Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of multidimensional transform? #85

Open
shssoichiro opened this issue Apr 19, 2022 · 7 comments
Open

Example of multidimensional transform? #85

shssoichiro opened this issue Apr 19, 2022 · 7 comments

Comments

@shssoichiro
Copy link

Hi, I have a case where I need to implement a 3D FFT. This crate doesn't seem to have an existing utility function for it. I figured I could potentially create my own by combining multiple 1D FFTs, but it seems to have resulted in garbage output. Could an example be added to the repository of how to do a multidimensional transform correctly with this crate? (Even better would be if there were wrapper functions provided by the crate.)

My current attempt looks like this:

// Applies a real-to-complex 3-dimensional FFT to `real`
fn real_to_complex_3d(
  &self, real: &[f32; BLOCK_VOLUME], output: &mut [Complex<f32>; CCNT],
) {
  let scratch_len = self
    .ft
    .0
    .get_inplace_scratch_len()
    .max(self.ft.1.get_inplace_scratch_len())
    .max(self.ft.2.get_inplace_scratch_len());
  let mut scratch = vec![Complex::default(); scratch_len];

  let mut complex =
    real.iter().map(|val| Complex::from(*val)).collect::<Vec<_>>();
  self.ft.0.process_with_scratch(&mut complex, &mut scratch);
  self.ft.1.process_with_scratch(&mut complex, &mut scratch);
  self.ft.2.process_with_scratch(&mut complex, &mut scratch);

  output.iter_mut().zip(complex.iter()).for_each(|(out, inn)| {
    *out = *inn;
  });
}
@HEnquist
Copy link
Contributor

Your solution is not far off, you "just" need to transpose the data between the FFT steps so that each FFT runs along the correct axis.

But have you looked at ndrustfft? I haven't tried it but looks like it should be able to do what you need.
https://crates.io/crates/ndrustfft

@shssoichiro
Copy link
Author

shssoichiro commented Apr 20, 2022

I see, I'll "just" do that if I can. 😆 Thanks. 🙂

I did look at ndrustfft, but I wanted to avoid the dependency on ndarray. Last time I tried ndarray, it was quite slow. Although that was years ago, it looks like they've been iterating, so I guess that could also be worth trying once more.

@HEnquist
Copy link
Contributor

Yeah there is quite a lot hidden in that "just" 😆
BTW roughly how large is your data? If it fits in the cpu cache, then you can get away with using simple transpose algorithms. But those get painfully slow for larger data. From a quick look I think ndrustfft is using something simple, could explain why you thought it was slow. Speeding up transposing is quite some work. This is for example what is used in RustFFT, and that's only for 2d! https://github.com/ejmahler/transpose/blob/master/src/out_of_place.rs

@shssoichiro
Copy link
Author

shssoichiro commented Apr 20, 2022

I can't remember anymore, this was quite some time ago when I was messing around with some ML stuff in Rust. ndrustfft itself doesn't quite have just a "xd_transform" function, but they did have examples showing how to do it, so I ended up using that. (Although I haven't benchmarked it yet because there's a bug somewhere else in my code 🙂 ) The data I'm working with can be several MB but the transforms are in 16x16x3 chunks, so not terribly large and easily doable in cache.

@ejmahler
Copy link
Owner

I think RustFFT would benefit from something built in that does this. It seems like a common request, and easy enough to get wrong that people would benefit from a central utility.

I unfortunately don't have a lot of motivation for side projects right now, but if someone else makes a PR for a n-d FFT, tests it, etc I'd be happy to accept.

@preiter93
Copy link

preiter93 commented Aug 4, 2022

Hi,

author of ndrustfft here; I crossed this issue by chance. The topic seems to be quite important because of its relevance to image processing and numerical computation, so I decided to do some tests.

Assuming we have a two-dimensional array of size n x n, the two approaches are:

"transpose"

  • Transpose data, if necessary
  • Compute 1 fft of size n x n
  • Transpose back, if necessary

The transpose is done out of place, hopefully effciently with transposes very similar to
https://github.com/ejmahler/transpose
I think multidimensional transposes can be done in a similar way by adjusting
the row and col sizes appropriately, however, I haven't thought too deeply about that.

"ndrustfft"

  • Iterate over 1d lanes -> Array1View
  • Copy data into slice, if 1d view is not contiguous,
  • Compute n ffts of size n

The code can be found here:
https://github.com/preiter93/ndfft_test
(This is only a proof of concept)

Here is a performance comparison (time in ms):

FFT along outer axis (transpose is not necessary)

N ndrustfft transpose
128 0.09 0.04
256 0.4 0.24
512 2.22 1.64
1024 10.3 4.77

FFT along inner axis (transpose is necessary)

N ndrustfft transpose
128 0.24 0.1
256 1.12 0.56
512 8.41 5.08
1024 42.92 19.49

So I think the transpose-based approach is promising. The advantages are
(1) Better Performance (~50%)
(2) No additional dependencies
However, the disadvantages are
(1) More memory is needed if an out of place transpose is used. In-place transposes are generally less efficient, see ejmahler/transpose#4.
(2) it is harder to parallelize, whereas ndrustfft is very easy to parallelize and scales quite well.

I don't know how much time I have to pursue this topic further, but I hope this has given a little insight.

@skewballfox
Copy link

hey I wasn't sure if I should post here or on #90 . Are you open to PRs for this? What are the constraints? I was thinking about doing something similar to how burn handles multiple tensor implementations: providing a trait that could be implemented on matrix/tensor types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants