Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting Eigen & FFTW when STAN_THREADS is enabled #3025

Open
rok-cesnovar opened this issue Nov 10, 2023 · 6 comments
Open

Supporting Eigen & FFTW when STAN_THREADS is enabled #3025

rok-cesnovar opened this issue Nov 10, 2023 · 6 comments

Comments

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Nov 10, 2023

Description:

The FFTW library can be used seamlessly with Eigen and their FFT implementation which makes everything simple to use with Stan. Using FFTW has proven to yield huge speedups in our model.

There is however a problem when using threading. When using fftw in multi-threaded applications, you need to call fftw_make_planner_thread_safe(); once before calling fft().

Which means I need to add

#ifdef EIGEN_FFTW_DEFAULT
    fftw_make_planner_thread_safe();
#endif

in main() of cmdstan.

Do we think this might be worth adding for all users? Or is this too niche?

Current Version:

v2.33.1

@WardBrian
Copy link
Member

I would be in favor of this. I think you'd want the condition to actually be #if defined EIGEN_FFTW_DEFAULT && defined STAN_THREADS.

I'm curious about your application that got faster - I had previously tried to use FFTW for a problem which required a 2-D fft, but that was not much faster in Stan-Math because Eigen (as of 3.4, anyway) only exposes a 1-D FFT function, so our 2-D FFTs are just loops still.

@rok-cesnovar
Copy link
Member Author

rok-cesnovar commented Nov 16, 2023

Oh yeah, #if defined EIGEN_FFTW_DEFAULT && defined STAN_THREADS sounds good.

The application is running a bunch of 1-D convolutions of vectors y(sizes between 800-1k) with filters x that can vary in size (typically size 100-400). Both are parameters.

The speedups of just the fft call are around 2 (but in my model that is a ton because the convolution, as you can imagine is like 90% of the runtime).

Gradient evalution times for fft() is shown below, x-axis is the size of y
image

black = non-fft convolution
red = native Eigen fft
green = FFTW

@WardBrian
Copy link
Member

@rok-cesnovar would it make more sense for this to be in stan::math::init_threadpool_tbb()?

@rok-cesnovar
Copy link
Member Author

Hmm, I think you are right, because that way its not limited to cmdstan only. Will move to math.

@rok-cesnovar rok-cesnovar transferred this issue from stan-dev/cmdstan Feb 14, 2024
@WardBrian
Copy link
Member

For a model using 2-D FFTs, I saw about a 1.5x speedup by using FFTW. This also required some changes to our FFT code, since Eigen exposes the 2-D transforms conditionally:

https://github.com/WardBrian/math/tree/experiment/eigen-fftw

@rok-cesnovar
Copy link
Member Author

Awesome! Yeah, FFTW is the real deal :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants