-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add threshold for fft_small which is set during configuration #1791
Conversation
TIMEIT_STOP_VALUES(__, t1) | ||
|
||
TIMEIT_START | ||
mpn_mul_default_mpn_ctx(s, x, n, y, n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure that make profile
works on machines that don't have this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for observing this.
#define FLINT_FFT_MUL_THRESHOLD 400 | ||
#define FLINT_FFT_SQR_THRESHOLD 800 | ||
# define FLINT_FFT_MUL_THRESHOLD FLINT_FFT_SMALL_THRESHOLD | ||
# define FLINT_FFT_SQR_THRESHOLD (2 * FLINT_FFT_SMALL_THRESHOLD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the squaring threshold really 2x on Intel too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it is exactly 2x on my machine... We should really try to look into if we can improve this.
And zooming in:
|
One could argue that the peaks in performance surrounding these valleys are worth it, but I'd rather have smooth performance. Either way, this should be fixed. |
Also use preprocessor instead of compiler in the configuration process to speed things up.
b6e892b
to
8e0a503
Compare
Also use preprocessor instead of compiler in the configuration process to speed things up.
Solves #1790 and #1789.
On Skylake, I get the following timings on Skylake with
fft_small/profile/p-fft_small_vs_gmp.c
:And zooming in on the range 1500-1600, we see something wierd:
Hence, I set the threshold for CPUs with fast
vroundpd
to 400, and for slow to 1540.