Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add threshold for fft_small which is set during configuration #1791

Merged
merged 3 commits into from
Feb 24, 2024

Conversation

albinahlback
Copy link
Collaborator

Also use preprocessor instead of compiler in the configuration process to speed things up.

Solves #1790 and #1789.

On Skylake, I get the following timings on Skylake with fft_small/profile/p-fft_small_vs_gmp.c:

mpn_mul_n vs flint_mpn_mul_n
n =  900: 1.06
n =  936: 0.90
n =  973: 0.92
n = 1011: 0.87
n = 1051: 0.90
n = 1093: 0.93
n = 1136: 0.95
n = 1181: 1.04
n = 1228: 1.05
n = 1277: 1.07
n = 1328: 1.09
n = 1381: 0.81
n = 1436: 0.84
n = 1493: 0.86
n = 1552: 1.18
n = 1614: 1.22
n = 1678: 1.01
n = 1745: 1.28
n = 1814: 1.33
n = 1886: 1.01
n = 1961: 1.06
n = 2039: 1.04
n = 2120: 1.07
n = 2204: 1.14
n = 2292: 1.16
n = 2383: 1.38
n = 2478: 1.39
n = 2577: 1.41
n = 2680: 1.44
n = 2787: 1.03
n = 2898: 1.08
n = 3013: 1.08
n = 3133: 1.57
n = 3258: 1.60
n = 3388: 1.63
n = 3523: 1.64
n = 3663: 1.71
n = 3809: 1.26
n = 3961: 1.58

And zooming in on the range 1500-1600, we see something wierd:

mpn_mul_n vs flint_mpn_mul_n

n = 1500: 0.86
n = 1505: 0.90
n = 1510: 0.89
n = 1515: 0.89
n = 1520: 0.89
n = 1525: 0.89
n = 1530: 0.89
n = 1535: 0.89
n = 1540: 1.17
n = 1545: 1.19
n = 1550: 1.19
n = 1555: 1.20
n = 1560: 1.20
n = 1565: 1.20
n = 1570: 1.22
n = 1575: 1.22
n = 1580: 1.21
n = 1585: 1.22
n = 1590: 1.22
n = 1595: 1.22
n = 1600: 1.22

Hence, I set the threshold for CPUs with fast vroundpd to 400, and for slow to 1540.

TIMEIT_STOP_VALUES(__, t1)

TIMEIT_START
mpn_mul_default_mpn_ctx(s, x, n, y, n);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that make profile works on machines that don't have this function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for observing this.

#define FLINT_FFT_MUL_THRESHOLD 400
#define FLINT_FFT_SQR_THRESHOLD 800
# define FLINT_FFT_MUL_THRESHOLD FLINT_FFT_SMALL_THRESHOLD
# define FLINT_FFT_SQR_THRESHOLD (2 * FLINT_FFT_SMALL_THRESHOLD)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the squaring threshold really 2x on Intel too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it is exactly 2x on my machine... We should really try to look into if we can improve this.

@albinahlback
Copy link
Collaborator Author

mpn_sqr vs fft_small

n = 1400: 0.69
n = 1470: 0.71
n = 1543: 1.17
n = 1620: 1.20
n = 1701: 0.90
n = 1786: 1.26
n = 1875: 0.89
n = 1968: 0.92
n = 2066: 0.93
n = 2169: 0.97
n = 2277: 1.01
n = 2390: 1.29
n = 2509: 1.31
n = 2634: 1.33
n = 2765: 0.86
n = 2903: 0.91
n = 3048: 0.95
n = 3200: 1.56
n = 3360: 1.60
n = 3528: 1.66
n = 3704: 1.72
n = 3889: 1.20

And zooming in:

mpn_sqr vs fft_small

n = 2900: 0.89
n = 2910: 0.90
n = 2920: 0.90
n = 2930: 0.90
n = 2940: 0.90
n = 2950: 0.93
n = 2960: 0.91
n = 2970: 0.91
n = 2980: 0.92
n = 2990: 0.91
n = 3000: 0.93
n = 3010: 0.91
n = 3020: 0.91
n = 3030: 0.91
n = 3040: 0.93
n = 3050: 0.95
n = 3060: 0.92
n = 3070: 0.89
n = 3080: 1.01
n = 3090: 1.42
n = 3100: 1.45
n = 3110: 1.44
n = 3120: 1.45
n = 3130: 1.46
n = 3140: 1.45
n = 3150: 1.45
n = 3160: 1.46
n = 3170: 1.47
n = 3180: 1.47
n = 3190: 1.47
n = 3200: 1.47

@albinahlback
Copy link
Collaborator Author

One could argue that the peaks in performance surrounding these valleys are worth it, but I'd rather have smooth performance. Either way, this should be fixed.

@albinahlback albinahlback merged commit babbdc2 into flintlib:main Feb 24, 2024
15 checks passed
@albinahlback albinahlback deleted the fft_small_threshold branch March 14, 2024 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants