Add threshold for fft_small which is set during configuration #1791

albinahlback · 2024-02-24T00:30:13Z

Also use preprocessor instead of compiler in the configuration process to speed things up.

On Skylake, I get the following timings on Skylake with fft_small/profile/p-fft_small_vs_gmp.c:

mpn_mul_n vs flint_mpn_mul_n
n =  900: 1.06
n =  936: 0.90
n =  973: 0.92
n = 1011: 0.87
n = 1051: 0.90
n = 1093: 0.93
n = 1136: 0.95
n = 1181: 1.04
n = 1228: 1.05
n = 1277: 1.07
n = 1328: 1.09
n = 1381: 0.81
n = 1436: 0.84
n = 1493: 0.86
n = 1552: 1.18
n = 1614: 1.22
n = 1678: 1.01
n = 1745: 1.28
n = 1814: 1.33
n = 1886: 1.01
n = 1961: 1.06
n = 2039: 1.04
n = 2120: 1.07
n = 2204: 1.14
n = 2292: 1.16
n = 2383: 1.38
n = 2478: 1.39
n = 2577: 1.41
n = 2680: 1.44
n = 2787: 1.03
n = 2898: 1.08
n = 3013: 1.08
n = 3133: 1.57
n = 3258: 1.60
n = 3388: 1.63
n = 3523: 1.64
n = 3663: 1.71
n = 3809: 1.26
n = 3961: 1.58

And zooming in on the range 1500-1600, we see something wierd:

mpn_mul_n vs flint_mpn_mul_n

n = 1500: 0.86
n = 1505: 0.90
n = 1510: 0.89
n = 1515: 0.89
n = 1520: 0.89
n = 1525: 0.89
n = 1530: 0.89
n = 1535: 0.89
n = 1540: 1.17
n = 1545: 1.19
n = 1550: 1.19
n = 1555: 1.20
n = 1560: 1.20
n = 1565: 1.20
n = 1570: 1.22
n = 1575: 1.22
n = 1580: 1.21
n = 1585: 1.22
n = 1590: 1.22
n = 1595: 1.22
n = 1600: 1.22

Hence, I set the threshold for CPUs with fast vroundpd to 400, and for slow to 1540.

fredrik-johansson · 2024-02-24T07:50:00Z

src/fft_small/profile/p-fft_small_vs_gmp.c

+        TIMEIT_STOP_VALUES(__, t1)
+
+        TIMEIT_START
+        mpn_mul_default_mpn_ctx(s, x, n, y, n);


Please make sure that make profile works on machines that don't have this function.

Thanks for observing this.

fredrik-johansson · 2024-02-24T07:50:44Z

src/mpn_extras.h

-#define FLINT_FFT_MUL_THRESHOLD 400
-#define FLINT_FFT_SQR_THRESHOLD 800
+# define FLINT_FFT_MUL_THRESHOLD FLINT_FFT_SMALL_THRESHOLD
+# define FLINT_FFT_SQR_THRESHOLD (2 * FLINT_FFT_SMALL_THRESHOLD)


Is the squaring threshold really 2x on Intel too?

Unfortunately it is exactly 2x on my machine... We should really try to look into if we can improve this.

albinahlback · 2024-02-24T11:16:30Z

mpn_sqr vs fft_small

n = 1400: 0.69
n = 1470: 0.71
n = 1543: 1.17
n = 1620: 1.20
n = 1701: 0.90
n = 1786: 1.26
n = 1875: 0.89
n = 1968: 0.92
n = 2066: 0.93
n = 2169: 0.97
n = 2277: 1.01
n = 2390: 1.29
n = 2509: 1.31
n = 2634: 1.33
n = 2765: 0.86
n = 2903: 0.91
n = 3048: 0.95
n = 3200: 1.56
n = 3360: 1.60
n = 3528: 1.66
n = 3704: 1.72
n = 3889: 1.20

And zooming in:

mpn_sqr vs fft_small

n = 2900: 0.89
n = 2910: 0.90
n = 2920: 0.90
n = 2930: 0.90
n = 2940: 0.90
n = 2950: 0.93
n = 2960: 0.91
n = 2970: 0.91
n = 2980: 0.92
n = 2990: 0.91
n = 3000: 0.93
n = 3010: 0.91
n = 3020: 0.91
n = 3030: 0.91
n = 3040: 0.93
n = 3050: 0.95
n = 3060: 0.92
n = 3070: 0.89
n = 3080: 1.01
n = 3090: 1.42
n = 3100: 1.45
n = 3110: 1.44
n = 3120: 1.45
n = 3130: 1.46
n = 3140: 1.45
n = 3150: 1.45
n = 3160: 1.46
n = 3170: 1.47
n = 3180: 1.47
n = 3190: 1.47
n = 3200: 1.47

albinahlback · 2024-02-24T11:18:20Z

One could argue that the peaks in performance surrounding these valleys are worth it, but I'd rather have smooth performance. Either way, this should be fixed.

Also use preprocessor instead of compiler in the configuration process to speed things up.

fredrik-johansson reviewed Feb 24, 2024

View reviewed changes

albinahlback added 3 commits February 24, 2024 12:20

Add threshold for fft_small which is set during configuration

5be4be2

Also use preprocessor instead of compiler in the configuration process to speed things up.

Fix MSVC runner

a0e35c0

Add reasonable threshold for fft_small for all architectures using CMake

8e0a503

albinahlback force-pushed the fft_small_threshold branch from b6e892b to 8e0a503 Compare February 24, 2024 11:20

albinahlback merged commit babbdc2 into flintlib:main Feb 24, 2024
15 checks passed

albinahlback deleted the fft_small_threshold branch March 14, 2024 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add threshold for fft_small which is set during configuration #1791

Add threshold for fft_small which is set during configuration #1791

albinahlback commented Feb 24, 2024

fredrik-johansson Feb 24, 2024

albinahlback Feb 24, 2024

fredrik-johansson Feb 24, 2024

albinahlback Feb 24, 2024

albinahlback commented Feb 24, 2024

albinahlback commented Feb 24, 2024

Add threshold for fft_small which is set during configuration #1791

Add threshold for fft_small which is set during configuration #1791

Conversation

albinahlback commented Feb 24, 2024

fredrik-johansson Feb 24, 2024

Choose a reason for hiding this comment

albinahlback Feb 24, 2024

Choose a reason for hiding this comment

fredrik-johansson Feb 24, 2024

Choose a reason for hiding this comment

albinahlback Feb 24, 2024

Choose a reason for hiding this comment

albinahlback commented Feb 24, 2024

albinahlback commented Feb 24, 2024