[REVIEW] Use CuPy v8 FFT cache plan #254

mnicely · 2020-10-04T22:56:07Z

Closes #253

This PR adds a check for CuPy v7 or v8, and uses version 8's internal FFT cache.

Without cache + CuPy v7.8
-------------------------------------------------------------------------------- benchmark 'FFTConvolve': 3 tests --------------------------------------------------------------------------------
Name (time in ms)                        Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fftconvolve_gpu[same-32768]      0.9265 (1.0)      2.3050 (1.37)     1.0593 (1.0)      0.1013 (1.16)     1.0293 (1.0)      0.0479 (1.0)       103;130  943.9904 (1.0)         967           1
test_fftconvolve_gpu[full-32768]      0.9506 (1.03)     1.9485 (1.16)     1.0928 (1.03)     0.0870 (1.0)      1.0477 (1.02)     0.0912 (1.90)       148;39  915.0630 (0.97)       1039           1
test_fftconvolve_gpu[valid-32768]     0.9488 (1.02)     1.6771 (1.0)      1.1023 (1.04)     0.0895 (1.03)     1.0592 (1.03)     0.1142 (2.38)       182;25  907.2237 (0.96)        963           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

With cache + CuPy v7.8
---------------------------------------------------------------------------------------- benchmark 'FFTConvolve': 3 tests ----------------------------------------------------------------------------------------
Name (time in us)                          Min                   Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fftconvolve_gpu[same-32768]      664.1870 (1.01)     1,199.3530 (1.0)      691.5575 (1.0)      31.8148 (1.0)      688.4840 (1.0)      25.0215 (1.0)         78;65        1.4460 (1.0)        1464           1
test_fftconvolve_gpu[full-32768]      675.1820 (1.02)     1,245.5840 (1.04)     707.1642 (1.02)     33.9103 (1.07)     697.7085 (1.01)     40.2040 (1.61)       113;17        1.4141 (0.98)       1394           1
test_fftconvolve_gpu[valid-32768]     659.9210 (1.0)      1,539.0550 (1.28)     708.1542 (1.02)     52.8239 (1.66)     691.9645 (1.01)     27.5100 (1.10)        55;56        1.4121 (0.98)       1460           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Without cache + CuPy v8.0
----------------------------------------------------------------------------------------- benchmark 'FFTConvolve': 3 tests ----------------------------------------------------------------------------------------
Name (time in us)                          Min                   Max                Mean             StdDev              Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fftconvolve_gpu[same-32768]      404.8580 (1.01)     1,034.3980 (1.19)     423.5631 (1.0)      26.2704 (1.0)      419.7810 (1.0)       11.3210 (1.0)       183;262        2.3609 (1.0)        2450           1
test_fftconvolve_gpu[valid-32768]     417.4070 (1.04)       869.7810 (1.0)      436.5062 (1.03)     28.0625 (1.07)     430.5310 (1.03)      13.3815 (1.18)      175;314        2.2909 (0.97)       2308           1
test_fftconvolve_gpu[full-32768]      400.9510 (1.0)      1,124.1380 (1.29)     458.4183 (1.08)     66.2477 (2.52)     423.2620 (1.01)     140.0130 (12.37)       476;1        2.1814 (0.92)       1798           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

With cache + CuPy v8.0
----------------------------------------------------------------------------------------- benchmark 'FFTConvolve': 3 tests ----------------------------------------------------------------------------------------
Name (time in us)                          Min                   Max                Mean             StdDev              Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_fftconvolve_gpu[same-32768]      473.9490 (1.0)      1,200.2100 (1.39)     491.7297 (1.0)      26.6978 (1.24)     481.9450 (1.0)       15.4428 (1.14)      238;245        2.0336 (1.0)        2073           1
test_fftconvolve_gpu[valid-32768]     489.1830 (1.03)       866.0560 (1.0)      507.5161 (1.03)     21.5673 (1.0)      498.9190 (1.04)      13.5545 (1.0)       275;267        1.9704 (0.97)       1968           1
test_fftconvolve_gpu[full-32768]      475.2700 (1.00)     1,209.9800 (1.40)     540.0543 (1.10)     61.9996 (2.87)     511.7405 (1.06)     106.6135 (7.87)        314;3        1.8517 (0.91)       1592           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

GPUtester · 2020-10-04T22:56:34Z

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

leofang · 2020-10-05T01:03:29Z

Hi @mnicely, thanks for the benchmark data. I am a bit confused --- any chance the outcomes for v8 + cache and v8 no cache are swapped? The cached performance seems to be worse if I read it right.

mnicely · 2020-10-05T01:06:52Z

Hi @mnicely, thanks for the benchmark data. I am a bit confused --- any chance the outcomes for v8 + cache and v8 no cache are swapped? The cached performance seems to be worse if I read it right.

Not swapped, just bad wording! That is the cuSignal cache I created. When I turn it off and use CuPy’s, the FFT is faster!

leofang · 2020-10-05T01:52:55Z

Ahh I see, thanks for clarifying, Matt. The number looks very good then! I wonder if all can be attributed to CuPy's cache, or there are additional nice changes made to v8?

mnicely · 2020-10-05T02:04:34Z

I believe the speedups between cuSignal’s cache + CuPy v7.8 and cuSignal (with no cache) + CuPy v8.0 is solely the cache.

And the differences cuSignal (with no cache) between v7.8 and v8.0 is attributed to many improvements.

leofang · 2020-10-05T18:00:26Z

Thanks, @mnicely! I wonder if you could do one additional test for me when you have time: Use CuPy v8, but turn off all caches (either cuSignal's or CuPy's). The latter can be turned off this way:

import cupy as cp

cache = cp.fft.config.get_plan_cache()
cache.set_size(0)

Note the cache object is per thread & per device, so if your tests span over threads and/or devices, you need to turn them all off in the proper context. For confirmation, if you do cache.show_info(), you'd see a line saying cache enabled? False.

mnicely · 2020-10-05T18:12:54Z

Thanks, @mnicely! I wonder if you could do one additional test for me when you have time: Use CuPy v8, but turn off all caches (either cuSignal's or CuPy's). The latter can be turned off this way:
import cupy as cp

cache = cp.fft.config.get_plan_cache()
cache.set_size(0)
Note the cache object is per thread & per device, so if your tests span over threads and/or devices, you need to turn them all off in the proper context. For confirmation, if you do cache.show_info(), you'd see a line saying cache enabled? False.

Sure, I'll try to have you something by the end of this week!

Use CuPy v8 FFT cache plan

48c1437

mnicely added the 3 - Ready for Review Ready for review by team label Oct 4, 2020

mnicely added this to the 0.16 milestone Oct 4, 2020

mnicely requested a review from awthomp October 4, 2020 22:56

mnicely requested a review from a team as a code owner October 4, 2020 22:56

mnicely self-assigned this Oct 4, 2020

mnicely added this to PR-WIP in v0.16 Release via automation Oct 4, 2020

mnicely added 3 commits October 4, 2020 18:57

Update CHANGELOG.md

d4b1b8b

Fix formatting

b22e5a5

Remove leftover print

2c62f07

awthomp approved these changes Oct 5, 2020

View reviewed changes

v0.16 Release automation moved this from PR-WIP to PR-Reviewer approved Oct 5, 2020

awthomp merged commit 9f6eab0 into rapidsai:branch-0.16 Oct 5, 2020

v0.16 Release automation moved this from PR-Reviewer approved to Done Oct 5, 2020

leofang mentioned this pull request Oct 5, 2020

Add a cuFFT plan cache cupy/cupy#3730

Merged

8 tasks

mnicely deleted the global_fft_cache branch October 5, 2020 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Use CuPy v8 FFT cache plan #254

[REVIEW] Use CuPy v8 FFT cache plan #254

mnicely commented Oct 4, 2020 •

edited

GPUtester commented Oct 4, 2020

leofang commented Oct 5, 2020

mnicely commented Oct 5, 2020

leofang commented Oct 5, 2020

mnicely commented Oct 5, 2020

leofang commented Oct 5, 2020 •

edited

mnicely commented Oct 5, 2020

[REVIEW] Use CuPy v8 FFT cache plan #254

[REVIEW] Use CuPy v8 FFT cache plan #254

Conversation

mnicely commented Oct 4, 2020 • edited

GPUtester commented Oct 4, 2020

leofang commented Oct 5, 2020

mnicely commented Oct 5, 2020

leofang commented Oct 5, 2020

mnicely commented Oct 5, 2020

leofang commented Oct 5, 2020 • edited

mnicely commented Oct 5, 2020

mnicely commented Oct 4, 2020 •

edited

leofang commented Oct 5, 2020 •

edited