CUDA accelerated propagation #239

mperrin · 2018-08-27T15:15:29Z

Issue by douglase
Monday Nov 13, 2017 at 00:03 GMT
Originally opened as mperrin/poppy#239

This is a proof of concept addition of to see if CUDA FFTs would significantly accelerate the angular spectrum code. While that is the case, in the course of implementation and benchmarking (profiling with ipython %prun) I discovered significant bottlenecks were np.fft.fftshift() and np.exp(). These are also addressed in this branch, the former with a numba gpu implementation fftshift and the latter using the NumExpr library.

This obviously adds a lot of optional dependencies, but my philosophy was follow the fftw approach and gracefully revert to numpy everywhere, so each addition gives partial improvement. On my machine tests are passing with and w/o numba or cuda installed.

While further optimization is possible, for a toy case with 3 optical surfaces (below) generating a PSF with a fresnel wavefront class is accelerated by as much as a factor of 3:

Benchmark Optical system:


wavelen = 770e-9

D_prim = 2.37 * u.m
fr_pri = 7.8
fl_pri = D_prim * fr_pri

D_relay = 20 * u.mm
fl_m2 = fl_pri * D_relay / D_prim
fr_m3 = 20.
fl_m3 = fr_m3 * D_relay

def wfirst_sys(npix=1024,ratio=0.25):
    wfirst_optsys = poppy.FresnelOpticalSystem(pupil_diameter=D_prim, npix=npix, beam_ratio=ratio)

    m1 = poppy.QuadraticLens(fl_pri, name='Primary')
    m2 = poppy.QuadraticLens(fl_m2, name='M2')
    m3 = poppy.QuadraticLens(fl_m3, name='M3')
    m4 = poppy.QuadraticLens(fl_m3, name='M4')

    wfirst_optsys.add_optic(poppy.CircularAperture(radius=D_prim.value/2))
    wfirst_optsys.add_optic(m1)
    wfirst_optsys.add_optic(m2, distance = fl_pri + fl_m2)
    wfirst_optsys.add_optic(m3, distance = fl_m2 + fl_m3)

    #wfirst_optsys.add_optic(poppy.ScalarTransmission(planetype=poppy.poppy_core.PlaneType.image, name='focus'),
    #                        distance=fl_m3)

    wfirst_optsys.add_optic(poppy.ScalarTransmission(planetype=poppy.poppy_core.PlaneType.intermediate,
                                                 name='focus'), distance=fl_m3)
    return wfirst_optsys
wfirst_optsys4096=wfirst_sys(npix=1024,ratio=0.25)
%timeit wfirst_optsys4096.calcPSF(wavelength=wavelen, display_intermediates=False, return_intermediates=False)

Benchmark computer:

Google Compute Engine:

Machine type

n1-standard-8 (8 vCPUs, 30 GB memory)
CPU platform

Intel Haswell
GPUs

1 x NVIDIA Tesla K80

douglase included the following code: https://github.com/mperrin/poppy/pull/239/commits

The text was updated successfully, but these errors were encountered:

mperrin · 2018-08-27T15:15:30Z

Comment by douglase
Monday Nov 13, 2017 at 00:20 GMT

also, h/t to @neilzim for the benchmark prescription.

I will investigate the travis failure to build.

mperrin · 2018-08-27T15:15:31Z

Comment by mperrin
Tuesday Nov 14, 2017 at 01:53 GMT

This is spectacular! I've been wanting to dig into all these performance optimization toolkits but just haven't had time. In addition to cuda and numexpr did you do any tests with numba? As I recall, @josePhoenix had done some tests with Cython but wasn't able to get performance better than using a well-optimized numpy for the array operations. That was in the Fraunhofer code rather than the Fresnel though. I'm excited to see you got a 3x speedup using cuda!

mperrin · 2018-08-27T15:15:31Z

Comment by joseph-long
Tuesday Nov 14, 2017 at 17:54 GMT

Awesome stuff, @douglase! I remember last time I looked at this I was trying to target OpenCL, which alas seems to be dead in the water, and numba's CUDA support was still proprietary/paid-license only. Exciting to see that support has matured so much!

mperrin · 2018-08-27T15:15:32Z

Comment by douglase
Friday Nov 17, 2017 at 17:05 GMT

(much of the benchmarking for this project is recreated in this notebook: https://gist.github.com/douglase/3846e8f105cd9baec96706681d0b8ee5)

mperrin · 2018-08-27T15:15:33Z

Comment by mperrin
Friday Nov 17, 2017 at 22:47 GMT

Wow, those benchmarks are really striking. Numexpr is way out in front, and for much less programming effort than the more manual numba vectorize and cuda versions.

I've tried a bit here in benchmarking and adding numexpr calls to parts of the Fraunhofer pathway too, and gotten some nice speedups there. But the real heavy lifting is still the calls to the numpy dot product, which doesn't fit easily into what numexpr can help with.

mperrin · 2018-08-27T15:15:33Z

Comment by mperrin
Sunday Jan 07, 2018 at 21:39 GMT

@douglase I'm assuming you're around the AAS this week? I'm going to try re-starting the build of this PR on Travis now to see if it works (it was never clear why it died before). The actual code worked fine in my local tests. If necessary maybe we can meet up for half an hour sometime this week to try to figure out how to get this running in the Travis CI setup?

mperrin · 2018-08-27T15:15:34Z

Comment by mperrin
Monday Jan 08, 2018 at 19:05 GMT

I ended up just merging this from the command line manually - trying to do anything clever with pushing the conflict fix onto your branch first just led me down into an unproductive morass of inscrutable git error messages. Was much simpler to just manually merge into master instead.

mperrin closed this as completed Aug 27, 2018

mperrin mentioned this issue Aug 27, 2018

Arbitrary basis clipping fixes #241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA accelerated propagation #239

CUDA accelerated propagation #239

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

CUDA accelerated propagation #239

CUDA accelerated propagation #239

Comments

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018

mperrin commented Aug 27, 2018