Skip to content

Using numpy, cupy, and numba to compare convolution implementations.

Notifications You must be signed in to change notification settings

randompast/python-convolution-comparisons

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Convolution Comparisons

Convolutions, or sliding dot products, are fundamental operations in scientific computing. The goal here is to explore the possible approaches in python. The graphs shown are number of elements vs runtimes.

Ideal implementation would have the inputs (x,K) where x is 1d or 3d K is [0th order, 1st order, 2nd, order, ... nth order].

Example usages:

x1 = np.random.uniform(-1,1,100)
x3 = np.random.uniform(-1,1,(100,10,10))
k1 = np.random.uniform(-1,1,10))
k2 = np.random.uniform(-1,1,(10,10))
k3 = np.random.uniform(-1,1,(10,10,10))
conv( x3, k3 )
conv( x1, k1 )
conv( x1, k2 )
conv( x1, k3 )
conv( x1, [ k1 , k2 , k3 ] )

1D First Order

An example 1D signal could be audio amplitudes. An example filter would be an impulse response. The filter slides over a window of the larger audio signal. Corresponding terms are multiplied and then summed together to construct a filtered version of the signal.

It was found that numpy.convolve is the fastest for a large variety of inputs, but after a sufficiently large size it makes sense to use cupy. The gpu implementations started to become worth using at around 10^7.

1D, First Order

Best to worst for larger arrays:

  • cpste - cupy strided @Alexer
  • nbc_gm, nbc - numba cuda improvement from @gmarkall via numba community support
  • cpv, spfcv, spvc, npc
    • cpv - cupyx.scipy.ndimage,
    • spvc - scipy.signal.convolve
    • spfvc - scipy.signal.fftconvolve
    • npc - numpy.convolve
  • npste, nbpd_np, nbp, nbnpc, nbnpc_nj
    • npste - numpy strided @Alexer
    • nbpd_np - numba parallel dot, no gil
    • nbp - numba parallel
    • nbnpc_nj - @njit numba numpy convolve
  • npd - numpy dot
  • npfe - numpy for einsum
  • naive - (not shown) two for loops, significantly worse

1D Second Order

1D, Second Order

Best to worst for larger arrays:

  • nbc - numba cuda (naive implementation with @cuda.njit)
  • nbconv - numba njit (naive implementation with njit)
  • npfe - numpy for loop einsum (generate the outer product on demand)
  • npe - numpy einsum (generate the outer product all at once first)

1D Third Order

1D, Third Order

Best to worst for larger arrays:

  • nbcg, nbcg1024, nbc
  • nbcg - numba cuda (naive implementation with @cuda.njit and grid)
  • nbcg1024 - numba cuda (naive implementation with @cuda.njit and grid, 1024 threads/block)
  • nbc - numba cuda (naive implementation with @cuda.njit)
  • nbconv - numba njit (naive implementation with njit)

3D First Order

A video would be a good example of a 3D signal.

3D, First Order

Best to worst for larger arrays:

  • cpv - cupyx.scipy.ndimage
  • cpste - cupy stride tricks @Alexer
  • nbcg - numba cuda (naive implementation with @cuda.njit and grid)
  • npste - numpy stride tricks @Alexer
  • nbconv - numba njit (naive implementation with njit)
  • spv - scipy.signal.convolve valid

About

Using numpy, cupy, and numba to compare convolution implementations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages