Convolution Notes #1

jamienoss · 2017-12-18T16:27:28Z

References:

Astropy:

SciPy:

Other:

Perf:

Driver:

mprof run --include-children ../convolve.py

import astropy
print(astropy.version.version)

import astropy.convolution as astroconv
import scipy.ndimage.filters as sciconv1
import scipy.signal as scisig
#from scipy.ndimage.filters import convolve
import numpy as np
import gputools

iLength = 10000
jLength = iLength
image = np.random.random((iLength, jLength))

size = 111
ker = astroconv.Gaussian2DKernel(20,x_size=size,y_size=size)
#for 10k+1 x 10k+1 tests
#ker = np.random.random((iLength+1, jLength+1))

#smoothed = scisig.fftconvolve(image, ker.array)
#smoothed = astroconv.convolve_fft(image, ker, allow_huge=True)

#smoothed = sciconv.convolve(image, ker.array, mode='wrap')
smoothed = astroconv.convolve(image, ker, boundary='wrap')
#smoothed = gputools.convolve(image, ker.array)

Benchmark

Env

python 3.5.4-he720263_23
numpy 1.13.3-py35hfd7066c_0
scipy 1.0.0-py35h8b35106_0
astropy 2.0.3 (built on the box below w/clang & -O3) v3.0rc2 has no significant code changes.
libcxx 4.0.1-h579ed51_0
libcxxabi 4.0.1-hebd6815_0
clang Apple LLVM version 7.3.0 (clang-703.0.31)
memory-profiler 0.50.0-pip

Box

Model Name: Mac Pro
Model Identifier: MacPro6,1
Processor Name: 6-Core Intel Xeon E5
- Processor Speed: 3.5 GHz
- Number of Processors: 1
- Total Number of Cores: 6 (12w/hyperthreading ON - default)
- L2 Cache (per Core): 256 KB
- L3 Cache: 12 MB
Memory: 32 GB
Boot ROM Version: MP61.0116.B25
SMC Version (system): 2.20f18
Illumination Version: 1.4a6
Serial Number (system): F5KS302VF9VN
Hardware UUID: 549FB335-E005-524B-85C3-B582CA595478
Disk: APPLE SSD SM1024G
GPU: Dual AMD FirePro D300
- 2GB of GDDR5 VRAM (each)
- 1280 stream processors
- 256-bit-wide memory bus
- 160GB/s memory bandwidth
- 2 teraflops performance

Direct

astropy.convolution.convolve

astropy.convolution.convolve(array, kernel,
                                                 boundary='fill',
                                                  fill_value=0.0,
                                                  nan_treatment='interpolate',
                                                  normalize_kernel=True, 
                                                  mask=None,
                                                  preserve_nan=False,
                                                   normalization_zero_tol=1e-08)

scipy.ndimage.filters.convolve

scipy.ndimage.filters.convolve(input, weights, output=None, mode='reflect', cval=0.0, origin=0)

NOTE: the above two functions and their tests utilize only a single thread.

gputools.convolve. This routine is very unstable, binding my machine on more runs than not. Only 2/12 runs did not do this. #061fa99 (master) & v0.2.2 (latest release)

gputools.convolve(data, h, res_g=None, sub_blocks=None)

FFT

astropy.convolution.convolve_fft

astropy.convolution.convolve_fft(array, kernel, boundary='fill', fill_value=0.0,
                                                 nan_treatment='interpolate', normalize_kernel=True,
                                                 normalization_zero_tol=1e-08,
                                                 preserve_nan=False, mask=None, crop=True, return_fft=False,
                                                 fft_pad=None, psf_pad=None,
                                                 quiet=False, min_wt=0.0, allow_huge=False, fftn=<function fftn>,
                                                 ifftn<function ifftn>, complex_dtype=<class 'complex'>)

scipy.signal.fftconvolve

scipy.signal.fftconvolve(in1, in2, mode='full')

NOTE: the above two functions and their tests utilize only a single thread.

NOTE: There appears to be a memory ceiling at ~25GB that mprof is unable to observe beyond. E.g. when running astropy.convolution.convolve_fft(image, kernel, allow_huge=True), the OSX activity monitor records a peak usage of ~47GB with ~22GB compressed, equating to ~25 "uncompressed". ~6GB is the background usage by other apps that are open during testing, yielding the 25GB out of the total 32GB available. It appears that mprof is oblivious to this compressed memory.

Results (10k x 10k with 111 x 111)

Memory Deck

The minimum memory required for both arrays to be stored in memory:

0.8GB (float64)

Direct

The above linear increase in memory, as a function of time, is indicative of it "leaking" (may not be a literal leak).

FFT

Results (10k x 10k with 10k+1 x 10k+1)

NOTE: The below plot titles are incorrect for kernel size in that they are missing the +1 in both dims

Memory Deck

The minimum memory required for both arrays to be stored in memory:

1.6GB (float64)

Direct

Estimates assuming linear scaling:

Astropy default (None) : ~141 days. Actual memory used ~2.3GB (incomplete run).
Scipy default (reflect) : ~101.5 days. Immediately fails with MemoryError. ~3GB reached at fail-time.

FFT

NOTE: effectively the PSF deconvolution benchmark.

Scipy default (full) :
Astropy default (fill) : Aborted by OS with signal 9. OSX activity monitor indicated significantly higher mem consumption than mprof has. This was true when not profiled so cannot be attributed to mprof overhead. Approx 70GB of memory was recorded, with ~50GB compressed.

The text was updated successfully, but these errors were encountered:

Repository owner locked and limited conversation to collaborators Jan 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convolution Notes #1

Convolution Notes #1

jamienoss commented Dec 18, 2017 •

edited

Loading

Convolution Notes #1

Convolution Notes #1

Comments

jamienoss commented Dec 18, 2017 • edited Loading

References:

Driver:

Benchmark

Env

Box

Direct

FFT

Results (10k x 10k with 111 x 111)

Memory Deck

Direct

FFT

Results (10k x 10k with 10k+1 x 10k+1)

Memory Deck

Direct

FFT

jamienoss commented Dec 18, 2017 •

edited

Loading