Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convolution Notes #1

Open
jamienoss opened this issue Dec 18, 2017 · 0 comments
Open

Convolution Notes #1

jamienoss opened this issue Dec 18, 2017 · 0 comments

Comments

@jamienoss
Copy link
Owner

jamienoss commented Dec 18, 2017

References:

Astropy:

SciPy:

Other:

Perf:

Driver:

mprof run --include-children ../convolve.py

import astropy
print(astropy.version.version)

import astropy.convolution as astroconv
import scipy.ndimage.filters as sciconv1
import scipy.signal as scisig
#from scipy.ndimage.filters import convolve
import numpy as np
import gputools

iLength = 10000
jLength = iLength
image = np.random.random((iLength, jLength))

size = 111
ker = astroconv.Gaussian2DKernel(20,x_size=size,y_size=size)
#for 10k+1 x 10k+1 tests
#ker = np.random.random((iLength+1, jLength+1))

#smoothed = scisig.fftconvolve(image, ker.array)
#smoothed = astroconv.convolve_fft(image, ker, allow_huge=True)

#smoothed = sciconv.convolve(image, ker.array, mode='wrap')
smoothed = astroconv.convolve(image, ker, boundary='wrap')
#smoothed = gputools.convolve(image, ker.array)

Benchmark

Env

  • python 3.5.4-he720263_23
  • numpy 1.13.3-py35hfd7066c_0
  • scipy 1.0.0-py35h8b35106_0
  • astropy 2.0.3 (built on the box below w/clang & -O3) v3.0rc2 has no significant code changes.
  • libcxx 4.0.1-h579ed51_0
  • libcxxabi 4.0.1-hebd6815_0
  • clang Apple LLVM version 7.3.0 (clang-703.0.31)
  • memory-profiler 0.50.0-pip

Box

  • Model Name: Mac Pro
  • Model Identifier: MacPro6,1
  • Processor Name: 6-Core Intel Xeon E5
    • Processor Speed: 3.5 GHz
    • Number of Processors: 1
    • Total Number of Cores: 6 (12w/hyperthreading ON - default)
    • L2 Cache (per Core): 256 KB
    • L3 Cache: 12 MB
  • Memory: 32 GB
  • Boot ROM Version: MP61.0116.B25
  • SMC Version (system): 2.20f18
  • Illumination Version: 1.4a6
  • Serial Number (system): F5KS302VF9VN
  • Hardware UUID: 549FB335-E005-524B-85C3-B582CA595478
  • Disk: APPLE SSD SM1024G
  • GPU: Dual AMD FirePro D300
    • 2GB of GDDR5 VRAM (each)
    • 1280 stream processors
    • 256-bit-wide memory bus
    • 160GB/s memory bandwidth
    • 2 teraflops performance

Direct

astropy.convolution.convolve(array, kernel,
                                                 boundary='fill',
                                                  fill_value=0.0,
                                                  nan_treatment='interpolate',
                                                  normalize_kernel=True, 
                                                  mask=None,
                                                  preserve_nan=False,
                                                   normalization_zero_tol=1e-08)
scipy.ndimage.filters.convolve(input, weights, output=None, mode='reflect', cval=0.0, origin=0)

NOTE: the above two functions and their tests utilize only a single thread.

gputools.convolve(data, h, res_g=None, sub_blocks=None)

figure_1-2

direct_conv_bench

FFT

astropy.convolution.convolve_fft(array, kernel, boundary='fill', fill_value=0.0,
                                                 nan_treatment='interpolate', normalize_kernel=True,
                                                 normalization_zero_tol=1e-08,
                                                 preserve_nan=False, mask=None, crop=True, return_fft=False,
                                                 fft_pad=None, psf_pad=None,
                                                 quiet=False, min_wt=0.0, allow_huge=False, fftn=<function fftn>,
                                                 ifftn<function ifftn>, complex_dtype=<class 'complex'>)
scipy.signal.fftconvolve(in1, in2, mode='full')

NOTE: the above two functions and their tests utilize only a single thread.

fft_conv_bench

fft_conv_bench

figure_1

figure_1-1

NOTE: There appears to be a memory ceiling at ~25GB that mprof is unable to observe beyond. E.g. when running astropy.convolution.convolve_fft(image, kernel, allow_huge=True), the OSX activity monitor records a peak usage of ~47GB with ~22GB compressed, equating to ~25 "uncompressed". ~6GB is the background usage by other apps that are open during testing, yielding the 25GB out of the total 32GB available. It appears that mprof is oblivious to this compressed memory.

Results (10k x 10k with 111 x 111)

Memory Deck

The minimum memory required for both arrays to be stored in memory:

  • 0.8GB (float64)

Direct

direct_conv_bench
The above linear increase in memory, as a function of time, is indicative of it "leaking" (may not be a literal leak).

FFT

fft_conv_bench

Results (10k x 10k with 10k+1 x 10k+1)

NOTE: The below plot titles are incorrect for kernel size in that they are missing the +1 in both dims

Memory Deck

The minimum memory required for both arrays to be stored in memory:

  • 1.6GB (float64)

Direct

Estimates assuming linear scaling:

  • Astropy default (None) : ~141 days. Actual memory used ~2.3GB (incomplete run).
  • Scipy default (reflect) : ~101.5 days. Immediately fails with MemoryError. ~3GB reached at fail-time.

FFT

NOTE: effectively the PSF deconvolution benchmark.

  • Scipy default (full) :
    scipy
  • Astropy default (fill) : Aborted by OS with signal 9. OSX activity monitor indicated significantly higher mem consumption than mprof has. This was true when not profiled so cannot be attributed to mprof overhead. Approx 70GB of memory was recorded, with ~50GB compressed.

figure_1

Repository owner locked and limited conversation to collaborators Jan 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant