In [1]:
import numpy as np

### How does Numpy manage Memory?

In [2]:
x = np.array([100.12, 120.23, 130.91])
x.dtype

dtype('float64')

**numpy** supports many different numerical data types such as bool_, int_, intc, intp, int8, int16, int32, int64, uint8, uint16, uint32, uint64, float_, float16, float32, float64, complex_, complex64, and complex128.

In [3]:
np.sctypes

{'int': [numpy.int8, numpy.int16, numpy.int32, numpy.int64],
 'uint': [numpy.uint8, numpy.uint16, numpy.uint32, numpy.uint64],
 'float': [numpy.float16, numpy.float32, numpy.float64, numpy.float128],
 'complex': [numpy.complex64, numpy.complex128, numpy.complex256],
 'others': [bool, object, bytes, str, numpy.void]}

In [4]:
np.float64.mro()

[numpy.float64,
 numpy.floating,
 numpy.inexact,
 numpy.number,
 numpy.generic,
 float,
 object]

In [5]:
np.float64(100.12).nbytes

8

In [6]:
np.str_('n').nbytes

4

In [7]:
np.str_('numpy').nbytes

20

In [8]:
np.float64(x).nbytes

24

In [9]:
x2 = x.astype(np.float32)
x2

array([100.12, 120.23, 130.91], dtype=float32)

In [10]:
np.float32(x2).nbytes

12

In [11]:
x.__array_interface__

{'data': (140542492335712, False),
 'strides': None,
 'descr': [('', '<f8')],
 'typestr': '<f8',
 'shape': (3,),
 'version': 3}

In [12]:
X = np.array([1,2,3,2,1,3,9,8,11,12,10,11,14,25,26,24,30,22,24,27])

In [13]:
X[::4]

array([ 1,  1, 11, 14, 30])

When you create new ndarrays by using slicing based on existing ndarrays, it may degrade the performance

In [15]:
nd_1 = np.random.randn(4,6,8)
nd_1.shape

(4, 6, 8)

In [16]:
nd_2 = nd_1[::,::2,::2]
nd_2.shape

(4, 3, 4)

In [17]:
nd_1.__array_interface__

{'data': (140542496927744, False),
 'strides': None,
 'descr': [('', '<f8')],
 'typestr': '<f8',
 'shape': (4, 6, 8),
 'version': 3}

In [18]:
nd_2.__array_interface__

{'data': (140542496927744, False),
 'strides': (384, 128, 16),
 'descr': [('', '<f8')],
 'typestr': '<f8',
 'shape': (4, 3, 4),
 'version': 3}

nd_2 has strides information to see how to move along a different dimension of the nd_1 array.

In [19]:
nd_1 = np.random.randn(400, 600)

In [20]:
nd_2 = np.random.randn(400, 600*20)[::, ::20]

In [21]:
print(nd_1.shape, nd_2.shape)

(400, 600) (400, 600)


In [23]:
#measure the time spent calculating the cumulative product of array elements for both nd_1 and nd_2:
%timeit np.cumprod(nd_1)

651 µs ± 12.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [24]:
%timeit np.cumprod(nd_2)

3.47 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


The presence of strides in nd_2 causes jumps to different memory locations when reading data from memory to CPU. If array elements are stored sequentially as a contiguous block of memory, then this operation is faster as seen from time measurements. Smaller strides are better to utilize CPU cache better for performance.

In [30]:
import numexpr as ne 
%timeit 2*nd_2 + 48

2.88 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [31]:
%timeit ne.evaluate("2*nd_2 + 48")

1.24 ms ± 37.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


### Be Aware of implicit copying

In [32]:
shape = (400,400,400)
x = np.random.random_sample(shape)

In [34]:
import cProfile

In [35]:
import re

In [36]:
cProfile.run('x *= 2')

         3 function calls in 0.191 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.191    0.191    0.191    0.191 <string>:1(<module>)
        1    0.000    0.000    0.191    0.191 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




In [37]:
cProfile.run('x = x*2')

         3 function calls in 0.301 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.301    0.301    0.301    0.301 <string>:1(<module>)
        1    0.000    0.000    0.301    0.301 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




Many array operations return a new array for results. This behavior is expected but damages performance in iterative tasks where you could have millions or billions of iterations. Some numpy functions has out argument which creates output array and use it to write results of iterations. By this way, your program manage the memory better and requires less time:

In [38]:
shape = (8000,3000)
x = np.random.random_sample(shape)

In [39]:
%timeit np.cumprod(x)

83.1 ms ± 1.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [40]:
output_array = np.zeros(x.shape[0] * x.shape[1])
%timeit np.cumprod(x, out=output_array)

55.1 ms ± 454 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
