# Understanding the `numpy.ndarray` internals

In [2]:
import numpy as np


In [3]:
x = np.array([[0, 1, 2, 3],[4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]], dtype=np.int8)
x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]], dtype=int8)

In [4]:
x.strides

(4, 1)

In [5]:
y = np.array([[0, 1, 2, 3],[4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]], dtype=np.int16, order='F')
y.strides

(2, 8)

<mark>Easy way to calculate the strides (in bytes) for each array</mark>

* For row major order (C convention):
  
  $s_i = b_{el}\cdot\prod_{j=i+1}^{N-1} d_j$

* For column major order (Fortran convention):

  $s_i = b_{el}\cdot\prod_{j=0}^{i-1} d_j$

where $s_i$ is the stride for dimention $i$, $b_{el}$ are the number of bytes of an array element and $d_j$ the number of elements of dimension $j$ .

***
### 1. Understanding strides
<mark>Question</mark>: Determine the strides for the following arrays. Check your answer with `x.strides`.

In [11]:
# 1.1
y = x.reshape((2, 8))
y.strides

(8, 1)

In [12]:
# 1.2
z = x.reshape((1, 16))
z.strides

(16, 1)

In [13]:
# 1.3
a = np.array([[0, 1, 2, 3],[4, 5, 6, 7],[8, 9, 10, 11], [12, 13, 14, 15]], dtype=np.int16)
a.strides

(8, 2)

***
### 2. Metadata modification vs copying the data buffer

<mark>Question</mark>: How do you explain the next result? Is the result the same when using `x.flatten()` instead of `x.ravel()`?

> Note: Both `flatten()` and `ravel()` return a flattend version of an array.

In [14]:
x = np.arange(5)
print('x =', x)

y = x.ravel()  #  assign to y a flattened version the array x
y[0] = 5       #  change the first element of the array y

print('\nx =', x)

x = [0 1 2 3 4]

x = [5 1 2 3 4]


<mark>Question</mark>: The next three cells do the same two operations: transposing a matrix and flattening it. How do you explain the difference in execution time?

In [15]:
x = np.random.rand(5000, 5000)

In [16]:
%%timeit
# 2.1
x.T
x.ravel()

273 ns ± 4.37 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [17]:
%%timeit
# 2.2
x.T
x.flatten()

103 ms ± 323 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [18]:
%%timeit
# 2.3
x.T.ravel()

416 ms ± 3.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [21]:
x

array([[0.96451957, 0.01328671, 0.41359046, ..., 0.73230555, 0.70712528,
        0.12283355],
       [0.6178087 , 0.30226109, 0.60967047, ..., 0.82422884, 0.12438844,
        0.81950957],
       [0.64637061, 0.5991913 , 0.03109118, ..., 0.44338154, 0.54956496,
        0.8755781 ],
       ...,
       [0.5209305 , 0.15357812, 0.41602991, ..., 0.54619562, 0.93587219,
        0.72213734],
       [0.1138394 , 0.4213797 , 0.52823719, ..., 0.74479063, 0.10899269,
        0.72953893],
       [0.61491811, 0.8344198 , 0.00629097, ..., 0.10871373, 0.86237908,
        0.5014326 ]])

In [20]:
x.T

array([[0.96451957, 0.6178087 , 0.64637061, ..., 0.5209305 , 0.1138394 ,
        0.61491811],
       [0.01328671, 0.30226109, 0.5991913 , ..., 0.15357812, 0.4213797 ,
        0.8344198 ],
       [0.41359046, 0.60967047, 0.03109118, ..., 0.41602991, 0.52823719,
        0.00629097],
       ...,
       [0.73230555, 0.82422884, 0.44338154, ..., 0.54619562, 0.74479063,
        0.10871373],
       [0.70712528, 0.12438844, 0.54956496, ..., 0.93587219, 0.10899269,
        0.86237908],
       [0.12283355, 0.81950957, 0.8755781 , ..., 0.72213734, 0.72953893,
        0.5014326 ]])

<mark>Answer</mark>: `ravel` doesn't change the memory, `flatten` does. However, on the third example, the `ravel` of the transpose can't be expressed only by changing the metadata. As a result, `ravel` creates a new data buffer.