# Understanding the `numpy.ndarray` internals

In [1]:
import numpy as np

In [2]:
x = np.array([[0, 1, 2, 3],[4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]], dtype=np.int8)
x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]], dtype=int8)

In [3]:
x.strides

(4, 1)

In [4]:
y = np.array([[0, 1, 2, 3],[4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]], dtype=np.int16, order='F')
y.strides

(2, 8)

<mark>Easy way to calculate the strides (in bytes) for each array</mark>

* For row major order (C convention):
  
  $s_i = b_{el}\cdot\prod_{j=i+1}^{N-1} d_j$

* For column major order (Fortran convention):

  $s_i = b_{el}\cdot\prod_{j=0}^{i-1} d_j$

where $s_i$ is the stride for dimention $i$, $b_{el}$ are the number of bytes of an array element and $d_j$ the number of elements of dimension $j$ .

***
### 1. Understanding strides
<mark>Question</mark>: Determine the strides for the following arrays. Check your answer with `x.strides`.

In [10]:
# 1.1
y = x.reshape((2, 8))
y

# my guess: (4,1)
print(x.strides)

(4, 1)


In [11]:
# 1.2
z = x.reshape((1, 16))
z

# my guess: (16,1)
print(z.strides)

(16, 1)


In [12]:
# 1.3
a = np.array([[0, 1, 2, 3],[4, 5, 6, 7],[8, 9, 10, 11], [12, 13, 14, 15]], dtype=np.int16)
a

# my guess: (8,2)
print(a.strides)

(8, 2)


***
### 2. Metadata modification vs copying the data buffer

<mark>Question</mark>: How do you explain the next result? Is the result the same when using `x.flatten()` instead of `x.ravel()`?

> Note: Both `flatten()` and `ravel()` return a flattend version of an array.

In [13]:
x = np.arange(5)
print('x =', x)

y = x.ravel()  #  assign to y a flattened version the array x
y[0] = 5       #  change the first element of the array y

print('\nx =', x)

x = [0 1 2 3 4]

x = [5 1 2 3 4]


<mark>Question</mark>: The next three cells do the same two operations: transposing a matrix and flattening it. How do you explain the difference in execution time?

In [14]:
x = np.random.rand(5000, 5000)

In [19]:
%%timeit
# 2.1
# x.T
x.ravel()

126 ns ± 0.495 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [20]:
%%timeit
# 2.2
# x.T
x.T.flatten() # creates a copy

414 ms ± 340 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [21]:
%%timeit
# 2.3
x.T.ravel()

414 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


<mark>Answer</mark>: `ravel` doesn't change the memory, `flatten` does. However, on the third example, the `ravel` of the transpose can't be expressed only by changing the metadata. As a result, `ravel` creates a new data buffer.