# NUMPY

## What is on the program
- Les différentes structures en Numpy
- Le shape (dimension)
- Les opérations de base et les méthodes

Exercises: https://github.com/rougier/numpy-100/blob/master/100_Numpy_exercises.ipynb

Solutions: https://numpy.org/numpy-tutorials/

Sources for this course:
- [NumPy docs](https://numpy.org/learn/)
- [NumPy Illustrated: The Visual Guide to NumPy](https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d)

In [1]:
import numpy as np

# Numpy Array vs. Python List


In [2]:
# ✅
# Pyhton list

a = [1, 2, 3]
[q*2 for q in a]

[2, 4, 6]

In [3]:
a = np.array([1, 2, 3])
a * 2

array([2, 4, 6])

In [4]:
# Python "2D" list

[
    [1,2,3],
    [4,5,6]
]

[[1, 2, 3], [4, 5, 6]]

In [5]:
# Numpy 2D array

np.array(
    [
        [1,2,3],
        [4,5,6]
    ]
)

array([[1, 2, 3],
       [4, 5, 6]])

## Numpy arrays

- more compact in RAM than python list
- Arrays have been conceived to be used as homogeneous: 
  - can only work fast with elements of one type
  - still possible to have heterogenous types in an array, but defy the pupose of using numpy
- Faster than lists: 
  - when the operation can be vectorized
- Slower than lists: 
  - when you append elements to the end

![numpy memory layout](https://miro.medium.com/v2/resize:fit:720/format:webp/1*D-I8hK4WXC8wtpR5tvR0fw.png)

## Numpy shape and dtype

In [6]:
# ✅
a_int = np.array(
    [ 1, 2, 3, 4, 5 ,6]
)

print(f"{a_int.shape = }")
print(f"{a_int.dtype = }")
a_int

a_int.shape = (6,)
a_int.dtype = dtype('int64')


array([1, 2, 3, 4, 5, 6])

In [7]:
a_2d_int = np.array(
    [
        [1, 2, 3],
        [4, 5 ,6],
    ]
)

print(f"{a_2d_int.shape = }")
print(f"{a_2d_int.dtype = }")
a_2d_int

a_2d_int.shape = (2, 3)
a_2d_int.dtype = dtype('int64')


array([[1, 2, 3],
       [4, 5, 6]])

In [8]:
a_float = np.array(
    [ 1.0, 2.0, 3.0, 4.0, 5.0 ,6.0]
)

print(f"{a_float.shape = }")
print(f"{a_float.dtype = }")
a_float

a_float.shape = (6,)
a_float.dtype = dtype('float64')


array([1., 2., 3., 4., 5., 6.])

## Creating arrays

### Methods for uniform values

In [9]:
a = np.zeros(5, int)
# a = np.zeros_like(a_2d_int)
a

array([0, 0, 0, 0, 0])

In [10]:
# a = np.ones(5, int)
a = np.ones_like(a_float)
a

array([1., 1., 1., 1., 1., 1.])

In [11]:
array_shape = (3,)
array_dtpye = int

np.empty(array_shape, array_dtpye)
# np.empty_like(a_float)

# Empty is faster than other methods, as the memory is not initialized at creation

array([1, 2, 3])

In [12]:
array_shape = (3,)
fill_value = 42.

# np.full(array_shape, fill_value)
np.full_like(a_2d_int, fill_value)

# There is priority on the array_like.dtype

array([[42, 42, 42],
       [42, 42, 42]])

### Methods for monotonic sequences

In [13]:
start = 5
stop = 8
num = 9

np.linspace(start, stop, num)

array([5.   , 5.375, 5.75 , 6.125, 6.5  , 6.875, 7.25 , 7.625, 8.   ])

In [14]:
start = 5
stop = 8
step = .3

np.arange(start, stop, step)

array([5. , 5.3, 5.6, 5.9, 6.2, 6.5, 6.8, 7.1, 7.4, 7.7])

In [15]:
# Type sensitive
start = 5.
stop = 8.
step = .3

np.arange(start, stop, step)

array([5. , 5.3, 5.6, 5.9, 6.2, 6.5, 6.8, 7.1, 7.4, 7.7])

In [16]:
# CAREFULL WITH ARANGE & floats

a_expected = np.arange(0.4, 0.8, 0.1)
print(f"{a_expected = }")
# 0.8 is not included, as expected


a_anomaly = np.arange(0.5, 0.8, 0.1)
print(f"{a_anomaly = }")
# 0.8 is included, contrary to what's expected. 
# It's because of the way floats are handled by computers (not perfect precision)

a_expected = array([0.4, 0.5, 0.6, 0.7])
a_anomaly = array([0.5, 0.6, 0.7, 0.8])


## Manipulating shapes

### Reshape
![](https://numpy.org/devdocs/_images/np_reshape.png)

In [17]:
a = np.arange(12)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [18]:
a.reshape((3, 4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [19]:
# We can use `-1` once for numpy to deduce the size of the dimension
a.reshape((2, -1))

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

In [20]:
a = np.ones((2, 3))
a

array([[1., 1., 1.],
       [1., 1., 1.]])

In [21]:
b = np.zeros((2, 3))
b

array([[0., 0., 0.],
       [0., 0., 0.]])

In [22]:
np.concatenate((a, b), axis=0)

array([[1., 1., 1.],
       [1., 1., 1.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [23]:
np.concatenate((a, b), axis=1)

array([[1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.]])

## Random

In [24]:
import random

# Python way
random.randint(0, 10)       # sample in [0, 10]

# NumPy way
np.random.randint(0, 10)    # sample in [0, 10)

5

In [25]:
np.random.rand(3)           # sample uniformly in [0, 1)

array([0.69539249, 0.13275387, 0.70164942])

In [26]:
low = 2
high = 5
shape = (3,)

np.random.uniform(          # sample uniformly in [low, high)
    low, 
    high, 
    shape
)

array([4.15280253, 3.88562437, 2.21349074])

### Sample in normal distribution

![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Standard_deviation_diagram_micro.svg/1024px-Standard_deviation_diagram_micro.svg.png)


![](https://www.gstatic.com/education/formulas2/553212783/en/normal_distribution.svg)

In [27]:
shape = (3,)

np.random.randn(          # sample from normal distribution with μ = 1 and σ = 1
    *shape
)

array([ 0.02617697, -0.16640602,  0.47099794])

In [28]:
μ = 10
σ = 3
shape = (3,)

np.random.normal(          # sample from normal distribution with μ and σ
    μ, 
    σ, 
    shape
)

array([ 8.82743052, 11.91198652,  8.1976463 ])

## Indexing

In [29]:
a = np.arange(0, 6)
a

array([0, 1, 2, 3, 4, 5])

In [30]:
a[1]

1

In [31]:
a[-1]

5

In [32]:
# Slices like python lists

a[2:4]

array([2, 3])

In [33]:
a[-2:]

array([4, 5])

In [34]:
a[::2]

array([0, 2, 4])

### Multiple dimensions indexing

In [35]:
a_2d = np.arange(12).reshape((3, -1))
a_2d

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [36]:
# We separate dimensions by comma, 
#   and we can apply our previous indexing methods
a_2d[: , 2]

array([ 2,  6, 10])

#### RGB to BGR

In [37]:
a = np.arange(24).reshape((2, 4, 3))
a[:,:,0] = 0
a[:,:,1] = 1
a[:,:,2] = 2
a

array([[[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]],

       [[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]]])

In [38]:
a = a[:,:,[2,1,0]]
a

array([[[2, 1, 0],
        [2, 1, 0],
        [2, 1, 0],
        [2, 1, 0]],

       [[2, 1, 0],
        [2, 1, 0],
        [2, 1, 0],
        [2, 1, 0]]])

### Ellipsis

In [39]:
a = np.arange(120).reshape(2, 3, 4, 5)
a

array([[[[  0,   1,   2,   3,   4],
         [  5,   6,   7,   8,   9],
         [ 10,  11,  12,  13,  14],
         [ 15,  16,  17,  18,  19]],

        [[ 20,  21,  22,  23,  24],
         [ 25,  26,  27,  28,  29],
         [ 30,  31,  32,  33,  34],
         [ 35,  36,  37,  38,  39]],

        [[ 40,  41,  42,  43,  44],
         [ 45,  46,  47,  48,  49],
         [ 50,  51,  52,  53,  54],
         [ 55,  56,  57,  58,  59]]],


       [[[ 60,  61,  62,  63,  64],
         [ 65,  66,  67,  68,  69],
         [ 70,  71,  72,  73,  74],
         [ 75,  76,  77,  78,  79]],

        [[ 80,  81,  82,  83,  84],
         [ 85,  86,  87,  88,  89],
         [ 90,  91,  92,  93,  94],
         [ 95,  96,  97,  98,  99]],

        [[100, 101, 102, 103, 104],
         [105, 106, 107, 108, 109],
         [110, 111, 112, 113, 114],
         [115, 116, 117, 118, 119]]]])

In [40]:
a[:, :, :, 0]

array([[[  0,   5,  10,  15],
        [ 20,  25,  30,  35],
        [ 40,  45,  50,  55]],

       [[ 60,  65,  70,  75],
        [ 80,  85,  90,  95],
        [100, 105, 110, 115]]])

In [41]:
a[..., 0]

array([[[  0,   5,  10,  15],
        [ 20,  25,  30,  35],
        [ 40,  45,  50,  55]],

       [[ 60,  65,  70,  75],
        [ 80,  85,  90,  95],
        [100, 105, 110, 115]]])

In [42]:
a[0, :, :, 0]

array([[ 0,  5, 10, 15],
       [20, 25, 30, 35],
       [40, 45, 50, 55]])

In [43]:
a[0, ..., 0]

array([[ 0,  5, 10, 15],
       [20, 25, 30, 35],
       [40, 45, 50, 55]])

## Copy

### Python list
```py
a = [1, 2, 3]
b = a           # Do not copy
b = a[:]        # Copy
b = a.copy()    # Copy
```

### NumPy array
```py
a = np.array([1, 2, 3])
b = a           # Do not copy
b = a[:]        # Do not copy /!\
b = a.copy()    # Copy
```

In [44]:
a = np.arange(0, 6)
a

array([0, 1, 2, 3, 4, 5])

### Fancy indexing:

In [45]:
# Fancy indexing: 
#   indexing with an array of the desired indices

a[[1, 2, -1]]

array([1, 2, 5])

### Boolean indexing:

In [46]:
# Boolean indexing

a = np.arange(0, 10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### Fancy indexing:

In [47]:
a[ a > 2 ]

array([3, 4, 5, 6, 7, 8, 9])

In [48]:
a[ ~(a == 2) ]

array([0, 1, 3, 4, 5, 6, 7, 8, 9])

In [49]:
a[ (a > 2) & (a < 7) ]

array([3, 4, 5, 6])

In [50]:
a[ (a < 2) | (a > 7) ]

array([0, 1, 8, 9])

In [51]:
a[ (a < 2) | (a > 7) ]

array([0, 1, 8, 9])

In [52]:
# /!\ 
#   The () are not optional 
#   when there is multiple conditions

# a[ a < 2 | a > 7 ]

In [53]:
value_if_true = 12
value_if_false = -3

np.where(
    a < 4, 
    value_if_true,
    value_if_false
)

array([12, 12, 12, 12, -3, -3, -3, -3, -3, -3])

## Operations

In [54]:
a = np.full((4,), 10, dtype=float)
a

array([10., 10., 10., 10.])

In [55]:
b = np.full_like(a, 2)
b

array([2., 2., 2., 2.])

### Basic operations


In [56]:
a + b

array([12., 12., 12., 12.])

In [57]:
a * b

array([20., 20., 20., 20.])

In [58]:
a / b

array([5., 5., 5., 5.])

In [59]:
a // b

array([5., 5., 5., 5.])

### Broadcasting

In [60]:
a = np.arange(1, 4)
a

array([1, 2, 3])

In [61]:
a * 2

array([2, 4, 6])


![](https://numpy.org/doc/stable/_images/broadcasting_1.png)

In [62]:
a + 1

array([2, 3, 4])

In [63]:
a / 2

array([0.5, 1. , 1.5])

In [64]:
a // 2

array([0, 1, 1])

### Advanced operations

In [65]:
a ** 2

array([1, 4, 9])

In [66]:
np.sqrt(a)

array([1.        , 1.41421356, 1.73205081])

In [67]:
np.exp(a)

array([ 2.71828183,  7.3890561 , 20.08553692])

In [68]:
np.log(a)

array([0.        , 0.69314718, 1.09861229])

#### Dot product

![](https://d138zd1ktt9iqe.cloudfront.net/media/seo_landing_files/matrix-representation-of-dot-product-1626103121.png)

In [69]:
a = np.arange(1, 11)
b = np.arange(4, 14)

np.dot(a, b)

550

In [70]:
a = np.arange(0, 3, .7)
a

array([0. , 0.7, 1.4, 2.1, 2.8])

In [71]:
np.floor(a)     # Rounds to lower integer

array([0., 0., 1., 2., 2.])

In [72]:
np.ceil(a)      # Rounds to upper integer

array([0., 1., 2., 3., 3.])

In [73]:
np.round(a)     # Rounds to nearest integer

array([0., 1., 1., 2., 3.])

In [74]:
# Matrix multiplication

a_2d = np.arange(12).reshape(3, -1)
b_2d = np.arange(12).reshape(-1, 3)

print(f"{a_2d.shape = }")
print(f"{b_2d.shape = }")

np.matmul(      # is equivalent to `a_2d @ b_2d``
    a_2d,
    b_2d
)

a_2d.shape = (3, 4)
b_2d.shape = (4, 3)


array([[ 42,  48,  54],
       [114, 136, 158],
       [186, 224, 262]])

## Statistics

In [75]:
a = np.arange(12, dtype=float)
a

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])

In [76]:
a.max()

11.0

In [77]:
a.min()

0.0

In [78]:
a.argmax()

11

In [79]:
a.argmin()

0

### Sum

$$
\sum_{i=1}^n x_i
$$

In [80]:
a.sum()

66.0

### Mean

$$
\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i
$$

In [81]:
a.mean()

5.5

### Standard deviation

$$
\sigma = \sqrt{\frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})^2}
$$

In [82]:
a.std()

3.452052529534663

### Variance

$$
\sigma^2 = \frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})^2
$$

In [83]:
a.var()

11.916666666666666

### Axis argument
![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*jmXqsVUNaBaUsBAkHgqb3A.png)

In [84]:
a_3d = np.arange(24).reshape(2, 3, 4)
a_3d

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [85]:
a_3d.sum()

276

In [86]:
a_3d.sum(axis=0)

array([[12, 14, 16, 18],
       [20, 22, 24, 26],
       [28, 30, 32, 34]])

In [87]:
a_3d.sum(axis=1)


array([[12, 15, 18, 21],
       [48, 51, 54, 57]])

In [88]:
a_3d.sum(axis=2)

array([[ 6, 22, 38],
       [54, 70, 86]])

## Numpy special values

In [89]:
np.NaN

nan

In [90]:
np.NaN == np.NaN

False

In [91]:
np.inf

inf

In [92]:
np.inf * -1

-inf

###  It's best to avoid rank one array 

Rank one arrays are arrays with (n,) shapes 

#### Problematic example

In [93]:
a = np.random.randn(5)
a

array([ 0.00534551,  0.95111409,  0.32562704, -0.23681591,  0.71547581])

In [94]:
a.shape

(5,)

In [95]:
# Tranpose gives back the same array on rank one error, 
#   this is source error
a.T

array([ 0.00534551,  0.95111409,  0.32562704, -0.23681591,  0.71547581])

In [96]:
np.dot(a, a.T)

1.5786669694180417

#### Expected behavior

In [97]:
a_2d = a.copy().reshape((5, 1))
a_2d

array([[ 0.00534551],
       [ 0.95111409],
       [ 0.32562704],
       [-0.23681591],
       [ 0.71547581]])

In [98]:
a_2d.shape

(5, 1)

In [99]:
a_2d.T

array([[ 0.00534551,  0.95111409,  0.32562704, -0.23681591,  0.71547581]])

In [100]:
np.dot(a_2d, a_2d.T)

array([[ 2.85744377e-05,  5.08418640e-03,  1.74064140e-03,
        -1.26590094e-03,  3.82458044e-03],
       [ 5.08418640e-03,  9.04618022e-01,  3.09708468e-01,
        -2.25238950e-01,  6.80499123e-01],
       [ 1.74064140e-03,  3.09708468e-01,  1.06032970e-01,
        -7.71136640e-02,  2.32978269e-01],
       [-1.26590094e-03, -2.25238950e-01, -7.71136640e-02,
         5.60817752e-02, -1.69436054e-01],
       [ 3.82458044e-03,  6.80499123e-01,  2.32978269e-01,
        -1.69436054e-01,  5.11905629e-01]])

#### Solution

In [101]:
def add_dimension_to_rank_one_array(a: np.ndarray):
    if len(a.shape) == 1:       # Check if rank one array
        a = a.reshape((1, -1))  # Apply reshape
    return a

add_dimension_to_rank_one_array(a).shape

(1, 5)