___

<a href='https://oxiane-institut.com/'> <img src='../oxiane.jpg' /></a>
___

# NumPy

NumPy (or Numpy) is a linear algebra library for Python. The reason why it's so important for data science with Python is that almost all libraries in the PyData ecosystem rely on NumPy as one of their main building blocks.

Exercises: https://github.com/rougier/numpy-100/blob/master/100_Numpy_exercises.ipynb

Solutions: https://numpy.org/numpy-tutorials/

Sources for this course:
- [NumPy docs](https://numpy.org/learn/)
- [NumPy Illustrated: The Visual Guide to NumPy](https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d)

In [2]:
import numpy as np

# Numpy Array vs. Python List


In [3]:
# Pyhton list

my_array = [1, 2, 3]
[q * 2 for q in my_array]

[2, 4, 6]

In [4]:
np.array(my_array)

array([1, 2, 3])

In [5]:
a = np.array([1, 2, 3])
a * 2

array([2, 4, 6])

In [6]:
# Python "2D" list

[
    [1,2,3],
    [4,5,6]
]

[[1, 2, 3], [4, 5, 6]]

In [7]:
# Numpy 2D array

np.array(
    [
        [1,2,3],
        [4,5,6]
    ]
)

array([[1, 2, 3],
       [4, 5, 6]])

## Numpy arrays

NumPy is also incredibly fast, as it has bindings to C libraries. 


- **more compact** in RAM than python list
- Arrays have been conceived to be used with **homogeneous type**: 
  - can only work fast with elements of one type
  - still possible to have heterogenous types in an array, but defy the pupose of using numpy

![numpy memory layout](https://miro.medium.com/v2/resize:fit:720/format:webp/1*D-I8hK4WXC8wtpR5tvR0fw.png)


	Some cool calculations on RAM usage of numpy arrays vs lists: 
	http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists

In consequence, numpy is:
- Faster than lists: 
  - when the operation can be vectorized
- Slower than lists: 
  - when you append elements to the end




## Numpy shape and dtype

In [8]:
a_int = np.array(
    [ 1, 2, 3, 4, 5 ,6]
)

print(f"{a_int.shape = }")
print(f"{a_int.dtype = }")
a_int

a_int.shape = (6,)
a_int.dtype = dtype('int64')


array([1, 2, 3, 4, 5, 6])

In [9]:
a_2d_int = np.array(
    [
        [1, 2, 3],
        [4, 5 ,6],
    ]
)

print(f"{a_2d_int.shape = }")
print(f"{a_2d_int.dtype = }")
a_2d_int

a_2d_int.shape = (2, 3)
a_2d_int.dtype = dtype('int64')


array([[1, 2, 3],
       [4, 5, 6]])

In [10]:
a_float = np.array(
    [ 1.0, 2.0, 3.0, 4.0, 5.0 ,6.0]
)

print(f"{a_float.shape = }")
print(f"{a_float.dtype = }")
a_float

a_float.shape = (6,)
a_float.dtype = dtype('float64')


array([1., 2., 3., 4., 5., 6.])

## Creating arrays

### Methods for uniform values

In [11]:
a = np.zeros(5, int)
# a = np.zeros_like(a_2d_int)
a

array([0, 0, 0, 0, 0])

In [12]:
# a = np.ones(5, int)
a = np.ones_like(a_float)
a

array([1., 1., 1., 1., 1., 1.])

In [13]:
array_shape = (3,)
array_dtpye = int

np.empty(array_shape, array_dtpye)
# np.empty_like(a_float)

# Empty is faster than other methods, as the memory is not initialized at creation

array([1, 2, 3])

In [14]:
array_shape = (3,)
fill_value = 42.

# np.full(array_shape, fill_value)
np.full_like(a_2d_int, fill_value)

# There is priority on the array_like.dtype

array([[42, 42, 42],
       [42, 42, 42]])

In [15]:
# Identity matrix

np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

### Methods for monotonic sequences

In [16]:
start = 5
stop = 8
num = 9

np.linspace(start, stop, num)

array([5.   , 5.375, 5.75 , 6.125, 6.5  , 6.875, 7.25 , 7.625, 8.   ])

In [17]:
start = 5
stop = 8
step = .3

np.arange(start, stop, step)

array([5. , 5.3, 5.6, 5.9, 6.2, 6.5, 6.8, 7.1, 7.4, 7.7])

In [18]:
# Type sensitive
start = 5.
stop = 8.
step = .3

np.arange(start, stop, step)

array([5. , 5.3, 5.6, 5.9, 6.2, 6.5, 6.8, 7.1, 7.4, 7.7])

In [19]:
# CAREFULL WITH ARANGE & floats

a_expected = np.arange(0.4, 0.8, 0.1)
print(f"{a_expected = }")
# 0.8 is not included, as expected


a_anomaly = np.arange(0.5, 0.8, 0.1)
print(f"{a_anomaly = }")
# 0.8 is included, contrary to what's expected. 
# It's because of the way floats are handled by computers (not perfect precision)

a_expected = array([0.4, 0.5, 0.6, 0.7])
a_anomaly = array([0.5, 0.6, 0.7, 0.8])


## Manipulating shapes

### Reshape
![](https://numpy.org/devdocs/_images/np_reshape.png)

In [20]:
a = np.arange(12)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [21]:
a.reshape((3, 4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [22]:
# We can use `-1` once for numpy to deduce the size of the dimension
a.reshape((2, -1))

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

In [23]:
a = np.ones((2, 3))
a

array([[1., 1., 1.],
       [1., 1., 1.]])

In [24]:
b = np.zeros((2, 3))
b

array([[0., 0., 0.],
       [0., 0., 0.]])

In [25]:
np.concatenate((a, b), axis=0)

array([[1., 1., 1.],
       [1., 1., 1.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [26]:
np.concatenate((a, b), axis=1)

array([[1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.]])

## Random

In [27]:
import random

# Python way
random.randint(0, 10)       # sample in [0, 10]

# NumPy way
np.random.randint(0, 10)    # sample in [0, 10)

9

In [28]:
np.random.rand(3)           # sample uniformly in [0, 1)

array([0.38207255, 0.94191638, 0.98178386])

In [29]:
low = 2
high = 5
shape = (3,)

np.random.uniform(          # sample uniformly in [low, high)
    low, 
    high, 
    shape
)

array([3.50144317, 3.02286775, 2.34847656])

### Sample in normal distribution

![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Standard_deviation_diagram_micro.svg/1024px-Standard_deviation_diagram_micro.svg.png)


![](https://www.gstatic.com/education/formulas2/553212783/en/normal_distribution.svg)

In [30]:
shape = (3,)

np.random.randn(          # sample from normal distribution with μ = 1 and σ = 1
    *shape
)

array([-0.40888919, -0.53810345,  1.18696572])

In [31]:
μ = 10
σ = 3
shape = (3,)

np.random.normal(          # sample from normal distribution with μ and σ
    μ, 
    σ, 
    shape
)

array([12.63091071, 14.08926086, 12.87408545])