# Introduction to numpy

The code from this notebook is largely inspired by [this tutorial from Stanford University's CS231n course](https://cs231n.github.io/python-numpy-tutorial) about Convolutional Neural Networks for Visual Recognition and [Dataquest's "Data Analyst in Python course"](www.dataquest.io%2Fpath%2Fdata-analyst%2F).


## Doing math with lists
Last week we discussed lists which are a way to store an **ordered collection of objects**, they are however not designed to perform scientific computing in Python. Try for instance the following code: 

In [1]:
my_list = [1., 4., 2.5]
my_list * 3

[1.0, 4.0, 2.5, 1.0, 4.0, 2.5, 1.0, 4.0, 2.5]

In [5]:
my_list = [1., 4., 2.5]
my_list + 1.

TypeError: can only concatenate list (not "float") to list

In [4]:
my_list1 = [1., 4.,  2.5]
my_list2 = [5., 3.1, 6. ]
my_list1 + my_list2

[1.0, 4.0, 2.5, 5.0, 3.1, 6.0]

**Exercise**:
Write the code allowing to:
- Multiply each element of a list by a scalar
- Add a scalar to each element of a list
- Add the elements of two lists of same length

In [8]:
def multiply_list_by_scalar(my_list, scalar):
    "Multiply each element of my_list by scalar"
    new_list = []
    for e in my_list:
        new_list.append(e * scalar)
    
    return new_list

multiply_list_by_scalar([1., 4., 2.5], 3)

[3.0, 12.0, 7.5]

In [9]:
def add_scalar_to_list(my_list, scalar):
    "Add scalar to each element of my_list"
    new_list = []
    for e in my_list:
        new_list.append(e + scalar)    

    return new_list

add_scalar_to_list([1., 4., 2.5], 1.)

[2.0, 5.0, 3.5]

In [11]:
# Bonus here if you use the "zip" function
def add_elements_of_two_lists(my_list1, my_list2):
    "Return a list containing the element-wise sum of two input lists"
    output = []
    for e1, e2 in zip(my_list1, my_list2):
        output.append(e1 + e2)
    return output

add_elements_of_two_lists([1., 4.,  2.5], [5., 3.1, 6. ])

[6.0, 7.1, 8.5]

## Basic operations using Numpy arrays
[Numpy](https://numpy.org/) is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The shape of an array is a tuple of integers giving the size of the array along each dimension.

### Initializing an array from lists 
We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [13]:
import numpy as np  # np is a common alias for numpy

a = np.array([1, 2, 3])   # Create a 1-dimensional array
a

array([1, 2, 3])

In [14]:
print(type(a))  # This will display <class 'numpy.ndarray'>

<class 'numpy.ndarray'>


In [15]:
b = np.array([[2., 3.], [0, 1]])  # Create a 2-dimensional array
b

array([[2., 3.],
       [0., 1.]])

In [16]:
# Get the shape of an array
print(a.shape)
print(b.shape)

(3,)
(2, 2)


Different datatypes (integers, float, boolean...) can be stored in a numpy array, see [numpy's documentation](https://numpy.org/doc/stable/reference/arrays.dtypes.html) for an extensive list.

In [19]:
print(a)
print(b)

# Get the data type of an array
print(a.dtype)
print(b.dtype)

[1 2 3]
[[2. 3.]
 [0. 1.]]
int64
float64


**Warning** The datatype and the axis sizes must be consistent when initializing a numpy array: 

In [20]:
np.array([[0, 1, 2], [0, 1]])

  """Entry point for launching an IPython kernel.


array([list([0, 1, 2]), list([0, 1])], dtype=object)

In [21]:
np.array([1.5, 1])  # Everything is converted to float

array([1.5, 1. ])

In [22]:
np.array(["a", False, 1.5, 1])  # Everything is converted to string

array(['a', 'False', '1.5', '1'], dtype='<U5')

### Alternatives for creating an array

In [26]:
# TODO: Create a 2x3 array filled with 0. 
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [27]:
# TODO: Create a 2x3 array filled with 1. 
np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [28]:
# TODO: Create an array with integers from 0 to 9
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [29]:
# Create a 3x3x3 array filled with random values between 0 and 1
np.random.random((3, 3, 3))

array([[[0.21807181, 0.38960798, 0.595758  ],
        [0.06635626, 0.23749029, 0.13262771],
        [0.21919289, 0.6529043 , 0.47909137]],

       [[0.98303586, 0.29723003, 0.67325219],
        [0.37803114, 0.322364  , 0.50350814],
        [0.45476925, 0.77685381, 0.41565801]],

       [[0.09090066, 0.37547742, 0.14981935],
        [0.02162273, 0.62759267, 0.6279036 ],
        [0.13925423, 0.28509275, 0.29973595]]])

In [31]:
# TODO: Create a 2x3 array filled with False 
np.zeros((2, 3), dtype=bool)

array([[False, False, False],
       [False, False, False]])

In [32]:
# TODO: Create a 2x3 array filled with True 
np.ones((2, 3), dtype=bool)

array([[ True,  True,  True],
       [ True,  True,  True]])

And more (np.eye, np.full, np.linspace...)

In [33]:
# 2x2 Identity matrix
np.eye(2)

array([[1., 0.],
       [0., 1.]])

In [34]:
# 3x3 array filled with 5
np.full((3,3), 5)

array([[5, 5, 5],
       [5, 5, 5],
       [5, 5, 5]])

In [35]:
# Array of 5 elements linearly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

### Accessing elements of a numpy array
Numpy offers several ways to index into arrays.

#### Slicing
Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [36]:
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2
b = a[:2, 1:3]
b

array([[2, 3],
       [6, 7]])

In [37]:
# You can also directly modify a value stored in an array as follows:
a[1, 1] = 42
a

array([[ 1,  2,  3,  4],
       [ 5, 42,  7,  8],
       [ 9, 10, 11, 12]])

#### Mixing integer indexing with slice indexing
You can also mix integer indexing with slice indexing.

**WARNING** Doing so will yield an array of lower rank than the original array !

In [38]:
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])

# Grab column with index 1 from a:
a[:, 1]

array([ 2,  6, 10])

In [39]:
# Note that the resulting array is 1-dimensional
a[:, 1].shape

(3,)

**Exercise**: Select second value of the first and last rows of `a` to obtain a one dimensional array:


In [44]:
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])
# TODO, should return array([ 2, 10])
a[(0, 2), 1]

array([ 2, 10])

#### Boolean array indexing
Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition.

In [46]:
a = np.arange(5)
a 

array([0, 1, 2, 3, 4])

In [47]:
bool_idx = np.array([False, True, True, False, False])
a[bool_idx]

array([1, 2])

### Dealing with arrays' shape

In [48]:
# Reminder: you can access an array's shape as follows:
a = np.zeros((2, 6))
a.shape

(2, 6)

#### Reshaping
You can reshape an array as follows:

In [49]:
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])
a.reshape(2, 6)

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

Or alternatively:

In [50]:
np.reshape(a, (2, 6))  # ! Note that we used the tuple (2, 6) here

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [52]:
# TODO: Combine np.arange and the reshape method to generate an array with content:
# [[1,  2,  3,  4],
#  [5,  6,  7,  8],
#  [9, 10, 11, 12]]
np.arange(1, 13).reshape(3, 4)

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

### Main operations using one numpy array
#### Array math
Mathematical operations (+, -, *, /, **...) between a numpy array and a scalar are applied to each of their elements:

In [53]:
a = np.arange(12)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [54]:
a + 2.5  # Add 2.5 to each element

array([ 2.5,  3.5,  4.5,  5.5,  6.5,  7.5,  8.5,  9.5, 10.5, 11.5, 12.5,
       13.5])

In [55]:
a * 5  # Multiply each element by 5

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55])

In [56]:
a ** 3  # each of the elements to the power 3

array([   0,    1,    8,   27,   64,  125,  216,  343,  512,  729, 1000,
       1331])

#### Boolean operations
Numpy arrays support the classical boolean operations (==, !=, <, <=, >, >=) that are here applied to each element:

In [57]:
a = np.arange(8)
a

array([0, 1, 2, 3, 4, 5, 6, 7])

In [58]:
a > 3

array([False, False, False, False,  True,  True,  True,  True])

**Exercise**:

Combine array math, boolean operations and Boolean array indexing to select the values of `a` whose square is bigger than 40 **without using any loop**:

In [62]:
a = np.arange(-10, 10, 2)
# TODO: should return array([-10,  -8,   8])
a[a ** 2 > 40]

array([-10,  -8,   8])

#### Other useful functions
Numpy provides many useful functions for performing computations on arrays; one of the most useful is `sum` (many others such as `mean`, `diff` or `count` are also availble, see [numpy's documentation](https://numpy.org/doc/stable/reference/routines.math.html) for a full list):

In [63]:
a = np.array([[1.5, 2.],
              [2.1, 5.]])
a

array([[1.5, 2. ],
       [2.1, 5. ]])

In [64]:
# Sum all elements of a
np.sum(a)

10.6

In [68]:
# Sum the elements of a along an axis
np.sum(a, axis=0)

array([3.6, 7. ])

In [None]:
# sum can also be called as follows
a.sum()

### Main operations using two (or more) numpy arrays
#### Mathematical operators on two arrays
Basic element-wise mathematical operations (+, -, /, **...) between two arrays are also available as follows:

In [74]:
a = np.array([1., 1.5, 2.])
b = np.array([2., 1.5, 0.])
a + b

array([3., 3., 2.])

Equivalently, you can call the corresponding numpy function like for instance:

In [75]:
np.add(a, b)

array([3., 3., 2.])

**WARNING** the `*` operator between two arrays corresponds to element-wise multplication, not matrix multiplication (you can use `numpy.dot` for that).

In [78]:
a * b

array([2.  , 2.25, 0.  ])

Many more operations are available in [numpy's documentation](https://numpy.org/doc/stable/reference/routines.math.html)

**Exercise:**
Write a function that computes the euclidean norm of an array along an axis:

In [85]:
def norm(arr, axis=None):
    """Return the euclidean norm of an array along an axis.
    
    By default (axis=None), return the euclidean norm using all values.
    """
    sum_of_squares = np.sum(arr ** 2, axis=axis)
    return np.sqrt(sum_of_squares)


a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])
print(norm(a, axis=1))
print(norm(a))

[ 5.47722558 13.19090596 21.11871208]
25.495097567963924


#### Boolean operations
Boolean operations between two arrays can also be performed as follows (more operations can be found in the [documentation](https://numpy.org/doc/stable/reference/routines.logic.html)):

In [86]:
np.arange(12) > 3

array([False, False, False, False,  True,  True,  True,  True,  True,
        True,  True,  True])

In [87]:
a = np.array([True, False, True, False])
b = np.array([False, True, True, False])

# a AND b
a & b  # equivalent to np.logical_and(a, b)

array([False, False,  True, False])

In [88]:
# a OR b
a | b  # equivalent to np.logical_or(a, b)

array([ True,  True,  True, False])

In [89]:
# not a
~a  # np.logical_not(a)

array([False,  True, False,  True])

You can find the indexes for which an array of boolean is True by using `np.where` (note the format of the output):

In [90]:
a = np.array([True, False, True, False])
# TODO
np.where(a)

(array([0, 2]),)

**Exercise** Find the indexes where the following array's values switch from False to True **without using any loop**:

In [94]:
a = np.array([False, True, False, True, True, True, False, False, True, False, True])
# TODO: should print out array([ 1,  3,  8, 10])
np.where(~a[:-1] & a[1:])[0] + 1

array([ 1,  3,  8, 10])

In [None]:
value_is_false = ~a[:-1]
following_value_is_true = a[1:]
# True when the value is false and the following one is True
going_up = value_is_false & following_value_is_true


In [96]:
a = np.array([True, False, True, False])
~a

array([False,  True, False,  True])

#### Broadcasting
Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes.

The [rules for broadcasting](https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html) are:

- **Rule 1**: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
- **Rule 2**: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
- **Rule 3**: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.


In [None]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[ 1,  2,  3],
              [ 4,  5,  6],
              [ 7,  8,  9],
              [10, 11, 12]])

v = np.array([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting
print(y)

### A few tricks that could make your life easier
#### You will never beat numpy's performance by coding a function yourself
If you can avoid a loop by using a numpy function, by all means DO IT.

In [97]:
%%timeit
n = 10000
scalar = 3.2

l = list(range(n))
for i, e in enumerate(l):
    l[i] = e * scalar

2.44 ms ± 346 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [100]:
%%timeit
n = 10000
scalar = 3.2
l = np.arange(n)
l = scalar * l

40 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


#### Tuples as arguments for specifying shape
Many numpy functions require a tuple to specify the shape of the output, most of the time a `TypeError` will be raised if sevaral integers are directly provided: 

In [103]:
# TODO: correct the following code to get a 3x2 array
np.zeros(3, 2)

TypeError: Cannot interpret '2' as a data type

#### Using -1 and reshape
Using -1 for one dimension allows to directly get the correct shape if the others are specified: 

In [104]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [105]:
# TODO: reshape a so that it has 2 rows
a.reshape(2, -1)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

#### Using -1 and reshape or np.newaxis to add a new dimension
Sometimes, you will need to artificially add a dimension to an array for instance so that the shape of two arrays match for a particular operation, this can be done as follows:

In [106]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [107]:
# Reshape a as a 10x1 array
a.reshape(-1, 1)

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])

In [108]:
# Idem using np.newaxis
a[:, np.newaxis]

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])

#### Transposing an array
You can transpose an array by using the `numpy.transpose` function or by calling `array.T`:

In [109]:
a = np.arange(10).reshape(2, 5)
a

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [110]:
a.T  # Equivalent to np.transpose(a)

array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])

#### A slice is a view into the same data
**WARNING** A slice of an array is a view into the same data, so modifying it will modify the original array:

In [111]:
a = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])

b = a[:2, 1:3]
# b is now: 
# array([[2, 3],
#        [6, 7]])

a[1, 1] = 42


# b is a view of the same data as the ones in a,
# modifying a also modified b
b 

array([[ 2,  3],
       [42,  7]])

copy numpy

### Exercises
Your are not allowed to use loops !

Swap rows 1 and 2 in the array `arr`

In [114]:
arr = np.arange(9).reshape(3,3)
# TODO
arr[::-1]

array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])

Write a function allowing to rescale the values of an array:

In [116]:
def rescale_array(arr, new_min=0., new_max=1.):
    "Rescale an array's values between new_min and new_max."
    arr = (arr - arr.min()) / (arr.max() - arr.min())
    arr = arr * (new_max - new_min) + new_min
    return arr

arr = np.arange(9).reshape(3,3)
rescale_array(arr, new_min=-1., new_max=1.)
# Should print:
# array([[-1.  , -0.75, -0.5 ],
#        [-0.25,  0.  ,  0.25],
#        [ 0.5 ,  0.75,  1.  ]])

array([[-1.  , -0.75, -0.5 ],
       [-0.25,  0.  ,  0.25],
       [ 0.5 ,  0.75,  1.  ]])

Write a function that replace missing values by a constant:

In [118]:
arr = np.array([0., 1.5, .75, np.nan, 0., np.nan])
arr[np.isnan(arr)]

array([False, False, False,  True, False,  True])

In [120]:
def replace_nan_by_value(arr, new_value=0.):
    "Replace the nan values in an array by a new value."
    arr[np.isnan(arr)] = new_value
    return arr

arr = np.array([0., 1.5, .75, np.nan, 0., np.nan])
replace_nan_by_value(arr, new_value = 42)

array([ 0.  ,  1.5 ,  0.75, 42.  ,  0.  , 42.  ])

Write a function that return the count of unique values in a numpy array:

In [122]:
def count_unique(arr):
    "Return the number of unique values in an array"
    return len(np.unique(arr))


arr = np.array([0, 1, 2, 2, 2, 3, 1, 3, 0]).reshape(3, -1)
count_unique(arr)

4