# Introduction to NumPy

This material is inspired from different sources:

* https://github.com/SciTools/courses
* https://github.com/paris-saclay-cds/python-workshop/blob/master/Day_1_Scientific_Python/01-numpy-introduction.ipynb

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## 1. Create numpy array

So we can easily create a NumPy array from scract using the function `np.array`.

In [2]:
np.array([0, 1, 2, 3])

array([0, 1, 2, 3])

Sometimes, we want our array to be in particular way: only zeros (`np.zeros`), only ones (`np.ones`), equally spaced (`np.linspace`) or logarithmic spaced (`np.logspace`), etc.

### Exercise

Try out some of these ways of creating NumPy arrays. See if you can:

* create a NumPy array from a list of integer numbers. Use the function [`np.array()`](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html) and pass the Python list. You can refer to the example from the documentation.

In [7]:
# %load solutions/01_solutions.py
x = np.array([1, 2, 3.0, 4, 5])

In [8]:
x.dtype

dtype('float64')

While checking the documentation of [np.array](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html) an interesting parameter to pay attention is ``dtype``. This parameter can force the data type inside the array.

In [None]:
arr.dtype

* create a 3-dimensional NumPy array filled with all zeros or ones numbers. You can check the documentation of [np.zeros](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.zeros.html) and [np.ones](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html).

In [9]:
# %load solutions/03_solutions.py
x = np.ones(shape=(2, 3, 4))

In [10]:
x

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

* a NumPy array filled with a constant value -- not 0 or 1. (Hint: this can be achieved using the last array you created, or you could use [np.empty](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.empty.html) and find a way of filling the array with a constant value),

In [11]:
# %load solutions/04_solutions.py
np.ones(shape=(3, 2)) * 5

array([[5., 5.],
       [5., 5.],
       [5., 5.]])

In [13]:
x = np.empty(shape=(3, 2))
for i in range(x.shape[0]):
    for j in range(x.shape[1]):
        x[i, j] = 5

In [15]:
x.shape

(3, 2)

* a NumPy array of 8 elements with a range of values starting from 0 and a spacing of 3 between each element (Hint: check the function [np.arange](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.arange.html)), and

In [17]:
# %load solutions/05_solutions.py
np.arange(0, 3 * 8, 3)
list(range(0, 3 * 8, 3))

[0, 3, 6, 9, 12, 15, 18, 21]

## 2. Manipulating NumPy array

### 2.1 Indexing

Note that the NumPy arrays are zero-indexed:

In [18]:
data = np.random.randn(10000, 5)

In [19]:
data[0, 0]

-0.8331115872639144

It means that that the third element in the first row has an index of [0, 2]:

In [20]:
data[0, 2]

-0.28041732235097483

We can also assign the element with a new value:

In [21]:
data[0, 2] = 100.
print(data[0, 2])

100.0


NumPy (and Python in general) checks the bounds of the array:

In [22]:
print(data.shape)
data[60, 10]

(10000, 5)


IndexError: index 10 is out of bounds for axis 1 with size 5

Finally, we can ask for several elements at once:

In [23]:
data[0, [0, 3]]

array([-0.83311159,  0.09025131])

You can even pass a negative index. It will go from the end of the array.

In [24]:
data[-1, -1]

-0.7555092677828522

### 2.2 Slices

We can reuse the slicing as with the Python list or Pandas dataframe to get element from one of the axis.

In [25]:
data[0, 0:2]

array([-0.83311159, -1.77108826])

Note that the returned array does not include third column (with index 2).

You can skip the first or last index (which means, take the values from the beginning or to the end):

In [26]:
data[0, :2]

array([-0.83311159, -1.77108826])

If you omit both indices in the slice leaving out only the colon (:), you will get all columns of this row:

In [27]:
data[0, :]

array([-8.33111587e-01, -1.77108826e+00,  1.00000000e+02,  9.02513125e-02,
        9.14507270e-02])

### 2.3 Filtering data

In [28]:
data

array([[-8.33111587e-01, -1.77108826e+00,  1.00000000e+02,
         9.02513125e-02,  9.14507270e-02],
       [-2.56356209e-01,  1.08827881e-01, -6.70361108e-01,
         1.10235224e-01, -2.46563648e-01],
       [ 1.80046719e+00,  3.20248266e+00, -8.51545307e-01,
         1.00735716e+00, -1.17905167e+00],
       ...,
       [ 6.41858473e-01, -1.23949172e+00, -1.28838428e+00,
         8.48656117e-02, -9.70344397e-01],
       [ 6.84254937e-01,  1.25926465e+00, -2.53637105e-01,
         1.10568800e+00,  7.99706070e-01],
       [-7.24345283e-01,  2.10235427e+00, -3.40693048e-03,
         1.69422507e+00, -7.55509268e-01]])

We can produce a boolean array when using comparison operators.

In [29]:
data > 0

array([[False, False,  True,  True,  True],
       [False,  True, False,  True, False],
       [ True,  True, False,  True, False],
       ...,
       [ True, False, False,  True, False],
       [ True,  True, False,  True,  True],
       [False,  True, False,  True, False]])

This mask can be used to select some specific data.

In [30]:
data[data > 0]

array([1.00000000e+02, 9.02513125e-02, 9.14507270e-02, ...,
       7.99706070e-01, 2.10235427e+00, 1.69422507e+00])

It can also be used to affect some new values

In [31]:
data[data > 0] = np.inf
data

array([[-0.83311159, -1.77108826,         inf,         inf,         inf],
       [-0.25635621,         inf, -0.67036111,         inf, -0.24656365],
       [        inf,         inf, -0.85154531,         inf, -1.17905167],
       ...,
       [        inf, -1.23949172, -1.28838428,         inf, -0.9703444 ],
       [        inf,         inf, -0.25363711,         inf,         inf],
       [-0.72434528,         inf, -0.00340693,         inf, -0.75550927]])

### 2.4 Quizz

Answer the following quizz:

In [32]:
data = np.random.randn(20, 20)

* Print the element in the $1^{st}$ row and $10^{th}$ cloumn of the data.

In [33]:
# %load solutions/08_solutions.py
data[0, 9]

0.5393863254604203

* Print the elements in the $3^{rd}$ row and columns of $3^{rd}$ and $15^{th}$.

In [36]:
# %load solutions/09_solutions.py
data[2, [2, 14]]

array([1.09295543, 0.03040971])

* Print the elements in the $4^{th}$ row and columns from $3^{rd}$ t0 $15^{th}$.

In [37]:
# %load solutions/10_solutions.py
data[3, 2:15]

array([ 0.44510063,  0.13113525,  0.42117336,  1.14596017, -2.93637642,
       -0.83312801,  0.43292784, -1.40298019, -0.1894745 , -0.32558389,
       -2.59143553, -0.96943721, -0.77007638])

* Print all the elements in column $15^{th}$ which their value is above 0.

In [41]:
# %load solutions/11_solutions.py
mask = data[:, 14] > 0
data[mask, 14]

array([0.94622835, 0.03040971, 1.09693719, 0.30777835, 0.26987648,
       0.43566839, 0.51311511, 0.61619646, 0.25576709, 0.91602943,
       1.39374041])

In [43]:
np.allclose([1, 2, 3], [1, 2, 3])

True

In [46]:
1 - 0.5 + 1

1.5

In [50]:
(1 + 1 + 0.5 - 1) == (1 - 0.5 + 1)

True

## 3. Numerical analysis

Vectorizing code is the key to writing efficient numerical calculation with Python/Numpy. That means that as much as possible of a program should be formulated in terms of matrix and vector operations.

### 3.1 Scalar-array operations

We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers.

In [52]:
v1 = np.arange(0, 5)
v1

array([0, 1, 2, 3, 4])

In [53]:
v1 * 2

array([0, 2, 4, 6, 8])

In [54]:
v1 + 2

array([2, 3, 4, 5, 6])

In [56]:
np.sin([1, 2,3 ])  # np.log(A), np.arctan(A),...

array([0.84147098, 0.90929743, 0.14112001])

### 3.2 Element-wise array-array operations

When we add, subtract, multiply and divide arrays with each other, the default behaviour is **element-wise** operations:

In [57]:
A = np.array([[1, 2], [3, 4]])

In [58]:
A * A  # element-wise multiplication

array([[ 1,  4],
       [ 9, 16]])

In [59]:
v1 * v1

array([ 0,  1,  4,  9, 16])

### 3.3 Calculations

Often it is useful to store datasets in NumPy arrays. NumPy provides a number of functions to calculate statistics of datasets in arrays. 

In [60]:
a = np.random.random(40)

Different frequently used operations can be done:

In [61]:
print ('Mean value is', np.mean(a))
print ('Median value is',  np.median(a))
print ('Std is', np.std(a))
print ('Variance is', np.var(a))
print ('Min is', a.min())
print ('Element of minimum value is', a.argmin())
print ('Max is', a.max())
print ('Sum is', np.sum(a))
print ('Prod', np.prod(a))
print ('Cumsum is', np.cumsum(a)[-1])
print ('CumProd of 5 first elements is', np.cumprod(a)[4])
print ('Unique values in this array are:', np.unique(np.random.randint(1, 6, 10)))
print ('85% Percentile value is: ', np.percentile(a, 85))

Mean value is 0.5711467824931927
Median value is 0.6864579559230671
Std is 0.31202572852291166
Variance is 0.09736005526025376
Min is 0.006292991970671125
Element of minimum value is 10
Max is 0.9961317888109937
Sum is 22.84587129972771
Prod 1.4993621102634058e-17
Cumsum is 22.845871299727715
CumProd of 5 first elements is 0.0008069549748583338
Unique values in this array are: [1 2 3]
85% Percentile value is:  0.856412996894152


In [62]:
a = np.random.random(40)
print(a.argsort())
a.sort() #sorts in place!
print(a.argsort())

[ 4  6  9 29  5 14 23  8 19 21 27 38 30 36 16  1 17 10 12 39 26 20  0  7
 11  3 33 18 24 28 13 15 37  2 25 22 35 32 31 34]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]


#### Calculations with higher-dimensional data

When functions such as `min`, `max`, etc., is applied to a multidimensional arrays, it is sometimes useful to apply the calculation to the entire array, and sometimes only on a row or column basis. Using the `axis` argument we can specify how these functions should behave: 

In [63]:
m = np.random.rand(3, 3)
m

array([[0.22119293, 0.92572553, 0.82771447],
       [0.92673716, 0.21930434, 0.14973744],
       [0.34904653, 0.50401576, 0.50897114]])

In [64]:
# global max
m.max()

0.9267371567212536

In [65]:
# max in each column
m.max(axis=0)

array([0.92673716, 0.92572553, 0.82771447])

In [66]:
# max in each row
m.max(axis=1)

array([0.92572553, 0.92673716, 0.50897114])

Many other functions and methods in the `array` and `matrix` classes accept the same (optional) `axis` keyword argument.

## 4. Data reshaping and merging

* How could you change the shape of the 8-element array you created previously to have shape (2, 2, 2)? Hint: this can be done without creating a new array.

In [None]:
arr = np.arange(8)

In [None]:
# %load solutions/07_solutions.py

* Could you reshape the same 8-element array to a column vector. Do the same, to get a row vector. You can use `np.reshape` or `np.newaxis`.

In [None]:
# %load solutions/22_solutions.py

* Stack vertically two 1D NumPy array of size 10. Then, stack them horizontally. You can use the function [np.hstack](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.hstack.html) and [np.vstack](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.vstack.html). Repeat those two operations using the function [np.concatenate](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.concatenate.html) with two 2D NumPy arrays of size 5 x 2.

In [None]:
# %load solutions/20_solutions.py

In [None]:
# %load solutions/21_solutions.py