# Introduction to NumPy

This material is inspired from different sources:

* https://github.com/SciTools/courses
* https://github.com/paris-saclay-cds/python-workshop/blob/master/Day_1_Scientific_Python/01-numpy-introduction.ipynb

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## 1. Create numpy array

We can easily create a NumPy array from a sequence using the function `np.array`.

In [None]:
np.array([0, 1, 2, 3])

Sometimes, we want our array to be in particular way: only zeros (`np.zeros`), only ones (`np.ones`), equally spaced (`np.linspace`) or logarithmic spaced (`np.logspace`), etc.

### Exercise

Try out some of these ways of creating NumPy arrays. See if you can:

* create a NumPy array from a list of integer numbers. Use the function [`np.array()`](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html) and pass the Python list. You can refer to the example from the documentation.

In [None]:
# %load solutions/01_solutions.py

In [None]:
x.dtype

While checking the documentation of [np.array](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html) an interesting parameter to pay attention is ``dtype``. This parameter can force the data type inside the array.

In [None]:
arr.dtype

* create a 3-dimensional NumPy array filled with all zeros or ones numbers. You can check the documentation of [np.zeros](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.zeros.html) and [np.ones](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html).

In [None]:
# %load solutions/03_solutions.py

In [None]:
x

* a NumPy array filled with a constant value -- not 0 or 1. (Hint: this can be achieved using the last array you created, or you could use [np.empty](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.empty.html) and find a way of filling the array with a constant value),

In [None]:
# %load solutions/04_solutions.py

* a NumPy array of 8 elements with a range of values starting from 0 and a spacing of 3 between each element (Hint: check the function [np.arange](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.arange.html)), and

In [None]:
# %load solutions/05_solutions.py

## 2. Manipulating NumPy array

### 2.1 Indexing

Note that the NumPy arrays are zero-indexed:

In [None]:
data = np.random.randn(10000, 5)

In [None]:
data[0, 0]

It means that that the third element in the first row has an index of [0, 2]:

In [None]:
data[0, 2]

We can also assign the element with a new value:

In [None]:
data[0, 2] = 100.
print(data[0, 2])

NumPy (and Python in general) checks the bounds of the array:

In [None]:
print(data.shape)
data[60, 10]

Finally, we can ask for several elements at once:

In [None]:
data[0, [0, 3]]

You can even pass a negative index. It will go from the end of the array.

In [None]:
data[-1, -1]

### 2.2 Slices

We can reuse the slicing as with the Python list or Pandas dataframe to get element from one of the axis.

In [None]:
data[0, 0:2]

Note that the returned array does not include third column (with index 2).

You can skip the first or last index (which means, take the values from the beginning or to the end):

In [None]:
data[0, :2]

If you omit both indices in the slice leaving out only the colon (:), you will get all columns of this row:

In [None]:
data[0, :]

### 2.3 Filtering data

In [None]:
data

We can produce a boolean array when using comparison operators.

In [None]:
data > 0

This mask can be used to select some specific data.

In [None]:
data[data > 0]

It can also be used to affect some new values

In [None]:
data[data > 0] = np.inf
data

### 2.4 Quiz

Answer the following quiz:

In [None]:
data = np.random.randn(20, 20)

* Print the element in the $1^{st}$ row and $10^{th}$ column of the data.

In [None]:
# %load solutions/08_solutions.py

* Print the elements in the $3^{rd}$ row and columns of $3^{rd}$ and $15^{th}$.

In [None]:
# %load solutions/09_solutions.py

* Print the elements in the $4^{th}$ row and columns from $3^{rd}$ t0 $15^{th}$.

In [None]:
# %load solutions/10_solutions.py

* Print all the elements in column $15^{th}$ which their value is above 0.

In [None]:
# %load solutions/11_solutions.py

## 3. Numerical analysis

Vectorizing code is the key to writing efficient numerical calculation with Python/Numpy. That means that as much as possible of a program should be formulated in terms of matrix and vector operations.

### 3.1 Scalar-array operations

We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers.

In [None]:
v1 = np.arange(0, 5)
v1

In [None]:
v1 * 2

In [None]:
v1 + 2

In [None]:
np.sin([1, 2,3 ])  # np.log(A), np.arctan(A),...

### 3.2 Element-wise array-array operations

When we add, subtract, multiply and divide arrays with each other, the default behaviour is **element-wise** operations:

In [None]:
A = np.array([[1, 2], [3, 4]])

In [None]:
A * A  # element-wise multiplication

In [None]:
v1 * v1

### 3.3 Calculations

Often it is useful to store datasets in NumPy arrays. NumPy provides a number of functions to calculate statistics of datasets in arrays. 

In [None]:
a = np.random.random(40)

Different frequently used operations can be done:

In [None]:
print ('Mean value is', np.mean(a))
print ('Median value is',  np.median(a))
print ('Std is', np.std(a))
print ('Variance is', np.var(a))
print ('Min is', a.min())
print ('Element of minimum value is', a.argmin())
print ('Max is', a.max())
print ('Sum is', np.sum(a))
print ('Prod', np.prod(a))
print ('Cumsum is', np.cumsum(a)[-1])
print ('CumProd of 5 first elements is', np.cumprod(a)[4])
print ('Unique values in this array are:', np.unique(np.random.randint(1, 6, 10)))
print ('85% Percentile value is: ', np.percentile(a, 85))

In [None]:
a = np.random.random(40)
print(a.argsort())
a.sort() #sorts in place!
print(a.argsort())

#### Calculations with higher-dimensional data

When functions such as `min`, `max`, etc., is applied to a multidimensional arrays, it is sometimes useful to apply the calculation to the entire array, and sometimes only on a row or column basis. Using the `axis` argument we can specify how these functions should behave: 

In [None]:
m = np.random.rand(3, 3)
m

In [None]:
# global max
m.max()

In [None]:
# max in each column
m.max(axis=0)

In [None]:
# max in each row
m.max(axis=1)

Many other functions and methods in the `array` and `matrix` classes accept the same (optional) `axis` keyword argument.

## 4. Data reshaping and merging

* How could you change the shape of the 8-element array you created previously to have shape (2, 2, 2)? Hint: this can be done without creating a new array.

In [None]:
arr = np.arange(8)

In [None]:
# %load solutions/07_solutions.py

* Could you reshape the same 8-element array to a column vector. Do the same, to get a row vector. You can use `np.reshape` or `np.newaxis`.

In [None]:
# %load solutions/22_solutions.py

* Stack vertically two 1D NumPy array of size 10. Then, stack them horizontally. You can use the function [np.hstack](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.hstack.html) and [np.vstack](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.vstack.html). Repeat those two operations using the function [np.concatenate](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.concatenate.html) with two 2D NumPy arrays of size 5 x 2.

In [None]:
# %load solutions/20_solutions.py

In [None]:
# %load solutions/21_solutions.py