# Numpy

Unlike Matlab or R, Python is a general purpose programming language that was not designed for scientific computing or data analysis. For this purpose, we have to use the well known NumPy (Numerical Python) package. 

In [None]:
import numpy as np

def printbl(*args):
    print(*args, '\n')

## References

* Python for Data Analysis, Chapter 4, by Wes McKinney, O'REILLY
* [NumPy Reference](http://docs.scipy.org/doc/numpy-dev/reference/index.html)
* [SciPy2017 tutorial: Introduction to Numerical Computing with NumPy](https://github.com/enthought/Numpy-Tutorial-SciPyConf-2017)

## Multidimensional Array : ndarray

Execute and explain (with comments) what follows. Compare the following thwo cells. What do you conclude from this ?

In [None]:
data1 = [6, 7.5, 8, 0, 1] 
print(type(data1))
print(type(data1[0]))
print(type(data1[1]))

In [None]:
data2 = np.array([6, 7.5, 8, 0, 1])
print(type(data2))
print(type(data2[0]))
print(type(data2[1]))

Compare the following thow cells. What do you conclude from this ?

In [None]:
data1 * 2

In [None]:
data2 * 2

What is the number of dimensions of the `ndarray` referred by `data4` ?

In [None]:
data3 = [[1, 2, 3], [4, 5]]
data4 = np.array(data3)
print(data4)

print()
print('Data Type :', data4.dtype)
print('Shape :', data4.shape)
print('Object type :', type(data4[0]))

What is the number of dimensions of the `ndarray` referred by `data5` 
?

In [None]:
data5 = np.array([[1, 2, 3], [4, 5, 6]])
print(data5)

print()
print('Data Type :', data5.dtype)
print('Shape :', data5.shape)
print('Object type :', type(data5[0]))

Compare and explain the output from the following expressions :

In [None]:
data4 * 2

In [None]:
data5 * 2

What is the shape of the following `ndarray` ?

In [None]:
data6 = np.array([[1, 2, 3], [4, 5, 6], [4, 5, 6], [4, 5, 6]])
print(data6.ndim)

### Exercises

* Based on the use of lists, create a 2 dimentional `ndarray` with shape equal to `(2, 5)` that contains all the digits.
* Create the same `ndarray` with the help of [`numpy.arange`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html) and [`reshape`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html#numpy.reshape).

### Indexing

In [None]:
x = np.arange(10, dtype=np.float64)
print('ID :', id(x))

x[2:7] = 8.6
print(x)

print('ID :', id(x))

* What is `y.base` refering to ? For which purpose ?
* What is the meaning of ` y[:] = 1` ? Why don't we write `y = 1` ?

In [None]:
y = x[:5]

print('IDs :', id(x), id(y), id(y.base))

y[::2] = 1.12

print(x)
print(y)

y[:] = 1

print(x)

In [None]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr2d[0][2], 'is equivalent to', arr2d[0, 2])

In [None]:
arr2d = np.arange(15).reshape(3, 5)
print(arr2d)

print()
arr2d[1:, 2:4] = 0
print(arr2d)

Indexing by arrays of indices (fancy indexing) instead of slicing creates a copy :

In [None]:
arr2d = np.arange(15).reshape(3, 5)
xpos = [1, 2, 2]
ypos = [0, 2, 4]
x = arr2d[xpos, ypos]
print(x)
x[:] = 6
print(arr2d)

Slicing and fancy indexing can be mixed :

In [None]:
arr2d[1:, [1, 3]]

### Exercise

Consider the following matrix :

$$
\begin{pmatrix}
  0 & 1 & 2 & 3 & 4 \\
  5 & 6 & 7 & 8 & 9 \\
  10 & 11 & 12 & 13 & 14 \\
  15 & 16 & 17 & 18 & 19 \\
  20 & 21 & 22 & 23 & 24 \\
\end{pmatrix}
$$

Create a 2 dimentional numpy array equivalent to this matrix and write expressions allowing to extract the following matrices without creating a copy:

$$
\begin{pmatrix}
  1 & 2\\
  6 & 7\\
  11 & 12\\
  16 & 17\\
  21 & 23\\
\end{pmatrix},
\begin{pmatrix}
  5 & 7\\
  15 & 17
\end{pmatrix},
\begin{pmatrix}
  20 & 21 & 22 & 23 & 24 \\
\end{pmatrix}
$$

Use fancy indexing to create the following matrix :


$$
\begin{pmatrix}
1 & 7 & 13 & 19
\end{pmatrix}
$$

### Boolean Indexing

In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.arange(28).reshape(7, 4)
print(data, end='\n\n')

print(names == 'Bob')
print(data[names == 'Bob'], end='\n\n')

print(data[(names == 'Bob') | (names == 'Will')])

In [None]:
data = np.random.randn(7, 4)
print(data)

data[data < 0] = 0
print()
print(data)

### Exercise

Consider the following matrix :

$$
\begin{pmatrix}
  0 & 1 & 2 & 3 & 4 \\
  5 & 6 & 7 & 8 & 9 \\
  10 & 11 & 12 & 13 & 14 \\
  15 & 16 & 17 & 18 & 19 \\
  20 & 21 & 22 & 23 & 24 \\
\end{pmatrix}
$$

Create a 2 dimentional numpy array equivalent to this matrix and use boolean indexing to extract all the odd numbers.

### Swapping Axes

In [None]:
X = np.arange(30).reshape((3, 10))
print(X)
np.swapaxes(X, 0, 1)

### Numerical operations (Elementwise)

In [None]:
X = np.zeros((3,6))
print(X)

print()

X = X + 3
print(X)

In [None]:
Y = np.ones((3,6))
X - Y

In [None]:
x = np.array([1, 2, 3, 1.5])
x * 2

In [None]:
print(np.ones((3,3)) - np.identity(3))

In [None]:
empty_array = np.empty(8, dtype=np.float64)

In [None]:
x = np.arange(10)
print(id(x))
x **= 2
print(id(x))
print(x)
x = x ** 2
print(id(x))
print(x)

In [None]:
X = np.arange(15).reshape((3, 5))
printbl(X, )
printbl(X * X) # Not matrix multiplication
printbl(np.dot(X, X.T)) # Use transoposing
printbl(np.sqrt(X))

### Exercises

Use the previous operations to verify the following properties for two randomly generated square matrices $A$, $B$ fo order 4 and scalar c (see, [https://en.wikipedia.org/wiki/Transpose](https://en.wikipedia.org/wiki/Transpose)):

1. $(A^T)^T = A$
1. $(A + B)^T = A^T + B^T$
1. $(AB)^T = B^TA^T$
1. $(cA)^T = cA^T$

Does transposing creates a copy ?

## Reductions

Operations considered in this sections are known as reduction. In other words, these operations produce a single value from a 1D array.

In [None]:
x = np.array([3, 6, 2, 4])
np.sum(x)

In [None]:
x = np.array([3, 6, 2, 4])
x.sum()

In [None]:
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
printbl(X)
printbl("Reduction applied on the flattened array :", np.sum(X))
printbl("Sum along the first axis (0), in other words, the reduction applied on columns :", np.sum(X, axis=0))
printbl("Sum along the second axis (1) :", np.sum(X, axis=1))

In [None]:
x = np.random.randn(100)
print((x > 0).any())
print((x > 0).all())
print(np.all(x > 0))

### Exercises

Let us consider the following array : `a = np.arange(-15, 15).reshape(5, 6) ** 2` and compute :

* The maximum of each row
* The mean of each column (see, [https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html](https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html))

### Sorting and Searching (see [here](http://docs.scipy.org/doc/numpy/reference/routines.sort.html))

In [None]:
x = np.random.randn(1000)
print(np.sort(x)[-1])
print(np.max(x))

In [None]:
x = np.random.randn(4, 4)
printbl(x)
np.where(x > 0, 2, -2)

In [None]:
x = np.random.randn(4, 4)
printbl(x)
np.where(x < 0, 0, x)

In [None]:
X = np.random.randn(2,5) * 0.1 + 10
printbl(X)
printbl(np.argmin(X)) # returns the index in the flattened array
printbl(np.argmin(X, axis=0)) # indices with respect to the first axis
printbl(np.argmin(X, axis=1)) # indices with respect to the second axis

## Exercises

##### Based on the use of the `%timeit` magic command (see, [http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit](http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit)), compare the performance of the sum of  elements of `np.arange(1000)` (with the numpy.sum()` mathematical function) and `range(1000)` (with built-in `sum()` function).

In [154]:
a = np.arange(1000)
l = range(1000)

##### Without the use of control flow statements, replace all the values of the following matrix that are not in the intervale $[\mu - \sigma, \mu + \sigma]$ by zero (see, [statistics](http://docs.scipy.org/doc/numpy/reference/routines.statistics.html)). 

In [None]:
X = np.random.randn(5, 4)

##### Write a function to compute the median of the following two arrays without the use of [numpy.median](http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.median.html). Compare your result with the value returned by numpy.median.

In [None]:
x = np.random.randn(1001)
y = np.random.randn(1000)

##### Without the use of control flow statements, determine the indices along the two axis of the first maximal element in the following matrix. 

In [None]:
X = np.random.randn(5, 8)

##### Without the use of control flow statements, compute the mean of each variable from the following data matrix (D) (see, [numpy.mean](http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html#numpy.mean)).

In [None]:
D = np.random.randn(5, 4)

print(D)

for i in range(4):
    data_i = D[:,i]
    print("Data from variable {} : {} (mean : {})".format(i, data_i, np.mean(data_i)))

###### What is the difference between np.random.randn() and np.random.rand() ?

###### Use routines from numpy.linalg (see [Linear algebra](http://docs.scipy.org/doc/numpy/reference/routines.linalg.html)) to solve the following system of equations (from [System of linear equations ](https://en.wikipedia.org/wiki/System_of_linear_equations)) :  $$ \begin{alignat}{7} 3x &&\; + \;&& 2y             &&\; - \;&& z  &&\; = \;&& 1 & \\ 2x &&\; - \;&& 2y             &&\; + \;&& 4z &&\; = \;&& -2 & \\ -x &&\; + \;&& \tfrac{1}{2} y &&\; - \;&& z  &&\; = \;&& 0 & \end{alignat} $$

###### Complete the following code by using routines from numpy.linalg to determine a line of equation $y = mx + p$ that fits the given data points.

In [None]:
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

x = np.arange(-0.5, 0.5, 0.01)
y = (2 * x + 3) + np.random.randn(100) * 0.1

plt.plot(x, y, 'o', label='Original data', markersize=5)
    