This notebook can be run on mybinder:  [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fgricad-gitlab.univ-grenoble-alpes.fr%2Fchatelaf%2Fconference-ia/master?urlpath=lab/tree/notebooks/0_python_in_a_nutshell/N0b_introduction_scipy.ipynb)

# Introduction to Numpy and Scipy

Data analysis needs effective computational ressources to read/write and process data. Usually, the data set to be processed is a set of arrays. 

The main structure provided by [Numpy](https://numpy.org/) is the *Fixed-Type Arrays*: **ndarray**. It is an efficient way of storing data and processing them.


[Scipy](https://www.scipy.org/) (*Scientific Python*) package is a dedicated tool that elaborates on Numpy to operate on *ndarray* efficiently. Quoting the *FAQ*, Scipy is "*set of open source (BSD licensed) scientific and numerical tools for Python. It currently supports special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others. A good rule of thumb is that if it’s covered in a general textbook on numerical computing (for example, the well-known Numerical Recipes series), it’s probably implemented in scipy*". This is the core of any data analysis package in Python.

In [None]:
# Find the scipy and numpy module and define an alias in the local namespace
import scipy as sp
import numpy as np

In [None]:
A = np.array(range(10)) # Create array from a list
print("A = {A}") # note that there is 10 elements: 0,1,...,9
B = np.arange(10) # Create array from scratch
print(f"B = {B}")
C = np.arange(3,18,2) # from 3 to 18 excluded, with a step size of 2
print(f"C = {C}")
help(np.array)

## Basics of Arrays 

There are plenty of functions to create and to initialize specific array (np.zeros, np.ones, np.empty ...). For each case, it is possible to define the type (int8, uint8, float64 ...) by providing the corresponding parameter. More information regarding the different array types can be found here: https://numpy.org/doc/stable/user/basics.types.html and https://numpy.org/doc/stable/reference/arrays.dtypes.html.

### Getting attributes


In [None]:
# Attributes
print(f"Number of elements in A: {A.size}")
print(f"Number of dimension of A: {A.ndim}")
print(f"Dimension of A: {A.shape}")
print(f"Type of element in A: {A.dtype}")

It is possible to modify explicitely some attributes, in particlar the *shape*:

In [None]:
B.shape = (2,5) # Change the shape to two lines, 5 columns -> the number of total elements should be the same
print("B = \n {}".format(B))
C = B.reshape(10) # The function return a new array with the corresponding shape
print(B.shape)
print(C.shape)

### Accessing elements

In [None]:
print("A = {}".format(A))
print(A[0]) # First element
print(A[1]) # Second element
print(A[-1]) # Last element
print(A[-2]) # Antepenultimate element

In [None]:
# Some slicing
print(A[0:3]) # Return an array of elements of A from the first (index 0) to the third (index 2)
print(A[::2]) # All elements with a step of 2
print(A[-3:-1]) # Can use reverse order

## Computation on Array
### Universal functions
A general comment for interpreted language: **do not use loop if you can** ! It is slow and inefficient.

The comment apply here with Python. Scipy provide a large types of operation that are optimized to work on array directly (as in Matlab, R ...). In particular, *universal functions* (ufuncs) are a set of functions for fast element-wise operations (+, -, power ...). Let see a short example:

In [None]:
def my_add(M,N): # Suppose that A and B have the same shape
    P = np.empty_like(M)
    nl, nr = M.shape
    for i in range(nl):
        for j in range(nr):
            P[i,j] = M[i,j] + N[i,j]
    return P

M, N = np.arange(100000).reshape(1000,100), np.arange(100000).reshape(1000,100)

# Evaluate execution time by repeating several runs based on a total of 2 seconds execution window
print('using loops')
%timeit my_add(M,N) # using loops
print('using ufuncs')
%timeit M + N # using ufuncs equivalent to sp.add(A,B)

Most all conventional functions exist: arithmetic, trigonometric, log/exp ... A detailed list is available here: https://numpy.org/doc/stable/reference/ufuncs.html

### Reduction
Numpy provides a set of functions to extrac values from the array itself and for some specific dimension of the array

In [None]:
A = sp.random.rand(5,4)
print("A = \n{}".format(A))
print(A.sum()) # Sum over all element
print(A.sum(axis=0)) # Sum over the lines: return an array of values
print(A.sum(axis=1)) # over the columns

Using the same convention, it is possible to get the cumulative sum (cumsum), product of element (prod, cumprod), the maximum/minimum value (max, min) and their position (argmax, argmin) and the first and second statistical moment (mean, var/std). It is also possible to check if a condition is fullfilled for all or any elements of the array.

In [None]:
np.any(A>0)

In [None]:
np.all(A>0.5)

In [None]:
print(A>0.5) # True/False *list*
print(np.where(A>0.5)) # array of indices

In [None]:
A[A>3] # extract relevant

### Some exercices
- Find the maximum and minimum value of A
- Find the maximum of each line
- Find the mean value of each row
- Find the position of the minimum value of each row

### Broadcasting
Broadcasting allow to define efficient operations between arrays of different sizes, given some of them are compatible. An extreme example is adding a scalar to a matrix

In [None]:
A+3

Easy ? Now if I need to center the data, it is also super easy

In [None]:
print('Size of A: {}'.format(A.shape))
print('Size of the average of A along the lines: {}'.format(A.mean(axis=0).shape))
# Suppose that each line is a sample, and each column a measurement (i.e., a variable)
Ac= A - A.mean(axis=0) 
print('Size of centered A: {}'.format(Ac.shape))
print ('Ac=\n{}'.format(Ac))

If we need to standardize the data (substract the mean and divide by the standard deviation), it can be achieved easily:

In [None]:
As = (A-A.mean(axis=0))/A.std(axis=0)
print(As)

Note that this is done with **scalar** mean and std. Multi-dimensional array can also be broadcasted by adding dimensions with None or np.newaxis (same).

In [None]:
M = np.random.rand(4,5) # array of random numbers
N = np.arange(5) # array of ones
P = M + N[None, :] # automatically expand N to M size by duplicating it. Here, duplicate as rows
print(M)
print(N)
print(P)

### Exo
Create a two-dimensional random array. Center and normalize each *row* in one line of code. Do the same for columns.

More details about broadcasting can be found here: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html

## Ploting in Python
The package [Matplotlib](https://matplotlib.org/) offers several functions to plot data. Below an example using 2D data, more complicated plots can be constructed when needed.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
x = np.arange(0,10,0.01)
y = x**2
plt.plot(x,y)
plt.grid()
plt.show()