<a href="https://colab.research.google.com/github/l-longo/neural-network-crash-course/blob/main/TA_Intro_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#NumPy

[NumPy](http://www.numpy.org/) is a fundamental package that allows efficient scientific computing in python.

An `ndarray` object constitutes the core of numpy package. It is an $n$-dimensional array of homogenous data types.

Every type of computation is faster with numpy because of ***vectorization***: operations are applied to whole arrays instead of individual elements. With this regard, numpy uses broadcasting, i.e. element-by-element operations.

The fundamental difference between a `list` and an `ndarray` is that the former can contain different data types. Numpy is more efficient because stores less memory and allow more compact operations, by specifying homogenous data types.

In [None]:
import numpy as np
print(np.__version__)

1.19.5



With the above code we imported the numpy library. Note that you may have to first install it if you use jupyter or other python console. You can do this with the command `pip install numpy`. We also print the version of numpy we are just using.

##Array creation:
Let's get started with array creation by using the command `np.array()` that converts a `list` in an `ndarray`:

In [None]:
a = np.array([1, 2, 3, 4]) 
print('a:')
print(a)
b = np.array(([1, 2, 3, 4],[5, 6, 7, 8],[9, 10, 11, 12])) #combining more lists in a 2-dimensional array
print('\n')
print('b:')
print(b)

a:
[1 2 3 4]


b:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


* $a$ is a 1-dimensional array
* $b$ is a 2-dimensional array that is created from a list of lists and can be seen as a matrix

A 3-dimensional array is created from 2-dimensional arrays nested in another list:





In [None]:
c = np.array([b,b])
print(c)

[[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]]


###Intrinsic array creation

We created arrays from list so far. Now we create arrays from scratch:

* `np.linspace(g,h,N)` creates a 1-D array with $N$ values equally spaced from $g$ to $h$;
* `np.arange(N)` creates a 1-D array of N values (size of the array);

* `np.zeros(N)` creates a 1-D array of N zeros;

* `np.ones(N)` creates a 1-D array of N ones;

* `np.zeros((m,k))` creates a 2-D array of zeros with $m$ rows and $k$ columns;

* `np.ones((m,k))` creates a 2-D array of ones with $m$ rows and $k$ columns;

* `np.eye(N)` creates a 2-D array with $N$ rows and $N$ columns with ones on the diagonal and zero otherwise (identity matrix of size $N$).



In [None]:
array_eq = np.linspace(1,9,10)
print(array_eq)

[1.         1.88888889 2.77777778 3.66666667 4.55555556 5.44444444
 6.33333333 7.22222222 8.11111111 9.        ]


In [None]:
array_ar = np.arange(10)
print(array_ar)

[0 1 2 3 4 5 6 7 8 9]


In [None]:
zeros = np.zeros(10)
print(zeros)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [None]:
ones = np.ones(10)
print(ones)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


In [None]:
zeros1 = np.zeros((5,10))
print(zeros1)

[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]


In [None]:
ones1 = np.ones((5,10))
print(ones1)

[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]


In [None]:
I = np.eye(10)

We can also create arrays with NaN values:

In [None]:
nan = np.empty(10)
nan[:] = np.nan
print(nan)

[nan nan nan nan nan nan nan nan nan nan]


In [None]:
nan1 = np.empty((5,5))
nan1[:] = np.nan
print(nan1)

[[nan nan nan nan nan]
 [nan nan nan nan nan]
 [nan nan nan nan nan]
 [nan nan nan nan nan]
 [nan nan nan nan nan]]


both 1-D and 2-D.

## Random number generator

In order to deal with random number generator we import the package `random` from the numpy library:

In [None]:
from numpy import random

* `random.randint(N)` returns a random integer from 0 to $N$;

* `random.rand()` returns a random float from 0 to 1.


In [None]:
r_int = random.randint(10)
r_float = random.rand()

print(r_int)
print(r_float)

4
0.30642792096806626


We can access the datatype of single values with the command `type()`:

In [None]:
print(type(r_int))
print(type(r_float))

<class 'int'>
<class 'float'>


We can generate arrays of random values with the command `randint` and `rand`:

In [None]:
r1 = random.randint(10, size=(3, 5))
print(r1)

[[4 8 2 4 1]
 [9 5 7 4 1]
 [2 5 5 0 8]]


We generated a 2-D array with sizes 3 and 5 and values of random integer between 0 and 10.

In [None]:
r2 = random.rand(3,5)
print(r2)

[[0.97158925 0.4261574  0.21430886 0.4349311  0.65853235]
 [0.6152668  0.71804872 0.28164187 0.23444046 0.64629005]
 [0.16732008 0.13800897 0.75076149 0.87308107 0.32077934]]


We generated a 2-D array of dimensions 3 and 5 containing random floats.

We can access the data type of an array with the command `dtype`:

In [None]:
print(r1.dtype)
print(r2.dtype)

int64
float64


## Stacking vector

Concatenating different arrays is a very useful and common practice in data science. Let's say we want to horizzontally concatenate two matrices with the same number of rows:

In [None]:
stack_h = np.hstack((r1,r2))
print(stack_h)

[[4.         8.         2.         4.         1.         0.97158925
  0.4261574  0.21430886 0.4349311  0.65853235]
 [9.         5.         7.         4.         1.         0.6152668
  0.71804872 0.28164187 0.23444046 0.64629005]
 [2.         5.         5.         0.         8.         0.16732008
  0.13800897 0.75076149 0.87308107 0.32077934]]


and we can do the same vertically if the matrices have the same number of columns:

In [None]:
stack_v = np.vstack((r1,r2))
print(stack_v)

[[4.         8.         2.         4.         1.        ]
 [9.         5.         7.         4.         1.        ]
 [2.         5.         5.         0.         8.        ]
 [0.97158925 0.4261574  0.21430886 0.4349311  0.65853235]
 [0.6152668  0.71804872 0.28164187 0.23444046 0.64629005]
 [0.16732008 0.13800897 0.75076149 0.87308107 0.32077934]]


##Access array elements
Once array is created we want to access its elements (keep in mind that indexing start with 0 in python):

In [None]:
print(a[0])
print('\n')
print(a[1:])


1


[2 3 4]


When we access an array we talk about *indexing* and *slicing*:
* indexing consists in accessing entries of the array;
* slicing consists in accessing rows/columns (subarrays) of the array.

let's use the 2-dimensional array we created before:

In [None]:
print(b)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


Now we access row indexed 1 and column indexed 2 of the 2-dimensional array b:

In [None]:
print(b[1,2])

7


We can also access subarrays:

In [None]:
print(b[0,:])
print('\n')
print(b[:,1])

[1 2 3 4]


[ 2  6 10]


Negative indexing can be used in numpy.

Access top-right of the array b:

In [None]:
print(b[-3,-1])

4


##Accessing information on dimension/shape/size

* `ndim` gives the dimension of the array;

* `shape` gives the shape of the array;

* `size` gives the size of the array (total number of elements).

Let's consider the arrays we created before:


In [None]:
print('a:')
print(a)
print('\n')
print('b:')
print(b)
print('\n')
print('c:')
print(c)

a:
[1 2 3 4]


b:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


c:
[[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]]


In [None]:
print(a.ndim)

print(b.ndim)

print(c.ndim)


1
2
3


In [None]:
print(a.shape)

print(b.shape)

print(c.shape)

(4,)
(3, 4)
(2, 3, 4)


The shape returns the number of elements per each dimension.
In a 2-D array shape gives the (number of row,number of columns).

You can access to single shape by indexing.
Number of rows for $b$ is given by:

In [None]:
print(b.shape[0])

3


while number of columns by:

In [None]:
print(b.shape[1])

4


Notice that $a$ is a vector with no column dimension, indeed for python $a$ does not have a shape[1] because it is a 1-D array. This can be a problem if you want to concatenate it to a matrix, or if you want to apply linear algebra on it. Therefore you may want to use the command `np.resize(a,new shape)`:

In [None]:
a_resized = np.resize(a,(a.shape[0],1))
print(a_resized)
print('\n')
print(a_resized.shape)

[[1]
 [2]
 [3]
 [4]]
(4, 1)


In [None]:
print(a.size)
print(b.size)
print(c.size)

4
12
24


The size is the total number of elements. It is the same of shape for 1-D array, while consists in the product of the shape over the dimension for an N-D array.

In [None]:
print(len(a))
print(len(b))
print(len(c))

4
3
2


`len()` gives the size of the first dimension and it is the shape and size of a 1-D array and the number of rows of a matrix (2-D array).

## Statistical operations

Statistical operations represent the core functions in the numpy package to perform preliminary data analysis. We now have a look at the main functions. For further details you can look at [Statistics](https://numpy.org/doc/stable/reference/routines.statistics.html).

* `np.amin()`/`np.amax()` gives the minimum/maximum value in an array;
* `np.mean()` gives the mean value in an array;
* `np.std()` gives the standard deviation in an array;
* `np.percentile(x,q)` gives the $q$-th percentile of array $x$.

In [None]:
array = np.random.rand(100,5)
print(array)

[[0.1497877  0.10364235 0.13015798 0.30271202 0.17025151]
 [0.5205516  0.75180653 0.47654947 0.71352741 0.67507964]
 [0.67648215 0.97504108 0.34964243 0.70597421 0.59808539]
 [0.93850115 0.05570064 0.70859212 0.15136929 0.39373385]
 [0.48027065 0.51992717 0.49873619 0.58463595 0.59852071]
 [0.49584349 0.88580623 0.02681783 0.86098337 0.49726247]
 [0.64884898 0.34560399 0.48306078 0.09599486 0.6603347 ]
 [0.92977293 0.13339204 0.04740141 0.80688897 0.29066933]
 [0.8293901  0.60147447 0.32130179 0.54273706 0.37990219]
 [0.6039478  0.88840045 0.73457441 0.33693696 0.04208687]
 [0.13440399 0.77533389 0.83399346 0.7652942  0.54431486]
 [0.01307942 0.63244863 0.87671006 0.06236033 0.53191683]
 [0.48735913 0.6064992  0.53802888 0.97847595 0.34684754]
 [0.57378271 0.24335827 0.42230261 0.2919588  0.13355422]
 [0.86254219 0.31489153 0.2262731  0.44503319 0.82350266]
 [0.90310528 0.15251959 0.88392728 0.52130197 0.44258202]
 [0.43062853 0.91189066 0.23419999 0.5422257  0.27761995]
 [0.2717107  0

In [None]:
print(np.amin(array))
print(np.amax(array))

0.0026854731417872424
0.9990785034110865


In [None]:
print(np.mean(array[:,0]))
print(np.std(array[:,0]))

0.49179849970242345
0.2907963864161126


We just printed the mean and standard deviation of the first column of the array, which is a vector of 100 entries.

In [None]:
print(np.percentile(array[:,0],95))

0.9423185462730549


We printed the 95-th percentile

## Linear algebra operations
Numpy allows linear algebra operations through the `numpy.linalg` functions. All the functions can be found at [Linear algebra](https://numpy.org/doc/stable/reference/routines.linalg.html#module-numpy.linalg). Let's see the basics:
* `np.dot(A,B)` computes the dot product between matrices;
* `np.linalg.inv(A)` computes the inverse of matrix $A$;
* `np.linalg.det(A)` computes the determinant of matrix $A$;
* `np.linalg.matrix_power(A,n)` computes the power ($n$) of matrix $A$.


In [None]:
A = np.array(random.randint(10, size=(4, 4)))
B = np.array(random.randint(10, size=(4, 4)))
print(A)
print('\n')
print(B)

[[9 2 0 1]
 [7 1 9 6]
 [1 2 0 9]
 [9 4 6 7]]


[[8 5 1 0]
 [5 6 1 3]
 [3 9 4 5]
 [4 5 8 0]]


Basic operation can be implemented with matrix (and we can do the same scalars):

In [None]:
sum = A + B
print(sum)
print('\n')
diff = A - B
print(diff)

[[17  7  1  1]
 [12  7 10  9]
 [ 4 11  4 14]
 [13  9 14  7]]


[[ 1 -3 -1  1]
 [ 2 -5  8  3]
 [-2 -7 -4  4]
 [ 5 -1 -2  7]]


You can do the same with vectors of the same dimension:

In [None]:
vector_operation = A[:,0] + A[:,1] - B[:,2]
print(vector_operation)

[10  7 -1  5]


In [None]:
product = np.dot(A,B)
print(product)

[[ 86  62  19   6]
 [112 152  92  48]
 [ 54  62  75   6]
 [138 158  93  42]]


We just computed the matrix product. 

In [None]:
det = np.linalg.det(A)
print(det)

1344.0000000000002


We just computed the determinant of the matrix $A$.

In [None]:
A_inv = np.linalg.inv(A)
print(A_inv)

[[ 0.16071429  0.07142857  0.01785714 -0.10714286]
 [-0.24107143 -0.35714286 -0.15178571  0.53571429]
 [-0.12202381  0.04761905 -0.0922619   0.0952381 ]
 [ 0.03571429  0.07142857  0.14285714 -0.10714286]]


We computed the inverse of $A$

In [None]:
A_squared = np.linalg.matrix_power(A,2)
print(A_squared)

[[104  24  24  28]
 [133  57  45 136]
 [104  40  72  76]
 [178  62  78 136]]


We computed $A^2$.

##Conclusion

In this tutorial, we applied the primary functions of NumPy library. More functions can be found in the official guideline. In the following tutorial, you will get a flavor of Pandas, another essential python package. You will recognize that combining functions with these two libraries will be fundamental to work with neural networks, deep learning and with any other data science application.