# Tutorial: numpy

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/insop/ML_crash_course/blob/main/tutorial_numpy.ipynb)


## Contents

1. [Motivation](#Motivation)
1. [Vectors](#Vectors)
  1. [Vector Initialization](#Vector-Initialization)
  1. [Vector indexing](#Vector-indexing)
  1. [Vector assignment](#Vector-assignment)
  1. [Vectorized operations](#Vectorized-operations)
  1. [Why vectorized operations](#Why-vectorized-operations)
1. [Matrices](#Matrices)
  1. [Matrix initialization](#Matrix-initialization)
  1. [Matrix indexing](#Matrix-indexing)
  1. [Matrix assignment](#Matrix-assignment)
  1. [Matrix reshaping](#Matrix-reshaping)
  1. [Numeric operations](#Numeric-operations)
1. [Credits](#Credits)
1. [References](#References)

## Motivation
- it allows us to compute vectors and matrices efficiently
- it is much performant than native Python
- other ML/DL framework ([TensorFlow](https://www.tensorflow.org), [Pytorch](https://pytorch.org)) use numpy style tensor operations

In jupyter notebook, `numpy` help is just two cliks away (Help->NumPy Reference).

## Vectors

### Vector initialization

Numpy array (`ndarray`) is one or more dimensions values of the same types. 

They are indexed by non negative integer tuples. 

Rank represents the number of dimensions of the array. 

The `shape` shows the dimensions of the array.

In [None]:
import numpy as np

In [None]:
# init '5' zeros with rank #1 array
np.zeros(5)

In [None]:
np.ones(5)


In [None]:
a=np.zeros(6)
print("a: {}".format(a))
print("a's dim is {}".format(a.shape))

In [None]:
# initialize array with type
np.array([1,2,3,4,5], dtype='float')

In [None]:
# 10 random floats in [0,1)
np.random.random(10)

In [None]:
# random integers
np.random.randint(11,20, size=10) 

### Vector indexing

In [None]:
x = np.array([10,20,30,40,50])

In [None]:
x[2]

In [None]:
# slicing
x[2:4]

In [None]:
# last value
x[-1]

In [None]:
# pick values by indices
x[[1,2,-1]]

### Vector assignment

!!! Be careful when assigning arrays to a new array

In [None]:
# what will happen?
x2 = x
# x2 = x.copy()

In [None]:
x2[0]=10

In [None]:
x2[[1,2]] = 90
x2

In [None]:
x2[[3,4]] = [-1,-2]
x2

In [None]:
# check the original array (x), what should we do?
x

### Vectorized operations

In [None]:
x.sum()

In [None]:
x.mean()

In [None]:
x.argmax()

In [None]:
np.log(x)

In [None]:
np.exp(x)

### Why vectorized operations

Compared to python list, `numpy` vectorized opertions lead to __huge__ performance gains. The following show the comparisions between python list and numpy vectorized operation with 10 million values.

In [None]:
# log every value as pyton list, one by one
def listlog(vals):
    return [np.log(y) for y in vals]

In [None]:
# get random array
random_nums = np.random.random_sample(int(1e7))+1
random_nums[0:10]

In [None]:
%time _ = np.log(random_nums)

In [None]:
%time _ = listlog(random_nums)

## Matrices

The matrix (multi-dimensional vector) is the main data object of machine learning operations.

### Matrix initilzation

In [None]:
np.array([[1,2,3],[4,5,6]])

In [None]:
np.array([[1,2,3],[4,5,6]], dtype='float') 

In [None]:
np.zeros((3,5))

### Matrix indexing

In [None]:
X=np.array([[1,2,3],[4,5,6]])
X

In [None]:
X[0] # row

In [None]:
X[0,0] # row, col

In [None]:
X[0,:] # row, ':' means all, i.e. from * to':'' * 

In [None]:
X[:,1] # column

In [None]:
X[:,[0,2]]

In [None]:
# find numbers greater and equal than 4
X[X>=4] # try `X>=4` first if this is not clear

## Matrix assignment

Similar to vector assignment, be careful for assignining, use `copy()` when necessary.

In [None]:
X2=X.copy()

In [None]:
X2[0,0] = 20
X2

In [None]:
X2[0] = 100  # row, all cols
X2

In [None]:
X2[:,-1] = [1,2]
X2

## Matrix reshaping

It is important to check the shape and reshape during the data pipeline within the model.

In [None]:
z = np.arange(1,7)
z

In [None]:
z.shape

In [None]:
z_t = z.reshape(2,3)
z_t

In [None]:
z_t.shape

In [None]:
z_t[0]


In [None]:
z_t.reshape(6)


## Numeric operations

In [None]:
A = np.array(range(1,7), dtype='float').reshape(2,3)
A

In [None]:
B = np.array([1, 2, 3])
B

In [None]:
# dot product, all are the same, A.dot(B), np.dot(A,B), A@B
A.dot(B)

In [None]:
# this is element wise multipley, not dot prod, `broadcasting` is happening
A*B # A [2,3], B[1,3]

## Credits


This note book follws closely with this tutorial, [Numpy tutorial](https://github.com/cgpotts/cs224u/blob/23b120f5f57ee45bc9414d38dc426f76a86f0578/tutorial_numpy.ipynb).

## References

[Numpy tutorial](https://github.com/cgpotts/cs224u/blob/23b120f5f57ee45bc9414d38dc426f76a86f0578/tutorial_numpy.ipynb)

[Python Numpy Tutorial (with Jupyter and Colab)](https://cs231n.github.io/python-numpy-tutorial/)