

![alt text](https://codelearnstorage.s3.amazonaws.com/Upload/Blog/thu-vien-numpy-trong-python-63724882299.0039.png)

`numpy` is one of the most essential libraries for Machine Learning and Deep Learning as it allows us to work with large, multi-dimensional arrays and matrices, and perform high-level mathematical functions on these arrays. 

+ Why use NumPy?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.



# Set up

In [None]:
# Install numpy:
! pip install numpy

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# %matplotlib inline
import warnings
warnings.filterwarnings('ignore')
sns.set_style("whitegrid")

# Numpy is much faster than Python list

**Numpy stands for Numerical Python**

NumPy is, just like SciPy, Scikit-Learn, Pandas, etc. one of the packages that you just can’t miss when you’re learning data science, mainly because this library provides you with an array data structure that holds some benefits over Python lists, such as: being more compact, faster access in reading and writing items, being more convenient and more efficient.

NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):

## Create a basic array

Let's take a took at how to create tensors with NumPy.

<div align="left">
<img src="https://i.imgur.com/VSCCjUx.png" width="650">
</div>


👉 np.array()

In [None]:
# scalar:
np.array(6)

In [None]:
# One-dim array:
np.array([6, 7, 8])

In [None]:
# 2-dim array:
np.array([[1, 2, 3], [4, 5, 6]])

In [None]:
# 3-dim array:
np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

In [None]:
def get_array_info(ip):
    x = np.array(ip)
    print ("x: ", x)
    # Number of dimensions
    print ("x.ndim: ", x.ndim)
    # Shape (number of elements at each dimension)
    print ("x.shape:", x.shape)
    # Size of elements (how many numbers does x store)
    print ("x.size: ", x.size)
    # Data type
    print ("x.dtype: ", x.dtype)

In [None]:
# scalar
get_array_info(6)

In [None]:
# One-dim array:
get_array_info([6, 7, 8])

In [None]:
# 2-dim array:
get_array_info([[1, 2, 3], [4, 5, 6]])

In [None]:
# 3-dim array:
get_array_info([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

👉 np.zeros(`n`)

In [None]:
np.zeros(10)

👉 np.ones(`n`)

In [None]:
np.ones(7)

👉 np.arange(`start`, `end`, `step`)

In [None]:
np.arange(1, 10, 2)

👉 np.linspace(`start`, `end`, `number of interval`)

In [None]:
np.linspace(0, 10, 5)

## Concatenate multi array
Join a sequence of arrays along an existing axis.

In [None]:
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6]])
print('x', x)
print('x.shape', x.shape)
print('=='*10)
print('y', y)
print('y.shape', y)


In [None]:
np.concatenate([x, y], axis = 0)

## Reshap an array

+ `np.reshape()`, `array.reshape()`

+ Using `arr.reshape()` will give a new shape to an array without changing the data. Just remember that when you use the reshape method, the array you want to produce needs to have the same number of elements as the original array. If you start with an array with 12 elements, you’ll need to make sure that your new array also has a total of 12 elements. **We can also use `-1` on a dimension and NumPy will infer the dimension based on our input tensor.**

<div align="left">
<img src="https://raw.githubusercontent.com/GokuMohandas/MadeWithML/main/images/foundations/numpy/reshape.png" width="450">
</div>


In [None]:
a = np.arange(12)
print(a)

In [None]:
# Using array.reshape()
b = a.reshape(3, 4)
print(b)

In [None]:
# using np.reshape()
c = np.reshape(a, newshape=(6, 2))
print(c)

In [None]:
# Reshape to 3-dim array:
d = a.reshape(2, 3, -1)
d

## Add `a new axis` to array

`np.newaxis`, `np.expand_dims`: will increase the dimensions of your array by one dimension when used once. This means that a 1D array will become a 2D array, a 2D array will become a 3D array, and so on.

In [None]:
a = np.array([1, 2, 3, 4, 5, 6])
print('a', a)
print('a.shape', a.shape)

In [None]:
a2 = a[np.newaxis, :]
print('a2', a2)
print('a2.shape', a2.shape)

In [None]:
a2 = a[:, np.newaxis]
print('a2', a2)
print('a2.shape', a2.shape)

In [None]:
a3 = np.expand_dims(a, axis = 0)
print('a3', a3)
print('a3.shape', a3.shape)

In [None]:
a3 = np.expand_dims(a, axis = 1)
print('a3', a3)
print('a3.shape', a3.shape)

## Transposing

We often need to change the dimensions of our matrix for operations like matrix multiplication

To transpose a matrix there are two steps:
1. Rotate the matrix 90°
2. Reverse the order of elements in each row (e.g. [a b ] becomes [b a])


<div align="left">
<img src="https://raw.githubusercontent.com/GokuMohandas/MadeWithML/main/images/foundations/numpy/transpose.png" width="400">
</div>

- Syntax: `array.T (or np.transpose(array, (1,0))`

In [None]:
x = np.array([[1,2,3], [4,5,6]])
print ("x:\n", x)

print('\n x shape:', x.shape)


In [None]:
# Transpose matrix:
x.T

In [None]:
# Transpose
np.transpose(x)

In [None]:
# Transpose 3D tensor array
a = np.arange(24).reshape(2, 3, -1)
print('a', a)
print('\n shape', a.shape)

In [None]:
# Reshape 3D-tensor array
b = np.transpose(a, axes = (1, 0, 2))
b

## Indexing and slicing

<div align="left">
<img src="https://i.imgur.com/y0CL0S4.png" width="300">
</div>

In [None]:
a1 = np.arange(1, 20, 2)
a1

In [None]:
# Get the first value:
a1[0]

In [None]:
# Get the last value:
a1[-1]

In [None]:
# Get from values from 3 -> 5 (including 5):
a1[3: 6]

In [None]:
a2 = np.arange(12).reshape(3, 4)
print(a2, '\n')
print(a2.shape)


In [None]:
# Get [ 4  5  6  7]
a2[1, :] # or a2[1]

In [None]:
# Get [2 6 10]
a2[:, 2]

In [None]:
# get [5, 6]:
a2[1, 1:3]

In [None]:
# Get [[5, 6, 7], [9, 10, 11]]
a2[1:, 1:]

## Get values based on condition

In [None]:
a2

In [None]:
a2>5

In [None]:
# Get all values > 5:
a2[a2>5]

In [None]:
# Get all values > 5 and < 10:
a2[(a2 > 5) & (a2 < 10)]

In [None]:
# Get all values < 5 or = 10:
a2[(a2 < 5) | (a2 == 10)]

## Basic Operations

<div align="left">
<img src="https://numpy.org/doc/stable/_images/np_data_plus_ones.png" width="900">
<img src="https://numpy.org/doc/stable/_images/np_sub_mult_divide.png" width="900">

</div>

In [None]:
a1 = np.arange(1, 20, 2).reshape(2, 5)
a1

In [None]:
a2 = np.arange(20, 30).reshape(2, 5)
a2

In [None]:
a1 + a2

In [None]:
a1 - a2


In [None]:
a1 * a2

In [None]:
a1 / a2


### 👉 Dot product 2 vectors:

<div align="left">
<img src="https://d138zd1ktt9iqe.cloudfront.net/media/seo_landing_files/dot-product-of-vectors-1626103027.png" width="450">
</div>

Dot product of $\vec{a}$ and $\vec{b}$ is 
- **is positive** when they point at 'similar' directions. Bigger = more similar
- **equals 0** when they are perpendicular.
- **is negative** when they are at dissimilar directions. Smaller (more negative) = more dissimilar

In [None]:
v1 = np.array([2, 3])
v2 = np.array([-3, -10])

# Option 1: @
v1 @ v2

In [None]:
# Option 2: np.dot()
np.dot(v1, v2)


### 👉 Matrix multiplication

One of the most common NumPy operations we’ll use in machine learning is matrix multiplication using the dot product. We take the rows of our first matrix (2) and the columns of our second matrix (2) to determine the dot product, giving us an output of `[2 X 2]`. 

Not all matrices are eligible for multiplication. In addition, there is a requirement on the dimensions of the matrix product. Source.

1. The number of columns in the first matrix must equal the number of rows in the second
2. **The product of an M x N matrix and an N x K matrix is an M x K matrix**. The new matrix takes the rows of the first and columns of the second.

<div align="left">
<img src="https://raw.githubusercontent.com/GokuMohandas/MadeWithML/main/images/foundations/numpy/dot.gif" width="450">
</div>

In [None]:
a1 = np.arange(1, 7).reshape(2, 3)
a1

In [None]:
a2 = np.arange(7, 13).reshape(3, 2)
a2

In [None]:
a1 @ a2

In [None]:
np.dot(a1, a2)

### 👉 Axis operations

We can also do operations across a specific axis.

<div align="left">
<img src="https://raw.githubusercontent.com/GokuMohandas/MadeWithML/main/images/foundations/numpy/axis.gif" width="450">
</div>

In [None]:
# Sum across a dimensionwhe
x = np.array([[1,2,3],[3,4,5]])

print (x)


In [None]:
# sum all elements
np.sum(x)

In [None]:
# sum across rows
np.sum(x, axis=0)


In [None]:
# sum across columns
np.sum(x, axis=1)

In [None]:
# Sum across a dimension
x = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4]])
print (x)


In [None]:
# question: to have this output [10,10,10], what should be the axis here?


In [None]:
# question: to have this output [3,6,9,12], what should be the axis here?


👉 For other functions like: ` min, max, mean, var, std, sum` we can also do the same

In [None]:
np.min(x)

In [None]:
np.min(x, axis=1)

## Broadcasting

What is broadcasting?

**Broadcasting** is a feature of NumPy that allows arrays of different shape to perform mathematical operations against each other automatically. When performing operations on arrays of different shape, NumPy will automatically expand (broadcast) those arrays to have the same size together.

![](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png)

Steps when implementing broadcasting in numpy:

+ Step 1: Make the dimensions of the 2 arrays equal. Array with smaller dimension will add dimension "1" to the left of that array until it equals dimension of the larger array.
+ Step 2: Consider each` pair of dimensions` from `right to left` of 2 arrays. If `all dimensional pairs` satisfy `one of the following two conditions`, broadcasting can be used:

    **+ They are equal**
    
    **+ 1 in 2 equals 1**

**Matrix and scalar**

Use case: Normalize **entire** your dataset by subtracting **the mean (a scalar)** from the entire data set (a matrix) and dividing by the **standard deviation (another scalar)**

In [None]:
# Case1: matrix (3, 4) +  (4,)
a = np.array([[4, 3, 9, 1],
            [6, 2, 3, 6],
            [10, 3, 2, 1]])

b = np.array([1, 2, 3, 4])

print('array a:\n', a)
print('shape of a:', a.shape)
print('=='*15)
print('array b:\n', b)
print('shape of b:', b.shape)
print('=='*15)
print('a + b:\n', a + b)

In [None]:
# Case2: matrix (3, 4) +  (3,)
a = np.array([[4, 3, 9, 1],
            [6, 2, 3, 6],
            [10, 3, 2, 1]])

b = np.array([1, 2, 3])

print('array a:\n', a)
print('shape of a:', a.shape)

print('=='*15)

print('array b:\n', b)
print('shape of b:', b.shape)

print('=='*15)

print('a + b:\n', a + b)

In [None]:
# Case2: matrix (3, 4) +  (3,). 
# Để có thể broadcast được ta cần phải thêm 1 chiều vào axis 1 của ma trận b: 
a = np.array([[4, 3, 9, 1],
            [6, 2, 3, 6],
            [10, 3, 2, 1]])

b = np.array([1, 2, 3])
b = np.expand_dims(b, axis = [1]) #b = b[:,np.newaxis]

print('array a:\n', a)
print('shape of a:', a.shape)

print('=='*15)

print('array b:\n', b)
print('shape of b:', b.shape)

print('=='*15)

print('a + b:\n', a + b)

🤔 Question

Check whether these matrices can be broadcasted, using the broadcasting rule

- A matrix of shape (2,3) + a vector of shape (2,)
- A matrix of shape (3,2) + a matrix of shape (3,)
- A matrix of shape (2,3) + a 3d tensor of shape (4,2,3)


# Additional resources

* If you want to learn more checkout the [NumPy reference manual](https://docs.scipy.org/doc/numpy-1.15.1/reference/).
* [Standford CS213n - Python Numpy tutorial](http://cs231n.github.io/python-numpy-tutorial/#numpy)
* [TutorialsPoint - A comprehensive guide to Python Numpy](https://www.tutorialspoint.com/numpy/index.htm)
* [Datacamp - Numpy Cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)
* Another useful tutorial: https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d