### Numpy / Package

Numerical Python `library` has support for large, multi-dimensional arrays and matrices. 

In [11]:
# pip install numpy

import numpy as np

### Numpy / Vectors

A vector is one-dimensional array, `row` or column.

In [12]:
a = np.array([1, 2, 3])
b = np.array([[1], 
              [4], 
              [3]])

print("Row vector: \n", a)
print("Column vector: \n", b)

Row vector: 
 [1 2 3]
Column vector: 
 [[1]
 [4]
 [3]]


### Numpy / Matrices

Numpy's `main` data structure is the multidimensional array (matrix).

In [13]:
# matrix with three rows, four columns

A = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
])

print("Matrix: \n", A)
print("Shape:", A.shape)
print("Size:", A.size)
print("Number of array dimension:", A.ndim)

Matrix: 
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
Shape: (3, 4)
Size: 12
Number of array dimension: 2



### Matrix / Accessing Elements

The data elements in a matrix can be accessed by using `[:]` slice notation (up-to OR after).

In [14]:
a = np.array([1, 2, 3, 4, 5, 6])

print("a[:]    =", a[:])        # entire range of elements
print("a[:3]   =", a[:3])       # 0 to 3 (not included)
print("a[0:3]  =", a[0:3])      # 0 to 3 (not included)
print("a[3:]   =", a[3:])       # 3 (included) to last
print("a[-1]   =", a[-1])       # last
print("a[3:-1] =", a[3:-1])     # 3 to last (not included)

a[:]    = [1 2 3 4 5 6]
a[:3]   = [1 2 3]
a[0:3]  = [1 2 3]
a[3:]   = [4 5 6]
a[-1]   = 6
a[3:-1] = [4 5]


### Matrix / Zero Indexed

Arrays are zero-indexed, `first` element has the index 0.

In [15]:
A = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
])

print("A[1,1] =", A[1,1])                 # second row, second column
print("A[:2, :] ="); print(A[:2, :])      # 0 to 2 rows, all columns
print("A[:, 1:2] ="); print(A[:, 1:2])    # all rows, second column

A[1,1] = 6
A[:2, :] =
[[1 2 3 4]
 [5 6 7 8]]
A[:, 1:2] =
[[ 2]
 [ 6]
 [10]]


### Sparse Matrices

A sparse matrix stores `only` non-zero elements, for computation savings.

In [16]:
from scipy import sparse

A = np.array([
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 0], 
    [3, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
])
B = sparse.csr_matrix(A)
print(B)

  (1, 1)	1
  (2, 0)	3


### Matrix / Vectorize

Vectorization is used to `speed` up the Python code without using loop.  
Insteed of opertating on a single value at a time, it operates on a `set of value` (vector) at a time. 
[<u>more details</u>](https://www.geeksforgeeks.org/vectorization-in-python/)

In [28]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])
B = np.vectorize(lambda x: x + 100)(A)
print(B)

[[101 102 103]
 [104 105 106]
 [107 108 109]]


### Broadcasting

Broascasting allows Numpy to handle arrays of `different` shapes during arithmetic operations.

In [18]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])
B = A + 100
print(B)

[[101 102 103]
 [104 105 106]
 [107 108 109]]


### Text / Vectorizer

We can represent `texts` as vectors and count word frequency. 
[<u>more details</u>](https://medium.com/@sumanadhikari/building-a-movie-recommendation-engine-using-scikit-learn-8dbb11c5aa4b)

In [19]:
from sklearn.feature_extraction.text import CountVectorizer

A = ['London Paris London', 'Paris Paris London']

vectorizer = CountVectorizer()
A = vectorizer.fit_transform(A)

print("A vectorized ="); print(A)
print("A toarray() ="); print(A.toarray())

A vectorized =
  (0, 0)	2
  (0, 1)	1
  (1, 0)	1
  (1, 1)	2
A toarray() =
[[2 1]
 [1 2]]


### Matrix / Axes Operations

We can apply operations along the `axes` (rows or columns).


In [20]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])

max_value = np.max(A)
min_value = np.min(A)
avg_value = np.mean(A)
max_in_each_row = np.max(A, axis=1)
max_in_each_col = np.max(A, axis=0)

print("max(A) =", max_value)
print("min(A) =", min_value)
print("mean(A) =", avg_value) 
print("Max in each row =", max_in_each_row)
print("Max in each col =", max_in_each_col)

max(A) = 9
min(A) = 1
mean(A) = 5.0
Max in each row = [3 6 9]
Max in each col = [7 8 9]


### Addition

For matrices addition simply use `+ operator`.

In [21]:
A = np.array([
    [1, 1],
    [2, 2],
])
B = np.array([
    [1, 1],
    [3, 3],
])
C = A + B

print(C)

[[2 2]
 [5 5]]


### Multiplication

For element wise multiplication use `* operator`. &nbsp; [<u>more detail</u>](https://www.mathsisfun.com/algebra/matrix-multiplying.html), [<u>more details 2</u>](https://mkang32.github.io/python/2020/08/30/numpy-matmul.html)

In [22]:
A = np.array([
    [1, 1],
    [2, 2],
])
B = np.array([
    [1, 1],
    [3, 3],
])

C = A * B
D = A @ B

print(C)
print(D)

[[1 1]
 [6 6]]
[[4 4]
 [8 8]]


### Reshape

Reshape `maintain` the data but as different numbers of rows and columns.

In [23]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12],
])

B = A.reshape(2, 6)     # same dimension as original matrix
C = A.reshape(1, -1)    # -1 means 'as many as needed'
D = A.flatten()         # transform the matrix in vector (one-dimensional array)

print("Reshape(2, 6) = ");  print(B)
print("Reshape(1, -1) =");  print(C)
print("Flatten =");         print(D)

Reshape(2, 6) = 
[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]]
Reshape(1, -1) =
[[ 1  2  3  4  5  6  7  8  9 10 11 12]]
Flatten =
[ 1  2  3  4  5  6  7  8  9 10 11 12]


### Transpose

The transpose of a matrix is simply a `flipped version` of the original matrix. [<u>more details</u>](https://mathinsight.org/matrix_transpose)

In [24]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])
B = A.T
B

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

### Inverse

$
    AA^{-1} = I
$

When A is multiply with the `inverse` matrix the result is `identity` matrix. [<u>more details</u>](http://www.mathwords.com/i/inverse_of_a_matrix.htm) 

In [25]:
A = np.array([
    [4, 3],
    [3, 2],
])
I = np.array([
    [1, 0],
    [0, 1],
])
B = np.linalg.inv(A) # inverse

assert (A @ B == I) .all() # all matrix elements are equals
assert (B @ A == I) .all() # order of matrices multiplication doesn't matter
print(A)
print(B)
print(A @ B)

[[4 3]
 [3 2]]
[[-2.  3.]
 [ 3. -4.]]
[[1. 0.]
 [0. 1.]]


### Data / Variance

$
    σ^2 = Σ (xi – μ)^2 / N 
$

Variance is a measure of the `spread` of the data. [<u>more details</u>](https://www.statology.org/sample-variance-vs-population-variance/)

In [26]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])
B = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 900], # Look Here
])

# Algorithm
def population_variance(X):
    N = X.size
    avg = np.mean(X)
    variance = (1/N) * np.sum((X - avg)**2)
    return variance

A_variance = population_variance(A)
B_variance = population_variance(B)

assert B_variance > A_variance
print("A_variance = ", A_variance.round(2))
print("B_variance = ", B_variance.round(2))
print("np.var(A) = ",  np.var(A).round(2)) # build-in
print("np.var(B) = ",  np.var(B).round(2))


A_variance =  6.67
B_variance =  79206.67
np.var(A) =  6.67
np.var(B) =  79206.67


### Data / Standard Deviation

$
    σ^2 = \sqrt{Σ (xi – μ)^2 / N} 
$

Standard deviation is `more intuitive` than variance, as it is expressed in the same units as data. [more details](https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/variance-standard-deviation-population/a/calculating-standard-deviation-step-by-step)

In [27]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])
B = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 900],
])

# Algorithm
def population_variance(X):
    N = X.size
    avg = np.mean(X)
    variance = (1/N) * np.sum((X - avg)**2)
    return variance

def standard_deviation(X):
    return np.sqrt(population_variance(X))

A_std = standard_deviation(A)
B_std = standard_deviation(B)

assert B_std > A_std
print("A standard deviation =", A_std.round(2)) # build-in
print("B standard deviation =", B_std.round(2))
print("np.std(A) =", np.std(A).round(2))
print("np.std(B) =", np.std(B).round(2))

A standard deviation = 2.58
B standard deviation = 281.44
np.std(A) = 2.58
np.std(B) = 281.44
