# **Section 2: Numpy** 
<a href="https://colab.research.google.com/github/osuranyi/UdemyCourses/blob/main/NumpyStack/Section2_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **1. Introduction**

Central objects: arrays

What Numpy can be used for?
* Matrix operations (addition, multiplication, etc.)
* Solving linear systems
* Calculating inverse and determinant of matrices
* Generate random numbers

Applications:
* Linear regression
* Logistic regression
* Deep neural networks
* K-means clustering
* Density estimation
* Principal components analysis
* Matrix factorization (recommender systems)
* Support vector machines
* Markov models
* Control systems
* Game theory
* Operation research
* Portfolio optimization

*Remark:* In Numpy, vectors are 1D (not an N x 1 ''2D array'')

## **2. Array vs. lists**

In [118]:
import numpy as np

In [119]:
L = [1,2,3]

In [120]:
A = np.array([1,2,3])

In [121]:
for e in L:
  print(e)

1
2
3


In [122]:
for e in A:
  print(e)

1
2
3


In [123]:
L.append(4)

In [124]:
L

[1, 2, 3, 4]

Size of an array is fixed, no append method for array:

In [125]:
#A.append(4)

Lists can be concatenated:

In [126]:
L + [5]

[1, 2, 3, 4, 5]

Arrays work differently, adds this element to all elements of A, this is called broadcasting:

In [127]:
A + np.array([4])

array([5, 6, 7])

When adding two same sized array, they are added up element-wise:

In [128]:
A + np.array([4,5,6])

array([5, 7, 9])

But broadcasting won't work when we try to add two different sized vector (none of them with length one):

In [129]:
#A + np.array([4,5])

Scalar multiplication works as expected in case of arrays:

In [130]:
2 * A

array([2, 4, 6])

But list gets repeated two times:

In [131]:
2 * L

[1, 2, 3, 4, 1, 2, 3, 4]

Using lists to add value to each element:

In [132]:
L2 = []
for e in L:
  L2.append(e+3)

In [133]:
L2

[4, 5, 6, 7]

Same with list comprehension:

In [134]:
L2 = [e + 3 for e in L]

In [135]:
L2

[4, 5, 6, 7]

This is pretty flexible, e.g. square every list element:

In [136]:
L2 = []
L2 = [e**2 for e in L]
L2

[1, 4, 9, 16]

But using arrays, this is much easier:

In [137]:
A**2

array([1, 4, 9])

Functions mostly applied elementwise for arrays:

In [138]:
np.sqrt(A)

array([1.        , 1.41421356, 1.73205081])

In [139]:
np.log(A)

array([0.        , 0.69314718, 1.09861229])

In [140]:
np.tanh(A)

array([0.76159416, 0.96402758, 0.99505475])

List looks like an array, but is a more general data structure. Numpy array exist for mathematics.

## **3. Dot product**

$$
a \cdot b = a^T b = \sum_{d=1}^D a_d b_d
$$

In [141]:
a = np.array([1,2])
b = np.array([3,4])

Performing dot product "by hand"

In [142]:
dot = 0
for e, f in zip(a,b):
  dot += e*f
dot

11

In [143]:
dot = 0
for i in range(len(a)):
  dot += a[i] * b[i]
dot

11

What happens if we use `*` operator?

In [144]:
a * b

array([3, 8])

Elementwise, but can be used to calculate dot product:

In [145]:
np.sum(a * b)

11

In [146]:
(a * b).sum()

11

Using the dedicated `dot` function:

In [147]:
np.dot(a,b)

11

Also works as an instance method:

In [148]:
a.dot(b)

11

The symbol `@` also performs the dot product:

In [149]:
a @ b

11

Alternative definition of dot product:
$$
a^T b = \|a\| \, \|b\| \cos\theta
$$

In [150]:
amag = np.sqrt(a@a)
amag

2.23606797749979

In [151]:
np.linalg.norm(a)

2.23606797749979

In [152]:
cosangle = a@b / (np.linalg.norm(a) * np.linalg.norm(b))

In [153]:
angle = np.arccos(cosangle)
angle

0.17985349979247847

##  **3. Speed test**

In this part, we will measure how much faster dot product is in numpy. First we define two vectors:

In [154]:
from datetime import datetime

a = np.random.randn(100)
b = np.random.randn(100)
T = 100000

Then create a function for the ''by hand'' calculation:

In [155]:
def slow_dot_product(a,b):
  result = 0
  for e,f in zip(a,b):
    result += e*f
  return result

Also a function for list comprehension dot product:

In [156]:
def list_comprehension_dot_product(a,b):
  result = 0
  result = sum(e*f for e,f in zip(a,b))
  return result

Now, timing this function and the built-in dot product:

In [157]:
t0 = datetime.now()
for t in range(T):       # running dot product T times to be more accurate
  slow_dot_product(a,b)
dt1 = datetime.now()-t0

t0 = datetime.now()
for t in range(T):
  list_comprehension_dot_product(a,b)
dt2 = datetime.now()-t0

t0 = datetime.now()
for t in range(T):
  a.dot(b)
dt3 = datetime.now()-t0


print("List comprehension is faster by this factor:",dt1.total_seconds() / dt2.total_seconds())
print("Numpy method is faster by this factor:",dt1.total_seconds() / dt3.total_seconds())

List comprehension is faster by this factor: 1.0359727764188864
Numpy method is faster by this factor: 58.46312387611131


##  **Matrices**

There is a `numpy.matrix` object, but not recommended to use. Using array is recommended instead, because it can be any dimension. Exception is sparse matrix.

Creating matrix with list of lists:

In [159]:
L = [[1,2],[3,4]]
L

[[1, 2], [3, 4]]

Get first row:

In [160]:
L[0]

[1, 2]

Access element:

In [None]:
L[0][1]

Using numpy.array:

In [161]:
A = np.array([[1, 2], [3, 4]])
A

array([[1, 2],
       [3, 4]])

Accessing elements in numpy.array:

In [None]:
A[0][1]
A[0,1]

Now we can retrieve a column:

In [162]:
A[:,0]

array([1, 3])

Transpose of matrix:

In [164]:
A.T

array([[1, 3],
       [2, 4]])

Element-wise exponentiation (and other function):

In [165]:
np.exp(A)

array([[ 2.71828183,  7.3890561 ],
       [20.08553692, 54.59815003]])

Also, list can be passed, it will be converted to `np.array` automatically:

In [167]:
np.exp(L)

array([[ 2.71828183,  7.3890561 ],
       [20.08553692, 54.59815003]])

Matrix multiplication (inner dimensions must match!):

In [169]:
B = np.array([[1,2,3],[4,5,6]])

In [170]:
A.dot(B)

array([[ 9, 12, 15],
       [19, 26, 33]])

Determinant:

In [171]:
np.linalg.det(A)

-2.0000000000000004

Inverse:

In [172]:
np.linalg.inv(A)

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

Checking if this is really the inverse:

In [173]:
np.linalg.inv(A).dot(A)

array([[1.00000000e+00, 0.00000000e+00],
       [1.11022302e-16, 1.00000000e+00]])

Trace:

In [174]:
np.trace(A)

5

Extracting diagonal elements into a vector:

In [175]:
np.diag(A)

array([1, 4])

Create a diagonal matrix from vector uses the same function - this is overloaded:

In [176]:
np.diag([1,4])

array([[1, 0],
       [0, 4]])

Eigenvalues and eigenvectors:

In [177]:
Lam, V = np.linalg.eig(A)
Lam
V

array([[-0.82456484, -0.41597356],
       [ 0.56576746, -0.90937671]])

Check this in the eigen equation:

In [178]:
V[:,0] * Lam[0] == A @ V[:,0]

array([ True, False])

This should have been `[True,True]`. The reason is numerical precision:

In [179]:
V[:,0] * Lam[0], A @ V[:,0]

(array([ 0.30697009, -0.21062466]), array([ 0.30697009, -0.21062466]))

We should use `np.allclose(u,v)` function instead, which checks whether the elements of u and v within a small $\varepsilon$ distance:

In [180]:
np.allclose(V[:,0] * Lam[0], A @ V[:,0])

True

We can also check all eigenvalues/vectors using matrix notification:

In [182]:
np.allclose(V @ np.diag(Lam), A @ V)

True

For symmetric matrix, it is better to use `numpy.linalg.eigh`

## **4. Solving linear systems**

Very common problem in all areas of science and engineering. Example problem:
\begin{align}
x_1 + x_2 &= 2200 \\
1.5x_1 + 4x_2 &= 5500
\end{align}

This could be written as a matrix equation:
$$
\mathbf{A} \mathbf{x} = \mathbf{b}
$$

In [184]:
A = np.array([[1,1],[1.5,4]])
b = np.array([2200,5500])

In theory, this could be solved by inverting $\mathbf{A}$:
$$ \mathbf{x} = \mathbf{A}^{-1} \mathbf{b} $$
But in practice, this is really inefficient in most cases. There are better algorithms, such as Gauss elimination. There is a built in function in numpy to solve linear systems:

In [185]:
np.linalg.solve(A, b)

array([1320.,  880.])