# **Section 2: Numpy** 
<a href="https://colab.research.google.com/github/osuranyi/UdemyCourses/blob/main/NumpyStack/Section2_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **1. Introduction**

Central objects: arrays

What Numpy can be used for?
* Matrix operations (addition, multiplication, etc.)
* Solving linear systems
* Calculating inverse and determinant of matrices
* Generate random numbers

Applications:
* Linear regression
* Logistic regression
* Deep neural networks
* K-means clustering
* Density estimation
* Principal components analysis
* Matrix factorization (recommender systems)
* Support vector machines
* Markov models
* Control systems
* Game theory
* Operation research
* Portfolio optimization

*Remark:* In Numpy, vectors are 1D (not an N x 1 ''2D array'')

## **2. Array vs. lists**

In [52]:
import numpy as np

In [53]:
L = [1,2,3]

In [54]:
A = np.array([1,2,3])

In [55]:
for e in L:
  print(e)

1
2
3


In [56]:
for e in A:
  print(e)

1
2
3


In [57]:
L.append(4)

In [58]:
L

[1, 2, 3, 4]

Size of an array is fixed, no append method for array:

In [59]:
A.append(4)

AttributeError: ignored

Lists can be concatenated:

In [60]:
L + [5]

[1, 2, 3, 4, 5]

Arrays work differently, adds this element to all elements of A, this is called broadcasting:

In [61]:
A + np.array([4])

array([5, 6, 7])

When adding two same sized array, they are added up element-wise:

In [62]:
A + np.array([4,5,6])

array([5, 7, 9])

But broadcasting won't work when we try to add two different sized vector (none of them with length one):

In [63]:
A + np.array([4,5])

ValueError: ignored

Scalar multiplication works as expected in case of arrays:

In [64]:
2 * A

array([2, 4, 6])

But list gets repeated two times:

In [65]:
2 * L

[1, 2, 3, 4, 1, 2, 3, 4]

Using lists to add value to each element:

In [66]:
L2 = []
for e in L:
  L2.append(e+3)

In [67]:
L2

[4, 5, 6, 7]

Same with list comprehension:

In [68]:
L2 = [e + 3 for e in L]

In [69]:
L2

[4, 5, 6, 7]

This is pretty flexible, e.g. square every list element:

In [70]:
L2 = []
L2 = [e**2 for e in L]
L2

[1, 4, 9, 16]

But using arrays, this is much easier:

In [71]:
A**2

array([1, 4, 9])

Functions mostly applied elementwise for arrays:

In [72]:
np.sqrt(A)

array([1.        , 1.41421356, 1.73205081])

In [73]:
np.log(A)

array([0.        , 0.69314718, 1.09861229])

In [74]:
np.tanh(A)

array([0.76159416, 0.96402758, 0.99505475])

List looks like an array, but is a more general data structure. Numpy array exist for mathematics.

## **3. Dot product**

$$
a \cdot b = a^T b = \sum_{d=1}^D a_d b_d
$$

In [75]:
a = np.array([1,2])
b = np.array([3,4])

Performing dot product "by hand"

In [76]:
dot = 0
for e, f in zip(a,b):
  dot += e*f
dot

11

In [77]:
dot = 0
for i in range(len(a)):
  dot += a[i] * b[i]
dot

11

What happens if we use * operator?

In [78]:
a * b

array([3, 8])

Elementwise, but can be used to calculate dot product:

In [79]:
np.sum(a * b)

11

In [80]:
(a * b).sum()

11

Using the dedicated *dot* function:

In [81]:
np.dot(a,b)

11

Also works as an instance method:

In [82]:
a.dot(b)

11

The symbol @ also performs the dot product:

In [83]:
a @ b

11

Alternative definition of dot product:
$$
a^T b = \|a\| \, \|b\| \cos\theta
$$

In [84]:
amag = np.sqrt(a@a)
amag

2.23606797749979

In [85]:
np.linalg.norm(a)

2.23606797749979

In [86]:
cosangle = a@b / (np.linalg.norm(a) * np.linalg.norm(b))

In [87]:
angle = np.arccos(cosangle)
angle

0.17985349979247847

##  **3. Speed test**

In this part, we will measure how much faster dot product is in numpy. First we define two vectors:

In [88]:
from datetime import datetime

a = np.random.randn(100)
b = np.random.randn(100)
T = 100000

Then create a function for the ''by hand'' calculation:

In [89]:
def slow_dot_product(a,b):
  result = 0
  for e,f in zip(a,b):
    result += e*f
  return result

Also a function for list comprehension dot product:

In [95]:
def list_comprehension_dot_product(a,b):
  result = 0
  result = sum(e*f for e,f in zip(a,b))
  return result

Now, timing this function and the built-in dot product:

In [97]:
t0 = datetime.now()
for t in range(T):       # running dot product T times to be more accurate
  slow_dot_product(a,b)
dt1 = datetime.now()-t0

t0 = datetime.now()
for t in range(T):
  list_comprehension_dot_product(a,b)
dt2 = datetime.now()-t0

t0 = datetime.now()
for t in range(T):
  a.dot(b)
dt3 = datetime.now()-t0


print("List comprehension is faster by this factor:",dt1.total_seconds() / dt2.total_seconds())
print("Numpy method is faster by this factor:",dt1.total_seconds() / dt3.total_seconds())

List comprehension is faster by this factor: 0.9663062697702786
Numpy method is faster by this factor: 56.20730057998908
