<a href="https://colab.research.google.com/github/kis-balazs/machine-learning/blob/main/MLCourse_np-pd-plt-ip.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import os
from pprint import pprint
from pathlib import Path

# used for parameter type declaration
from typing import Union

---
*Author*: Balázs Kis

*Email*: balazskis@gmail.com

---
# Setup evaluation function

**PLEASE DO NOT ALTER** this code, since is used for setting up the solution database together with the evaluate function.

---

---

*We would like to give you an easy and nice evaluation process for the problems we prepared for you, and this cell is highly relevant for that task. If you change anything in the cell, please re-roll to the original version if you would like the evaluation to be correct. Otherwise we cannot account for any difference between the correct results and the results we prepared.*

In [None]:
assert not Path('./le').exists(), 'Please DO NOT attempt to run this cell more than once for each runtime!'

# dependencies
%pip install cryptography --quiet
%pip install git+https://github.com/ozgur/python-firebase --quiet

# setup for the evaluation
os.system("git clone -l -s https://gist.github.com/kis-balazs/872f4e35871942f3f7076bbc5626c226 le")
os.chdir('le')

from load_evaluate import *

os.chdir('..')

# Useful documentation

 - [Python3 Cheatsheet](https://github.com/kis-balazs/machine-learning/blob/main/res/python3_cheatsheet.pdf)
 - ...

---
# **NumPy**

Official reference: https://numpy.org/doc/stable/reference/index.html

$\color{#F0000C}{Hint: check \nobreakspace documentation \nobreakspace CONSTANTLY}$

In [None]:
import numpy as np
print(np.__version__)

building blocks of mostly every problem in computer science:
 - numbers
 - arrays (1D, 2D, 3D, 4D(?), ...)
 - operations & underlying mathematics

*We will see that many problems in computer science can be solved using numpy, i.e. <ins>mathematics</ins> :D*

## 1) Python < Numpy

### 1.1) Python and NumPy arrays



In [None]:
arr = [3, 4, 5, 2, 6, 1]

In [None]:
arr[0] = 3.0

In [None]:
print(type(arr))
print(arr[0], type(arr[0]))

In [None]:
arr_np = np.array(arr)
print(type(arr_np))
print(arr_np[0], type(arr_np[0]))

In [None]:
arr_np1 = np.array(arr)
print(type(arr_np1))
print(arr_np1[0], type(arr_np1[0]))

In [None]:
arr_np[0] = 4

In [None]:
print(arr_np == arr_np1)
# unified answer?
print((arr_np == arr_np1).all())

### 1.2) 1D

In [None]:
# the above array represents the average temperature in Cairo, Egypt in the week of 20th-26th (Monday-Sunday) April 1992
temps = [20, 20, 17, 16, 16, 17, 19]

# in parallel, using numpy
temps_np = np.array(temps)

In [None]:
# compute the biggest temperature of the week
# python
print('Max temp (Python):', max(temps))
# numpy
print('Max temp (NumPy):', temps_np.max())

In [None]:
# compute the mean temperature of the week
# python
print('Max temp (Python):', sum(temps) / len(temps))
# numpy
print('Max temp (NumPy):', temps_np.mean())

In [None]:
# compute the standard deviation of the temperature of the week (2 decimals)
# Hint: https://datascienceparichay.com/wp-content/uploads/2021/09/standard-deviation-formula-768x444.png.webp

# python
mean = sum(temps) / len(temps)
print('Max temp (Python):', (sum(pow((elem - mean), 2) for elem in temps) / len(temps)) ** .5)  # look carefully at the last operation
# numpy
print('Max temp (NumPy):', temps_np.std())

#### Problems

In [None]:
# ######
# Problem 1:
# We have an ndarray given below.
# What is the index of the smallest value in the array?
# ######
arr = np.array([-1, 0, -1, -2, -2, 2, 3, -3, 1])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/1D/p1', submitted_solution=solution)

In [None]:
# ######
# Problem 2:
# We have an ndarray given below.
# What is the sum of the of positive values in the array?
# ######
arr = np.array([-1, 0, -1, -2, -2, 2, 3, -3, 1])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/1D/p2', submitted_solution=solution)

In [None]:
# ######
# Problem 3:
# We have an ndarray given below.
# What is the sum of values in the array, after rounding to 2 decimals? Hint: result is float!
# ######
arr = np.array([3.57042396, 8.35559273, 0.7621579 , 3.53006476,
                0.98356653, 7.66801256, 3.32831346, 4.70137415])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/1D/p3', submitted_solution=solution)

### 1.3) 2(+)D

 - **2D**: two-dimensional array (i.e. matrix)
 - 3D: three-dimensional array
 - 4D: four-dimensional array
 - ...

*Observation*: 3+ dimensional arrays are called dimensionality-wise, i.e. n-dimensional arrays, or **tensors** as we will see in the near future.

In [None]:
# build the I3 (identity) matrix; try I4, I5, ... In;
# Important notation: When n is known from context, the identity matrix is written as I.
n = 3
# python
print('I{} (Python):\n'.format(n), [[1 if j == i else 0 for j in range(0, n)] for i in range(0, n)])
# numpy
print('I{} (NumPy):\n'.format(n), np.identity(n))  # eye?

---

In [None]:
# given a 2x2 matrix of arbitrary values, compute it's determinant
A = [
     [5, 2],
     [4, 0.5]
]
# python
print('Determinant (Python):', A[0][0] * A[1][1] - A[0][1] * A[1][0])  # Sarrus rule
# numpy
print('Determinant (NumPy):', np.linalg.det(np.array(A)))  # linalg?

In [None]:
# given a 3x3 matrix of arbitrary values , compute it's determinant
A = [
     [5, 2, 3],
     [4, 0.5, 1.7],
     [-2, 6, 3]
]
# python
from functools import reduce 

def derive_comp(A, _j, n):
    component_array = [A[(cnt) % n][(cnt + _j) % n] for cnt in range(0, n)]
    return reduce((lambda a, b: a*b), component_array) 

# Sarrus rule 3x3
def det(A, n):
    pos_comps = [derive_comp(A, _j, n) for _j in range(0, n)]
    neg_comps = [derive_comp(A[::-1], _j, n) for _j in range(0, n)]

    return sum(pos_comps) - sum(neg_comps)


print('Determinant (Python):', det(A, 3))
# numpy
print('Determinant (NumPy):', np.linalg.det(np.array(A)))  # linalg?

In [None]:
# given a 4x4 matrix of arbitrary values , compute it's determinant
A = [
     [5, 2, 3, 1],
     [4, 0.5, 1.7, -2],
     [-2, 6, 3, 1],
     [-1, 1, -1, 1]
]
# python
# BAD NEWS: https://www.quora.com/Can-the-Sarrus-rule-be-applied-to-4x4-determinants?share=1
# perhaps use Laplace rule / Gaussian elimination
print('Determinant (Python): ???')
# numpy
print('Determinant (NumPy):', np.linalg.det(np.array(A)))  # linalg?

---

In [None]:
# given a 2x2 matrix of arbitrary values, compute it's inverse
n = 2
A = [
     [5, 2],
     [4, 0.5]
]
# python

def inv2x2(A):
    det_a = 1.0 / (A[0][0] * A[1][1] - A[0][1] * A[1][0])
    adjugate = [
        [A[1][1], -A[0][1]],
        [-A[1][0], A[0][0]]
    ]
    return [[det_a * adjugate[i][j] for j in range(0, n)] for i in range(0, n)]

print('Inverse (Python):\n', inv2x2(A))
# numpy
print('Inverse (NumPy):\n', np.linalg.inv(np.array(A)))  # linalg?

In [None]:
# 3x3 inverse: already gets quite complicated to compute adjugate using Python, Numpy just scales :D

#### Problems

In [None]:
# ######
# Problem 1:
# We have a 2D ndarray given below.
# What is the inverse of the matrix?
# ######
A = np.array([[1, 3], [2, 6]])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/2D/p1', submitted_solution=solution)

In [None]:
# ######
# Problem 2:
# We have a 4x4 2D ndarray given below.
# What is the determinant of the (matrix multiplied by it's inverse)?
# Hint1: np.dot() (inner product)s
# Hint2: round result matrix to int!
# ######
A = np.array([[5, 2, 3, 1], [4, 0.5, 1.7, -2], [-2, 6, 3, 1], [-1, 1, -1, 1]])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/2D/p2', submitted_solution=solution)

In [None]:
# ######
# Problem 3:
# Formalise the previous problem as mathematical equation using the following notation:
# Inverse of a matrix: A^-1
# Dot (inner) product between two matrices: A dot B
# ######
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/2D/p3', submitted_solution=solution)

### 4) Conclusion

I think a fine conclusion can be drawn that even when dealing with such simplistic problems, numpy is a way better solution than pure python, in arrays of **any dimensions**.

In the upcomings we will see more *complex functions*, and use-cases for numpy which represent better the real power of numpy in an immense variety of computer science-related mathematics:
 - $\color{#20b2aa}{signal \nobreakspace processing}$ (1D, 2D (images), etc.)
 - $\color{#00ff00}{linear \nobreakspace algebra}$
 - $\color{#ffbf00}{analytical \nobreakspace geometry}$
 - ...

$\color{#F0000C}{In \nobreakspace the \nobreakspace upcomings \nobreakspace pay \nobreakspace extra \nobreakspace attention \nobreakspace to \nobreakspace how \nobreakspace numpy \nobreakspace handles \nobreakspace \textbf{dimensions}.}$

## 2) NumPy in computer science

### 2.1) NumPy ndarray operations

#### 2.1.1) 1D ndarray

In [None]:
# create an ndarray of 0s
print(np.zeros(4))

In [None]:
# create an ndarray of random numbers (in range [0, 1))
print(np.random.random(4))

In [None]:
# create an ndarray with values in a range
print(np.arange(10, 20, step=1))

In [None]:
# sort an ndarray of random values in ascending order
print(np.sort(np.random.random(3)))

# descending?
print(np.sort(np.random.random(3))[::-1])

---

In [None]:
# two arrays of 5 values
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([-1, 3, -2, 4, -7])

In [None]:
print(arr1)  # shape?

In [None]:
print(arr2)

In [None]:
# access elements in the 1d ndarray:
print(arr2[0])

###### W/ constants

In [None]:
# add/sub of a constant
print(arr1 + 2)
print(arr2 - .7)

In [None]:
# mult/div with a constant - automatic float?
print(arr1 * 1.5)
print(arr2 / 3)

In [None]:
# append constant to ndarray
print(np.append(arr1, -1))

---
###### W/ 1D ndarrays

In [None]:
# add/sub
print(arr1 + arr2)  # np.add() | np.append(arr2, 1)
print(arr1 - arr2)  # np.subtract()
print(arr2 - arr1)

In [None]:
# mult/div 
print(arr1 * arr2)
print(arr1 / arr2)

In [None]:
# append array to array
print(np.append(arr1, arr2))

---
###### Relevant functions

In [None]:
# transpose
print(arr2)

print(arr2.T)  # no difference? why?

In [None]:
# norm of a ndarray - distance from origin (outer product)
print(np.linalg.norm(arr1))

In [None]:
# dot product (inner product)
print(np.dot(arr1, arr2))

# print(arr1 @ arr2)

What are **norm and dot product** good for? quite a lot actually!

[$\color{#00ff00}{linear \nobreakspace algebra}$/$\color{#ffbf00}{analytical \nobreakspace geometry}$] Computing the angle between two vectors

In [None]:
# ndarrays representing points in n dimensions
v1 = [1, 6, 5]
v2 = [4, 4, 5]

angle = np.arccos(np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2)))  # radian!
print('Angle between v1 and v2:', np.rad2deg(angle))

In [None]:
# 2D
assert(len(v1) == 2)
assert(len(v2) == 2)

import matplotlib.pyplot as plt

plt.plot([0, v1[0]], [0, v1[1]], color='blue')
plt.plot([0, v2[0]], [0, v2[1]], color='red')
plt.show()

In [None]:
# 3D
assert(len(v1) == 3)
assert(len(v2) == 3)

import matplotlib.pyplot as plt

fig = plt.figure()
ax = plt.axes(projection="3d")
ax.plot3D([0, v1[1]], [0, v1[2]], [0, v1[2]], color='blue')
ax.plot3D([0, v2[1]], [0, v2[2]], [0, v2[2]], color='red')
plt.show()

#### 2.1.2) 2D ndarray

In [None]:
# create an 2x2 2d ndarray of 0s
print(np.zeros(4).reshape((2, 2)))

print(np.zeros((2, 2)))  # shape

In [None]:
# create an 2d ndarray of random numbers (in range [0, 1))
# how can we construct shapes? (row, col) !
arr = np.random.random(6)
# arr = arr.reshape(2, 3)
# arr = np.resize(arr, (3, 2))
print(arr)

# print(arr.reshape(2, 2))

In [None]:
# create a 3x3 2d ndarray of random values without reshaping a 1d ndarray
print(np.random.random((3, 3)))

In [None]:
# create an ndarray with values in a range
print(np.arange(10, 20, step=1).reshape((2, 5)))  # alternatives?

---

In [None]:
# two matrices of uniform values betwen (-1, 1)  ! check documentation!
mat1 = np.random.uniform(-1, 1, size=(3, 2))
mat2 = np.random.uniform(-1, 1, size=(2, 2))

In [None]:
print(mat1)
print(mat1.shape)

In [None]:
print(mat2)
print(mat2.shape)

In [None]:
# access elements in the 2d ndarray:
print(mat1[0])
i, j = 1, 1
print(mat1[i][j])
print(mat1[i, j])

##### W/ constants

In [None]:
# add/sub of a constant
print(mat1 + 2)
print(mat2 - .7)

In [None]:
# mult/div with a constant - automatic float?
print(mat2 * 1.5)
print(mat1 / 3)

In [None]:
# append constant to ndarray
_mat = np.append(mat1, -1)
print(_mat)  # what???
print(_mat.shape)

##### W/ 1D arrays

In [None]:
# add/sub
print(mat1 + np.array([-1, 1]))  # np.add()
print(mat2 - np.array([1, 1]))  # np.subtract()

In [None]:
# mult/div 
print(mat2 * np.array([-1, 1]))
print(mat1 / np.array([.5, .25]))

In [None]:
# append array to array
_mat = np.append(mat1, np.array([5, 5]))
print(_mat)

# print(_mat.reshape(?, ?))

##### W/ 2D arrays

In [None]:
# add/sub
print(mat1 + mat2)  # np.add()
print(mat2 - mat1)  # np.subtract()

In [None]:
# mult/div 
print(mat2 * mat1)
print(mat1 / np.eye(3))

In [None]:
# append array to array
_mat = np.append(mat1, mat2)
print(_mat)

# print(_mat.reshape(?, ?))

##### Relevant functions

In [None]:
# transpose
print(mat1)
print()
print(mat1.T)  # huh

In [None]:
# norm of a ndarray - distance from origin
print(np.linalg.norm(mat2))  # what are we computing here exactly?

# print(np.linalg.norm(mat2.reshape(4, 1)))

# Does this make sense?

In [None]:
# dot product with 1d ndarray
mat1 = np.arange(6).reshape(3, 2)
arr1 = np.array([1, 1])

print(mat1, '\n\n', arr1, '\n\nDot:')
print(mat1 @ arr1) # rule?

In [None]:
# dot product with 2d array
mat1 = np.arange(6).reshape(3, 2)
mat2 = np.eye(2).reshape(2, 2)
# mat2[[0, 1]] = mat2[[1, 0]]  # what did I do here?
print(mat1, '\n\n', mat2, '\n\nDot:')

_dot = np.dot(mat1, mat2)
print(_dot.shape)  # rule?
# print(_dot)

# print(mat1 @ mat2)

In [None]:
# interchange rows/columns in a matrix
mat1 = np.random.random((3, 3))
print(mat1)

print('\nInterchange rows:')
rmat = mat1[[2, 1, 0]]
print(rmat)

print('\nInterchange columns:')
cmat = mat1[:][[1, 2, 0]]
# cmat = mat1[:, [1, 2, 0]]
print(cmat)

#### 2.1.3) 3(+)D ndarray

So-called [tensors](https://en.wikipedia.org/wiki/Tensor) or higher dimensional matrices.

*In theory*: everything is a tensor, but "missing" a couple dimensions :)

In [None]:
# create a 3x3x3 3d ndarray
arr1 = np.random.randint(0, 5, size=(3, 3, 2))
print(arr1)

print(arr1[0])  # [0]...

In [None]:
# create a 4x4x3x2 4d ndarray
arr1 = np.random.randint(0, 5, size=(4, 4, 3, 2))
print(arr1)

print(arr1[0])  # [0]...

Can this go to "infinity"?

**Careful at size parameter, and how size is defined for any other data structure!**

---
##### Apply known functions/operations on tensors

In [None]:
arr2 = np.random.randint(-5, 5, size=(2, 2, 2))
print(arr2.max())

In [None]:
print(np.linalg.inv(arr2))  # we will have to believe them...

In [None]:
res = np.dot(arr1, arr2)
print(res.shape)

In [None]:
res = arr1 * arr2  # np.random.randint(0, 5, size=(?))
print(res.shape)

In [None]:
res = arr1 * np.random.randint(0, 3, size=3)
print(res.shape)

In [None]:
# let's try some together!

##### Why are tensors **useful**?

Classify an image: use a NxN pixel image and feed it through a network, get label (e.g. cat/dog).

* Modern machine learning frameworks allow us to "stack" multiple 
images in **batches** i.e. store multiple images and create, using a batch size of 32, tensors of shape (32xNxN).

* These tensors are fed through the network, and using the highly optimized an parallelized GPUs (Graphical Processing Units, initially developed for video outputs, but highly utilized in machine learning) the networks can load all 32 images (of size 64x64) in the network and produce 32 labels **at once**.

* Using batches there are some more mathematical aspects which are inhanced such as a more "smooth" loss computation, but we will get to that later!

---
Let's understand why using batches is useful with a very simple example.

[!!!] Please note, this is a very theoretical proof, and we are trying only to make you understand the practicality of the batches, i.e. tensors in machine learning!

In practice the speed-up of executing depends on GPU specifications and size, even though neural networks are [*embarrassingly* parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel).

In [None]:
# let's assume that a neural network NN is taking an image, and takes 10ns (1e-5s) *per pixel* to produce a label

def network_timing(input: np.ndarray, batch_size: int = 1) -> None:
    NN = lambda x: np.prod(input.shape) * 1e-5
    comp_time = NN(input)
    if len(input.shape) == 3:
         comp_time /= batch_size
    print('To process an image NN takes {} seconds'.format(comp_time))

In [None]:
# create an image, pixel values are NOT important, only the size
img = np.random.random(size=(16, 16))
network_timing(img)

---
Create a set of images (16x16), used to run through the network, of size 10.000:

In [None]:
img_set = np.random.random(size=(10_000, 16, 16))  # careful at order!

In [None]:
print(img_set[0].shape)

In [None]:
network_timing(img_set)

---
Let's use a real-time example in which 200_000 images of shape 64x64 are loaded in the network:

In [None]:
del img_set_large

In [None]:
img_set_large = np.random.random(size=(200_000, 64, 64))

In [None]:
print(img_set_large[0].shape)

In [None]:
# let's see how much memory does the large image set take
print('{} Mb'.format(img_set_large.size * img_set_large.itemsize / 1024 ** 2))

In [None]:
network_timing(img_set_large)

---
Let's utilize our GPU which can process 32 images in parallel at once!

In [None]:
network_timing(img, batch_size=32)  # no effect?

In [None]:
network_timing(img_set, batch_size=32)

In [None]:
network_timing(img_set_large, batch_size=32)

In real-life GPUs support a much larger number of batches, and there are multiple dependencies such as RAM (GPU uses pre-loaded data, so the data has to fit in the RAM) and many other details. [Further reading](https://stackoverflow.com/questions/45132809/how-to-select-batch-size-automatically-to-fit-gpu).

Let's just experiment with larger batch sizes :)

In [None]:
network_timing(img_set_large, batch_size=512)

[(Optional) Why deep learning uses GPUs?](https://towardsdatascience.com/why-deep-learning-uses-gpus-c61b399e93a0)

### 2.2) NumPy "everywhere"

#### Linear equations

[$\color{#00ff00}{linear \nobreakspace algebra}$] Classical problem of solving n-unknown/n-equation problems.

Solve the following problem using numpy:

![](https://i.pinimg.com/474x/68/1d/b4/681db4a743b93089bf6fab5f01a4eb61.jpg)

In [None]:
fruits = [
    [3, 0, 0],
    [1, 4, 0],
    [1, 3, 1]
]

results = [120, 100, 105]

In [None]:
fruit_prices = np.linalg.solve(fruits, results)

In [None]:
print('apple = {}; banana = {}; plum = {}'.format(fruit_prices[0], fruit_prices[1], fruit_prices[2]))

In [None]:
print('Solution to question: {}'.format(fruit_prices[1] + fruit_prices[2]))

#### Normalization

Used in mostly every field such as machine learning, but also statistics, linear algebra, etc.

In [None]:
# normalize ndarray: what does that mean?
# https://en.wikipedia.org/wiki/Normalization_(statistics)

m = np.random.randint(1, 10, size=(5, 5))
print(m)

print(m.mean())

In [None]:
# solution: standard normalization (x-mean/stddev)
stdM = (m - m.mean()) / m.std()
print(stdM)  # range of values?

In [None]:
# alternative: min-max normalization (x-min/max-min)
divident = m.max() - m.min()
minMaxM = (m - m.min()) / divident
print(minMaxM)  # range of values?

In practice we are going to use a more generic library for handling such (and not only) preprocessing tasks:
  - [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
  - [MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) 

#### Peak detection

[$\color{#20b2aa}{signal \nobreakspace processing}$] Given an ndarray containing values from a signal (e.g. temperature, number of people crossing in front of a sensor on a street, stocks, etc.), find local maximum/minimum points, so-called peaks.

Yearly temperature anomalies since 1880:

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
gt = pd.read_csv(r'https://datahub.io/core/global-temp/r/0.csv').iloc[::2, :][['Year', 'Mean']][::-1]

y = np.array(gt['Year'])
# mean temperature
mt = np.array(gt['Mean'])
del gt

In [None]:
plt.plot(y, mt)
plt.title('Mean temperature anomalies over years')
plt.show()

In [None]:
idx = 100
print('At year {}, mean temperature was: {}'.format(y[idx], mt[idx]))

In [None]:
# utility function
def plot_peaks(peaks): 
    try:
        peak_values = peaks.nonzero()[0]
    except:
        peak_values = peaks

    # align the labels to the years
    _peaks = [p + y.min() for p in peak_values]

    plt.plot(y, mt)
    plt.plot(_peaks, mt[peaks], 'ro')
    plt.title('Local peaks in mean temp anomalies')
    plt.show()

##### NumPy solution

In [None]:
def peak_detection(signal: np.ndarray, w_size: np.int64 = 1):
    peaks = [True] * len(signal)  # because of the logical AND
    for step in np.arange(1, w_size + 1):
        # fwd
        peaks &= (signal > np.roll(signal, step))
        # bwd
        peaks &= (signal > np.roll(signal, -step))
    return peaks

In [None]:
peaks = peak_detection(mt, w_size=1)
plot_peaks(peaks)

##### Alternative method: scipy

In [None]:
from scipy.signal import find_peaks

peaks_scipy = list(find_peaks(mt, distance=10)[0])
plot_peaks(peaks_scipy)

#### Rotation matrices

[$\color{#ffbf00}{analytical \nobreakspace geometry}$] 2/3 dimensional rotation matrices.

Examples of how rotations are used in robotics. [Link](https://en.wikipedia.org/wiki/Rotation_matrix)

In [None]:
a2r = lambda angle: angle * np.pi / 180.0
r2a = lambda radian: radian * 180.0 / np.pi

###### 2D

In [None]:
v2d = [1, 1]

In [None]:
angle = a2r(30)  # radian!

rot_matrix = [
    [np.cos(angle), -np.sin(angle)],
    [np.sin(angle), np.cos(angle)]
]

_v2d = np.dot(rot_matrix, v2d)

In [None]:
print(_v2d)

In [None]:
assert(len(v2d) == 2)
assert(len(_v2d) == 2)

import matplotlib.pyplot as plt

plt.plot([0, v2d[0]], [0, v2d[1]], color='blue')
plt.plot([0, _v2d[0]], [0, _v2d[1]], color='red')
plt.show()

In [None]:
angle = np.arccos(np.dot(v2d, _v2d) / (np.linalg.norm(v2d) * np.linalg.norm(_v2d)))  # radian!
print('Angle between v2d and _v2d:', r2a(angle))

###### 3D

In [None]:
v3d = [1, 1, 1]

In [None]:
# radian!
angleX = a2r(30)
angleY = a2r(0)
angleZ = a2r(30)

#
rotX_matrix = [
    [1,             0,                 0],
    [0, np.cos(angleX),  -np.sin(angleX)],
    [0, np.sin(angleX),   np.cos(angleX)]
]

rotY_matrix = [
    [np.cos(angleY),    0,  np.sin(angleY)],
    [0,                 1,               0],
    [-np.sin(angleY),   0,  np.cos(angleY)]
]

rotZ_matrix = [
    [np.cos(angleZ), -np.sin(angleZ),   0],
    [np.sin(angleZ),  np.cos(angleZ),   0],
    [0,               0,                1]
]

# Rz(Ry(Rx(p)))
_v3d = np.dot(rotZ_matrix, np.dot(rotY_matrix, np.dot(rotX_matrix, v3d)))

In [None]:
print(_v3d)

In [None]:
assert(len(v3d) == 3)
assert(len(_v3d) == 3)

import matplotlib.pyplot as plt

fig = plt.figure()
ax = plt.axes(projection="3d")
ax.plot3D([0, v3d[1]], [0, v3d[2]], [0, v3d[2]], color='blue')
ax.plot3D([0, _v3d[1]], [0, _v3d[2]], [0, _v3d[2]], color='red')
plt.show()

In [None]:
angle = np.arccos(np.dot(v3d, _v3d) / (np.linalg.norm(v3d) * np.linalg.norm(_v3d)))  # radian!
print('Angle between v3d and _v3d:', r2a(angle))

#### Intro to NN

[$\color{#00ff00}{linear \nobreakspace algebra}$] Rosenblatt’s perceptron, the first modern neural network. 

Adapted after original code, **Using only NumPy!**

[Source](https://towardsdatascience.com/rosenblatts-perceptron-the-very-first-neural-network-37a3ec09038a)

In [None]:
# Problem: using a bunch of randomly generated ndarrays of length 5, train a *neuron* so that
# it classifies correctly based on a mathematical condition.

n = 5
test_size = 100_000


def gen_set(size, function):
    ndarrays = [np.random.uniform(-1, 1, n) for _ in range(0, size)]
    return [(ndarr, function(ndarr)) for ndarr in ndarrays]


# create the training & testing set as a list of (array, sum_of_array)
# fct = lambda x: x.sum() > 0
fct = lambda x: x[0] < x[1:].sum()
# fct = isPrime(), perfectNumber() ???? nonlinear functions?
train_set = gen_set(
    size=test_size,
    function=fct
)
test_set = gen_set(
    size=100,
    function=fct
)
print(train_set[0])

In [None]:
node_weights = np.random.uniform(-1, 1, n)
node_bias = 0.0

def train_node(input: tuple, weights: np.ndarray, bias: np.float64):
    y = np.dot(input[0], weights) > bias
    if y != input[1]:
        if y:
            weights -= input[0]
            # bias -= 1
        else:
            weights += input[0]
            # bias += 1
    return weights, bias

def classify_input(input_arr: np.ndarray, weights: np.ndarray, bias: np.float64) -> bool:
    return np.dot(input_arr, weights) > bias

# training loop
print('! training... ', end='')
for i, input in enumerate(train_set):
    if i % (test_size // 10) == 0:
        print(i, end=' ')
    node_weights, node_bias = train_node(input, node_weights, node_bias)

In [None]:
# check how many percent correct
correct = 0
for ta in test_set:
    if classify_input(ta[0], node_weights, node_bias) == ta[1]:
        correct += 1
print('Classifier correct: {}%'.format(correct))

In [None]:
# check for one input
test_array = test_set[np.random.randint(0, 100)]
print('> test array:', test_array[0], ' (sum:', test_array[0].sum(), '\b)\n\tfulfills condition?', test_array[1])
print('\n> Classifier prediction:', classify_input(test_array[0], node_weights, node_bias))

**Discussion**: why is this NN? is NN this simple? but what happens if we want more? what more?

![](https://miro.medium.com/max/1400/1*ofVdu6L3BDbHyt1Ro8w07Q.png)

Source: https://towardsdatascience.com/rosenblatts-perceptron-the-very-first-neural-network-37a3ec09038a

### 2.3) Problems

In [None]:
# ######
# Problem 1:
# We have an ndarray given below.
# What is the size of the biggest 2D square matrix that can be constructed, assuming that we cannot append any value to it?
# Note: Elements can be deleted if needed.
# ######
arr = np.array([-1, 0, -1, -2, -2, 2, 3, -3, 1, -1, 1])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/pbs/p1', submitted_solution=solution)

In [None]:
# ######
# Problem 2:
# We have an ndarray given below.
# Open the numpy docs and find the function which returns the index of the biggest element in the ndarray.
# ######
arr = np.array([-1, 0, -1, -2, -2, 2, -3, 3, 1, -1, 1])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/pbs/p2', submitted_solution=solution)

In [None]:
# ######
# Problem 3:
# We have an ndarray given below.
# What is the product of the indices where the array's value is -2?
# Hint: np.where()
# ######
arr = np.array([-1, 0, -1, -2, -2, 2, -3, 3, 1, -1, 1])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/pbs/p3', submitted_solution=solution)

In [None]:
# ######
# Problem 4:
# We have a 2D ndarray given below.
# What is largest length of an ndarray with which the matrix can be multiplied successfully?
# ######
arr = np.random.randint(0, 5, size=(6, 4))
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/pbs/p4', submitted_solution=solution)

In [None]:
# ######
# Problem 5:
# Formalise the previous (for 2D ndarrays) problem answering the question:
# What is the generic rule for 2D ndarray multiplication of matrix (NxM)?
# Hint: use # for does not matter. 2D ndarrays are of shape (#x#) generally.
# ######
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/pbs/p5', submitted_solution=solution)

In [None]:
# ######
# Problem 6:
# We have a 2D ndarray given below.
# Provide an 1D ndarray and using it sum the elements of the 2D ndarray row by row. How much is the mean of the 1D ndarray?
# ######
arr = np.array([[1, 3], [2, 2], [2, 4]])
summer = # YOUR CODE HERE
print(np.dot(arr, summer).reshape(3, 1))
solution = summer.mean()
evaluate(problem_id='numpy/pbs/p6', submitted_solution=solution)

In [None]:
# ######
# Problem 7:
# Create an ndarray of 9 equidistant values in [0, 3]. How much is the product of the elements of the ndarray?
# ######
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/pbs/p7', submitted_solution=solution)

In [None]:
# ######
# Problem 8:
# We have the 2D ndarray given below.
# Compute the dot product between the two matrices created by cumulatively summing along rows & columns of the ndarray.
# What is the value at index M[0][2]?
# ######
arr = np.array([[1, 2, 3], [0, 0, 1], [1, 3, 0]])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/pbs/p8', submitted_solution=solution)

In [None]:
# ######
# Problem 9:
# We have the sine wave defined on range (-pi, pi). What is the integral of the ndarray in discrete space (take 2 decimals)?
# ######
t = np.linspace(-np.pi, np.pi)
sin = np.sin(t)
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/pbs/p9', submitted_solution=solution)

In [None]:
# ######
# Problem 10:
# We have the following linear equation
# /  2x - 3y =  0
# \ -4x + 2y = -8
# Provide the value of integer x & y.
# ######
x = # YOUR CODE HERE
evaluate(problem_id='numpy/pbs/p10/x', submitted_solution=x)
y = # YOUR CODE HERE
evaluate(problem_id='numpy/pbs/p10/y', submitted_solution=y)

---
# **pandas**

Official reference: https://pandas.pydata.org/pandas-docs/stable/reference/index.html

$\color{#F0000C}{Hint: check \nobreakspace documentation \nobreakspace CONSTANTLY}$

In [None]:
import pandas as pd
print(pd.__version__)

## Data. How, what, why?

### Data

To be added: reasoning about how to store data, what options are there to keep data, advantages/disadvantages.

Encoding of data, why is it needed?

Scaling?

Useful functions...

### Let's see some data:

In [None]:
df = pd.read_csv('./sample_data/mnist_test.csv')  # absolute/relative path?

In [None]:
print(df.columns)

In [None]:
df

#### What are we looking at?

In [None]:
# change index to see other values
sample = df.iloc[0]
# label = index 0, image = index 1 onwards, sqrt(784) = 28
label = sample[0]
img = np.array(sample[1:]).reshape((28, 28))

In [None]:
import matplotlib.pyplot as plt

print('Label: {}'.format(sample[0]))
plt.imshow(img)
plt.show()

#### More info

In [None]:
df.info()

In [None]:
df.describe()

### Work with dataframes

#### One dataframe

In [None]:
# todo

#### Multiple dataframes

In [None]:
# todo

## Pandas in computer science

### Data exploration

In [None]:
# useful functions, data discovery and data exploration tasks

### Preprocessing

In [None]:
# encoding techniques, scaling, separation of data, etc.

### Postprocessing

In [None]:
# inverse scaling & encoding for result analysis after evaluation, good 

---
# **Matplotlib**

Official reference: https://matplotlib.org/stable/index.html

$\color{#F0000C}{Hint: check \nobreakspace documentation \nobreakspace CONSTANTLY}$

In [None]:
import matplotlib
import matplotlib.pyplot as plt
print(matplotlib.__version__)

---
# **Image Processing**

## setup code

In [None]:
# IP
from scipy import misc
from PIL import Image

In [None]:
def show_image(img: np.ndarray) -> None:
    print('Image shape:', img.shape)
    plt.grid(False)
    plt.gray()
    plt.axis('off')
    plt.imshow(img)
    plt.show()

In [None]:
# racoon
img = np.array(Image.fromarray(misc.face()).resize((400, 300)), dtype=np.int32)

In [None]:
# stairs
img = np.array(Image.fromarray(misc.ascent().astype('uint8')).resize((256, 256)), dtype=np.float64)

In [None]:
show_image(img)

## 1) Basic operations on images

### 1.1) Pixel and image structure

In [None]:
# print the first pixel of the image
x = 0
y = 0
print(img[x, y])  # play along with x and y

In [None]:
# iterate over the whole image and separate red green and blue values from it into three separate images
r_img = np.zeros(img.shape, dtype=np.int32)
g_img = np.zeros(img.shape, dtype=np.int32)
b_img = np.zeros(img.shape, dtype=np.int32)

for i in range(0, img.shape[0]):  # horizontal axis
    for j in range(0, img.shape[1]):  # vertical axis
        pixel = img[i, j]  # [R, G, B]

        r_img[i, j] = [pixel[0], 0, 0]
        g_img[i, j] = [0, pixel[1], 0]
        b_img[i, j] = [0, 0, pixel[2]]


print('Extracted RED channel:')
show_image(r_img)
print('Extracted GREEN channel:')
show_image(g_img)
print('Extracted BLUE channel:')
show_image(b_img)

RGB Color picker: https://www.rapidtables.com/web/color/color-picker.html

In [None]:
# white - to - blue
img_1 = np.zeros((10, 10, 3), dtype=np.int32)
show_image(img_1)

# chessboard?
for i in range(0, 10):
    for j in range(0, 10):
        if (i + j) % 2 == 0:
            img_1[i, j] = [255, 255, 255]  # what color is this?
show_image(img_1)

### 1.2) Color image to Grayscale

In [None]:
# ######
# Math:
#  s, d in R{n, n}; s[0][0] in R{3}; d[0][0] in R;
#  d|i, j in {0, n}|[i][j] = sum(s[i][j]) / count(s[i][j])
# ######
def color_to_grayscale(img: np.ndarray):
    img_grayscale = np.zeros((img.shape[0], img.shape[1]))
    
    for i in range(0, img.shape[0]):
        for j in range(0, img.shape[1]):
            # print(np.average(img[i, j]))
            img_grayscale[i, j] = int(np.average(img[i, j]))
    return img_grayscale

In [None]:
img_grayscale = color_to_grayscale(img)
show_image(img_grayscale)

In [None]:
# only for stairs!
img_grayscale = img

### 1.3) Problems

In [None]:
# ######
# Problem 1:
# We have an RGB pixel with value [35, 200, 155]. How much is the RED value?
# ######

pixel = [35, 200, 155]
solution = # YOUR CODE HERE
evaluate(problem_id='IP/basic/p1', submitted_solution=solution)

In [None]:
# ######
# Problem 2:
# We have an RGB pixel with value given below.
# What will be the grayscale value according to our algorithm used before?
# ######

pixel = [120, 175, 4]
solution = # YOUR CODE HERE
evaluate(problem_id='IP/basic/p2', submitted_solution=solution)

## 2) Complex operations on images

### 2.1) Conv2D

#### Theory

Convolutions are used to:
  - keep spatial information of images
  - maximize the shrinking potential of images while keeping relevant information

![](https://drive.google.com/uc?id=1I-ksqhe13i04_qePM9_I8JrNw9mUCcIg)


![](https://drive.google.com/uc?export=view&id=1I3gPiYAWFPbbEKTi_L4KyVAOXZN4dUXM)


Source: https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks

#### Code

In [None]:
# sum(filter) <= 1, so need to weight it if not!

# average smoothing
three = [[1, 1, 1], [1, 1, 1], [1, 1, 1]], 9
five = [[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]], 25
gaussian = [[1, 2, 1], [2, 4, 2], [1, 2, 1]], 16

# edge enhancement
#  generic
laplace = [[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]], 1
highpass = [[-1, -1, -1], [-1, 9, -1], [-1, -1, -1]], 1
#
sobel_vertical = [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], 1
sobel_horizontal = [[-1, -2, -1], [0, 0, 0], [1, 2, 1]], 1

In [None]:
print(np.matrix(sobel_vertical[0]))

In [None]:
# ######
# Math: 
#  s, d in R{n, n}; s[0][0], d[0][0] in R;
#  d|i, j in {1, n - 1}|[i][j] =
#    |_i, _j in {0, filter_width}|{sum|i, j in {1, n-1}|(s[i - filter_width + _i][j - filter_width + _j] * filter[_i][_j])}
# ######
def conv2d(img: np.ndarray, filter_obj: Union[list, int]):
    # check grayscale for transformation using our code
    assert isinstance(img[0, 0], np.float64)

    # copy image to a numpy array => borders!
    image_transformed = np.copy(img)

    # size of the (original) image
    size_x = image_transformed.shape[0]
    size_y = image_transformed.shape[1]

    # collect filter and weight from the composite filter_object
    filter, weight = filter_obj
    size_of_filter = len(filter[0])
    filter_width = size_of_filter // 2  # border of matrix

    # iterate over the image !! careful at boundaries
    for x in range(filter_width, size_x - filter_width):
        for y in range(filter_width, size_y - filter_width):
            convolution = 0.0
            for i in range(0, size_of_filter):
                for j in range(0, size_of_filter):
                    convolution += img[x - filter_width + i, y - filter_width + j] * filter[i][j]
            # div by weight !!
            convolution /= weight
            # bound between the boundaries of 1byte ~= grayscale
            convolution = min(255, max(0, convolution))

            image_transformed[x, y] = convolution
    return image_transformed

In [None]:
res = conv2d(img_grayscale, sobel_vertical)
show_image(res)

In [None]:
res = conv2d(img_grayscale, sobel_horizontal)
show_image(res)

#### Problems

In [None]:
# ...

### 2.2) MaxPool2D

#### Theory

Pooling:
  - operation used for dimensional downsampling
  - Alternatives:
    - max
    - min
    - average
    - ?

![](https://drive.google.com/uc?export=view&id=1IC7l4ug6v9nitk8ToiISVjz8pRrMOntq)


Source: https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks

#### Code

In [None]:
# ######
# Math: 
#  s  R{n, n}; s[0][0]; d in R{m, m}; d[0][0] in R;
#  d|i, j in {0, m}|[i][j] = max(|_i, _j in {0, pool_size}|{sum|i, j in {0, n, pool_size}|(s[i + _i][j + _j])})
# ######
def maxpool2d(img: np.ndarray, pool_size: int) -> np.ndarray:
    # check grayscale for transformation using our code
    assert isinstance(img[0, 0], np.float64)

    # size of the (original) image
    size_x = img.shape[0]
    size_y = img.shape[1]

    new_x = size_x // pool_size
    new_y = size_y // pool_size

    # Create blank image with reduced dimensions
    image_transformed = np.zeros((new_x, new_y))

    # Iterate over the image
    for x in range(0, size_x, pool_size):
        for y in range(0, size_y, pool_size):
            pixels = []
            for i in range(0, pool_size):
                for j in range(0, pool_size):
                    pixels.append(img[x + i, y + j])

            # Get only the largest value and assign to the reduced image
            image_transformed[x // pool_size, y // pool_size] = max(pixels)
    return image_transformed

In [None]:
res = maxpool2d(img_grayscale, pool_size=2)  # careful, conv_size HAS TO DIVIDE shape of input image
# res = maxpool2d(img_grayscale, 8)
show_image(res)

#### Problems

In [None]:
# ######
# Problem 1:
# We have an image with pixel values (INT) as presented below.
# Use MaxPool2D of 2x2. What is the sum of the pixels in the resulting image?
# ######

img = [[1, 2, 3, 4],[2, 3, 4, 5],[3, 4, 5, 6],[4, 5, 6, 7]]
solution = # YOUR CODE HERE
evaluate(problem_id='IP/complex/mp/p1', submitted_solution=solution)

In [None]:
# ######
# Problem 2
# We have an image with pixel values (INT) as presented below.
# Use MaxPool2D of 2x2. What is the sum of the pixels in the resulting image?
# ######

img = [[1, 2, 3],[2, 3, 4],[3, 4, 5]]
solution = # YOUR CODE HERE
evaluate(problem_id='IP/complex/mp/p2', submitted_solution=solution)

### 2.3) Conv2D + Pool2D

In [None]:
show_image(img_grayscale)
res = conv2d(img_grayscale, laplace)
res = maxpool2d(res, pool_size=2)
res = conv2d(res, sobel_horizontal)
res = maxpool2d(res, pool_size=2)
show_image(res)

## 3) Commonly used IP concepts/techniques