<a href="https://colab.research.google.com/github/kis-balazs/machine-learning/blob/main/MLCourse_np-pd-ip.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import os
from pprint import pprint
from pathlib import Path

# used for parameter type declaration
from typing import Union

---
*Author*: Balázs Kis

*Email*: balazskis@gmail.com

---
# Setup evaluation function

**PLEASE DO NOT ALTER** this code, since is used for setting up the solution database together with the evaluate function.

---

---

*We would like to give you an easy and nice evaluation process for the problems we prepared for you, and this cell is highly relevant for that task. If you change anything in the cell, please re-roll to the original version if you would like the evaluation to be correct. Otherwise we cannot account for any difference between the correct results and the results we prepared.*

In [None]:
assert not Path('./le').exists(), 'Please DO NOT attempt to run this cell more than once for each runtime!'

# dependencies
%pip install cryptography --quiet
%pip install git+https://github.com/ozgur/python-firebase --quiet

# setup for the evaluation
os.system("git clone -l -s https://gist.github.com/kis-balazs/872f4e35871942f3f7076bbc5626c226 le")
os.chdir('le')

from load_evaluate import *

os.chdir('..')

# Useful documentation

 - [Python3 Cheatsheet](https://github.com/kis-balazs/machine-learning/blob/main/res/python3_cheatsheet.pdf)
 - ...

---
# **NumPy**

Official reference: https://numpy.org/doc/stable/reference/index.html

$\color{#F0000C}{Hint: check \nobreakspace documentation \nobreakspace CONSTANTLY}$

In [None]:
import numpy as np
print(np.__version__)

building blocks of mostly every problem in computer science:
 - numbers
 - arrays (1D, 2D, 3D, 4D(?), ...)
 - operations & underlying mathematics

*We will see that many problems in computer science can be solved using numpy, i.e. mathematics :D*

## 1) Python < Numpy

### 1.1) Python and NumPy arrays



In [None]:
arr = [3, 4, 5, 2, 6, 1]

In [None]:
arr[0] = 3.0

In [None]:
print(type(arr))
print(arr[0], type(arr[0]))

In [None]:
arr_np = np.array(arr)
print(type(arr_np))
print(arr_np[0], type(arr_np[0]))

In [None]:
arr_np1 = np.array(arr)
print(type(arr_np1))
print(arr_np1[0], type(arr_np1[0]))

In [None]:
arr_np[0] = 4

In [None]:
print(arr_np == arr_np1)
# unified answer?
print((arr_np == arr_np1).all())

### 1.2) 1D

In [None]:
# the above array represents the average temperature in Cairo, Egypt in the week 20-26 (Monday-Sunday) April 1992
temps = [20, 20, 17, 16, 16, 17, 19]

# in parallel, using numpy
temps_np = np.array(temps)

In [None]:
# compute the biggest temperature of the week
# python
print('Max temp (Python):', max(temps))
# numpy
print('Max temp (NumPy):', temps_np.max())

In [None]:
# compute the mean temperature of the week
# python
print('Max temp (Python):', sum(temps) / len(temps))
# numpy
print('Max temp (NumPy):', temps_np.mean())

In [None]:
# compute the standard deviation of the temperature of the week (2 decimals)
# Hint: https://datascienceparichay.com/wp-content/uploads/2021/09/standard-deviation-formula-768x444.png.webp

# python
mean = sum(temps) / len(temps)
print('Max temp (Python):', (sum(pow((elem - mean), 2) for elem in temps) / len(temps)) ** .5)  # look carefully at the last operation
# numpy
print('Max temp (NumPy):', temps_np.std())

#### Problems

In [None]:
# ######
# Problem 1:
# We have an array given below.
# What is the index of the smallest value in the array?
# ######
arr = np.array([-1, 0, -1, -2, -2, 2, 3, -3, 1])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/1D/p1', submitted_solution=solution)

In [None]:
# ######
# Problem 2:
# We have an array given below.
# What is the sum of the of positive values in the array?
# ######
arr = np.array([-1, 0, -1, -2, -2, 2, 3, -3, 1])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/1D/p2', submitted_solution=solution)

In [None]:
# ######
# Problem 3:
# We have an array given below.
# What is the sum of values in the array, after rounding to 2 decimals? Hint: result is float!
# ######
arr = np.array([3.57042396, 8.35559273, 0.7621579 , 3.53006476,
                0.98356653, 7.66801256, 3.32831346, 4.70137415])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/1D/p3', submitted_solution=solution)

## 1.3) 2(+)D

 - **2D**: two-dimensional array (i.e. matrix)
 - 3D: three-dimensional array
 - 4D: four-dimensional array
 - ...

*Observation*: 3+ dimensional arrays are called dimensionality-wise, i.e. n-dimensional arrays, or **tensors** as we will see in the near future.

In [None]:
# build the I3 (identity) matrix; try I4, I5, ... In;
# Important notation: When n is known from context, the identity matrix is written as I.
n = 3
# python
print('I3 (Python):\n', [[1 if j == i else 0 for j in range(0, n)] for i in range(0, n)])
# numpy
print('I3 (NumPy):\n', np.identity(n))  # eye?

---

In [None]:
# given a 2x2 matrix of arbitrary values, compute it's determinant
A = [
     [5, 2],
     [4, 0.5]
]
# python
print('Determinant (Python):', A[0][0] * A[1][1] - A[0][1] * A[1][0])  # Sarrus rule
# numpy
print('Determinant (NumPy):', np.linalg.det(np.array(A)))  # linalg?

In [None]:
# given a 3x3 matrix of arbitrary values , compute it's determinant
A = [
     [5, 2, 3],
     [4, 0.5, 1.7],
     [-2, 6, 3]
]
# python
from functools import reduce 

def derive_comp(A, _j, n):
    component_array = [A[(cnt) % n][(cnt + _j) % n] for cnt in range(0, n)]
    return reduce((lambda a, b: a*b), component_array) 

# Sarrus rule 3x3
def det(A, n):
    pos_comps = [derive_comp(A, _j, n) for _j in range(0, n)]
    neg_comps = [derive_comp(A[::-1], _j, n) for _j in range(0, n)]

    return sum(pos_comps) - sum(neg_comps)


print('Determinant (Python):', det(A, 3))
# numpy
print('Determinant (NumPy):', np.linalg.det(np.array(A)))  # linalg?

In [None]:
# given a 4x4 matrix of arbitrary values , compute it's determinant
A = [
     [5, 2, 3, 1],
     [4, 0.5, 1.7, -2],
     [-2, 6, 3, 1],
     [-1, 1, -1, 1]
]
# python
# BAD NEWS: https://www.quora.com/Can-the-Sarrus-rule-be-applied-to-4x4-determinants?share=1
# perhaps use Laplace rule / Gaussian elimination
print('Determinant (Python): ???')
# numpy
print('Determinant (NumPy):', np.linalg.det(np.array(A)))  # linalg?

---

In [None]:
# given a 2x2 matrix of arbitrary values, compute it's inverse
n = 2
A = [
     [5, 2],
     [4, 0.5]
]
# python

def inv2x2(A):
    det_a = 1.0 / (A[0][0] * A[1][1] - A[0][1] * A[1][0])
    adjugate = [
        [A[1][1], -A[0][1]],
        [-A[1][0], A[0][0]]
    ]
    return [[det_a * adjugate[i][j] for j in range(0, n)] for i in range(0, n)]

print('Determinant (Python):\n', inv2x2(A))  # Sarrus rule
# numpy
print('Determinant (NumPy):\n', np.linalg.inv(np.array(A)))  # linalg?

In [None]:
# 3x3 inverse: already gets quite complicated to compute adjugate using Python, Numpy just scales :D

#### Problems

In [None]:
# ######
# Problem 1:
# We have a 2x2 matrix given below.
# What is the inverse of the matrix?
# ######
A = np.array([[1, 3], [2, 6]])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/2D/p1', submitted_solution=solution)

In [None]:
# ######
# Problem 2:
# We have a 4x4 matrix given below.
# What is the determinant of the matrix multiplied by it's inverse?
# Hint1: np.dot()
# Hint2: round result matrix to int!
# ######
A = np.array([[5, 2, 3, 1], [4, 0.5, 1.7, -2], [-2, 6, 3, 1], [-1, 1, -1, 1]])
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/2D/p2', submitted_solution=solution)

In [None]:
# ######
# Problem 3:
# Formalise the previous problem as mathematical equation using the following notation:
# Inverse of a matrix: A^-1
# Dot product between two matrices: A dot B
# ######
solution = # YOUR CODE HERE
evaluate(problem_id='numpy/2D/p3', submitted_solution=solution)

### 4) Conclusion

I think a fine conclusion can be drawn that even when dealing with such simplistic problems, numpy is a way better solution than pure python, in arrays of **any dimensions**.

In the upcomings we will see more *complex functions*, and use-cases for numpy which represent better the real power of numpy in an immense variety of computer science-related mathematics:
 - $\color{#20b2aa}{signal \nobreakspace processing}$ (1D, 2D (images), etc.)
 - $\color{#00ff00}{linear \nobreakspace algebra}$
 - $\color{#ffbf00}{analytical \nobreakspace geometry}$
 - ...

$\color{#F0000C}{In \nobreakspace the \nobreakspace upcomings \nobreakspace pay \nobreakspace extra \nobreakspace attention \nobreakspace to \nobreakspace how \nobreakspace numpy \nobreakspace handles \nobreakspace \textbf{dimensions}.}$

## 2) NumPy in computer science

### 2.1) NumPy ndarray operations

#### 2.1.1) 1D ndarray

In [None]:
# create an ndarray of 0s
print(np.zeros(4))

In [None]:
# create an ndarray of random numbers (in range [0, 1))
print(np.random.random(4))

In [None]:
# create an ndarray with values in a range
print(np.arange(10, 20, step=1))

In [None]:
# sort an ndarray of random values in ascending order
print(np.sort(np.random.random(3)))

# descending?
print(np.sort(np.random.random(3))[::-1])

---

In [None]:
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([-1, 3, -2, 4, -7])

###### W/ constants

In [None]:
# add/sub of a constant
print(arr1 + 2)
print(arr2 - .7)

In [None]:
# mult/div with a constant - automatic float?
print(arr1 * 1.5)
print(arr2 / 3)

In [None]:
# append constant to ndarray
print(np.append(arr1, -1))

---
###### W/ 1D ndarrays

In [None]:
# add/sub
print(arr1 + arr2)  # np.add()
print(arr1 - arr2)  # np.subtract()
print(arr2 - arr1)

In [None]:
# mult/div 
print(arr1 * arr2)
print(arr1 / arr2)

In [None]:
# append array to array
print(np.append(arr1, arr2))

---
###### Relevant functions

In [None]:
# norm of a ndarray - distance from origin
print(np.linalg.norm(arr1))

In [None]:
# dot product
print(np.dot(arr1, arr2))

What are **norm and dot product** good for? quite a lot actually!

Example [$\color{#00ff00}{linear \nobreakspace algebra}$/$\color{#ffbf00}{analytical \nobreakspace geometry}$] : computing the angle between two vectors

In [None]:
# ndarrays representing points in n dimensions
v1 = [6, 1]
v2 = [4, 4]

angle = np.arccos(np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2)))  # radian!
print('Angle between v1 and v2:', angle * 180 / np.pi)

In [None]:
# 2D
assert(len(v1) == 2)
assert(len(v2) == 2)

import matplotlib.pyplot as plt

plt.plot([0, v1[0]], [0, v1[1]], color='blue')
plt.plot([0, v2[0]], [0, v2[1]], color='red')
plt.show()

In [None]:
# 3D
assert(len(v1) == 3)
assert(len(v2) == 3)

import matplotlib.pyplot as plt

fig = plt.figure()
ax = plt.axes(projection="3d")
ax.plot3D([0, v1[1]], [0, v1[2]], [0, v1[2]], color='blue')
ax.plot3D([0, v2[1]], [0, v2[2]], [0, v2[2]], color='red')
plt.show()

#### 2.1.2) 2D ndarray

In [None]:
# TBD: construction types, operations with [constants, 1D ndarrays, 2D ndarrays], important functions

#### 2.1.3) 3(+)D ndarray

In [None]:
# generalization of 2D ndarrays, important functions

### 2.2) NumPy "everywhere"

In [None]:
# examples, problems and a simple introduction to NNs

#### Intro to NN

Single-Layer Perceptron [$\color{#00ff00}{linear \nobreakspace algebra}$]

Adapted after Rosenblatt (1957) - **Using only NumPy!**

In [None]:
# Problem: using a bunch of randomly generated ndarrays of length 5, train a *neuron* so that
# it classifies correctly based on a mathematical condition.
# Code is an adaptation of https://towardsdatascience.com/rosenblatts-perceptron-the-very-first-neural-network-37a3ec09038a


n = 5
test_size = 1_000


def gen_set(size, function):
    ndarrays = [np.random.uniform(-1, 1, n) for _ in range(0, size)]
    return [(ndarr, function(ndarr)) for ndarr in ndarrays]


# create the training & testing set as a list of (array, sum_of_array)
fct = lambda x: x.sum() > 0
# fct = lambda x: x[0] < x[1:].sum()
# fct = isPrime(), perfectNumber() ???? nonlinear functions?
train_set = gen_set(
    size=test_size,
    function=fct
)
test_set = gen_set(
    size=100,
    function=fct
)
print(train_set[0])

In [None]:
node_weights = np.random.uniform(-1, 1, n)
node_bias = 0.0

def train_node(input: tuple, weights: np.ndarray, bias: np.float64):
    y = np.dot(input[0], weights) > bias
    if y != input[1]:
        if y:
            weights -= input[0]
            bias -= 1
        else:
            weights += input[0]
            bias += 1
    return weights

def classify_input(input_arr: np.ndarray, weights: np.ndarray, bias: np.float64) -> bool:
    return np.dot(input_arr, weights) > bias

# training loop
print('! training... ', end='')
for i, input in enumerate(train_set):
    if i % (test_size // 10) == 0:
        print(i, end=' ')
    node_weights = train_node(input, node_weights, node_bias)

In [None]:
# check how many percent correct
correct = 0
for ta in test_set:
    if classify_input(ta[0], node_weights, node_bias) == ta[1]:
        correct += 1
print('Classifier correct: {}%'.format(correct))

In [None]:
# check for one input
test_array = test_set[np.random.randint(0, 100)]
print('> test array:', test_array[0], ' (sum:', test_array[0].sum(), '\b)\n\tfulfills condition?', test_array[1])
print('\n> Classifier prediction:', classify_input(test_array[0], node_weights, node_bias))

**Discussion**: why is this NN? is NN this simple? but what happens if we want more? what more?

![](https://miro.medium.com/max/1400/1*ofVdu6L3BDbHyt1Ro8w07Q.png)

---
## **pandas**

Official reference: https://pandas.pydata.org/pandas-docs/stable/reference/index.html

$\color{#F0000C}{Hint: check \nobreakspace documentation \nobreakspace CONSTANTLY}$

In [None]:
import pandas as pd
print(pd.__version__)

---
## **Matplotlib**

Official reference: https://matplotlib.org/stable/index.html

$\color{#F0000C}{Hint: check \nobreakspace documentation \nobreakspace CONSTANTLY}$

In [None]:
import matplotlib as plt
print(plt.__version__)

---
# **Image Processing**

## setup code

In [None]:
# IP
from scipy import misc
from PIL import Image

In [None]:
img = misc.ascent()
img = misc.face()

In [None]:
# resize image to 300x400 - no need to wait for it
img = np.array(Image.fromarray(img).resize((400, 300)), dtype=np.int32)

In [None]:
import matplotlib.pyplot as plt

def show_image(img: np.ndarray) -> None:
    print('Image shape:', img.shape)
    plt.grid(False)
    plt.gray()
    plt.axis('off')
    plt.imshow(img)
    plt.show()

In [None]:
show_image(img)

## 1) Basic operations on images

### 1.1) Pixel and image structure

In [None]:
# print the first pixel of the image
x = 0
y = 0
print(img[x, y])  # play along with x and y

In [None]:
# iterate over the whole image and separate red green and blue values from it into three separate images
r_img = np.zeros(img.shape, dtype=np.int32)
g_img = np.zeros(img.shape, dtype=np.int32)
b_img = np.zeros(img.shape, dtype=np.int32)

for i in range(0, img.shape[0]):  # horizontal axis
    for j in range(0, img.shape[1]):  # vertical axis
        pixel = img[i, j]  # [R, G, B]

        r_img[i, j] = [pixel[0], 0, 0]
        g_img[i, j] = [0, pixel[1], 0]
        b_img[i, j] = [0, 0, pixel[2]]


print('Extracted RED channel:')
show_image(r_img)
print('Extracted GREEN channel:')
show_image(g_img)
print('Extracted BLUE channel:')
show_image(b_img)

RGB Color picker: https://www.rapidtables.com/web/color/color-picker.html

In [None]:
# white - to - blue
img_1 = np.zeros((10, 10, 3), dtype=np.int32)
show_image(img_1)

# chessboard?
for i in range(0, 10):
    for j in range(0, 10):
        if (i + j) % 2 == 0:
            img_1[i, j] = [255, 255, 255]  # what color is this?
show_image(img_1)

### 1.2) Color image to Grayscale

In [None]:
# ######
# Math:
#  s, d in R{n, n}; s[0][0] in R{3}; d[0][0] in R;
#  d|i, j in {0, n}|[i][j] = sum(s[i][j]) / count(s[i][j])
# ######
def color_to_grayscale(img: np.ndarray):
    img_grayscale = np.zeros((img.shape[0], img.shape[1]))
    
    for i in range(0, img.shape[0]):
        for j in range(0, img.shape[1]):
            # print(np.average(img[i, j]))
            img_grayscale[i, j] = int(np.average(img[i, j]))
    return img_grayscale

In [None]:
img_grayscale = color_to_grayscale(img)
show_image(img_grayscale)

### 1.3) Problems

In [None]:
# ######
# Problem 1:
# We have an RGB pixel with value [35, 200, 155]. How much is the RED value?
# ######

pixel = [35, 200, 155]
solution = # YOUR CODE HERE
evaluate(problem_id='IP/basic/p1', submitted_solution=solution)

In [None]:
# ######
# Problem 2:
# We have an RGB pixel with value given below.
# What will be the grayscale value according to our algorithm used before?
# ######

pixel = [120, 175, 4]
solution = # YOUR CODE HERE
evaluate(problem_id='IP/basic/p2', submitted_solution=solution)

## 2) Complex operations on images

### 2.1) Conv2D

#### Theory

Convolutions are used to:
  - keep spatial information of images
  - maximize the shrinking potential of images while keeping relevant information

![](https://drive.google.com/uc?id=1I-ksqhe13i04_qePM9_I8JrNw9mUCcIg)


![](https://drive.google.com/uc?export=view&id=1I3gPiYAWFPbbEKTi_L4KyVAOXZN4dUXM)


Source: https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks

#### Code

In [None]:
# sum(filter) <= 1, so need to weight it if not!

# average smoothing
three = [[1, 1, 1], [1, 1, 1], [1, 1, 1]], 9
five = [[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]], 25
gaussian = [[1, 2, 1], [2, 4, 2], [1, 2, 1]], 16

# edge enhancement
#  generic
laplace = [[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]], 1
highpass = [[-1, -1, -1], [-1, 9, -1], [-1, -1, -1]], 1
#
sobel_vertical = [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], 1
sobel_horizontal = [[-1, -2, -1], [0, 0, 0], [1, 2, 1]], 1

In [None]:
print(np.matrix(sobel_horizontal[0]))

In [None]:
# ######
# Math: 
#  s, d in R{n, n}; s[0][0], d[0][0] in R;
#  d|i, j in {1, n - 1}|[i][j] =
#    |_i, _j in {0, filter_width}|{sum|i, j in {1, n-1}|(s[i - filter_width + _i][j - filter_width + _j] * filter[_i][_j])}
# ######
def conv2d(img: np.ndarray, filter_obj: Union[list, int]):
    # check grayscale for transformation using our code
    assert isinstance(img[0, 0], np.float64)

    # copy image to a numpy array => borders!
    image_transformed = np.copy(img)

    # size of the (original) image
    size_x = image_transformed.shape[0]
    size_y = image_transformed.shape[1]

    # collect filter and weight from the composite filter_object
    filter, weight = filter_obj
    size_of_filter = len(filter[0])
    filter_width = size_of_filter // 2  # border of matrix

    # iterate over the image !! careful at boundaries
    for x in range(filter_width, size_x - filter_width):
        for y in range(filter_width, size_y - filter_width):
            convolution = 0.0
            for i in range(0, size_of_filter):
                for j in range(0, size_of_filter):
                    convolution += img[x - filter_width + i, y - filter_width + j] * filter[i][j]
            # div by weight !!
            convolution /= weight
            # bound between the boundaries of 1byte ~= grayscale
            convolution = min(255, max(0, convolution))

            image_transformed[x, y] = convolution
    return image_transformed

In [None]:
res = conv2d(img_grayscale, sobel_vertical)
show_image(res)

res = conv2d(img_grayscale, sobel_horizontal)
show_image(res)

#### Problems

In [None]:
# ...

### 2.2) MaxPool2D

#### Theory

Pooling:
  - operation used for dimensional downsampling
  - Alternatives:
    - max
    - min
    - average
    - ?

![](https://drive.google.com/uc?export=view&id=1IC7l4ug6v9nitk8ToiISVjz8pRrMOntq)


Source: https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks

#### Code

In [None]:
# ######
# Math: 
#  s  R{n, n}; s[0][0]; d in R{m, m}; d[0][0] in R;
#  d|i, j in {0, m}|[i][j] = max(|_i, _j in {0, conv_size}|{sum|i, j in {0, n, conv_size}|(s[i + _i][j + _j])})
# ######
def maxpool2d(img: np.ndarray, conv_size: int) -> np.ndarray:
    # check grayscale for transformation using our code
    assert isinstance(img[0, 0], np.float64)

    # size of the (original) image
    size_x = img.shape[0]
    size_y = img.shape[1]

    new_x = size_x // conv_size
    new_y = size_y // conv_size

    # Create blank image with reduced dimensions
    image_transformed = np.zeros((new_x, new_y))

    # Iterate over the image
    for x in range(0, size_x, conv_size):
        for y in range(0, size_y, conv_size):
            pixels = []
            for i in range(0, conv_size):
                for j in range(0, conv_size):
                    pixels.append(img[x + i, y + j])

            # Get only the largest value and assign to the reduced image
            image_transformed[x // conv_size, y // conv_size] = max(pixels)
    return image_transformed

In [None]:
res = maxpool2d(img_grayscale, 1)  # careful, conv_size HAS TO DIVIDE shape of input image
# res = maxpool2d(img_grayscale, 8)
show_image(res)

#### Problems

In [None]:
# ######
# Problem 1:
# We have an image with pixel values (INT) as presented below.
# Use MaxPool2D of 2x2. What is the sum of the pixels in the resulting image?
# ######

img = [[1, 2, 3, 4],[2, 3, 4, 5],[3, 4, 5, 6],[4, 5, 6, 7]]
solution = # YOUR CODE HERE
evaluate(problem_id='IP/complex/mp/p1', submitted_solution=solution)

In [None]:
# ######
# Problem 2
# We have an image with pixel values (INT) as presented below.
# Use MaxPool2D of 2x2. What is the sum of the pixels in the resulting image?
# ######

img = [[1, 2, 3, 4],[2, 3, 4, 5],[3, 4, 5, 6],[4, 5, 6, 7]]
solution = # YOUR CODE HERE
evaluate(problem_id='IP/complex/mp/p2', submitted_solution=solution)

## 3) Commonly used IP concepts/techniques