# Tema 2 - Perceptron <br>
Ne propunem sa cream un model capabil sa recunoasca imagini alb-negru cu cifre scrise de mana.<br>

### Initializarea datelor<br>
Folosim dataset-ul MNIST pentru antrenarea si testarea modelului.

In [6]:
import math

import numpy as np
from numpy.ma.core import reshape
from torchvision.datasets import MNIST

def download_mnist(is_train: bool):
    dataset = MNIST(root='./data',
    transform=lambda x: np.array(x).flatten(),
    download=True,
    train=is_train)
    mnist_data = []
    mnist_labels = []
    for image, label in dataset:
        mnist_data.append(image)
        mnist_labels.append(label)
    return mnist_data, mnist_labels

In [7]:
def initialize_data() -> (np.array, np.array, np.array, np.array):
    train_x, train_y = download_mnist(True)
    test_x, test_y = download_mnist(False)
    
    #convertim datele in np.array s
    train_x = np.array(train_x)
    train_y = np.array(train_y)
    test_x = np.array(test_x)
    test_y = np.array(test_y)
    
    return train_x, train_y, test_x, test_y
    

### Normalizarea setului de date + one-hot encoding<br>
Valorile din seturile de date sunt numere intregi cuprinse in intervalul $[0, 256]$. Pentru a normaliza datele, vom imparti la 256, aducand valorile in intervalul $[0. 1]$. <br>
<br>
Q: De ce ajuta sa normalizam setul?<br>
Cred ca mi-am raspuns singur la intrebare; numerele ajung foarte mari daca nu facem asta.

In [8]:
def normalize_data(v: np.array) -> np.array:
    return v / 256

In [9]:
def one_hot_encode(v: np.array, nr_classes: int) -> np.array:
    return np.array([np.array([int(i == label) for i in range(nr_classes)]) for label in v])
# test_Y = np.array([np.array([int(i == label) for i in range(10)]) for label in test_Y]) #sa nu mai uit niciodata sa fac encoding si la test labels

### Forward propagation<br>
Utilizam functia softmax pentru a crea un output balansat. Deoarece functia foloseste exponentiere, diferentele dintre valorile claselor dupa inmultirea cu ponderile vor avea un impact mult mai mare asupra probabilitatii rezultate.<br>
De asemenea asigura proprietatea de aditivitate numarabila a outputului (aka suma probabilitatilor este 1).


In [10]:
import math

def softmax(z: np.array) -> np.array:
    exp_sum = sum([math.e**element for element in z])
    return np.array([math.e**element / exp_sum for element in z])

def make_prediction(weights: np.array, bias: np.array, instance: np.array) -> np.array:
    #z = np.dot(weights, instance) + bias
    z = [np.dot(weights[i], instance) + bias[i] for i in range(len(weights))]
    return softmax(z)
    

### Antrenarea intr-o epoca

folosim modelul pentru a face o predictie asupra instantelor de testare. Calculam eroarea comparand rezultatul obtinut cu label-ul real al instantei si ponderam aceasta eroare cu coeficientul de invatare si array-ul actual de greutati (weights). Cumulam aceaste eroari ponderate intr-un array cu acelasi shape ca si weights pentru a face update ulterior. Procedam asemanator si pentru bias.

In [11]:
def train(x: np.array, y: np.array, weights: np.ndarray, bias: np.ndarray, alpha: float) -> (np.array, np.array):
    delta = np.zeros_like(weights)
    beta = np.zeros_like(bias)
    for i in range(len(x)):
        predicted_y = make_prediction(weights, bias, x[i])
        error =  (y[i] - predicted_y)
        delta = delta + alpha * np.dot(error.reshape(len(predicted_y), 1), np.atleast_2d(x[i]))
        #delta = delta + alpha * error.reshape(len(predicted_y), 1) * np.tile(x[i], (len(weights), 1))
        beta = beta + alpha * error
    return delta, beta

### Simularea epocilor si initializarea parametrilor<br>
Pentru un numar predefinit de epoci antrenam modelul pe training set. Intr-o epoca dam shuffle la setul de date, il impartim in batch-uri, calculam update-ul per batch si facem update la weights si bias de la batch la batch (un fel de mini batch training?).<br>
<br>
Q: De ce facem batches daca algoritmul nu poate fi paralelizat?

In [12]:
def make_model(weights: np.array, bias: np.array, x: np.array, y: np.array, alpha: float, epochs: int, batch_size: int) -> None:
    delta = np.zeros_like(weights)
    beta = np.zeros_like(bias)
    while epochs > 0:
        perm = np.random.permutation(len(x))
        x = x[perm]
        y = y[perm]
        for i in range(len(x) // batch_size):
            delta1, beta1 = train(x[i * batch_size : (i + 1) * batch_size], y[i * batch_size : (i + 1) * batch_size], weights, bias, alpha)
            # delta += delta1
            # beta += beta1
            weights += delta1
            bias += beta1
        epochs -= 1

Initializam weight-urile si bias-ul cu valori random pe care le normalizam.

In [13]:
def random_init_model() -> (np.array, np.array):
    w = np.random.randn(10, 784)
    b = np.random.randn(10)
    
    #normalize
    w = (w - w.min()) / (w.max() - w.min())
    b = (b - b.min()) / (b.max() - b.min())
    
    #scale
    w = 2 * w - 1
    b = 2 * b - 1
    
    return w, b

In [14]:
#se calculeaza acuratetea unui model dat fiind un set de date
def assert_accuracy(weights: np.array, bias: np.array, test_x: np.array, test_y: np.array) -> float:
    nr_successes = 0
    for i in range(len(test_x)):
        predicted_y = make_prediction(weights, bias, test_x[i])
        if predicted_y.argmax() == test_y[i].argmax():
            nr_successes += 1
    return nr_successes / len(test_x)

In [497]:
# #verificam acuratetea modelului initializat aleator
# #ar trebui sa fie ~0.1
# random_init_acc = assert_accuracy(w, b, test_X, test_Y)
# print(random_init_acc)

0.1253


In [503]:
# make_model(w, b, train_X, train_Y, alpha=0.001, epochs=20, batch_size=100)

In [504]:
# #verificam acuratetea modelului antrenat
# model_acc = assert_accuracy(w, b, test_X, test_Y)
# print(model_acc)

0.8956


In [31]:
params_list = [
    {"Epochs": 1, "Learning Rate": 0.1},
    {"Epochs": 1, "Learning Rate": 0.01},
    {"Epochs": 10, "Learning Rate": 0.1},
    {"Epochs": 10, "Learning Rate": 0.01},
    {"Epochs": 20, "Learning Rate": 0.1},
    {"Epochs": 20, "Learning Rate": 0.01},
    # {"Epochs": 100, "Learning Rate": 0.1},
    # {"Epochs": 100, "Learning Rate": 0.01},
]

In [32]:
import time
train_X, train_Y, test_X, test_Y = initialize_data()

train_X = normalize_data(train_X)
test_X = normalize_data(test_X)

train_Y = one_hot_encode(train_Y, nr_classes = 10)
test_Y = one_hot_encode(test_Y, nr_classes = 10)

nr_tests_per_param = 10

results = []
for params in params_list:
    for _ in range(nr_tests_per_param):
        w, b = random_init_model()
        
        start = time.time()
        make_model(w, b, train_X, train_Y, alpha=params["Learning Rate"], epochs=params["Epochs"], batch_size=100)
        end = time.time()
        
        results.append({"Epochs": params["Epochs"], "Learning Rate": params["Learning Rate"], "Training Time": end - start, "Accuracy": assert_accuracy(w, b, test_X, test_Y)})

In [25]:
w, b = random_init_model()
make_model(w, b, train_X, train_Y, alpha=0.1, epochs=10, batch_size=100)

In [26]:
model_acc = assert_accuracy(w, b, test_X, test_Y)
print(model_acc)

0.9037


### Rezultate

In [None]:
import pandas as pd

df = pd.DataFrame(results)
df.to_csv("model_training_results.csv", index=False)

In [39]:
averaged_results = df.groupby(["Epochs", "Learning Rate"]).agg({
    "Accuracy": "mean",
    "Training Time": "mean"
}).reset_index()

print(averaged_results)

   Epochs  Learning Rate  Accuracy  Training Time
0       1           0.01   0.90353       2.993196
1       1           0.10   0.86499       2.996382
2      10           0.01   0.91687      30.360469
3      10           0.10   0.88158      29.725730
4      20           0.01   0.91976      59.127596
5      20           0.10   0.88567      58.794753
