# Metody Inteligencji Obliczeniowej w Analizie Danych

[Tasks](https://pages.mini.pw.edu.pl/~karwowskij/mioad/lab-sieci.html)

### Lab 6: NN5: Testowanie różnych funkcji aktywacji (1 tydzień, 2 punkty)

Należy rozszerzyć istniejącą implementację sieci i metody uczącej o możliwość wyboru funkcji aktywacji:
- sigmoid,
- liniowa,
- tanh,
- ReLU.

Pytanie dodatkowe – czy wszyscy implementują dokładnie gradient dla funkcji ReLU?



Porównać szybkość uczenia i skuteczność sieci w zależności od liczby neuronów w poszczególnych warstwach i rodzaju funkcji aktywacji.

Należy wziąć pod uwagę fakt, że różne funkcje aktywacji mogą dawać różną skuteczność w zależności od liczby neuronów i liczby warstw. 

**Sprawdzić sieci z jedną, dwiema i trzema warstwami ukrytymi.** Podobnie jak w poprzednim tygodniu, trzeba dostosować proces uczenia
do pochodnych nowych funkcji aktywacji.

**Przeprowadzić testy wstępne dla zbioru multimodal-large (regresja), dla wszystkich trzech architektur i wszystkich czterech funkcji aktywacji.**

Dla pozostałych zbiorów wybrać dwa najlepsze zestawy i zbadać ich skuteczność:
- regresja
- - steps-large,
- klasyfikacja
- - rings5-regular
- - rings3-regular

#### TODO:

- ~~implement sigmoid, linear, tanh, ReLU functions~~
- ~~answer additional question regarding ReLU gradient implementation~~
- ~~compare speed of learning and accuracy of networks with different number of hidden layers and activation functions on multimodal-large dataset~~
- ~~for each other dataset choose two best sets and test their accuracy~~

In [56]:
import MultiLayerPerceptron as mlp
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

## Dataset 1: multimodal-large

In [57]:
df_train = pd.read_csv('data/regression/multimodal-large-test.csv')
df_test = pd.read_csv('data/regression/multimodal-large-training.csv')
print(df_train.head())

x_train = [[x] for x in df_train.loc[:,"x"]]
y_train = [[y] for y in df_train.loc[:,"y"]]
x_test = [[x] for x in df_test.loc[:,"x"]]
y_test = [[y] for y in df_test.loc[:,"y"]]

          x           y
0  0.493292  -98.208166
1 -0.470203  -55.283891
2  1.869983  100.299997
3 -1.040446    2.720629
4 -0.616507  -75.991636


### Comparison of speed and accuracy of networks with different number of hidden layers and activation functions

Done in a separate notebook with results presented below.

All 12 networks were trained on multimodal-large dataset with a total of 120 hidden neurons, batch size of 1, and 200 epochs. Learning rate was selected manually for best results. Total execution time: 42min on my laptop.

In [54]:
pd.read_csv('lab6_model_selection_results.csv',index_col=0)

Unnamed: 0,activation,hidden,learning rate,loss_train,loss_test
0,relu,1,0.0003,1602.898149,1566.403175
1,relu,2,0.0003,144.59368,143.693841
2,relu,3,0.0001,7.60662,12.495676
3,linear,1,0.0001,4459.693009,4424.354227
4,linear,2,0.0001,4583.188541,4548.167808
5,linear,3,0.0001,4649.009926,4613.98424
6,sigmoid,1,0.003,4.527554,9.502291
7,sigmoid,2,0.0003,1583.002296,1445.241671
8,sigmoid,3,0.0003,416.797194,374.565892
9,tanh,1,0.001,3.368971,8.402341


### Regression - steps-large

In [45]:
df_train = pd.read_csv('data/regression/steps-large-training.csv',index_col=0)
df_test = pd.read_csv('data/regression/steps-large-test.csv',index_col=0)
print(df_test.head())

x_train = [[x] for x in df_train.loc[:,"x"]]
y_train = [[y] for y in df_train.loc[:,"y"]]
x_test = [[x] for x in df_test.loc[:,"x"]]
y_test = [[y] for y in df_test.loc[:,"y"]]

          x    y
1  1.706990  160
2 -0.604580  -80
3 -0.674405  -80
4  1.341562   80
5 -1.427434  -80


#### Model 1

In [47]:
net = mlp.NeuralNetwork()
net.add(mlp.Layer(neurons_count=1, add_bias=True))
net.add(mlp.Layer(neurons_count=30, activation_fun=mlp.ActivationTanh(), add_bias=True))
net.add(mlp.Layer(neurons_count=1, add_bias=False))
net.train(x_train, y_train,x_test,y_test, epochs=100, learning_rate=0.001, batch_size=1)

Epoch:    1/100,  MSE loss train:   148.31,  test:  153.276
Epoch:   11/100,  MSE loss train:   56.259,  test:    50.93
Epoch:   21/100,  MSE loss train:   44.133,  test:   38.191
Epoch:   31/100,  MSE loss train:    38.07,  test:   31.991
Epoch:   41/100,  MSE loss train:   34.409,  test:   27.908
Epoch:   51/100,  MSE loss train:    32.06,  test:   25.053
Epoch:   61/100,  MSE loss train:    30.26,  test:   22.768
Epoch:   71/100,  MSE loss train:   29.014,  test:   21.086
Epoch:   81/100,  MSE loss train:   28.101,  test:   19.797
Epoch:   91/100,  MSE loss train:   27.356,  test:   18.744
Epoch:  100/100,  MSE loss train:   26.665,  test:   17.719


#### Model 2

In [60]:
net = mlp.NeuralNetwork()
net.add(mlp.Layer(neurons_count=1, add_bias=True))
net.add(mlp.Layer(neurons_count=30, activation_fun=mlp.ActivationReLU(), add_bias=True))
net.add(mlp.Layer(neurons_count=30, activation_fun=mlp.ActivationReLU(), add_bias=True))
net.add(mlp.Layer(neurons_count=30, activation_fun=mlp.ActivationReLU(), add_bias=True))
net.add(mlp.Layer(neurons_count=1, add_bias=False))
net.train(x_train, y_train,x_test,y_test, epochs=100, learning_rate=0.0001, batch_size=1)

Epoch:    1/100,  MSE loss train: 3300.485,  test: 3243.998
Epoch:   11/100,  MSE loss train:   412.03,  test:  428.202
Epoch:   21/100,  MSE loss train:  265.872,  test:  283.168
Epoch:   31/100,  MSE loss train:  226.766,  test:  243.377
Epoch:   41/100,  MSE loss train:   85.135,  test:   94.242
Epoch:   51/100,  MSE loss train:   64.996,  test:   65.314
Epoch:   61/100,  MSE loss train:   25.996,  test:    33.02
Epoch:   71/100,  MSE loss train:   76.055,  test:   74.339
Epoch:   81/100,  MSE loss train:  277.262,  test:  281.756
Epoch:   91/100,  MSE loss train:   10.903,  test:    17.01
Epoch:  100/100,  MSE loss train:    7.981,  test:   13.861


### Classification - rings5-regular

In [42]:
df_train = pd.read_csv('data/classification/rings5-regular-training.csv').sample(frac=1)
df_test = pd.read_csv('data/classification/rings5-regular-test.csv').sample(frac=1)
print(df_test.head())

# onehot encoding
x_train = df_train.loc[:,df_train.columns!='c'].to_numpy().tolist()
y_train = pd.get_dummies(df_train.loc[:,df_train.columns=='c'].squeeze(axis=1), prefix='class').to_numpy().tolist()
x_test = df_test.loc[:,df_test.columns!='c'].to_numpy().tolist()
y_test = pd.get_dummies(df_test.loc[:,df_test.columns=='c'].squeeze(axis=1), prefix='class').to_numpy().tolist()

print(f"\nUnique classes: {np.array(y_train).shape[1]}")

              x          y  c
706  -97.899164  17.106432  4
460  -98.035256 -28.845421  3
1234  98.816094 -93.602525  3
833   46.096214 -19.163398  1
1062  16.987171 -93.895154  2

Unique classes: 5


#### Model 1

In [43]:
net = mlp.NeuralNetwork()
net.add(mlp.Layer(2))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationReLU()))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationReLU()))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationReLU()))
net.add(mlp.Layer(5, activation_fun=mlp.ActivationTanh(), add_bias=False))
net.train(x_train, y_train, x_test, y_test, epochs=100, learning_rate=0.01, batch_size=10, loss_function=mlp.LossMSE(f1_score=True))

  precision = TP/(TP+FP)


Epoch:    1/100,  MSE loss train:    0.509,  test:    0.488   |   F1 macro train:    0.318,  test:    0.298
Epoch:   11/100,  MSE loss train:    0.106,  test:     0.12   |   F1 macro train:    0.641,  test:    0.571
Epoch:   21/100,  MSE loss train:    0.099,  test:    0.111   |   F1 macro train:    0.686,  test:    0.623
Epoch:   31/100,  MSE loss train:    0.093,  test:    0.107   |   F1 macro train:    0.695,  test:    0.617
Epoch:   41/100,  MSE loss train:    0.089,  test:    0.103   |   F1 macro train:    0.709,  test:    0.623
Epoch:   51/100,  MSE loss train:    0.087,  test:      0.1   |   F1 macro train:     0.73,  test:    0.646
Epoch:   61/100,  MSE loss train:    0.087,  test:    0.098   |   F1 macro train:    0.718,  test:    0.656
Epoch:   71/100,  MSE loss train:    0.083,  test:    0.096   |   F1 macro train:     0.75,  test:    0.672
Epoch:   81/100,  MSE loss train:    0.081,  test:    0.095   |   F1 macro train:    0.748,  test:    0.676
Epoch:   91/100,  MSE loss t

#### Model 2

In [44]:
net = mlp.NeuralNetwork()
net.add(mlp.Layer(2))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationTanh()))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationTanh()))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationTanh()))
net.add(mlp.Layer(5, activation_fun=mlp.ActivationTanh(), add_bias=False))
net.train(x_train, y_train, x_test, y_test, epochs=100, learning_rate=0.01, batch_size=10, loss_function=mlp.LossMSE(f1_score=True))

Epoch:    1/100,  MSE loss train:    0.148,  test:    0.157   |   F1 macro train:    0.328,  test:    0.246
Epoch:   11/100,  MSE loss train:    0.131,  test:    0.145   |   F1 macro train:    0.404,  test:    0.333
Epoch:   21/100,  MSE loss train:    0.128,  test:    0.142   |   F1 macro train:    0.458,  test:    0.381
Epoch:   31/100,  MSE loss train:    0.125,  test:    0.141   |   F1 macro train:     0.47,  test:    0.389
Epoch:   41/100,  MSE loss train:    0.124,  test:     0.14   |   F1 macro train:    0.478,  test:    0.389
Epoch:   51/100,  MSE loss train:    0.121,  test:    0.138   |   F1 macro train:    0.498,  test:    0.372
Epoch:   61/100,  MSE loss train:     0.12,  test:    0.137   |   F1 macro train:    0.505,  test:    0.367
Epoch:   71/100,  MSE loss train:    0.119,  test:    0.136   |   F1 macro train:     0.51,  test:     0.37
Epoch:   81/100,  MSE loss train:    0.118,  test:    0.136   |   F1 macro train:    0.522,  test:    0.396
Epoch:   91/100,  MSE loss t

### Classification - rings3-regular

In [39]:
df_train = pd.read_csv('data/classification/rings3-regular-training.csv').sample(frac=1)
df_test = pd.read_csv('data/classification/rings3-regular-test.csv').sample(frac=1)
print(df_test.head())

# onehot encoding
x_train = df_train.loc[:,df_train.columns!='c'].to_numpy().tolist()
y_train = pd.get_dummies(df_train.loc[:,df_train.columns=='c'].squeeze(axis=1), prefix='class').to_numpy().tolist()
x_test = df_test.loc[:,df_test.columns!='c'].to_numpy().tolist()
y_test = pd.get_dummies(df_test.loc[:,df_test.columns=='c'].squeeze(axis=1), prefix='class').to_numpy().tolist()

print(f"\nUnique classes: {np.array(y_train).shape[1]}")

              x          y  c
1589  93.057692 -51.823544  2
100   48.683090  83.667869  1
1184  59.250966  13.685368  1
609    1.597480 -47.192650  0
867   97.458982  61.258681  2

Unique classes: 3


#### Model 1

In [40]:
net = mlp.NeuralNetwork()
net.add(mlp.Layer(2))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationReLU()))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationReLU()))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationReLU()))
net.add(mlp.Layer(3, activation_fun=mlp.ActivationTanh(), add_bias=False))
net.train(x_train, y_train, x_test, y_test, epochs=100, learning_rate=0.01, batch_size=10, loss_function=mlp.LossMSE(f1_score=True))

Epoch:    1/100,  MSE loss train:     0.66,  test:    0.663   |   F1 macro train:    0.305,  test:    0.303
Epoch:   11/100,  MSE loss train:    0.171,  test:    0.175   |   F1 macro train:    0.591,  test:    0.579
Epoch:   21/100,  MSE loss train:    0.157,  test:    0.162   |   F1 macro train:    0.669,  test:    0.647
Epoch:   31/100,  MSE loss train:    0.147,  test:    0.155   |   F1 macro train:    0.719,  test:     0.69
Epoch:   41/100,  MSE loss train:    0.133,  test:    0.142   |   F1 macro train:    0.738,  test:    0.719
Epoch:   51/100,  MSE loss train:     0.13,  test:    0.141   |   F1 macro train:    0.742,  test:    0.712
Epoch:   61/100,  MSE loss train:     0.13,  test:    0.144   |   F1 macro train:    0.737,  test:    0.704
Epoch:   71/100,  MSE loss train:    0.127,  test:    0.142   |   F1 macro train:    0.738,  test:    0.712
Epoch:   81/100,  MSE loss train:    0.114,  test:    0.126   |   F1 macro train:    0.784,  test:    0.759
Epoch:   91/100,  MSE loss t

#### Model 2

In [41]:
net = mlp.NeuralNetwork()
net.add(mlp.Layer(2))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationTanh()))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationTanh()))
net.add(mlp.Layer(20, activation_fun=mlp.ActivationTanh()))
net.add(mlp.Layer(3, activation_fun=mlp.ActivationTanh(), add_bias=False))
net.train(x_train, y_train, x_test, y_test, epochs=100, learning_rate=0.01, batch_size=10, loss_function=mlp.LossMSE(f1_score=True))

Epoch:    1/100,  MSE loss train:    0.202,  test:    0.198   |   F1 macro train:     0.49,  test:    0.509
Epoch:   11/100,  MSE loss train:     0.19,  test:    0.188   |   F1 macro train:    0.515,  test:    0.548
Epoch:   21/100,  MSE loss train:    0.184,  test:    0.184   |   F1 macro train:    0.546,  test:    0.543
Epoch:   31/100,  MSE loss train:    0.178,  test:    0.177   |   F1 macro train:    0.576,  test:    0.565
Epoch:   41/100,  MSE loss train:    0.177,  test:    0.177   |   F1 macro train:    0.563,  test:    0.558
Epoch:   51/100,  MSE loss train:    0.169,  test:    0.168   |   F1 macro train:     0.58,  test:     0.57
Epoch:   61/100,  MSE loss train:    0.175,  test:    0.174   |   F1 macro train:    0.576,  test:    0.554
Epoch:   71/100,  MSE loss train:    0.158,  test:    0.164   |   F1 macro train:    0.609,  test:    0.598
Epoch:   81/100,  MSE loss train:    0.154,  test:     0.16   |   F1 macro train:    0.622,  test:    0.604
Epoch:   91/100,  MSE loss t

## Summary

### Additional question regarding ReLU gradient implementation

ReLU function may have differences in gradient implementation - some may set gradient to 0 for negative values, some may set it as 0.1 for example (called Leaky ReLu).

### Comparison of speed and accuracy of networks with different number of hidden layers and activation functions

Done in a separate notebook with results presented above.

In general, the best results were obtained by networks with 1 hidden layer with 120 neurons and nonlinear activation functions, but also with 3 hidden layers and ReLU activation function.

Linear activation function is useless as no matter how many layers and neurons, the output will always be a linear function.

It's also worth noticing that ReLU and tanh activation functions are faster than the sigmoid activation function. The cause of this is that ReLU and tanh activation functions are implemented in numpy, which turns out to be much faster than a vectorized python function implemented for sigmoid.

### Performance on other datasets

#### Regression - steps-large

Model 1 - 1x30 neurons, tahn activation function (on output layer linear activation)
- mse train:    28.101
- mse test:     19.797

Model 2 - 3x30 neurons, ReLU activation function (on output layer linear activation)
- mse train:    7.981
- mse test:     13.861

#### Classification - rings5-regular

Model 1 - 3x20 neurons, relu activation function (on output layer tahn activation)
- f1 train: 0.76     
- f1 test:  0.7

Model 2 - 3x20 neurons, tanh activation function (on output layer too)
- f1 train: 0.529
- f1 test:  0.407

#### Classification - rings3-regular

Model 1 - 3x20 neurons, relu activation function (on output layer tahn activation)
- f1 train: 0.79
- f1 test:  0.774

Model 2 - 3x20 neurons, tanh activation function (on output layer too)
- f1 train: 0.665
- f1 test:  0.639
