# Implementacja propagacji wstecznej błędu

## [Zadanie](http://pages.mini.pw.edu.pl/~karwowskij/mioad/lab-sieci.html#org6058800)



W ramach tego laboratorium trzeba zaimplementować uczenie sieci neuronowej propagacją wsteczną błędu.  
Aby sprawdzić implementację, należy wykonać uczenie na prostych danych do uczenia dostarczonych na zajęciach. Następnie należy zaimplementować metodę wizualizacji wartości wag sieci w kolejnych iteracjach i, w przypadku gdy nie udaje się nauczyć sieci, spróbować wykorzystać te wizualizacje do ustalenia przyczyny problemu.  
Zaimplementować wersję z aktualizacją wag po prezentacji wszystkich wzorców i wersję z aktualizacją po prezentacji kolejnych porcji (batch). Porównać szybkość uczenia dla każdego z wariantów.

Rozważyć trzy warianty inicjowania wag do procesu uczenia w tym wagi z rozkładu jednostajnego na przedziale [0,1]. Opcjonalnie zaimplementować inną metodą inicjowania wag. Albo metodę He albo Xavier.

Przetestować uczenie sieci na następujących zbiorach:
- square-simple (jeśli sieć nie jest w stanie się nauczyć tej funkcji to coś jest bardzo źle zrobione),
- steps-small,
- multimodal-large.




References:
- https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/
- http://neuralnetworksanddeeplearning.com/chap1.html
- http://neuralnetworksanddeeplearning.com/chap2.html

In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn import metrics
import random
import time

# the fixed grain ensures the reproducibility of the results
random.seed(123)

In [2]:
#### Miscellaneous functions
def activation_func(name):
    """
    The activation function.
    """
    if(name == 'linear'):
        return lambda x: x
    elif(name == 'sigmoid'):
        return lambda x: (1 / (1 + np.exp(-x)))
    else:
        print('Unknown activation function - using sigmoid')
        return lambda x: (1 / (1 + np.exp(-x)))
    
def activation_prime(name):
    """
    Derivative of the activation function.
    """
    if(name == 'linear'):
        return lambda x: 1
    elif(name == 'sigmoid'):
        sigmoid = lambda x: (1 / (1 + np.exp(-x))) 
        return lambda x: sigmoid(x) * (1-sigmoid(x)) 
    else:
        print('Unrecognized activation function has been replaced with the default sigmoid ')
        sigmoid = lambda x: (1 / (1 + np.exp(-x)))                           
        return lambda x: sigmoid(x) * (1-sigmoid(x)) 

### Neural network initialization

In [3]:
class MLP:
    def __init__(self, layers, activation_functions, initial_dist = 'default'):
        """
        Presence of at least one hidden layer is an accompanying assumption        

        Takes:        
        layers - list of numbers of neurons in subsequent layers
        activation_functions - list of names of functions ie ('sigmoid', 'linear')
        
        Remarks: 
          - Length of layers list should be equal to length of activation_functions list + 1 
          - The biases and weights for the network are initialized randomly, using continuous uniform 
          distribution with certain bounds between 0 and 1 or -1 and 1 or Gaussian distribution with mean 0, 
          and variance 1.
        """
        
        self.layers = layers
        self.activation_functions = activation_functions
        self.weights = [] 
        self.biases = [] 
        
        if(initial_dist == 'gaussian'): 
            for i in range(len(layers) - 1):
                self.weights.append(np.random.randn(layers[i+1], layers[i]))
                self.biases.append(np.random.randn(layers[i+1], 1))
                
        elif(initial_dist == 'uniform'): 
            for i in range(len(layers) - 1):
                self.weights.append(np.random.uniform(-1, 1, size=(layers[i+1], layers[i]))) 
                self.biases.append(np.random.uniform(-1, 1, size=(layers[i+1], 1))) 
                
        else:
            # print('Unrecognized initial distribution has been replaced with the default uniform distribution bounded with 0 and 1.')
            for i in range(len(layers) - 1):
                self.weights.append(np.random.uniform(0, 1, size=(layers[i+1], layers[i])))
                self.biases.append(np.random.uniform(0, 1, size=(layers[i+1], 1)))
            
    def forward(self, x):
        """
        Returns the output of the network if x is an input.
        """
        
        a = x
        z = []
        activations = [a]
        
        for i in range(len(self.weights)):
            activation_function = activation_func(self.activation_functions[i])
            z.append(self.weights[i].dot(a) + self.biases[i])
            
            a = activation_function(z[-1])
            activations.append(a)
            
        return (z, activations)
                  
    def backprop(self, y, z, activations):
        """
        Function performing backpropagation
        Returns nabla_b and nabla_w representing the
        derivatives by weights and biases respectively. 
        nabla_b and nabla_w are calculated layer-by-layer.
        """ 
        
        nabla_w = [] 
        nabla_b = []

        # deltas for each layer 
        d = [0 for i in range (len(self.weights))]  
                  
        # the last layer error
        d[-1] = ((activations[-1] - y) * (activation_prime(self.activation_functions[-1]))(z[-1])) 
        
        # backward pass
        for i in reversed(range(len(d)-1)):
            d[i] = self.weights[i+1].T.dot(d[i+1]) * (activation_prime(self.activation_functions[i])(z[i]))  
                  
        nabla_b = [d.dot(np.ones((y.shape[1],1))) for d in d]
        nabla_w = [d.dot(activations[i].T) for i,d in enumerate(d)]
                  
        return nabla_w, nabla_b
                  
                  
    def train(self, x, y, batch_size = 10, epochs = 100, eta = 0.01):
        """
        Updates weights and biases based on the output using backpropagation.
        
        The version with the update of the scales after the presentation 
        of all patterns and the version with the update after the presentation
        of subsequent portions (batch) was performed within one function. 
        To get the first variant of operation as batch size the number of all
        observations should be given.
        """
        for epoch in range(epochs):
            
            a = np.arange(len(y[0]))
            random.shuffle(a)
            updated_x = np.array([[x[0][i] for i in a]])
            updated_y = np.array([[y[0][i] for i in a]])
            i = 0 
            
            while(i < len(y)):
                
                x_batch = updated_x[0][i : (i + batch_size)].reshape(1,-1)
                y_batch = updated_y[0][i : (i + batch_size)].reshape(1,-1)
                i += batch_size
                
                z, activations = self.forward(x_batch)
                nabla_w, nabla_b = self.backprop(y_batch, z, activations)
                
                self.weights = [w - (eta / batch_size) * dw for w, dw in zip(self.weights, nabla_w)]
                # print(self.weights)
                self.biases = [b - (eta / batch_size) * db for b, db in zip(self.biases, nabla_b)]

## 1. Square-simple dataset

In [4]:
train_df = pd.read_csv('../data/mio1/regression/square-simple-training.csv', index_col=0)
X = train_df['x']
X = X.values.reshape(1,-1)
y = train_df['y']
y = y.values.reshape(1,-1)
test_df = pd.read_csv('../data/mio1/regression/square-simple-test.csv', index_col=0)
X_test = test_df['x']
X_test = X_test.values.reshape(1,-1)
y_test = test_df['y']
y_test = y_test.values.reshape(1,-1)

In [5]:
# Rozkład jednostajny U([0,1])

means = []
mins = []
maxs = []
stds = []
mean_times = []
for i in np.arange(start = 10, stop = (len(X[0])+1), step = len(X[0])/10):
    scores = [] #mean absolute error
    times = []
    for j in range(50):
        start = time.time()
        mlp = MLP([1, 20,  1], activation_functions = ['sigmoid', 'linear'])
        mlp.train(X, y, epochs=1000, batch_size=int(i), eta = 0.01)
        z, activations = mlp.forward(X_test)
        end = time.time()
        scores.append(metrics.mean_absolute_error(y_test[0], activations[-1][0]))
        times.append(end - start)
    means.append(np.mean(scores))
    mins.append(min(scores))
    maxs.append(max(scores))
    stds.append(np.std(scores))
    mean_times.append(np.mean(times))
output = pd.DataFrame({'Nr of batches':np.arange(start = 10, stop = (len(X[0])+1), step = len(X[0])/10), 'Mean MSE':means, 'Min MSE':mins, 'Max MSE':maxs, 'MSE Std':stds, 'Mean training time':mean_times})
output

Unnamed: 0,Nr of batches,Mean MSE,Min MSE,Max MSE,MSE Std,Mean training time
0,10.0,12.643634,5.912571,33.312178,5.335768,0.506815
1,20.0,13.460079,5.873654,21.805111,4.325653,0.597413
2,30.0,14.290073,7.298766,31.875251,5.198911,0.561309
3,40.0,13.502314,6.818468,24.233312,3.853637,0.512476
4,50.0,13.889574,8.892049,32.43478,4.022268,0.523701
5,60.0,13.908477,8.267224,31.658309,5.121636,0.530092
6,70.0,13.966092,6.257157,31.561458,4.929973,0.586362
7,80.0,12.368534,7.723691,21.901756,2.971547,0.513405
8,90.0,13.01048,8.191591,21.963261,2.989532,0.594555
9,100.0,12.485091,7.892613,20.493086,3.084025,0.556966


In [6]:
# Rozkład jednostajny U([-1,1])

means = []
mins = []
maxs = []
stds = []
mean_times = []
for i in np.arange(start = 10, stop = (len(X[0])+1), step = len(X[0])/10):
    scores = [] #mean absolute error
    times = []
    for j in range(50):
        start = time.time()
        mlp = MLP([1, 20,  1], activation_functions = ['sigmoid', 'linear'], initial_dist='uniform')
        mlp.train(X, y, epochs=1000, batch_size=int(i), eta = 0.01)
        z, activations = mlp.forward(X_test)
        end = time.time()
        scores.append(metrics.mean_absolute_error(y_test[0], activations[-1][0]))
        times.append(end - start)
    means.append(np.mean(scores))
    mins.append(min(scores))
    maxs.append(max(scores))
    stds.append(np.std(scores))
    mean_times.append(np.mean(times))
output = pd.DataFrame({'Nr of batches':np.arange(start = 10, stop = (len(X[0])+1), step = len(X[0])/10), 'Mean MSE':means, 'Min MSE':mins, 'Max MSE':maxs, 'MSE Std':stds, 'Mean training time':mean_times})
output

Unnamed: 0,Nr of batches,Mean MSE,Min MSE,Max MSE,MSE Std,Mean training time
0,10.0,9.799406,5.047427,21.933413,3.556502,0.468309
1,20.0,9.860416,4.87133,19.811685,3.340989,0.539376
2,30.0,10.025123,5.300886,25.641195,4.294306,0.509884
3,40.0,9.702336,5.372084,16.917427,2.756387,0.502832
4,50.0,9.95539,4.490533,20.926792,3.416104,0.490538
5,60.0,10.048605,4.39794,19.389944,3.097819,0.644819
6,70.0,10.204976,4.585366,17.97154,2.887872,0.510437
7,80.0,10.406578,5.451966,21.665282,4.133785,0.563854
8,90.0,9.954448,5.688356,31.368066,4.036751,0.551121
9,100.0,10.358034,4.496592,21.307828,3.522311,0.504609


In [7]:
# Rozkład gaussowski N(0,1)

means = []
mins = []
maxs = []
stds = []
mean_times = []
for i in np.arange(start = 10, stop = (len(X[0])+1), step = len(X[0])/10):
    scores = [] #mean absolute error
    times = []
    for j in range(50):
        start = time.time()
        mlp = MLP([1, 20,  1], activation_functions = ['sigmoid', 'linear'], initial_dist='gaussian')
        mlp.train(X, y, epochs=1000, batch_size=int(i), eta = 0.01)
        z, activations = mlp.forward(X_test)
        end = time.time()
        scores.append(metrics.mean_absolute_error(y_test[0], activations[-1][0]))
        times.append(end - start)
    means.append(np.mean(scores))
    mins.append(min(scores))
    maxs.append(max(scores))
    stds.append(np.std(scores))
    mean_times.append(np.mean(times))
output = pd.DataFrame({'Nr of batches':np.arange(start = 10, stop = (len(X[0])+1), step = len(X[0])/10), 'Mean MSE':means, 'Min MSE':mins, 'Max MSE':maxs, 'MSE Std':stds, 'Mean training time':mean_times})
output

Unnamed: 0,Nr of batches,Mean MSE,Min MSE,Max MSE,MSE Std,Mean training time
0,10.0,8.049504,4.523435,16.315374,2.50795,0.423929
1,20.0,7.91005,4.04417,18.897859,2.975172,0.44362
2,30.0,7.075544,4.302903,16.107871,2.518223,0.463364
3,40.0,7.048299,4.239753,14.212465,2.040417,0.488425
4,50.0,7.093778,4.133146,11.163663,1.821443,0.528831
5,60.0,7.354486,3.958426,16.063226,2.41636,0.512568
6,70.0,6.77091,3.788175,11.255257,1.73965,0.495077
7,80.0,6.925562,3.532236,17.70284,2.682641,0.508775
8,90.0,7.210558,4.068948,14.41889,2.314474,0.529134
9,100.0,7.657029,4.846737,16.723033,2.424914,0.609939


## 2. Steps-small dataset

In [27]:
train_df = pd.read_csv('../data/mio1/regression/steps-small-training.csv', index_col=0)
X = train_df['x']
X = X.values.reshape(1,-1)
y = train_df['y']
y = y.values.reshape(1,-1)
test_df = pd.read_csv('../data/mio1/regression/steps-small-test.csv', index_col=0)
X_test = test_df['x']
X_test = X_test.values.reshape(1,-1)
y_test = test_df['y']
y_test = y_test.values.reshape(1,-1)

In [28]:
# Rozkład jednostajny U([0,1])

means = []
mins = []
maxs = []
stds = []
mean_times = []
for i in np.arange(start = 5, stop = (len(X[0])+1), step = len(X[0])/10):
    scores = [] #mean absolute error
    times = []
    for j in range(50):
        start = time.time()
        mlp = MLP([1, 20,  1], activation_functions = ['sigmoid', 'linear'])
        mlp.train(X, y, epochs=1000, batch_size=int(i), eta = 0.01)
        z, activations = mlp.forward(X_test)
        end = time.time()
        scores.append(metrics.mean_absolute_error(y_test[0], activations[-1][0]))
        times.append(end - start)
    means.append(np.mean(scores))
    mins.append(min(scores))
    maxs.append(max(scores))
    stds.append(np.std(scores))
    mean_times.append(np.mean(times))
output = pd.DataFrame({'Nr of batches':np.arange(start = 5, stop = (len(X[0])+1), step = len(X[0])/10), 'Mean MSE':means, 'Min MSE':mins, 'Max MSE':maxs, 'MSE Std':stds, 'Mean training time':mean_times})
output

Unnamed: 0,Nr of batches,Mean MSE,Min MSE,Max MSE,MSE Std,Mean training time
0,5.0,12.568424,9.986691,14.983697,1.206049,0.381441
1,10.0,12.831569,9.766502,14.864,1.037339,0.442443
2,15.0,12.650521,9.644683,14.009029,1.080831,0.443363
3,20.0,12.673775,9.84931,14.100047,1.068098,0.438783
4,25.0,12.692304,9.78529,14.0204,1.055783,0.438197
5,30.0,12.559726,9.654451,13.677073,1.141574,0.452159
6,35.0,13.06196,12.380738,13.881919,0.354035,0.454705
7,40.0,12.905561,9.850711,13.93889,0.685141,0.375914
8,45.0,12.547677,9.795282,13.3691,1.046786,0.442319
9,50.0,12.823889,10.080947,13.645882,0.663135,0.490067


In [29]:
# Rozkład jednostajny U([-1,1])

means = []
mins = []
maxs = []
stds = []
mean_times = []
for i in np.arange(start = 5, stop = (len(X[0])+1), step = len(X[0])/10):
    scores = [] #mean absolute error
    times = []
    for j in range(50):
        start = time.time()
        mlp = MLP([1, 20,  1], activation_functions = ['sigmoid', 'linear'], initial_dist='uniform')
        mlp.train(X, y, epochs=1000, batch_size=int(i), eta = 0.01)
        z, activations = mlp.forward(X_test)
        end = time.time()
        scores.append(metrics.mean_absolute_error(y_test[0], activations[-1][0]))
        times.append(end - start)
    means.append(np.mean(scores))
    mins.append(min(scores))
    maxs.append(max(scores))
    stds.append(np.std(scores))
    mean_times.append(np.mean(times))
output = pd.DataFrame({'Nr of batches':np.arange(start = 5, stop = (len(X[0])+1), step = len(X[0])/10), 'Mean MSE':means, 'Min MSE':mins, 'Max MSE':maxs, 'MSE Std':stds, 'Mean training time':mean_times})
output

Unnamed: 0,Nr of batches,Mean MSE,Min MSE,Max MSE,MSE Std,Mean training time
0,5.0,10.476052,9.155576,12.181807,0.751752,0.442858
1,10.0,10.451113,9.606521,12.668735,0.536183,0.452118
2,15.0,10.559929,9.612576,11.306888,0.367685,0.442738
3,20.0,10.53018,9.452257,11.604194,0.40094,0.498157
4,25.0,10.496054,9.777637,11.295098,0.322044,0.493117
5,30.0,10.500243,9.945143,11.301036,0.291394,0.450026
6,35.0,10.540347,9.888917,11.049901,0.310978,0.482623
7,40.0,10.597314,9.984754,11.298997,0.296725,0.508169
8,45.0,10.651978,10.035162,11.294728,0.272011,0.570515
9,50.0,10.675836,9.940731,11.206103,0.258057,0.525219


In [30]:
# Rozkład gaussowski N(0,1)

means = []
mins = []
maxs = []
stds = []
mean_times = []
for i in np.arange(start = 5, stop = (len(X[0])+1), step = len(X[0])/10):
    scores = [] #mean absolute error
    times = []
    for j in range(50):
        start = time.time()
        mlp = MLP([1, 20,  1], activation_functions = ['sigmoid', 'linear'], initial_dist='gaussian')
        mlp.train(X, y, epochs=1000, batch_size=int(i), eta = 0.01)
        z, activations = mlp.forward(X_test)
        end = time.time()
        scores.append(metrics.mean_absolute_error(y_test[0], activations[-1][0]))
        times.append(end - start)
    means.append(np.mean(scores))
    mins.append(min(scores))
    maxs.append(max(scores))
    stds.append(np.std(scores))
    mean_times.append(np.mean(times))
output = pd.DataFrame({'Nr of batches':np.arange(start = 5, stop = (len(X[0])+1), step = len(X[0])/10), 'Mean MSE':means, 'Min MSE':mins, 'Max MSE':maxs, 'MSE Std':stds, 'Mean training time':mean_times})
output

Unnamed: 0,Nr of batches,Mean MSE,Min MSE,Max MSE,MSE Std,Mean training time
0,5.0,10.35359,8.891288,12.832247,0.680927,0.387595
1,10.0,10.292974,9.324535,11.908711,0.474993,0.395539
2,15.0,10.213512,9.154566,11.087745,0.504216,0.427227
3,20.0,10.329257,9.685127,11.23658,0.338272,0.511446
4,25.0,10.283892,9.723963,11.067686,0.329195,0.464543
5,30.0,10.38821,9.847602,10.89133,0.271163,0.492169
6,35.0,10.437272,9.690542,11.011493,0.308507,0.453783
7,40.0,10.424373,9.861852,11.237751,0.31351,0.486737
8,45.0,10.498563,9.909122,11.471396,0.308443,0.423937
9,50.0,10.492616,9.610139,11.142139,0.299499,0.533022


### 3. Multimodal-large dataset

In [22]:
train_df = pd.read_csv('../data/mio1/regression/multimodal-large-training.csv', index_col=0)
X = train_df['x']
X = X.values.reshape(1,-1)
y = train_df['y']
y = y.values.reshape(1,-1)
test_df = pd.read_csv('../data/mio1/regression/multimodal-large-test.csv', index_col=0)
X_test = test_df['x']
X_test = X_test.values.reshape(1,-1)
y_test = test_df['y']
y_test = y_test.values.reshape(1,-1)

In [23]:
# Rozkład jednostajny U([0,1])

means = []
mins = []
maxs = []
stds = []
mean_times = []
for i in np.array([10, 100, 1000, 10000]):
    scores = [] #mean absolute error
    times = []
    for j in range(10):
        start = time.time()
        mlp = MLP([1, 5, 5,  1], activation_functions = ['sigmoid', 'sigmoid', 'linear'])
        mlp.train(X, y, epochs=1000, batch_size=int(i), eta = 0.01)
        z, activations = mlp.forward(X_test)
        end = time.time()
        scores.append(metrics.mean_absolute_error(y_test[0], activations[-1][0]))
        times.append(end - start)
    means.append(np.mean(scores))
    mins.append(min(scores))
    maxs.append(max(scores))
    stds.append(np.std(scores))
    mean_times.append(np.mean(times))
output = pd.DataFrame({'Nr of batches': np.array([10, 100, 1000, 10000]), 'Mean MSE':means, 'Min MSE':mins, 'Max MSE':maxs, 'MSE Std':stds, 'Mean training time':mean_times})
output

Unnamed: 0,Nr of batches,Mean MSE,Min MSE,Max MSE,MSE Std,Mean training time
0,10,15.706951,13.06204,20.762024,2.670063,40.645264
1,100,11.165347,9.713305,13.446099,1.161271,43.980804
2,1000,11.109059,9.398124,12.462422,1.01508,29.927933
3,10000,10.664581,10.082848,11.215142,0.387911,45.858199


In [24]:
# Rozkład jednostajny U([-1,1])

means = []
mins = []
maxs = []
stds = []
mean_times = []
for i in np.array([10, 100, 1000, 10000]):
    scores = [] #mean absolute error
    times = []
    for j in range(10):
        start = time.time()
        mlp = MLP([1, 5, 5,  1], activation_functions = ['sigmoid', 'sigmoid', 'linear'], initial_dist='uniform')
        mlp.train(X, y, epochs=1000, batch_size=int(i), eta = 0.01)
        z, activations = mlp.forward(X_test)
        end = time.time()
        scores.append(metrics.mean_absolute_error(y_test[0], activations[-1][0]))
        times.append(end - start)
    means.append(np.mean(scores))
    mins.append(min(scores))
    maxs.append(max(scores))
    stds.append(np.std(scores))
    mean_times.append(np.mean(times))
output = pd.DataFrame({'Nr of batches':np.array([10, 100, 1000, 10000]), 'Mean MSE':means, 'Min MSE':mins, 'Max MSE':maxs, 'MSE Std':stds, 'Mean training time':mean_times})
output

Unnamed: 0,Nr of batches,Mean MSE,Min MSE,Max MSE,MSE Std,Mean training time
0,10,10.372553,6.606052,14.898121,2.389733,29.246943
1,100,7.173462,5.161227,11.052933,2.063298,29.443649
2,1000,7.90149,5.28908,10.355405,1.991918,29.55215
3,10000,8.77832,5.46149,11.678431,2.399101,48.865691


In [26]:
# Rozkład gaussowski N(0,1)

means = []
mins = []
maxs = []
stds = []
mean_times = []
for i in np.array([10, 100, 1000, 10000]):
    scores = [] #mean absolute error
    times = []
    for j in range(10):
        start = time.time()
        mlp = MLP([1, 5, 5,  1], activation_functions = ['sigmoid', 'sigmoid', 'linear'], initial_dist='gaussian')
        mlp.train(X, y, epochs=1000, batch_size=int(i), eta = 0.01)
        z, activations = mlp.forward(X_test)
        end = time.time()
        scores.append(metrics.mean_absolute_error(y_test[0], activations[-1][0]))
        times.append(end - start)
    means.append(np.mean(scores))
    mins.append(min(scores))
    maxs.append(max(scores))
    stds.append(np.std(scores))
    mean_times.append(np.mean(times))
output = pd.DataFrame({'Nr of batches': np.array([10, 100, 1000, 10000]), 'Mean MSE':means, 'Min MSE':mins, 'Max MSE':maxs, 'MSE Std':stds, 'Mean training time':mean_times})
output

Unnamed: 0,Nr of batches,Mean MSE,Min MSE,Max MSE,MSE Std,Mean training time
0,10,10.817211,6.755954,17.260537,3.240975,29.799897
1,100,7.264094,4.6727,12.540535,2.636687,28.711503
2,1000,7.797173,4.835234,10.545822,2.152942,29.064719
3,10000,7.07895,5.468716,11.143545,1.933736,49.700941
