# Feedforward neural network. Backpropagation.

## Calculating output of neural network

Right now class "NeuralNet" is meaningless because we will use neural networks as approximators of sophisticated functions but this class can't even compute an output **y** based on an input **x**. Thus, based on weight notation, we are going to write code of computing next layer values based on previous ones and weights of connections between them.

In [None]:
import random
import math

def sigmoid(x):
    return 1/(1+math.exp(-x))
    
weights=[]
n_input=3
n_hidden=4
epsilon=0.1
for i in range(n_input):
    weights_row=[] #creating empty row of weights from one input neuron
    for j in range(n_hidden):
        weights_row.append(random.uniform(-epsilon,epsilon)) #appending random weights
    weights.append(weights_row) #appending row to a table of weights    

x=[]
for i in range(n_input):
    x.append(random.random())

y=[]
for j in range(n_hidden):
    #preparing to sum product of values and weights
    yj=0
    for i in range(n_input):
        yj+=x[i]*weights[i][j] #i - number of input neuron , j - number of output neuron
        
    yj=sigmoid(yj) #applying sigmoid function
    y.append(yj)
    
print(y)

Now this algorithm can be repeated twice in "NeuralNet" class:

In [None]:
class NeuralNet:
    def __init__(self,inp,hid,out,epsilon):
        #initial parameters are numbers of neurons in input (inp), hidden (hid), and output (out) layers 
        #plus epsilon value for weight initialization
        #firstly, write down numbers of neurons in each layer 
        self.inp=inp
        self.hid=hid
        self.out=out
        #secondly, create weights
        #wa - weights of connection between input and hidden layers
        #wb - weights of connection between hidden and output layers
        import random
        self.wa=[]
        self.wb=[]
        
        #fill weight tables with random values from -epsilon to epsilon
        for i in range(self.inp):
            weights_row=[] 
            for j in range(self.hid):
                weights_row.append(random.uniform(-epsilon,epsilon)) 
            self.wa.append(weights_row)  
            
        for i in range(self.hid):
            weights_row=[] 
            for j in range(self.out):
                weights_row.append(random.uniform(-epsilon,epsilon)) 
            self.wb.append(weights_row) 
            
    #define activation function
    def sigmoid(self,x):
        return 1/(1+math.exp(-x))
    
    #calculate y output
    def y(self,x):
        
        #firstly, calculate outputs of hidden layer neurons
        hidden_values=[]
        for j in range(self.hid):    
            yj=0
            for i in range(self.inp):
                yj+=x[i]*self.wa[i][j] #i - number of input layer neuron , j - number of hidden layer neuron

            yj=self.sigmoid(yj)
            hidden_values.append(yj)

        #secondly, hidden layer is treated as an input layer
        output_values=[]
        for k in range(self.out):
            zk=0
            for j in range(self.hid):
                zk+=hidden_values[j]*self.wb[j][k] #j - number of hidden layer neuron , k - number of output layer neuron

            zk=self.sigmoid(zk)
            output_values.append(zk)
            
        return output_values

In [None]:
nn=NeuralNet(3,4,2,0.1) #implement neural net from a picture
print(nn.y([1,0,1])) #get y for x=[1,0,1]

## Backpropagation

The main purpose of a neural network is still not accomplished. We need to figure out a way of changing something in a neural network to approximate functions. We have already declared its architecture and activation function, so we are left only with one thing to change - weights.
Firstly, we need to show the way to a neural network by writing down not only x-values but also desired y-values. These pairs are called **training examples**. Such declaration of input and output called supervised learning and it has huge applications. Basically, we are trying to create a perfect function which goes through (x,y) coordinates we locked. Neural networks are very good at such tasks because of their neural plasticity or ability to quickly change their behavior by changing weights.
This task is usually solved as follows:
1. We declare **loss** function, which shows how good neural network in fitting into training examples
2. Considering loss function as a function with several variables - weights, we calculate partial derivatives of loss function with respect to the weights
3. We shift weights iteratively using vanilla gradient descent

Task 1 is quite easy: loss function can be defined as $$\mathcal{L}=\frac{1}{2}\sum_i(y_i(\mathbf{\hat{x}})-\hat{y_i})^2$$where two lists of numerical values $(\mathbf{\hat{x},\hat{y}})$ is a training example and summation is done by each output layer neuron.

Task 3 was discussed in the previous lecture and can be described in one string: $$w^{t+1}=w^t-\frac{\partial\mathcal{L}}{\partial w}$$

Task 2 is solvable using partial derivative but can be quite hard for understanding. Its solution is an algorithm called backpropagation and its derivation is going to be written for an interested reader into separate pdf file, which is going to be in the same github directory as this Jupyter notebook. Nevertheless, we are going to fully use its fruits in the following code, which is final iteration of NeuralNet class, which should be placed in separate library 'neural_net_lib'.

In [None]:
class NeuralNet:
    def __init__(self,inp,hid,out,epsilon):
        #initial parameters are numbers of neurons in input (inp), hidden (hid), and output (out) layers 
        #plus epsilon value for weight initialization
        #firstly, write down numbers of neurons in each layer 
        self.inp=inp
        self.hid=hid
        self.out=out
        #secondly, create weights
        #wa - weights of connection between input and hidden layers
        #wb - weights of connection between hidden and output layers
        import random
        self.wa=[]
        self.wb=[]
        
        #fill weight tables with random values from -epsilon to epsilon
        for i in range(self.inp):
            weights_row=[] 
            for j in range(self.hid):
                weights_row.append(random.uniform(-epsilon,epsilon)) 
            self.wa.append(weights_row)  
            
        for i in range(self.hid):
            weights_row=[] 
            for j in range(self.out):
                weights_row.append(random.uniform(-epsilon,epsilon)) 
            self.wb.append(weights_row) 
            
    #define activation function
    def sigmoid(self,x):
        return 1/(1+math.exp(-x))
    
    #calculate y output
    def y(self,x):
        
        #firstly, calculate outputs of hidden layer neurons
        hidden_values=[]
        for j in range(self.hid):    
            yj=0
            for i in range(self.inp):
                yj+=x[i]*self.wa[i][j] #i - number of input layer neuron , j - number of hidden layer neuron

            yj=self.sigmoid(yj)
            hidden_values.append(yj)

        #secondly, hidden layer is treated as an input layer
        output_values=[]
        for k in range(self.out):
            zk=0
            for j in range(self.hid):
                zk+=hidden_values[j]*self.wb[j][k] #j - number of hidden layer neuron , k - number of output layer neuron

            zk=self.sigmoid(zk)
            output_values.append(zk)
            
        return output_values
    
    def train(self,x_data,y_data,eta):
        for t in range(len(x_data)):
            #firstly, calculate outputs of hidden layer neurons
            hidden_values=[]            
            for j in range(self.hid):    
                yj=0
                for i in range(self.inp):
                    yj+=x_data[t][i]*self.wa[i][j]#i - number of input layer neuron , j - number of hidden layer neuron

                yj=self.sigmoid(yj)
                hidden_values.append(yj)

            #secondly, hidden layer is treated as an input layer
            output_values=[]            
            for k in range(self.out):
                zk=0
                for j in range(self.hid):
                    zk+=hidden_values[j]*self.wb[j][k] #j - number of hidden layer neuron , k - number of output layer neuron

                zk=self.sigmoid(zk)
                output_values.append(zk)
            
            delta_k=[]
            for k in range(self.out):
                delta_k.append((output_values[k]-y_data[t][k])*output_values[k]*(1-output_values[k]))
                               
            delta_j=[]
            for j in range(self.hid):
                s=0
                for k in range(self.out):
                    s+=delta_k[k]*self.wb[j][k]
                delta_j.append(s*hidden_values[j]*(1-hidden_values[j]))
                               
            for j in range(self.hid):
                for k in range(self.out):
                    self.wb[j][k]-=eta*delta_k[k]*hidden_values[j]
                               
            for i in range(self.inp):
                for j in range(self.hid):
                    self.wa[i][j]-=eta*delta_j[j]*x_data[t][i]

To start with, we are going to test power of neural networks on finding patterns in logic tables which represent logic operators like AND, OR and XOR.

In [None]:
#this example shows how new function train works on XOR logic table
#x1 x2 y1
#0  0  0
#0  1  1
#1  0  1
#1  1  0
nn=NeuralNet(2,5,1,0.1)
x_data=[[0,0],[0,1],[1,0],[1,1]]
y_data=[[0],[1],[1],[0]]

In [None]:
#train neural net iteratively with learning rate eta=1
eta=1
for i in range(10001):
    #each 1000 iterations print side by side actual neural net result and desired y value
    if i%1000==0:
        print()
        for k in range(len(x_data)):
            print(nn.y(x_data[k]),y_data[k])          
    
    nn.train(x_data,y_data,eta)

It works! After several thousands iterations neural net shows very close results to a desired one!

## Homework

1. Implement neural network **without** hidden layers from a scratch as a class "NeuralNet0" and save it in 'neural_net_lib'
2. Train neural network "NeuralNet" on AND,OR and XOR logic gates. Try different numbers of hidden neurons. **Remember!** Size of x-value and y-value lists must be the same as number of input and output neurons, respectively.
3. Repeat task 2 for "NeuralNet0"
4. Compare typical values of loss functions from task 2 and 3. Which neural network is better?