# Neural networks basics

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
from IPython.display import Markdown, display, HTML

# Fix the dying kernel problem (only a problem in some installations - you can remove it, if it works without it)
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'

**Task 1.** Write a method for performing feed forward for a network with 2 input neurons, 3 hidden neurons with ReLU as the activation function, and 1 output neuron with no activation function. The interface of the method should be as follows:

    feed_forward_1(x, w1, w2)
    
where x is a numpy array of size 1x2, w1 is a numpy array of size 3x2 of weights from the input layer to the hidden layer, w2 is a numpy array of size 1x3. The result should be a single number.

As a helper function you can write the relu function:

    relu(x)
   
which takes a numpy array as input and applies the ReLU function element-wise.

Test it on 2-3 examples, calculate the result also with pen and paper by yourself (it's very important to do it at least once in your life!) and compare your results with the function results.

In [2]:
def relu(x):
    return np.maximum(x,0)

def feed_forward_1(x, w1, w2):
    y = np.matmul(w2,relu(np.matmul(w1,x)))
    print(y)
# Test

x = [1,2]
w1 = [[1,2],
     [3,2],
     [2,2]]
w2=[3,2,1]
feed_forward_1(x,w1,w2)

35


**Task 2.** Write a method for performing feed forward for a network with 2 input neurons, 3 hidden neurons with sigmoid activation function, and 1 output neuron with sigmoid activation function. The interface of the method should be as follows:

    feed_forward_2(x, w1, w2)
    
where x is a numpy array of size 1x2, w1 is a numpy array of size 3x2 of weights from the input layer to the hidden layer, w2 is a numpy array of size 1x3. The result should be a single number.

As a helper function you can write the sigmoid function:

    sigmoid(x)
   
which takes a numpy array as input and applies the logistic function element-wise.

Test it on 2-3 examples. Try to estimate the result by hand and compare both results.

In [3]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

def feed_forward_2(x, w1, w2):
    return sigmoid(np.matmul(w2,sigmoid(np.matmul(w1,x))))
   

x = [1,2]
w1 = [[1,2],
     [3,2],
     [2,2]]
w2=[3,2,1]
feed_forward_2(x,w1,w2)

0.9974665250265122

**Task 3.** Write a method for performing feed forward for a network with 2 input neurons, 3 hidden neurons with relu activation, and 3 output neurons with no activation function but softmax applied to them. The interface of the method should be as follows:

    feed_forward_3(x, w1, w2)
    
where x is a numpy array of size 1x2, w1 is a numpy array of size 3x2 of weights from the input layer to the hidden layer, w2 is a numpy array of size 3x3. The result should be a numpy array of size 1x3.

As a helper function you can write the softmax function:

    softmax(x)
   
which takes a numpy array as input, applies softmax to it and returns a numpy array of the same size.

Test it on 2-3 examples. Try to estimate the result by hand and compare both results. Verify that the sum of output neurons is equal to 1.

In [4]:
def softmax(x):
    return np.exp(x)/np.sum(np.exp(x))

def feed_forward_3(x, w1, w2):
    return softmax(np.matmul(w2,relu(np.matmul(w1,x))))
   

 	

x = np.array([-1, 1])
w1 = np.array([[2, -1],
               [-5, -2],
               [1, 3]])
w2 = np.array([[-1, -2, 1],
               [1, 0, -1],
               [2, -3, 0.5]])
print(feed_forward_3(x, w1, w2))

[0.11894324 0.87887824 0.00217852]


**Task 4.** Write a method for calculating the squared error:

    se(y_bar, y)
    
where y_bar is a numpy array of predicted results and y is a numpy array of real values to be predicted.

Test it on the 2-3 examples you created for feed_forward_1 and find the SE with respect to real values chosen by hand. Experiment with several values to get a grip of how SE works.

In [5]:
def se(y_bar, y):
    return np.mean((y_bar-y)**2)
    
# Test

########################
# Write your code here #
########################

**Task 5.** Write a method for calculating the cross-entropy loss:

    crossentropy(y_bar, y)
    
where y_bar is a numpy array of predicted results and y is a numpy array of real values to be predicted (y must contain one value of 1 and 0 on all other positions - just like in one-hot encoding).

Test it on the 2-3 examples you created for feed_forward_3 and find the cross-entropy with respect to vectors of real values chosen by hand. Experiment with several target vectors to get a grip of how cross-entropy works.

In [6]:
def crossentropy(y_bar, y):

    return -np.sum(y*np.log(y_bar) + (1-y)*np.log(1-y_bar))
# Test


x = np.array([1, 2])
w1 = np.array([[-1, 1],
               [-0.5, 2.5],
               [0, 3]])
w2 = np.array([[-1, 2, 1],
               [1, 0, -1],
               [2, -3, 0.5]])
print(crossentropy(feed_forward_3(x, w1, w2), np.array([1.0, 0.0, 0.0])))

1.154397247699546e-08


**Task 6.** Write a method which calculates the error term for network 2:

    error_term(x, w1, w2, y)
    
where x is a numpy array of size 1x2, w1 is a numpy array of size 3x2 of weights from the input layer to the hidden layer, w2 is a numpy array of size 1x3, y is a float.

The solution is extremely simple - it's just the difference between y and the result of feed_forward_2. This task is solely to bring this notion to your attention.

Test it on the 2-3 examples you created for feed_forward_2 and several values of y (remember that the values of y should logically be in the interval [0-1], because you used the sigmoid activation which sends the network's results into this interval, technically turning them into probabilities).

In [7]:
def error_term(x, w1, w2, y):
    return y -feed_forward_2(x, w1, w2)

# Test

x = np.array([1, 2])
w1 = np.array([[-1, 1],
               [0.5, -2],
               [0, 3]])
w2 = np.array([-1, 2, 1])
print(error_term(x, w1, w2, 0.5))

-0.08056501459743926


**Task 7.** Write a method which calculates the derivative of the sigmoid function on every element of a numpy array:

    sigmoid_derivative(x)
    
where x is a numpy array. The result should have the same size as the input.

Use the formula:

<center>
$$
    \sigma'(x) = \sigma(x) (1 - \sigma(x))
$$
</center>

Verify this formula by differentiating the sigmoid function by hand.

Test the sigmoid_derivative method on a numpy array with several values (e.g. -3, -2, -1, 0, 1, 2, 3).

In [8]:
def sigmoid_derivative(x):
    return sigmoid(x)*(1-sigmoid(x))

# Test

print(sigmoid_derivative(np.array([-3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0])))

[0.04517666 0.10499359 0.19661193 0.25       0.19661193 0.10499359
 0.04517666]


**Task 8.** Write a method for performing the backpropagation step for network 2. The method should have the following interface:

    backpropagate(x, w1, w2, y, alpha)
    
where x is a numpy array of size 1x2, w1 is a numpy array of size 3x2 of weights from the input layer to the hidden layer, w2 is a numpy array of size 1x3, y is a float, alpha is the learning rate. The method should return a tuple with updated matrices w1 and w2.

Test it on 2-3 examples. Test several values of the learning rate alpha on the same input to see how it affects the update step.

In [13]:
def backpropagate(x, w1, w2, y, alpha):
    h = sigmoid(np.matmul(w1, x))
    error = error_term(x, w1, w2, y)
    curr_grad = sigmoid_derivative(np.matmul(w2, h))
    prev_grad = w2 * sigmoid_derivative(np.matmul(w1, x))
    prev_grad = np.matmul(prev_grad.reshape(3, 1), x.reshape(1, 2))
    w1 += 2 * alpha * error * curr_grad * prev_grad
    w2 += 2 * alpha * error * curr_grad * h
    return w1, w2

# Test

x = np.array([1.0, 2.0])
w1 = np.array([[-1.0, 1.0],
               [0.5, -2],
               [0.0, 3.0]])
w2 = np.array([-1.0, 2.0, 1.0])

print(feed_forward_2(x, w1, w2))
print()

y = -0.78
alpha = 0.1
for _ in range(10):
    w1, w2 = backpropagate(x, w1, w2, y, alpha)
    print(w1)
    print(w2)
    print(feed_forward_2(x, w1, w2))
    print()

0.5805650145974393

[[-9.86972092e-01  1.02605582e+00]
 [ 4.96229289e-01 -2.00754142e+00]
 [-1.63435940e-04  2.99967313e+00]]
[-1.04844143  1.99805771  0.9339018 ]
0.5521595151335615

[[-9.73804803e-01  1.05239039e+00]
 [ 4.92549709e-01 -2.01490058e+00]
 [-3.15319668e-04  2.99936936e+00]]
[-1.09743683  1.99616156  0.86818172]
0.5231787430565403

[[-9.60643225e-01  1.07871355e+00]
 [ 4.88984236e-01 -2.02203153e+00]
 [-4.54768343e-04  2.99909046e+00]]
[-1.14659236  1.99432343  0.80332383]
0.49405460583983957

[[-9.47629028e-01  1.10474194e+00]
 [ 4.85552880e-01 -2.02889424e+00]
 [-5.81257317e-04  2.99883749e+00]]
[-1.19550609  1.9925537   0.73978796]
0.465228374281893

[[-9.34890815e-01  1.13021837e+00]
 [ 4.82271500e-01 -2.03545700e+00]
 [-6.94643334e-04  2.99861071e+00]]
[-1.24379456  1.99086061  0.6779813 ]
0.43711299579491325

[[-9.22536617e-01  1.15492677e+00]
 [ 4.79151098e-01 -2.04169780e+00]
 [-7.95145920e-04  2.99840971e+00]]
[-1.29111617  1.98924993  0.61823694]
0.4100612926570

**Task 9.** Write a method for performing the backpropagation step for network 2 but without the activation function on the output neuron.

In [16]:
def backpropagate(x, w1, w2, y, alpha):
    h = sigmoid(np.matmul(w1, x))
    error = error_term(x, w1, w2, y)
    curr_grad = sigmoid_derivative(np.matmul(w2, h))
    prev_grad = w2 * sigmoid_derivative(np.matmul(w1, x))
    prev_grad = np.matmul(prev_grad.reshape(3, 1), x.reshape(1, 2))
    w1 += 2 * alpha * error * prev_grad
    w2 += 2 * alpha * error * h
    return w1, w2

# Test

 	
