##### y = Fnn(x) = F3(F2(f1(x)))
More generally
##### Fl(z) = Gl(Wl*z+Bl)
W -> matrix holding weights for a layer. This is a matrix because the length of a particular row needs to match the length of the vector z. It just works out this way, mathematically, even though it seems odd. So, a weight for a particular unit is duplicated across each row of Wl.

z -> input vector data

B -> bias vector for a particular layer

G -> activation function defined for a particular layer. Turns linear transformation on a vector into a real number.


In [2]:
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Problems: vanishing and exploding gradients. Wth back propagation, chain rule is applied through layers of NN. Lots of matrix multiplications which causes small gradients to be smaller and large gradients to be huge.
# Exploding gradients solved with gradient clipping or L1/L2 regularization. 
# Vanishing gradients a bigger problem for some time. Examples of solutions: ReLu, LSTM (or other gated units), skip connections, and modifications to gradient descent. 

# Activation functions:
# Purpose: map the output of a unit to a domain we care about (e.g., (0,1) or (-1, 1))
# Sigmoid: A differentiable function good for predicting probability, range (0, 1). Monotonic but not for derivative. Good for classification.
# Tanh: A sigmoidal (and differentiable) function but range is from (-1, 1). Monotonic but not for derivative. Good for classification.
# ReLu (Rectified Linear Unit): Most common (used in all Conv. NN). 0 when x < 0, x=x otherwise. Range (0, inf) Function and derivative are monotonic. 
# Some more information with derivatives of functions: https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6

In [54]:
# Input data with targets
X=np.array(([0,0,1],[0,1,1],[1,0,1],[1,1,1]), dtype=float)
y=np.array(([0, 1],[1, 0],[1, 0],[0, 1]), dtype=float)  

# Sigmoid activation function
def sigmoid(t):
    return 1/(1+np.exp(-t))

# Derivative of activation required for backpropagation
def sigmoid_derivative(p):
    return p * (1 - p)

class NeuralNetwork:
    def __init__(self, x,y):
        self.input = x
        self.weights1= np.random.rand(self.input.shape[1],4) # 3x4: first layer with 3 units (3 features, 4 examples)
        self.weights2 = np.random.rand(4,2) # 4x2 output layer: Num of examples in batch with a class prediction per example [0, 1] or [1, 0]. Make it 4x1 to change to regression.
        self.y = y
        self.output = np. zeros(y.shape)
        
    def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1)) # Matrix multiplication (dot acts a mat mult with matrices) 4x3 * 3x4 -> 4x4 matrix
        self.layer2 = sigmoid(np.dot(self.layer1, self.weights2)) # 4x4 * 2x1 -> 4x1
        return self.layer2
        
    def backprop(self):
        # Chain rule on each layer?
        d_weights2 = np.dot(self.layer1.T, 2*(self.y -self.output)*sigmoid_derivative(self.output))
        d_weights1 = np.dot(self.input.T, np.dot(2*(self.y -self.output)*sigmoid_derivative(self.output), self.weights2.T)*sigmoid_derivative(self.layer1))
    
        self.weights1 += d_weights1
        self.weights2 += d_weights2

    def train(self, X, y):
        self.output = self.feedforward()
        self.backprop()
        

# print(X.shape)
# print(np.random.rand(X.shape[1],4))
NN = NeuralNetwork(X,y)

print(NN.weights1)

layer1 = sigmoid(np.dot(X, NN.weights1))
# print(sigmoid(np.dot(layer1, NN.weights2)))

for i in range(1001): # 200 epochs
    if i % 250 == 0: 
        print ("Iteration " + str(i) + "\n")
        print ("Predicted Output: \n" + str(NN.feedforward()) + "\nActual output: \n" + str(y))
        print ("Loss: \n" + str(np.mean(np.square(y - NN.feedforward())))) # mean sum squared loss
        print ("\n")
  
    NN.train(X, y)

[[0.61134166 0.37097371 0.77215203 0.17046251]
 [0.38451581 0.34695895 0.4318071  0.78177983]
 [0.48417303 0.79401445 0.5764889  0.34206943]]
Iteration 0

Predicted Output: 
[[0.76883729 0.68987526]
 [0.80550308 0.71962289]
 [0.8006851  0.70888651]
 [0.8285938  0.73426905]]
Actual output: 
[[0. 1.]
 [1. 0.]
 [1. 0.]
 [0. 1.]]
Loss: 
0.31780017901073604


Iteration 250

Predicted Output: 
[[0.03849973 0.96132707]
 [0.89207693 0.10690209]
 [0.89202988 0.10691238]
 [0.13569405 0.86573032]]
Actual output: 
[[0. 1.]
 [1. 0.]
 [1. 0.]
 [0. 1.]]
Loss: 
0.010697787438138196


Iteration 500

Predicted Output: 
[[0.01600774 0.9837091 ]
 [0.94502741 0.05471682]
 [0.94502708 0.05471754]
 [0.06876965 0.93160418]]
Actual output: 
[[0. 1.]
 [1. 0.]
 [1. 0.]
 [0. 1.]]
Loss: 
0.0027451053784770165


Iteration 750

Predicted Output: 
[[0.01109431 0.98867059]
 [0.95888612 0.04097901]
 [0.95888687 0.04097854]
 [0.05138734 0.94881835]]
Actual output: 
[[0. 1.]
 [1. 0.]
 [1. 0.]
 [0. 1.]]
Loss: 
0.001531352