## Introduction
Inspired by [Machine Learning from Andrew Ng](https://www.coursera.org/learn/machine-learning) on Coursera and the wonderful [blogs](http://iamtrask.github.io/2015/07/27/python-network-part2/) from Trask. I am attempting to implement the simple neural networks from my understanding and memory. This is the best way to understand the architectures of the network and the mechanism behind back propagation algorithm.

In [1]:
import numpy as np

### Sigmoid function and its derivative function

In [2]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
    
def sigmoid_out_derivative(out):
    return out * (1 - out)

### Toy data example

In [3]:
X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])
y = np.array([[0,1,1,0]]).T
# y = np.array([[0,0,1,1]]).T

In [4]:
X.shape, y.shape

((4, 3), (4, 1))

### 2 Layer Neural Network 
This is almost as the same as Logistic Regression. And the training result is not promising.

In [5]:
# 2 Layer Neural Network

# initial weights with mean of 0
Weights0 = np.random.randn(X.shape[1], 1) / np.sqrt(X.shape[1])
# define the learning rate of Gradient descent
alpha = 0.1

for j in range(60000):
    # forward propagation to compute the prediction
    l0 = X
    l1 = sigmoid(l0.dot(Weights0))
    
    # error between the target and the prediction
    l1_error = l1 - y
    
    ## back propagation
    
    # multiply the error by the slope of the sigmoid at the values of l1
    l1_delta = l1_error * sigmoid_out_derivative(l1)
#     l1_delta = l1_error
    
    # update the weights
    w0_derivative = l0.T.dot(l1_delta)
    Weights0 -= alpha * w0_derivative   
    
print("Output after training:")
print(l1)
print("Weights for the neural network")
print(Weights0)

Output after training:
[[ 0.5]
 [ 0.5]
 [ 0.5]
 [ 0.5]]
Weights for the neural network
[[  2.44488557e-16]
 [  2.47109102e-16]
 [ -3.30474445e-16]]


### 3 Layer Neural Network
With just one more hidden layer, we can see the power of neural network especially for data with non-linear decision boudary.

In [38]:
# 3 Layer Neural Network

# initial weights with mean of 0
Weights0 = np.random.randn(X.shape[1], 10) / np.sqrt(X.shape[1])
Weights1 = np.random.randn(10, 1) / np.sqrt(4)
# define the learning rate of Gradient descent
alpha = 0.25

for j in range(60000):
    # forward propagation to compute the prediction
    l0 = X
    l1 = sigmoid(l0.dot(Weights0))
    l2 = sigmoid(l1.dot(Weights1))
    
    # error between the target and the prediction
    l2_error = l2 - y
    
    ## back propagation
    
    # multiply the error by the slope of the sigmoid at the values of l2
    l2_delta = l2_error * sigmoid_out_derivative(l2)
    
    # back propagate the error to layer 1
    l1_error = l2_delta.dot(Weights1.T)
    
    # multiply the error by the slope of the sigmoid at the values of l1
    l1_delta = l1_error * sigmoid_out_derivative(l1)
    
    # update the weights for the layers
    w0_derivative = l0.T.dot(l1_delta)
    Weights0 -= alpha * w0_derivative
    
    w1_derivative = l1.T.dot(l2_delta)
    Weights1 -= alpha * w1_derivative
    
print("Output after training:")
print(l2)
# print("Weights for the neural network")
# print(Weights0)
# print(Weights1)

Output after training:
[[ 0.00637546]
 [ 0.99274119]
 [ 0.99250258]
 [ 0.00817472]]
