# Neural Network Architecture #

* NN consists of 3 layers: input, hidden and output layers.
* A n/w where the flow of learning is passed forward all the way to the outputs in one pass is called the feed forward NN.
* The strength of the connections is expressed as weights and these are passed into the activation function. The goal of the activation function is to convert the input to the output.

** Output range **

This is the range of the actual output.

** Active range **

It is the range where the gradient has the most variance in the final weight updates. Outside this range, the gradient is near zero and does not add to the parameter updates during learning. This problem of a close-to-zero gradient is called the ** vanishing gradient problem ** and is solved by the **ReLU(Rectified Linear Unit)** activation function.

Activation fucntion options:

* sigmoid
    - Active range: [sqrt(3), sqrt(3)]
    - Output range: (0,1)
* tanh(rescaled version of sigmoid function)
    - Active range: [-2,2]
    - Output range: (-1,1)
* ReLU(Rectified Linear Unit) - best suited to deal with large NNs and the vanishing gradient problem.
    - Active range: [0,infinity]

## Softmax for classification ##

Softmax Regression (synonyms: Multinomial Logistic, Maximum Entropy Classifier, or just Multi-class Logistic Regression) is a generalization of logistic regression that we can use for **multi-class classification** (under the assumption that the classes are mutually exclusive). In contrast, we use the (standard) Logistic Regression model in binary classification tasks.
The outputs of the last hidden layer need to converted to probability outputs so that  final class prediction can be made. If the output is significantly lower than the MAX of all output values, softmax converts that value to a near 0 value.

## Forward Propagation ##

It occurs as follows:

* Dot product on the inputs with the weights between the first and second layer and then transforming the result with the activation function.
* Dot product on the outputs of the first hidden layer with the weights between the second and third layer and transforming the result with the activation fucntion.
* Multiply the final vector with the activation fucntion(softmax for classification).

In [1]:
'''
Implementation of Forward propagation
'''
import numpy as np
import math
b1 = 0 #bias unit 1
b2 = 0 #bias unit 2

# Activation function - sigmoid function
def sigmoid(x):      
    return 1 /(1+(math.e**-x))

# Softmax function applied at output layer for classification
def softmax(x):     
    l_exp = np.exp(x)
    sm = l_exp/np.sum(l_exp, axis=0)
    return sm
    
# Test Input dataset with 3 features
X = np.array([  [.35,.21,.33],
            	[.2,.4,.3],
            	[.4,.34,.5],
            	[.18,.21,16] ])

# Training set size
len_X = len(X) 

# Input layer dimensionality
input_dim = 3 

# Output layer dimensionality
output_dim = 1 
hidden_units=4
  
np.random.seed(22)

# Create random weight vectors between input and hidden layer
theta0 = 2 * np.random.random((input_dim, hidden_units))

# Create random weight vectors between hidden and output layer
theta1 = 2 * np.random.random((hidden_units, output_dim))


'''
Forward propagation:

- Dot product on the inputs with the weights between the first and second layer 
  and then transforming the result with the activation function.
  
- Dot product on the outputs of the first hidden layer with the weights between the second and third layer 
  and transforming the result with the activation function.
  
- Apply softmax to output of final layer.
'''
d1 = X.dot(theta0) + b1
l1 = sigmoid(d1)
l2 = l1.dot(theta1) + b2

# Apply softmax to the output of the final layer
output = softmax(l2)

## Backpropagation ##

