## Artificial Neural Networks Classification Model
- Intuition of a perceptron 
    - Linear function that computes hyperplanes
    - You have the following inputs and weights
        - X, inputs
        - W, weights
        - Activation = sum(X*W) >= θ
            - θ is the threshold
            - If Activation >= θ
                - Yes, y=1
            - If Activation <= θ 
                - No, y=0

_I have a more [technical guide](http://www.ritchieng.com/neural-networks-learning/) by Andrew Ng that will give you clarity on the exact mathematics going behind a neural network._

**Rules**
1. Perceptron rule (threshold)
2. Gradient descent or delta rule (unthreshold)

**Perceptron rule**
- Finite time for linearly separable data
- ![](nn1.png)

- We include θ as a weight to make the math easier such that 
    - y_hat = sum(X*W) >= 0
    - W includes the bias 1
- The algorithm (3 lines) will run continuously
- This only works on data that is linearly separable
- Intuition of algorithm
    - When y = y_hat = 0 or y = y_hat = 1
        - This means we've **predicted correctly**
        - So there's no need for learning, Δwi = 0
    - When y = 0, y_hat = 1 or y = 1, y_hat = 0
        - This means we've **predicted wrongly**
        - There is a need for learning, Δwi would be non-zero
        - We want to make sure we don't make huge changes, so that's where the learning rate comes in
        

**Gradient Descent**
- More robust for non-linear separability
- ![](nn2.png)
    - Here, we do not have a threshold
    - We simply minimize the error directly
    - We will add a learning rate to the derivative which is essentially Δwi

**Comparison of learning rules**
- ![](nn3.png)
    - Perceptron has a guarantee of finite convergence due to linear separability
    - Gradient descent uses calculus but it converges to the limit
        - We can't take the derivative of y_hat because y_hat holds binary values, 0 or 1
        - However, this can be solved with a sigmoid function which we will be explaining subsequently

**Building a Network**
- We can build a network for any boolean function

**Sigmoid: Differentiable Threshold**
- ![](nn4.png)
- If you take the derivative of the sigmoid function, it looks great! 


**Neural Network**
- Many local optima
- Neural network with 2 hidden layers
    - ![](https://upload.wikimedia.org/wikipedia/commons/7/7f/Two_layer_ann.svg)
    - Each circle is a sigmoid function
- It uses backpropagation
    - Computationally beneficial organization of the chain rule
    - Calculates errors of the actual value versus the predicted value and send it backwards 

**Optimizing (learning) weights**
1. Gradient Descent
2. Advanced methods
    - Momentary
    - Higher order derivatives
    - Randomized optimization
    - Penalty for "complexity"
        - For neural networks, it gets complicated with more nodes and layers

**Restriction Bias**
- Representational power
    - What it is you're able to represent
- Set of hypotheses that we will consider
- Perceptron: 
    - Linear
    - Half spaces
- Sigmoids: 
    - Much more complex
    - Not much restriction
- What kind of functions can we represent?
    - Boolean: 
        - Network of threshold-like units
    - Continuous function:
        - As input changes, output changes 
        - Connected, no jumps     

**Preference Bias**
- Algorithm's selection of one representation over another
- What algorithm?
    - We have to initialize the weights with some values
    - We can start with small random values
- Occan's razor
    - Entities should not be multiplied unnecessarily
        - Necessary when we're better fitting the data
        - Choose the one that is less complex
        - We'll get better generalization with simpler hypotheses


**Manually create perceptron's algorithm**
- activate() function 
    - y_hat = sum(X*W) >= θ
        - sum(X*W) > θ
            - 1
        - sum(X*W) < θ
            - 0
- update() function
    - w_i = w_i + Δw_i
    - Δw_i = (η)(y - y_hat)(x_i)

In [28]:
# Assume this file's name is p1_activation.py
import numpy as np

class Perceptron:
    """
    This class models an artificial neuron with step activation function.
    """
    
    # Constructor
    # Special method that is automatically called when an object of that Class is created
    # p1 = Perceptron() is called an instantiation
    # Here, we gave default values of weights and threshold in case no arguments are passed during instantiation
    def __init__(self, weights = np.array([1]), threshold = 0):
        """
        Initialize weights and threshold based on input arguments. Note that no
        type-checking is being performed here for simplicity.
        """
        self.weights = weights
        self.threshold = threshold
    
    # This is the method
    # We can access it after instantiation and by passing an argument, inputs
    # p1.activate(argument)
    def activate(self, values):
        """
        Takes in @param values, a list of numbers equal to length of weights.
        @return the output of a threshold perceptron with given inputs based on
        perceptron weights and threshold.
        """
               
        # First calculate the strength with which the perceptron fires
        strength = np.dot(values,self.weights)
        
        # Then return 0 or 1 depending on strength compared to threshold  
        return int(strength > self.threshold)
    
    def update(self, values, train, eta=.1):
        """
        Takes in a 2D array @param values consisting of a LIST of inputs and a
        1D array @param train, consisting of a corresponding list of expected
        outputs. Updates internal weights according to the perceptron training
        rule using these values and an optional learning rate, @param eta.
        """
        # For each data point...
            # 1. Obtain the neuron's prediction for that point, y_hat
            # 2. Update self.weights based on prediction accuracy, learning rate and input value
                # Δw_i = (η)(y - y_hat)(x_i)
                # w_i = w_i + Δw_i 
        
        # Explanation of enumerate function below
        for (index, row) in enumerate(values):
            self.weights = self.weights + eta*(train[index] - self.activate(row))*row
        
def test():
    """
    A few tests to make sure that the perceptron class performs as expected.
    Nothing should show up in the output if all the assertions pass.
    """
    
    # assert tests if the statement is true or false
    
    def sum_almost_equal(array1, array2, tol = 1e-6):
        return sum(abs(array1 - array2)) < tol

    # Instantiation of class Perceptron
    p1 = Perceptron(np.array([1,1,1]),0)
    p1.update(np.array([[2,0,-3]]), np.array([1]))
    assert sum_almost_equal(p1.weights, np.array([1.2, 1, 0.7]))

    p2 = Perceptron(np.array([1,2,3]),0)
    p2.update(np.array([[3,2,1],[4,0,-1]]),np.array([0,0]))
    assert sum_almost_equal(p2.weights, np.array([0.7, 1.8, 2.9]))

    p3 = Perceptron(np.array([3,0,2]),0)
    p3.update(np.array([[2,-2,4],[-1,-3,2],[0,2,1]]),np.array([0,1,0]))
    assert sum_almost_equal(p3.weights, np.array([2.7, -0.3, 1.7]))

# _name_ is a global variable
# if _name_ == "_main_" would be true if you run a.py
# if _name_ == "_main_" would not be true if you run b.py containing an import statement importing p1_activation.py
    # _name_ would be "p1_activation", the module we are 
if __name__ == "__main__":
    test()

In [29]:
# Enumerate function example
choices = ['pizza', 'pasta', 'salad', 'nachos']
for index, item in enumerate(choices):
    print(index, item)

0 pizza
1 pasta
2 salad
3 nachos


**Layered Network Calculation**
- ![](https://cdn-enterprise.discourse.org/udacity/uploads/default/original/3X/9/4/942a1b4ed344d005e2ba810380f049e8cfdf23fb.png)
- What will be the network output?

In [50]:
# Layered network calculation

# Hidden layer weights
w_h1_1 = np.array([1, 1, -5]).reshape(1, 3)
w_h1_2 = np.array([3, -4, 2]).reshape(1, 3)

# We can concatenate the matrices to form 2 x 3
weights = np.concatenate((w_h1_1, w_h1_2), axis=0)

# Output weights
w_o = np.array([2, -1])

# Input 
inp = np.array([1, 2, 3])

# Shapes to understand which matrix to reshape
print('weight scalar 1', w_h1_1.shape)
print('weight scalar 2', w_h1_2.shape)
print('output sclar', w_o.shape)
print('input scalar', inp.shape)
print('overall weights matrix', weights.shape)

# matrix of ndoes
# (2, 3) . (3 x 1) would give (2 x 1)
nodes_matrix = np.dot(weights, inp).reshape(1, 2)
print('Nodes matrix', nodes_matrix.shape)

# (1 x 2), nodes
# multiplied by (2 x 1), output
# would give (1 x 1)
np.dot(nodes_matrix, w_o)

weight scalar 1 (1, 3)
weight scalar 2 (1, 3)
output sclar (2,)
input scalar (3,)
overall weights matrix (2, 3)
Nodes matrix (1, 2)


array([-25])