# Neural Networks

Today, we'll explore [Neural Networks](https://en.wikipedia.org/wiki/Artificial_neural_network) in depth, a wide class of algorithms that _learn_ by mimicking the human brain. In particular, we're going to implement two types of neural networks.

- [Perceptron](https://en.wikipedia.org/wiki/Perceptron)
- [Feedfordward Neural Network](https://en.wikipedia.org/wiki/Feedforward_neural_network), specifically a [Multilayer Perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron)

To better understand this tutorial, we'll need to understand the basics of Object-Oriented Programming in Python. Here's a basic [guide](https://realpython.com/python3-object-oriented-programming/).

The most important parts to understand are:
- What is a class?
  - __ANS__: A _class_ is a _template_ for data.
- What is an Object?
  - __ANS__: When you use the template, you create an _object_.
- How do we store attributes in the class for use later?
  - __ANS__:  The _class_ definition has attributes creates in the `__init__()` function. These are variables stored separately for each _object_ created from the _class_
- How do we call an instance method (function) from another instance method?
  - __ANS__: From any other method, we simply say `self.function_name()`, where `self` is a special variable that means _"this object itself"_

## OOP Example

Here, we show an example of how to create a _class_ in Python. The class is called _Dog_, and it has two functions, `bark()` and `bark_twice()`, as well as two attributes, `name` and `breed`.

In [1]:
class Dog:
    """
    This is the class definition. Everything indented over one from
    this point onward is part of the class definition for a dog.
    """
    
    def __init__(self, name):
        """
        This is a special function used to initialize an object. We'll
        see how it's called later, but just understand that "self" here
        refers to "this object".
        
        In this initialization function, we do "self.name" and define it
        to be the name variable passed in this init function.
        
        We can create as many fields as we want.
        """
        # This field was passed into the init func
        self.name = name
        
        # This field was not but we can still initialize it to something
        self.breed = "Unknown Doggo"
        
    def bark(self):
        # We can access this object's attributes by using
        # self.attribute_name in our function
        print('"Woof" - {}'.format(self.name))
        
    def bark_twice(self):
        # We can also call other functions or "instance methods" on this
        # object from other functions. Here, we'll just call the bark()
        # function twice.
        self.bark()
        self.bark()

In [3]:
wilfred = Dog("Wilfred")  # Notice here we only pass one variable, the name
wilfred.bark() # Notice here we pass in no var, not even "self"
print("Wilfred barked once")
wilfred.bark_twice() # Notice here we pass in no var, not even "self"
print("Wilfred barked twice")

"Woof" - Wilfred
Wilfred barked once
"Woof" - Wilfred
"Woof" - Wilfred
Wilfred barked twice



# Perceptron Classifier

Now we know enough OOP to proceed to creating a Perceptron Binary Classifier.

Here are the main ideas:
- A single perceptron has a single output, a `0` or a `1`. For now let's say we're only trying to ever predict `positive` or `negative`
- That perceptron also accepts as many inputs as your dataset requires. Ex: The X=(age, weight, height), y=(gender) dataset would be 3 features, so a Perceptron for that problem would have 3 weights
- I lied, there is an additional weight, that is called the _bias_, so this example problem needs _4_ weights

Here's effectively the formula being applied to get a `1` or `0` for a single `X` vector, $X_{i}$:

$$
\text{prediction} = W_1 * X_{i, 1} + W_2 * X_{i, 2} + W_3 * X_{i, 3} + \text{bias}
$$

$$
= (\sum{W_j * X_{i, j}}) + b
$$

$$
\hat{y} = \text{round(prediction)}
$$

Where:
- $X_{i, j}$ is the $j$-th element in vector $X_{i}$, or the $j$-th feature vector value
- $W_{j}$ is the $j$-th element in vector $W$
- $b$ is the _bias_ weight, also just called the _bias_
- $\text{prediction}$ is the number between `0` and `1` that comes out of this computation
- $\hat{y}$ is the actual class label prediction, which we get by rounding the $\text{prediction}$, so we get either `0` or `1`, which would correspond to two possible class labels (e.g. `negative` is `0`, `positive` is `1`)


## Training Perceptron
To train your perceptron, you're trying to find weights $W$ (and also _bias_ $b$) so that this $\hat{y}$ prediction matches the true $y$ for as much of your training set as possible.

The way we do this is by defining an error function that we can find the [derivative](https://en.wikipedia.org/wiki/Derivative) function of (this is Calculus-based), then computing the derivative of that function using a specific datapoint (each $X_i$ inidividually), then updating the weights by a teeny bit.

__NOTE:__ A _derivative_ is conceptually same as the slope of a line, but now applies to curves rather than just straight lines. It is a slope re-computed at every point along the curve.

![Derivative Example](https://upload.wikimedia.org/wikipedia/commons/thumb/0/0f/Tangent_to_a_curve.svg/400px-Tangent_to_a_curve.svg.png)


In [1]:
import numpy as np

import pandas as pd

# Seed for our random number generator.
seed = 42

In [18]:
class Perceptron(object):
    """
    Perceptron classifer:

    Params:
    :learning_rate: float
        Learning rate
    :max_iter: int
        Passes over the training dataset
    :random_state: int
        Random number generator seed for random weight initialization
    """

    def __init__(self, learning_rate=0.01, max_iter=50, random_state=seed):
        self.learning_rate = learning_rate
        self.max_iter = max_iter
        self.random_state = random_state
        self.errors = []
        self.bias = 0 # We'll change this later.

    def dotprod(self, xi):
        """
        Return the dot product of the input with the bias added.
        
        This computes the output of the perceptron itself, without rounding up or down
        to make a label prediction. This is matrix multiplication. 
        """
        # This is equivalent to "y-hat = Wx + b" with x being a single example
        # and y-hat being the prediction.
        return np.dot(xi, self.weights) + self.bias

    def predict(self, xi):
        """
        Return class label 1 if above 0
        Return class label -1 otherwise
        """
        # This makes all positive results class label 1, otherwise class label -1
        return np.where(self.dotprod(xi) >= 0.0, 1, -1)
    
    def score(self, X, y):
        """
        Evaluated the percentage of predictions that are correct.
        """
        predictions = self.predict(X)
        total = X.shape[0]
        correct = (predictions == y).sum()
        # TODO: Return the percentage of predictions that are correct.
        return 0

    def fit(self, X, y):
        # Create a random number generator from the random_state.
        rgen = np.random.RandomState(self.random_state)

        # Initialize with random weights. They must be relatively small, so
        # loc=0.0 and scale=0.01 set the mean and variance, and each number
        # get pulled from a normal distribution with those params.
        # We need (num_attributes) weights and then a separate one for the bias
        num_attributes = X.shape[1]
        self.weights = rgen.normal(loc=0.0, scale=0.01, size=num_attributes)
        
        # TODO: Initialize a size 1 random number just like the line above.
        self.bias = 0

        # This is where iterations come in.
        for idx in range(self.max_iter):
            differences = []  # The differences between predictions and actual vals
            for xi, target in zip(X, y):
                # Compute predictions for each instance.
                output = self.dotprod(X)
                
                # Compute the difference between the target and prediction
                difference = (target - self.predict(xi))

                # TODO: The update is then the learning_rate times the difference
                update = 0
                
                # All weights except the bias get modified with this formula
                # The are modified by the product of the update (for a certain
                # attribute) with the value of that attribute in this example
                self.weights += update * xi  # This is element-by-element vector multiplication

                # The bias unit doesn't have an feature associated with it, so it's updated directly
                self.bias += update

                differences.append(difference)

            differences = np.array(differences)  # Convert list to np.array
            # This is our error function, which forms a parabola
            error = (differences ** 2).sum()
            self.errors.append(error)

            if error == 0.0:
                print("Reached convergence after {} iterations".format(idx+1))
                break

        # Return the train error at each iteration
        return self.errors

In [19]:
import sklearn

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

In [20]:
iris = load_iris()

In [21]:
X_orig = iris.data
X_orig

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [22]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

# Shortcut, don't do this. Always do fit(train_X) and tranform on train_X and test_X
X = scaler.fit_transform(X_orig)

In [29]:
y_originals = iris.target
y_originals

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [30]:
# We need to change our problem from having 3 classes to having two.
# Let's use label "1" for anything that is 1 or 2, and label "-1" for
# anything that is currently .
y = []

for y_orig in y_originals:
    if y_orig >= 1:
        y.append(1)
    else:
        y.append(-1)

y = np.array(y)
y

array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1])

In [31]:
train_X, test_X, train_y, test_y = train_test_split(X, y,
                                                    test_size=0.3,
                                                    random_state=seed)

In [33]:
perceptron = Perceptron(random_state=seed)
errors = perceptron.fit(train_X, train_y)

Reached convergence after 2 iterations


In [34]:
train_acc = perceptron.score(train_X, train_y)
test_acc = perceptron.score(test_X, test_y)

print("Train accuracy: {:.2f}%".format(train_acc * 100))
print("Test accuracy: {:.2f}%".format(test_acc * 100))

105
45
Train accuracy: 100.00%
Test accuracy: 100.00%


In [35]:
# Try for yourself to see if this worked
sample_x = train_X[11]
sample_y = test_y[11]

pred = perceptron.predict(sample_x)

print(sample_x)
print("y: {}\tpred: {}".format(sample_y, pred))

[-0.7795133   0.80065426 -1.3412724  -1.31297673]
y: -1	pred: [-1]


## TODO
Use the `people.csv` dataset again and use these same techniques and this perceptron to see what performance you get.