<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
Input nodes such as the ones representing pixels on an image. They do not take info from previous layers.

### Hidden Layer:
Layers in between the input layer and the output layer to add additional complexity to the model. Tuned 
by process backpropagation in order to adjust itself and correctly predict the output based on the
signals from the input layer.

### Output Layer:
The target layer in the model. Each node might represent a different classification. Used to asses the model
and tune the hidden layers based on whether or not the output layer gave the correct prediction.

### Neuron:
Node in a graph in relation to neural networks.

### Weight:
Parameter used to adjust each input in a neuron (which is adjusted by a fitness function and gradient descent). 
Weights and the inputs are combined and summed to produce a weighted sum.

### Activation Function:
Might be a step function or a sigmoid function or relu and it maps its input between a range like between 0 and 1.
![Activation Functions](https://miro.medium.com/max/1005/1*p_hyqAtyI8pbt2kEl6siOQ.png)

### Node Map:
A representation of the neural network in terms of a graph with nodes and edges. 

### Perceptron:
Simplest unit of computation in a neural network. It is a binary classifier and is a type of linear classifier.

![Perceptron](https://s2.qwant.com/thumbr/0x0/6/b/7237d55675ed90a0458b307f6f7b398237c3c9a2ea9055a23d511b61e81987/Rosenblattperceptron.png?u=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Ff%2Fff%2FRosenblattperceptron.png&q=0&b=1&p=0&a=1)

## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?
Inputs are mapped into the input layer which send that message forward to neurons in the hidden in the layer.
Those neurons take the weights and the inputs from the input layer and inputs into a transfer function, which 
is probably the weighted sums plus a bias, and that is mapped over the activation function such as sigmoid to 
produce an output to the next layer. Backpropagation is used based on the output to adjust the weights from
earlier steps using gradient descent and the derivative of the activation function. 

#### Your Answer Here

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [6]:
import pandas as pd
import numpy as np
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]}

df = pd.DataFrame.from_dict(data).astype('int')

In [265]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx =  sigmoid(x)
    return sx * (1-sx)

class Perceptron(object):
    def init_weights(self,X):
        self.weights = np.random.random((X.shape[1], 1))

    def init_bias(self, X):
        self.bias = np.random.random((X.shape[0], 1))
        
    def get_weighted_sum(self, X):
        return np.dot(X, self.weights) + self.bias
    
    def __init__(self,  n=1000):
        self.n = n
    
    def fit(self, X, y):
        self.init_weights(X)
        self.init_bias(X)
        for i in range(self.n):
            # the inputs adjusted by weights and bias
            weighted_sum = self.get_weighted_sum(X)
            # the prediction of the perceptron
            activated = sigmoid(weighted_sum) 
            # adjust based on error gradient descent 
            err = np.array([y]).T - activated
            adjusted = err * sigmoid_derivative(activated)
            
            # adjust weights
            self.weights += np.dot(X.T, adjusted)
            self.bias += err
    
    def predict(self, X):
        result = sigmoid(self.get_weighted_sum(X))
        return result
            
    def get_info(self, X):
        result = pd.DataFrame(sigmoid( percept.get_weighted_sum(X) ).round(), columns=['Predicted'])
        return result

In [243]:
p = Perceptron()

In [244]:
p.fit(df[_in], df[target])

In [241]:
p.get_weighted_sum(df[_in])

array([[ 6.90516837],
       [ 6.87524701],
       [ 6.87538393],
       [-6.97099495]])

In [229]:
p.get_info(df[_in])

Unnamed: 0,Predicted
0,1.0
1,1.0
2,1.0
3,0.0


In [87]:
err

0    0.500000
1    0.367638
2    0.382817
3   -0.734967
Name: y, dtype: float64

In [88]:
adjusted = err * sigmoid_derivative(activated)

In [93]:
# Update our weights 10,000 times - (fingers crossed that this process reduces error)
def train(n=1_000):
    inputs = df[_in]
    correct_outputs = [df[target]]
    weights = init_weights(inputs)
    for iteration in range(n):
        # Weighted sum of inputs / weights
        weighted_sum = np.dot(inputs, weights)
        # Activate!
        activated_output = sigmoid(weighted_sum)
        # Cac error
        error = correct_outputs - activated_output
        adjustments = error * sigmoid_derivative(activated_output)
        # Update the Weights
        weights += np.dot(inputs.T, adjustments)
    return dict(weights=weights, activated_output=activated_output)

In [99]:
train_info = train()
weights_df = pd.DataFrame(train_info['weights'])
output_df = pd.DataFrame(train_info['activated_output'])

## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [230]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [231]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

feats = list(diabetes)[:-1]

X = diabetes[feats]
y = diabetes['Outcome']

In [248]:
min_max = MinMaxScaler().fit_transform(X)

In [249]:
normed = Normalizer().fit_transform(min_max)

In [251]:
normed.shape

(768, 8)

In [254]:
Perceptron().fit(normed)

TypeError: fit() missing 1 required positional argument: 'y'

In [293]:
from sklearn.pipeline import Pipeline
pipe = Pipeline(
    [
        #('min-max', MinMaxScaler()),
        ('norm', Normalizer()),
        ('percept', Perceptron(n=10_000))
    ]
)

In [294]:
pipe.fit(X, y)

Pipeline(memory=None,
         steps=[('norm', Normalizer(copy=True, norm='l2')),
                ('percept', <__main__.Perceptron object at 0x7fd8ef74e438>)],
         verbose=False)

In [295]:
predicted = pipe.named_steps['percept'].predict(X)

  


In [296]:
results_df = pd.DataFrame({'y' : y, 
 'predicted' : predicted[:,0].round()})

In [300]:
diabetes.shape

(768, 9)

In [301]:
# accuracy not great
results_df[results_df['y'] != results_df['predicted']]

Unnamed: 0,y,predicted
0,1,0.0
2,1,0.0
4,1,0.0
6,1,0.0
7,0,1.0
...,...,...
755,1,0.0
757,1,0.0
759,1,0.0
761,1,0.0


## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?