<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:  
* Input layer/node is where data comes into the NN. Input nodes have a 1 to 1 relationship with the number of features in a dataset. The only part exposed to our data.  
### Hidden Layer:  
* The `black box` of the NN. The data can not be accessed or directly interact with in this layer. Hidden layers are between the input and output.  
### Output Layer:  
* The result of the NN and the functions it runs. The last layer.   
### Neuron:  
* Synonymous with "node". A function that runs some code & does some stuff, passing information to the next layer, taking information from the previous layer.  
### Weight:  
* Appendage to the neurons within the hidden layers. Can be random. Usually best to standardize / normalize the data.  
### Activation Function:  
* The activation function decides whether a cell "fires" or not. Sometimes it is said that the cell is "activated" or not. In Artificial Neural Networks activation functions decide how much signal to pass onto the next layer. This is why they are sometimes referred to as transfer functions because they determine how much signals is transferred to the next layer.  
### Node Map:  
* The NN Equivalent to an entity relationship diagram (ERD) in SQL.  
### Perceptron:  
* A type of ANN with very little complexity.


## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

#### Your Answer Here

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [1]:
import pandas as pd
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0],
         'b':  [1,1,1,1] #add bias column
       } 

df = pd.DataFrame.from_dict(data).astype('int')

In [2]:
##### Your Code Here #####

# Correct Outupts
correct_outupts = [[0],[0],[1], [1]]

In [3]:
# set random seed
import numpy as np
np.random.seed(719422)

In [4]:
# Make the DF a numpy array
# https://intellipaat.com/community/11827/convert-pandas-dataframe-to-numpy-array
# df_array = df.rename_axis('ID').values.T
# df_array

### Sigmoid activation function and its derivative for updating weights

In [5]:
# sigmoid
def sigmoid(x):
    return 1/(1 + np.exp(-x))

In [6]:
# sigmoid derivative
def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1 - sx)

### Initialize random weights for our three inputs

In [7]:
weights = 2 * np.random.random((4, 1)) - 1
weights

array([[ 0.75266466],
       [ 0.83857107],
       [-0.78892834],
       [ 0.11528913]])

### Calculate weighted sum of inputs and weights  
Dive deeper into [dot products](https://www.youtube.com/watch?v=LyGKycYT2v0) with 3Blue1Brown

In [8]:
weighted_sum = np.dot(df, weights)
weighted_sum

array([[-0.67363921],
       [ 0.07902545],
       [ 0.16493186],
       [ 1.70652485]])

### Output the activated value for the end of 1 training epoch

In [9]:
activated_output = sigmoid(weighted_sum)
activated_output

array([[0.33768244],
       [0.51974609],
       [0.54113975],
       [0.846385  ]])

### Difference of output and true values to calculate error

In [10]:
error = correct_outupts - activated_output
error

array([[-0.33768244],
       [-0.51974609],
       [ 0.45886025],
       [ 0.153615  ]])

### Gradient descent/backprop - magic!

In [11]:
adjustments = error * sigmoid_derivative(weighted_sum)
adjustments

array([[-0.07552369],
       [-0.12973387],
       [ 0.11393845],
       [ 0.01997263]])

In [12]:
adjustments.shape

(4, 1)

In [13]:
weights += np.dot(df.T, adjustments)
weights

array([[ 0.64290342],
       [ 0.97248215],
       [-0.88024745],
       [ 0.04394264]])

### Put it all together

In [14]:
# Steps we've already done: 
# 1. Randomly Initialized Weights already. Those are in memory as `weights`
# 2. We've already got input data & correct_outputs


# Update our weights 10,000 times - (fingers crossed that this process reduces error)
for iteration in range(10000):
    
    # Weighted sum of inputs / weights
    weighted_sum = np.dot(df, weights)
    
    # Activate!
    activated_output = sigmoid(weighted_sum)
    
    # Cac error
    error = correct_outupts - activated_output
    
    adjustments = error * sigmoid_derivative(weighted_sum)
    
    # Update the Weights
    weights += np.dot(df.T, adjustments)
    
print("Weights after training")
print(weights)

print("Output after training")
print(activated_output)

Weights after training
[[-1.19047371]
 [ 9.12044736]
 [-3.08891376]
 [-1.43435769]]
Output after training
[[0.01073746]
 [0.00328963]
 [0.99001983]
 [0.99849213]]


## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [15]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [16]:
diabetes.shape

(768, 9)

Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [17]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

feats = list(diabetes)[:-1]

normz = Normalizer().fit(diabetes[feats])

normz

Normalizer(copy=True, norm='l2')

In [18]:
X = normz.transform(diabetes[feats])

In [19]:
y = diabetes['Outcome'].values

In [20]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=719422)

In [21]:
print('y: \t\t',type(y))
print('y_train/test: \t',type(y_train))
print('X_train/test: \t',type(X_train))

y: 		 <class 'numpy.ndarray'>
y_train/test: 	 <class 'numpy.ndarray'>
X_train/test: 	 <class 'numpy.ndarray'>


In [22]:
##### Update this Class #####

class Perceptron(object):
    
    def __init__(self, niter = 10):
        self.niter = niter
    
    def __sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def __sigmoid_derivative(self, x):
        sx = sigmoid(x)
        return sx * (1-sx)

    def fit(self, X, y):
        """Fit training data
        X : Training vectors, X.shape : [#samples, #features]
        y : Target values, y.shape : [#samples]
        """

        # Randomly Initialize Weights
        self.weights = 2 * np.random.random((8, 1)) - 1

        for i in range(self.niter):
            # Weighted sum of inputs / weights
            weighted_sum = np.dot(X, self.weights)

            # Activate!
            activated_output = self.__sigmoid(weighted_sum)

            # Cac error
            error = y - activated_output
            
            # Adjustments
            adjustments = error * self.__sigmoid_derivative(weighted_sum)

            # Update the Weights
            self.weights = self.weights + np.dot(X.T, adjustments)
            
            return self


    def predict(self, X):
        """Return class label after unit step"""
        weighted_sum = np.dot(X, self.weights) + 1
        activated_output = self.__sigmoid(weighted_sum)
        return np.round(activated_output)


In [23]:
# Instantiate
perc = Perceptron()

In [24]:
perc.fit(X_train, y_train)

<__main__.Perceptron at 0x7fa6022c8150>

In [25]:
perc.predict(X_test)

array([[0., 0., 1., ..., 0., 0., 1.],
       [0., 0., 1., ..., 0., 0., 1.],
       [0., 0., 1., ..., 0., 0., 1.],
       ...,
       [0., 0., 1., ..., 0., 0., 1.],
       [0., 0., 1., ..., 0., 0., 1.],
       [0., 0., 1., ..., 0., 0., 1.]])

In [26]:
# # Get model accuracy
# from sklearn.metrics import accuracy_score

# # this works, but why not the next cell
# y_pred = perc.predict(X_test)
# y_pred

In [27]:
# # this don't function
# accuracy = accuracy_score(y_pred, y_test)
# accuracy

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?