<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
- the outermost layer, deals with the inputs only


### Hidden Layer:
- between the input and output layer
- extracts required features, then processes the inputs obtained in the previous layer
- there can be multiple hidden layers
- each layer is a collection of neurons

### Output Layer:
- the successive computations of all the activations in the hidden layer(s)
- if done correctly, we should get our desired number of values in a desired range

### Neuron:
- holds a number between 0 and 1
- takes an input (with a bias value) and outputs something
- when a signal comes, it gets multiplied by a weight value
- the activation function acts on it

### Weight:
- "Strength" of connection between two neurons from adjacent layers
- it decides how much influence the input will have on the output
- negative weights: increasing this input will decrease the output
- weights near zero: input does not really influence output
- the bias determines how high the weighted sum needs to be before the neuron needs to be active

### Activation Function:
- this is where non-linearity happens in neural networks
- there are many types of activation functions, a Sigmoid af is just one example
- they squish the values in a smaller range e.g. Sigmoids sqich values between a range of 0 to 1
- neuron "lights up" when the activation is a high number (close to 1)

### Node Map:
- activation functions map the input between the required values range like (0,1), (-1,1)

### Perceptron:
- a single layer neural network
- first, all inputs are multipied by their weights
- then, add all the multiplied values and apply to the correct activation function



## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

- the input layer receives the input values
- input values come with a bias value (a constant representating the intercept)
- when the values are passed from the input layer to the hidden layer, and then between hidden layer(s), a weight is applied to map the first layer to the second
- each layer is made up of nodes which contain activation functions, which squish the values in a specified range 
- the last hidden layer links to the output layer
- we should received our desired values within a desired range

Learning: finding the right weights and biases

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [21]:
import pandas as pd
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

df = pd.DataFrame.from_dict(data).astype('int')

In [2]:
import numpy as np

In [3]:
# the last term is our bias, a constant to throw us off
inputs = np.array([
    [0,0,1],
    [1,0,1],
    [0,1,1],
    [1,1,1]
])

correct_outputs = [[1], [1], [1], [0]]

In [11]:
# create the sigmoid activation function 
# and its derivative for updating weights

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    
    return sx * (1-sx)

In [7]:
# intialize random weights for the three inputs
# we need a matrix of 
weights = 2 * np.random.random((3,1)) - 1
weights

array([[ 0.38638716],
       [ 0.06477086],
       [-0.9438369 ]])

In [8]:
# calculate weighted sum of inputs and weights
weighted_sum = np.dot(inputs, weights)

In [12]:
# activated output for the end of 1 training epoch
activated_output = sigmoid(weighted_sum)

In [14]:
# calculate error 
error = correct_outputs - activated_output
error

array([[ 0.71987404],
       [ 0.63586225],
       [ 0.70662865],
       [-0.37926269]])

In [16]:
# gradient descent - make adjustments
adjustments = error + sigmoid_derivative(activated_output)

In [18]:
# weights we learnt from this learning process
weights += np.dot(inputs.T, adjustments)
weights

array([[1.86581367],
       [1.69133927],
       [4.36830715]])

In [19]:
# now do this 10000 times for fine tuning
for iteration in range(10000):
    
    # weighted sum of inputs/weights
    weighted_sum = np.dot(inputs, weights)
    
    # activate!
    activated_output = sigmoid(weighted_sum)

    # calc error
    error = correct_outputs - activated_output

    adjustments = error + sigmoid_derivative(activated_output)

    # update the weights:
    weights += np.dot(inputs.T, adjustments)

print("Weights after training")
print(weights)
# the third feature is the most important as it has the highest weight

print("Output after training")
print(activated_output)
# the first three features are close to the truth, we got 1s

Weights after training
[[-1311.90833771]
 [-1311.97335912]
 [ 2624.11929888]]
Output after training
[[1.        ]
 [1.        ]
 [1.        ]
 [0.55912263]]


## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [48]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [68]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

feats = list(diabetes)[:-1]

X = diabetes[feats]

In [69]:
scaler = MinMaxScaler()

# fit: compute min and max to be used for later scaling
# transform: transform features of X
X_scaled = scaler.fit_transform(X) 

In [75]:
# transform y into an array
y = diabetes.iloc[:, -1].values.reshape(-1, 1)
X_scaled.shape, y.shape

((768, 8), (768, 1))

In [80]:
##### Update this Class #####

class Perceptron(object):
    
    def __init__(self, niter = 10):
        self.niter = niter
    
    def __sigmoid(self, x):
        return None
    
    def __sigmoid_derivative(self, x):
        return None

    def fit(self, X, y):
        """Fit training data
        X : Training vectors, X.shape : [#samples, #features]
        y : Target values, y.shape : [#samples]
        """

        # Randomly Initialize Weights
        weights = 2 * np.random.random((8,1)) - 1

        for i in range(self.niter):
            # Weighted sum of inputs / weights
            weighted_sum = np.dot(X, weights)
        
            # Activate!
            activated_output = sigmoid(weighted_sum)

            # Cac error
            error = y - activated_output
            
            adjustments = error + sigmoid_derivative(activated_output)
            
            # Update the Weights
            weights += np.dot(X.T, adjustments)
        
        # return Output after training
        return activated_output


    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.net_input(X) >= 0.0, 1, -1)

In [84]:
results = Perceptron()
results = results.fit(X_scaled, y)

print("Output after training")
print(results)

Output after training
[[1.15012570e-15]
 [3.72747898e-36]
 [6.50934395e-07]
 [2.80148567e-41]
 [1.05060815e-14]
 [2.40339961e-27]
 [2.68099842e-28]
 [1.00000000e+00]
 [1.37309055e-11]
 [6.98279761e-10]
 [4.26392896e-44]
 [1.91818043e-14]
 [9.92550627e-01]
 [9.99957299e-01]
 [2.64672601e-08]
 [1.00000000e+00]
 [2.20837772e-51]
 [5.98560547e-25]
 [2.24334992e-19]
 [2.03057510e-36]
 [2.39795238e-46]
 [5.75566531e-19]
 [2.33151639e-23]
 [3.02183211e-30]
 [2.53650564e-19]
 [9.96164407e-13]
 [1.38772246e-18]
 [2.43458128e-35]
 [9.96568517e-01]
 [6.38735296e-35]
 [9.40075899e-16]
 [3.53186832e-32]
 [1.27372043e-29]
 [3.30059918e-37]
 [6.95668056e-15]
 [1.31950094e-19]
 [2.21566793e-14]
 [1.12415899e-18]
 [5.70465472e-43]
 [3.92715458e-16]
 [3.41796078e-28]
 [1.00050618e-26]
 [3.61276433e-25]
 [1.10337547e-26]
 [2.03621489e-08]
 [4.77274113e-35]
 [1.69324538e-22]
 [9.82823888e-43]
 [1.70429054e-28]
 [1.00000000e+00]
 [1.05568065e-41]
 [9.66759758e-25]
 [4.90319863e-27]
 [6.92009207e-12]
 [3.14

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?