<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:  vector of input values, often preprocessed (normalized/regularized)
### Hidden Layer:  layers of neurons between the input layer and output layer
### Output Layer: vector of results
### Neuron: single computation element, function, within the nueral network
### Weight: a modification function for input to any node
### Activation Function: a filter function for the output of any node
### Node Map: A Map of nodes in their layers from input to output
### Perceptron: A type of feed forward neural network with one input layer and one output layer where each input corresponds to a single neuron


## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

Inputs are delivered to their respective nodes with a weight multiplier.  At the node, all delivered information is sent to an activation function where the level of throughput is calculated (all or part of the signal).  The neuron then sends the activated signal, if any, onward.  If there is bias, a number may be subtracted from the weighted sum of inputs to increase or decrease the likelihood a neuron will send information on.  

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [2]:
import pandas as pd
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

df = pd.DataFrame.from_dict(data).astype('int')
df

Unnamed: 0,x1,x2,y
0,0,0,1
1,1,0,1
2,0,1,1
3,1,1,0


In [121]:
# Split dataframe 
X = df.drop(columns='y')
y = np.array(df.y).reshape(-1,1)

In [110]:
##### Your Code Here #####
import numpy as np

# Activation Functions

def sigmoid(x):
    return (1 / (1 + np.exp(-x)))

def sigmoid_derivative(x):
    return np.exp(-x) * sigmoid(x)**2

In [111]:
# Initializing Functions

def init_weights(feature_vector):
    return np.random.random((feature_vector.shape[1], 1))

In [42]:
# Calculate weighted sum of inputs

def calc_weighted_sum(input_vector, weights):
    return np.dot(input_vector, weights)


In [65]:
# Error Functions

def calc_absolute_error(test_vector, target_vector):
    return target_vector-test_vector

In [528]:
# Gradient Descent/Backprop

def backprop(error_vector, output_vector):
    adjustments = error_vector * sigmoid_derivative(output_vector)
    return adjustments

def update_weights(adjustment_vector, weights, input_vector):
    scaled_adjustments = np.dot(input_vector.T, adjustment_vector)
    return (weights + scaled_adjustments)
    

In [529]:
weights = init_weights(X)

test_pass = sigmoid(calc_weighted_sum(X, weights))
errors = calc_absolute_error(test_pass, y)

update_weights(
    adjustment_vector=backprop(errors, test_pass),
    weights=weights,
    input_vector=X,
)

array([[0.60316362],
       [0.64494227]])

In [533]:
## Putting it all together

# Split dataframe 
X = df.drop(columns='y')
y = np.array(df.y).reshape(-1,1)
bias = 0.425
threshold = 0.5

# # test: engineer feature by adding the two
# X['x3'] = X.x1 * X.x2

epoch_weights = []
epoch_output = []


# Update our weights 10,000 times - (fingers crossed that this process reduces error)
for epoch in range(20):
    # Initialize Weights
    weights = init_weights(X)
    
    for iteration in range(10000):

        # Weighted sum of inputs / weights
        weighted_sum = calc_weighted_sum(
            input_vector=X,
            weights=weights
        )

        # Calc activated output
        activated_output = sigmoid(weighted_sum) + bias

        # Calc error
        error = calc_absolute_error(
            test_vector=activated_output,
            target_vector=y,
        )

        # Calc adjustements
        adjustments = backprop(
            error_vector=error,
            output_vector=activated_output
        )

        # Update the Weights
        weights = update_weights(
            adjustment_vector=adjustments,
            weights=weights,
            input_vector=X
        )
    
    epoch_weights.append(weights)
    epoch_output.append(activated_output)

In [534]:
display(np.mean(epoch_weights, axis=0), np.mean(epoch_output, axis=0))

array([[-1.98788737],
       [-1.98788737]])

array([[0.925     ],
       [0.54548055],
       [0.54548055],
       [0.44341913]])

In [532]:
# Apply a threshold of 0.5 to output
output = []
for val in np.mean(epoch_output, axis=0):
    if val > threshold:
        output.append(1)
    else:
        output.append(0)
        
output

[1, 1, 1, 0]

## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [1]:
import pandas as pd
import numpy as np

diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [2]:
# Split into features, target
X_raw = diabetes.drop(columns='Outcome')
y = np.array(diabetes.Outcome).reshape(-1,1)
display(X_raw.head(), y[0:5])

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


array([[1],
       [0],
       [1],
       [0],
       [1]])

Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [3]:
# Preprocessing pipeline
from sklearn.preprocessing import MinMaxScaler, Normalizer
from sklearn.pipeline import Pipeline

process_pipe = Pipeline([
    ('scaler', MinMaxScaler()),
    ('normer', Normalizer())
])

In [4]:
# Transform features into scaled, normalized, arrays
X = process_pipe.fit_transform(X_raw)

#### Build Network Class (container for perceptrons with general methods)

In [5]:
"""
Network builds layers of specified size
"""

class Network():
    
    def __init__(self, X, y, num_layers=1, nodes_per_layer=2, n_iter=1):
        self.X = X
        self.y = y
        self.layers = self.__create_layers(num_layers, nodes_per_layer)
        
    def __create_layers(self, num_layers, nodes_per_layer):
        # static creation of one hidden layer, one output layer
        layer_list = (
            Layer( # hidden layer
                node_size=self.X.shape[1],
                num_nodes = 2,
            ),
            Layer( # output layer
                node_size=X.shape[1],
                num_nodes = 1,
            ),
        )
        return layer_list

    def fit(self):
        # at layer 0
        self.layers[0].fit(self.X)
        # a layer 1, sigmoid output from layer 0 is input
        self.layers[1].fit(self.layers[0].sigmoid_)
        # begin backprop
        self.layers[1].update_weights('output', self.y)
        self.layers[0].update_weights('hidden', self.layers[1].errors_)
    
    def predict(self):
        pass

In [6]:
class Layer(Network):
    def __init__(self, node_size, num_nodes=1):
        self.nodes = self.__create_nodes(num_nodes, node_size)
        
    def __create_nodes(self, num_nodes, node_size):
        node_list = []
        for i in range(num_nodes):
            node_list.append(Perceptron(size=node_size))
        return node_list
    
    def fit(self, X, y=None):
        self.sigmoid_ = []
        for node in self.nodes:
            # Weighted sum of inputs / weights
            node.calc_weighted_sum(X)
            # Calc activated output
            node.calc_sigmoid()
            node.calc_sigmoid_derivative()
            self.sigmoid_ += node.sigmoid_
#         print('layer sigmoid_', len(self.sigmoid_))
            
    def update_weights(self, layer_pos, y):
        for node in self.nodes:
            # Calc error
            node.calc_absolute_error(layer_pos, y)
            # Make adjustments to weights
            node.backprop(X)
            
    def predict(self): 
        output_node = self.nodes[len(self.nodes)-1]
        output_node.calc_sigmoid()
            
        return pd.DataFrame(output_node.sigmoid_)
            

In [7]:
from scipy.special import expit
scaler = MinMaxScaler(feature_range=(-1,1))

class Perceptron(Layer):
    
    def __init__(self, size, bias=0):
        self.size=size
        self.bias = bias
        self.weights = self.__init_weights()
        
    def __init_weights(self):
        return np.random.random((self.size, 1))
    
    def calc_weighted_sum(self, x):
        self.weighted_sum_ = np.dot(x, self.weights)
        self.weighted_input_ = self.weights * x
        print(self.weighted_sum_[0:5], self.weighted_input[0:5])
    
    def calc_sigmoid(self):
        self.sigmoid_ = expit(self.weighted_sum_) + self.bias
    
    def calc_sigmoid_derivative(self):
        self.sigmoid_derivative_ = self.sigmoid_ * (1 - self.sigmoid_)
        
    def calc_absolute_error(self, layer_pos, y):
        if layer_pos == 'output':
            self.error_ = y - self.sigmoid_
        elif layer_pos == 'hidden':
            self.error_ = np.dot(y, self.weights)
        
    def backprop(self, X):
        # calculate adjustments
        adjustments = self.error_ * self.sigmoid_derivative_
        # update weights with scaled adjustments
        scaled_adjustments = np.dot(X.T, adjustments)
        self.weights += scaler.fit_transform(scaled_adjustments)


In [24]:
len(X.T)

8

In [8]:
# This no longer works with network fit

layer = Layer(node_size=X.shape[1])
for i in range(1000):
    layer.fit(X, y)
comparison = layer.predict()
comparison['target'] = y
comparison

ValueError: operands could not be broadcast together with shapes (8,1) (768,8) 

In [183]:
network = Network(X, y)
network.fit()

ValueError: operands could not be broadcast together with shapes (8,1) (768,8) 

In [133]:
network.layers[1].nodes

[<__main__.Perceptron at 0x7f1f66e694d0>]

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?