<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
A set of nodes that take values as input to a neural network.  Exaamples are pixels in an image, or perhaps thermometer readings.


### Hidden Layer:
A set of nodes that receive input from other nodes, and that are not observed as output or prediction values. The first **hidden** layer receives input that is the output of the **input** layer, and passes that hidden layer's output to one next hidden layer or to the output layer. Each subsequent layer receives input only from one previous hidden layer.

### Output Layer:  
a set of nodes that do not pass their values to any further layer. The values of these nodes are the 'prediction' or 'classification' output of the neural net.

### Neuron:
A thing that receives inputs (both weighted inputs from 'earlier' neurons, and a bias value), applies a function to calculate a value, and transmits either no, or one positive, value to a following layer.

### Weight:
A coefficient by which an input from a neuron is multiplied as part of the function that determines a node/neuron's value.


### Activation Function:

### Node Map:
The interrelationship between elements of a neural network, including nodes and transfer functions.

### Perceptron:
An early (1957) neural network design. Must have one input layer of nodes, one or more hidden layer of nodes, one output layer of nodes, and associated propagation paths.

## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

Information flow through a neural network:

1. Input nodes receive data from sensors or a data stream.
2. Each input node transmits its value to each of a set (layer) of nodes other than ones in the input layer.
3. Each input to each node in this next layer is multiplied by a **weight**, a coefficient.
4. The sum of the products of each input value and its weight, added to a bias value (applied in common to all nodes in the layer considered) is the value of the node.
5. Based the value of an activation function applied to the value of a node, an output value is calculated.
6. This output value is then propagated to either 
- the final output layer (by definition, the final step in calculation in a trained neural network), or to 
- a following hidden layer, with these steps recurring from step 3.

If a network is being trained. an error value ic calculated, and applied to modify the weights and biases embodied by the network, for use in another **epoch**, or iteration of the steps above.

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [1]:
import numpy as np
import pandas as pd
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
        'x3' : [1,1,1,1],
         'y':  [1,1,1,0]
       }

# x's are inputs.
# y's are the valid outputs.

df = pd.DataFrame.from_dict(data).astype('int')

df

Unnamed: 0,x1,x2,x3,y
0,0,0,1,1
1,1,0,1,1
2,0,1,1,1
3,1,1,1,0


In [2]:
inputs = df[['x1','x2', 'x3']]
inputs

Unnamed: 0,x1,x2,x3
0,0,0,1
1,1,0,1
2,0,1,1
3,1,1,1


In [3]:
correct_outputs = [[1], [1], [1], [0]]
correct_outputs

[[1], [1], [1], [0]]

In [4]:
## Define activation functon (the sigmoid function)
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1 - sx)


In [5]:
weights = 2 * np.random.random((3,1)) - 1
weights

array([[ 0.82466522],
       [ 0.62099344],
       [-0.67398716]])

In [6]:
weighted_sum = np.dot(inputs, weights)
weighted_sum

array([[-0.67398716],
       [ 0.15067806],
       [-0.05299372],
       [ 0.7716715 ]])

In [7]:
activated_output = sigmoid(weighted_sum)
activated_output # True values are [1, 1, 1, 0]

array([[0.33760462],
       [0.53759841],
       [0.48675467],
       [0.68388236]])

In [8]:
error = correct_outputs - activated_output
error

array([[ 0.66239538],
       [ 0.46240159],
       [ 0.51324533],
       [-0.68388236]])

In [9]:
# Majic

adjustment = error * sigmoid_derivative(activated_output)
adjustment

array([[ 0.16096844],
       [ 0.10763438],
       [ 0.12100148],
       [-0.15244101]])

In [10]:
weights += np.dot(inputs.T, adjustment)
weights

array([[ 0.77985859],
       [ 0.5895539 ],
       [-0.43682387]])

### First epoch completed

In [11]:
for epoch in range(10000):
    
    weighted_sum = np.dot(inputs, weights)
    
    activated_output = sigmoid(weighted_sum)
    
    error = correct_outputs - activated_output
    
    adjustment = error * sigmoid_derivative(activated_output)
    
    weights += np.dot(inputs.T, adjustment)

print("After Training:")
print("Weights")
print(weights)
print()
print("Outputs")
print(activated_output)

After Training:
Weights
[[-11.83813596]
 [-11.83813596]
 [ 17.80606914]]

Outputs
[[0.99999998]
 [0.99744675]
 [0.99744675]
 [0.00281464]]


## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [12]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


numpy.int64

Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [40]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

features = list(diabetes)[:-1]
diabetes_X = diabetes_X[features]
diabetes_X.head()
# # features

min_max_scaler = MinMaxScaler()

numpy_scaled = min_max_scaler.fit_transform(diabetes_X)

numpy_scaled
diabetes_X_scaled = pd.DataFrame(data = numpy_scaled, columns = features)
diabetes_X_scaled.head()
diabetes_X_scaled.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


In [26]:
y = diabetes['Outcome']
len(y)

list(y)[0]

1

In [16]:
# practice with scaler
from sklearn.preprocessing import Normalizer
transformer = Normalizer()

X = [[4, 1, 2, 2],
     [1, 3, 9, 3],
     [5, 7, 5, 1]]

transformer.transform(X)

array([[0.8, 0.2, 0.4, 0.4],
       [0.1, 0.3, 0.9, 0.3],
       [0.5, 0.7, 0.5, 0.1]])

In [17]:
X_np = np.array(X)
X_np.shape

(3, 4)

In [18]:
Xtest = diabetes_X[:5]
Xtest

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


In [19]:
# y_test = y[1][:5]
y_test = list(y[:5])
y_test

[1, 0, 1, 0, 1]

In [80]:
class Perceptron(object):     
    
    import numpy as np
    import pandas as pd
    from sklearn.preprocessing import MinMaxScaler 
    
    def __init__(self, niter = 10):
        self.niter = niter
    
    def __sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def __sigmoid_derivative(self, x):
        sx = self.__sigmoid(x)
        return sx * (1 - sx)

    def fit(self, X, y):

        correct_outputs = y
        
#         weights = 2 * np.random.random(len(y)) -1
        
        X_np = np.array(X)
        
        scaler = MinMaxScaler()
        
        inputs = scaler.fit_transform(X_np)
        
        # Randomly Initialize Weights
        weights = 2 * np.random.random(len(inputs[0])) -1
        
        for epoch in range(self.niter):
            
            weighted_sum = np.dot(inputs, weights)
        
            # Activate!
            activated_output = self.__sigmoid(weighted_sum)
        
            # Calculate error
            error = correct_outputs - activated_output
        
            # Update the Weights
            adjustment =  error * self.__sigmoid_derivative(activated_output)
            weights += np.dot(inputs.T, adjustment)
            
        self.weights = weights
        
        return self.weights
    
    def predict(self, X):
        """Return class label after unit step"""
        
#         predictions = []
#         for row in X:

#             weighted_sum = np.dot(self.weights,row)
#             activated_output = self.__sigmoid(weighted_sum)
#             predictions.append(activated_output)

#             predictions.append(self.__sigmoid(np.dot(self.weights,row)))

        return list(map((lambda r: self.__sigmoid(np.dot(self.weights,r))),
                               X))
            
#         return predictions

In [81]:
p = Perceptron()

p.fit(Xtest, y_test)
p.predict([X_test_row])
# p.weights

[1.0]

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?