The goal of this project is to train a simple Neural Network.

We will start with the following toy model



![image-3.png](https://drive.google.com/uc?id=1SjPymYoI0iJPYAJqSKLttuUXjJi7fgB0)



# NN Training Steps


---
## 1. Feed forward
---

- Apply linear transformation and activation function to determine the neurons in the next layers of NN


> \begin{array}{ccccccccc}
\text{Input = Output Layer 0} & & \text{Layer 1} & & \text{Output Layer 1} & & \text{Layer 2} & & \text{Output Layer 2}\\
\hline
\vec{x} = \vec{a}^{(0)} & ⇒  & \mathbf{W}^{(1)} \cdot \vec{x} + \vec{b}^{(1)} = \vec{z}^{(1)} & ⇒ &
 \sigma(\vec{z}^{(1)}) = \vec{a}^{(1)} & ⇒ &  \mathbf{W}^{(2)} \cdot \vec{a}^{(1)} + \vec{b}^{(2)} = \vec{z}^{(2)} & ⇒ &  \sigma(\vec{z}^{(2)}) = \vec{a}^{(2)} = \hat{y}
\end{array}

#### LAYER 1

1. Linear Transformation:

$$
\vec{z}_1 = \mathbf{W}_1 \cdot \vec{x} + \vec{b}_1
$$

2. Activation Function (sigmoid):

$$
\vec{a}_1 = \sigma(\vec{z}_1)
$$

#### LAYER 2

1. Linear Transformation:

$$
\vec{z}_2 = \mathbf{W}_2 \cdot \vec{a}_1 + \vec{b}_2
$$

2. Activation Function (sigmoid):

$$
\vec{a}_2 = \sigma(\vec{z}_2)
$$


![image-5.png](https://drive.google.com/uc?id=1ZmRXr2zCthT5kS3K4Ah532whDvKEX_mm)






---
## 2. Define the loss/Cost function
---

---
## 3. Backpropagate
---

Define $\vec{\theta} = \{ W^{(1)}_{1,1},W^{(1)}_{2,1},...,b_1^{(1)},...\}$

Then the gradient of the Loss function is

$$
\nabla_{\vec{\theta}} C (\vec{\theta})
$$

We will visualize this in a smaller case for clarity:
(in this case W is used for the weight matrices instead of A)

![image-4.png](https://drive.google.com/uc?id=1xyP1ocJGA-pS6z1BUpNISUmlTK09L4g8)

---
## 4. Update Values of Weights
---

(h is the learning rate)

![image-5.png](https://drive.google.com/uc?id=1E5OIQQMK-wXDHxSbMdYNacifuAD9weC_)

---
### 5. Repeat!
---

# Neural Network

>   #### Now we're going to complicate the simple Neural Network that we showed above.  Our input layer (in $\color{blue} {\text{blue}}$) is now in $\mathbb{R}^3$ and we are going to add a hidden layer (in $\color{lightgreen} {\text{green}}$) also in $\mathbb{R}^3$.  Our output (in $\color{red} {\text{red}}$) is still in $\mathbb{R}$.

![image1](https://drive.google.com/uc?id=16GbDnjSC5PjilwKReDqNJC_BirxuIoko)




> #### Here you can see all the neurons, weight matrices, and biases labeled and the dimension of each given.

![image2](https://drive.google.com/uc?id=1OyfiK0-U9xObswZdhxZXM0H8a0nHW_9r)




# Our dataset

> #### Let's first take a look at the dataset we will use to build our neural network, the `penquin` dataset.

In [None]:
import pandas as pd
X = pd.read_csv('penguins1.csv', usecols=range(1,4))
Y = pd.read_csv('penguins2.csv', usecols=range(1,2))

In [None]:
X

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper_length_mm
0,39.1,18.7,181
1,39.5,17.4,186
2,40.3,18.0,195
3,36.7,19.3,193
4,39.3,20.6,190
...,...,...,...
260,47.2,13.7,214
261,46.8,14.3,215
262,50.4,15.7,222
263,45.2,14.8,212


In [None]:
Y

Unnamed: 0,species
0,1
1,1
2,1
3,1
4,1
...,...
260,0
261,0
262,0
263,0


> #### We notice that the `X` dataset consists of 265 different samples (penguins) with 3 features each (`bill_length_mm`, `bill_depth_mm`, and `flipper_length_mm`).  

> #### Our neural network takes as input each row of the `X` dataset (which is a vector in $\mathbb{R}^3$) and outputs a value that classifies the species of penguin.  We have 2 types of penguins in the `Y` dataset, 0 and 1.  

---
# Make Neural Network !

## TODO: MAKE A COPY OF THIS ASSIGNMENT AND EDIT

Tips:
- use `print` statements to debug your code
- code small helper functions first and check if correct by running practice tests
- check if matrices are the **correct dimension for matrix multiplication** by printing the matrix or printing the dimension of the matrix (may need to transpose by using `matrix.T` in some cases)
- make sure you understand the math behind feeding forward and backpropogating before implementing the `feed_forward` and `gradient` functions
- The internet is your friend! Look up unknown python syntax and error messages  
- Other potentially useful functions: `np.dot`, `np.random.normal`, `np.exp`, `pd.DataFrame`

# Load Libraries

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import normalize

# Create Functions

## Small Helper Functions:

#### `sigmoid(z)`
- activation function
-takes in a numpy array `z` and returns sigmoid `sig_z`

In [None]:
def sigmoid(z):
    # edit
    return sig_z

#### `sigmoid_derivative(a)`
- derivative of activation function - for backpropogation
- takes in numpy array `a`, such that `a` = `simgoid(z)`, and returns the sigmoid derivative `sig_deriv_a`

In [None]:
def sigmoid_derivative(a):
    # edit
    return sig_deriv_a

#### `initialize_random_weights(i,j)`
- initialize a n`i` by `j`  numpy matrix with random values (from a standard normal distribution)
- takes in integer dimensions `i` and `j` and returns a `matrix` of random values

In [None]:
def initialize_random_weights(i,j):
    # edit
    return matrix

#### `cost_function(y, y_hat)`
- takes in two values `y`, `yhat` and returns `error` between them

In [None]:
def cost_function(y, y_hat):
    # edit
    return error

## Major Functions:

#### ` feed_forward(x)`
- takes in input numpy array `x` and applies linear transformation and sigmoid function for layers 1 and 2
- returns output values for layers 1 and 2: `a1` = (a11, a12, a13) and `a2` = (a21)
- INITIALIZE your weight matrices (W1 and W2 ) before doing this (the code to initialize is below in the loading data section)

In [None]:
def feed_forward(x):
    # edit
    return a1, a2

#### `gradient(x,y,a1,a2)`
- takes in input `x` and output `y` as well as calculated hidden layer `a1` and calculated output layer `a2`
- calculates the gradient (see equations above) and returns change in weight matrices `dW1`, `dW2` and biases `db1`, `db2`

In [None]:
def gradient(x, y, a1, a2):
    # edit
    return dW1, db1, dW2, db2

#### `train(inputs, outputs, learning_rate, epochs)`
- applies helper functions to train neural network
- does not need to return anything, works by updating weights and biases `W1`, `W2`, `b1`, `b2`

In [None]:
def train(inputs, outputs, learning_rate, epochs):

    for epoch in range(epochs):
        for x,y in zip(inputs,outputs):
            # reshape x to correct vector size
            x = np.reshape(x, (3,1))

            # edit: feed forward

            # edit: calculate error using cost function

            # edit: backpropogate using gradient function

            # set to global variable to update values outside of function
            global W1
            global b1
            global W2
            global b2

            # edit: reassign matrices

#### `test(inputs, outputs)`
- run after training to see if neural network works well on other data points that it didn't see during training
- takes in test `inputs` and `outputs` and returns `percent_correct`

In [None]:
def test(inputs, outputs):

    count_correct = 0
    for x,y in zip(inputs,outputs):

        # edit: feed forward

        # edit: determine if result is correct & update count_correct value


    # edit: calculate percent_correct = 100*count_correct/total_count
    return percent_correct

# Load Data and Run Neural Network
No editing needed below but can try out different values for `learning_rate` or `epochs`

**NOTE**: make sure to rerun `initialize_random_weights` each time you train the network (otherwise `train` function will have edited the matices and you want them to start random)

### Load Data

In [None]:
X = pd.read_csv('penguins1.csv', usecols=range(1,4))
X = X.to_numpy()
Y = pd.read_csv('penguins2.csv', usecols=range(1,2))
Y = Y.to_numpy()

# normalize X dataset
X = normalize(X, axis=1, norm='l2')

### Split into test (`x_test`, `y_test`) and train (`x_train`, `y_train`) datasets

In [None]:
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.2)

### Initialize values of weights `W1`, `W2` and biases `b1`, `b2`

In [None]:
#Weights
W1 = initialize_random_weights(3,3)
W2 = initialize_random_weights(3,1)

#Biases
b1 = initialize_random_weights(3,1)
b2 = initialize_random_weights(1,1)

learning_rate = .1
epochs = 500

### `train`

In [None]:
train(x_train, y_train, learning_rate, epochs)

### `test`

In [None]:
test(x_test, y_test)

# Analysis

1. How are the weight matrices updated?
    - print the weight matrices every 50 epochs in `train` function
    
    

2. How does the error change over epochs?
    - implement a method of calculating and graphing the change in error of your neural network in `train` function
    -I did this by returning a pandas dataframe, `error_df`, from the `train` function that contained two columns, the epoch number and the calculated error for that epoch (remember to average over the whole epoch).  Then I plotted using `error_df.plot(x = "epoch", y = "error")`. There are many different ways this could be done though!
    
    Want to make something like this:
    
    ![image-4.png](https://drive.google.com/uc?id=10LqjzKCplZjWXnSxuGjQYF8ycTToNx9K)

  