# Understanding our Class Library
We'll be using the gwu_nn library to dive a bit deeper into how neural networks work and get a better handle on how we can manipulate our layers and activation functions to build better networks

In [1]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from gwu_nn.gwu_network import GWUNetwork
from gwu_nn.layers import Dense
from gwu_nn.activation_layers import Sigmoid

## Setting up our Data
To explore how our GWU_Network library works we'll reuse the example data from the ML Crash Course. As the data is not linearly solvable, solving this problem will also show the flexibility of our network

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

y_col = 'Survived'
x_cols = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
df = pd.read_csv('titanic_data.csv')
y = np.array(df[y_col]).reshape(-1, 1)
orig_X = df[x_cols]

In [3]:
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

# Lets standardize our features
scaler = preprocessing.StandardScaler()
stand_X = scaler.fit_transform(orig_X)
X = stand_X

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)


In [4]:
print(X_train.shape)
print(X_train[0])

(478, 7)
[ 0.91123237  0.75905134 -0.32371271 -0.55170307 -0.50589515 -0.65607592
 -0.50349899]


In [5]:
print(y_train[0].shape)
print(y_train[0])

(1,)
[1]


## The GWUNetwork Class
This class is the main structure used to house our network. Like many standard libraries, our GWUNetwork or neural network model works by adding layers to it.

```python
class GWUNetwork():

    def __init__(self):
        self.layers = []
        self.loss = None
        self.loss_prime = None

    def add(self, layer):

    def get_weights(self):

    #def set_loss(self, loss):
       
    def predict(self, input_data):
       
    def measure_loss(self, x, y):

    def fit(self, x_train, y_train, epochs):
```

So when we create a network it is initially empty, as it doesn't contain any layers

In [6]:
network = GWUNetwork()
print(network)

Model: Empty


## Adding Layers
To begin building our network we need to start adding functionality or layers. Each layer we add will increase the complexity and depth of our model. However, we'll find that for most simple problems 3-4 layers are typically enough.

Currently the GWU_NN library only supports densely connected layers, this means that every input node is connected to every output node.

<img src="https://i.stack.imgur.com/iHW2o.jpg">

In [7]:
network.add(Dense(16, add_bias=True, activation='relu', input_size=7))
print(network)

Model:
Dense - (Input:7, Output:16)



### Our Dense Layer
```python
class Dense:
    def __init__(self, output_size, add_bias=False, activation=None, input_size=None):
        super().__init__(activation)
        self.type = None
        self.name = "Dense"
        self.input_size = input_size
        self.output_size = output_size
        self.add_bias = add_bias
        
    def forward_propagation(self, input):
        self.input = input
        output = np.dot(input, self.weights)
        if self.add_bias:
            return output + self.bias
        else:
            return output

    def backward_propagation(self, output_error, learning_rate):
        input_error = np.dot(output_error, self.weights.T)
        weights_error = np.dot(self.input.T, output_error)

        self.weights -= learning_rate * weights_error
        if self.add_bias:
            self.bias -= learning_rate * output_error
        return input_error
```

Notice that in our layer code, we actually define the forward and backward propogation steps. Due to the nature of how neural networks work (and the core principals of computational graphs), calculating forward/backward passes does not matter on the entire network structure.

Forward Progpogation:
 - When calculating the forward propogation, we simply need to know the values coming into the layer. Thus our model simply passes the previous layer's output to the subsequent one
 
Backwards Propogation:
 - This works similar to forward propogation but in reverse. The major difference here is that while the error is propogating backwards, the weights are being updated on each layer.

## Adding a Hidden Layer
To take true advantage of our network, we need to add a hidden layer. For our GWU_NN models each layer defines the number of input and output nodes. So creating a hidden layer simply requires the addition of another layer.

*Note: As we'll only have one hidden layer, we'll want our output to match the size/shape of our final prediction*

In [8]:
network.add(Dense(1, True))
print(network)

Model:
Dense - (Input:7, Output:16)
Dense - (Input:16, Output:1)



## Adding Activation Layers/Functions
As previously discussed, activation layers help shape our network and let us interpret results for a given prediction/outcome. In this instance we will continue to use Sigmoid activation layer/function to predict binary classes.

In [9]:
network.add(Sigmoid())
print(network)

Model:
Dense - (Input:7, Output:16)
Dense - (Input:16, Output:1)
Sigmoid Activation


### Sigmoid Activation Layer/Function

Notice that the sigmoid layer/function works nearly identical to the Dense layer, just here:
 - activation -> forward propogation
 - activation_partial_derivative -> backward propogation
 
``` python
class SigmoidActivation(ActivationFunction):

    @classmethod
    def activation(cls, x):
        return 1 / (1 + np.exp(-x))

    @classmethod
    def activation_partial_derivative(cls, x):
        return np.exp(-x) / (1 + np.exp(-x))**2
```

## Compiling Our Model
Finally to complete our model we need to compile it. This final step defines our loss function, optimizer, learning_rate, and more (when working with larger libraries).

In [10]:
network.compile(loss='log_loss', lr=0.01)
print(network)

Model:
Dense - (Input:7, Output:16)
Dense - (Input:16, Output:1)
Sigmoid Activation


### Our Loss Function
We can see that our **log_loss** function parallels exactly what we had in our perceptron/logistic regression computational graph

```python
class LogLoss(LossFunction):

    @classmethod
    def loss(cls, y_true, y_pred):
        return np.mean(-np.log(y_pred)*y_true + -np.log(1-y_pred)*(1-y_true))

    @classmethod
    def loss_partial_derivative(cls, y_true, y_pred):
        return -np.sum(y_true - y_pred)
```

## Fitting or Training our Model
With everything put in place, the last step is to train our model
 - *batch_size (currently not supported): How many datapoints to train on for each backprop/gradient descent*
 - *epochs: How many times we loop over the data*

In [22]:
print(X_train.shape)
print(X_train[0].shape)
#X_train = np.stack([record.reshape(1,-1) for record in X_train])
X_train = X_train.reshape(X_train.shape[0],1,X_train.shape[1])
print(X_train.shape)
print(X_train[0].shape)

(478, 1, 7)
(1, 7)


ValueError: cannot reshape array of size 3346 into shape (478,1,1)

In [12]:
network.fit(X_train, y_train, epochs=100, batch_size=20)

epoch 1/100   error=0.623918
epoch 11/100   error=0.435275
epoch 21/100   error=0.411720
epoch 31/100   error=0.401982
epoch 41/100   error=0.394857
epoch 51/100   error=0.390294
epoch 61/100   error=0.382705
epoch 71/100   error=0.389418
epoch 81/100   error=0.391416
epoch 91/100   error=0.389163


Here is a streamlined representation of our network

```python
network = GWUNetwork()
network.add(Dense(14, True, activation='relu', input_size=7))
network.add(Dense(1, True, activation='sigmoid'))
network.compile(loss='log_loss', lr=0.001)
network.fit(X_train, y_train, epochs=100, batch_size=20)
```

### Deeper Look at Training
```python
    def fit(self, x_train, y_train, batch_size, epochs):
        # sample dimension first
        samples = len(x_train)

        # training loop
        for i in range(epochs):
            err = 0
            for j in range(samples):
                # forward propagation
                output = x_train[j].reshape(1, -1)
                for layer in self.layers:
                    output = layer.forward_propagation(output)

                # compute loss (for display purpose only)
                err += self.loss(y_train[j], output)

                # backward propagation
                error = self.loss_prime(y_train[j], output)
                for layer in reversed(self.layers):
                    error = layer.backward_propagation(error, self.learning_rate)

            # calculate average error on all samples
            if i % 10 == 0:
                err /= samples
                print('epoch %d/%d   error=%f' % (i + 1, epochs, err))
```

In [13]:
network.predict(X_test[:2])

[array([[0.08301823]]), array([[0.82756661]])]

In [14]:
network.layers[0].weights.shape

(7, 16)

In [15]:
network.layers[0].weights

array([[ 0.19988928, -0.19867889,  0.40365918, -0.04466178,  0.36097635,
         0.52146408, -0.95912783,  0.11570432,  0.06624123,  0.18749753,
         0.45480326,  0.4796858 , -0.08597312, -0.1973516 , -1.10959328,
         0.00513415],
       [-0.08526921, -0.04602341, -0.53379116, -0.40084962, -0.15009928,
         0.21694768, -0.69326546, -0.07342725,  0.067196  ,  0.34496763,
        -0.16189939, -0.11395936,  0.47138062,  1.02167249, -0.75463628,
        -0.17937106],
       [ 0.03917447,  0.12356748, -0.15450242,  0.07116774, -0.06888449,
         0.18867709, -0.80778195,  0.12348489, -0.31324665,  0.15677182,
         0.35003437,  0.24012905,  0.26424629,  0.32659497,  0.30654192,
        -0.03321727],
       [ 0.35623195,  0.37366246, -0.16969693, -0.15839175,  0.16319463,
        -0.43418166, -0.18652879,  0.17713857, -0.16295576,  0.03891437,
         0.49707429,  0.03858724, -0.33896115, -0.3396342 ,  0.13102934,
        -0.257091  ],
       [-0.38067756,  0.24913177, -0

In [16]:
X_train[0].reshape(1, -1).dot(network.layers[0].weights)

array([[ 0.2741055 , -0.31594925,  0.22108848, -0.42694962,  0.01050662,
         0.50516216, -1.19725139, -0.40811571,  0.36027896,  0.30101866,
        -0.49110829,  0.29038851,  0.18921216,  0.36938554, -1.72211045,
         0.11841268]])

In [17]:
d_round = lambda x: 1 if x >= 0.5 else 0
predictions = [d_round(x[0][0]) for x in network.predict(X_test[:10])]
actual = [y for y in y_test[:10].reshape(-1)]

print(predictions)
print(actual)

[0, 1, 1, 1, 0, 1, 0, 1, 0, 0]
[0, 1, 1, 1, 0, 1, 1, 1, 0, 0]


## In Class:
Determine the accuracy and loss of the model using our holdout testing dataset

In [18]:
# Joel's answer (I didn't copy it right)
thresh = lambda x: 0 if x < 0.5 else 1

test_size = y_test.shape[0]
correct = 0
for ix in range(test_size):
    pred = thresh(network.predict(X_test[ix].reshape(-1,7)[0]))
    if pred == y_test[1]:
        correct +=1

TypeError: '<' not supported between instances of 'list' and 'float'

In [19]:
# Space for work - this matched with Joel's answer
raw_predictions = network.predict(X_test)
predictions = [d_round(x[0][0]) for x in raw_predictions]
actual = [y for y in y_test.reshape(-1)]
count = 0
for p,a in zip(predictions,actual):
    if p == a:
        count += 1
print("accuracy: " + str(100 * count/len(predictions)))

accuracy: 79.66101694915254


In [20]:
# My answer (def wrong)
mse = 0
for a,r in zip(actual,raw_predictions):
    mse += (a-r)**2
mse /= len(raw_predictions)
print("loss: " + str(mse[0][0]))

loss: 0.1481354159845752


In [21]:
# Joel's answer
preds = np.array(network.predict(X_test)).reshape(-1,1)
network.loss(y_test, preds)

0.5018783489366799

## In Class:
Use the GWU Network to solve one of our earlier problems

In [None]:
# Space for work