<br/>
$$ \huge{\underline{\textbf{ 2-Layer Neural Network - Regression }}} $$
<br/>

<font color="red">
    
Contents:
* [Introduction](#Introduction)
* [Load and Explore Data](#Load-and-Explore-Data)
* [Preprocess](#Preprocess)
* [Neural Network](#Neural-Network)
* [Train Estimator](#Train-Estimator)

# Introduction

This notebook presents simple **2-layer neural network** used for regression.

**Model**

* 1st layer: **fully connected** with **sigmoid** activation 
* 2nd (output) layer: **fully connected** with **linear** activation (i.e. no actiavtion)
* loss: **mean squared error**
* optimizer: **vanilla SGD**

**Recommended Reading**

* *Neural Networks and Deep Learning* by Michael Nilsen - great free introductory book [here](http://neuralnetworksanddeeplearning.com/)

# Neural Network

In [1]:
import numpy as np
import matplotlib.pyplot as plt

This function contains everything. Pass in full dataset (inputs x, targets y) and randomly initialized weights $W$ and biases $b$. This function trains on mini-batches. $W$ and $b$ are updated in-place.

In [None]:
def train_classifier(x_train, x_train, nb_epochs, batch_size, Wh, bh, Wo, bo):
    """Params:
        x_train - inputs  - shape: (train_dataset_size, nb_inputs)
        x_train - targets - shape: (train_dataset_size, nb_outputs)
        nb_epochs - nb of full passes over train dataset x
        batch_size - mini-batch size
        Wh - hidden layer weights, modified in place - shape: (nb_inputs, nb_hidden)
        bh - hidden layer biases, modified in place  - shape: (1, nb_hidden)
        Wo - output layer weights, modified in place - shape: (nb_hidden, nb_output)
        bh - output layer biases, modified in place  - shape: (1, nb_output)
    """
    losses = []                                                 # keep track of losses for plotting

    indices = np.array(range(len(x_train)))
    for e in range(nb_epochs):
        np.random.shuffle(indices)
        
        for batch_idx in range(0, len(x_train), batch_size):
            
            # Pick mext mini-batch
            x = x_train[batch_idx : batch_idx+batch_size]
            y = y_train[batch_idx : batch_idx+batch_size]
            
            # Forward Pass
            z_hid = x @ Wh + bh                                 # (eq 1)    z.shape: (batch_size, nb_neurons)
            h_hid = sigmoid(z_hid)                              # (eq 2)    y_hat.shape: (batch_size, nb_neurons)
            
            y_hat = h_hid @ Wo                                  #           no activation function,
                                                                #           y_hat.shape: (batch_size, nb_outputs)
                
            # Backward Pass
            ro_out = -(y-y_hat)                                 # no transfer function
            dWh_out = h_hid.T @ ro_out

            ro_hid = (ro_out @ W_out.T) * sigmoid_deriv(z_hid)
            dW_hid = x.T @ ro_hid
        
        # Backward Pass
        rho = y_hat - y                                         # (eq 3)    combined sigmoid and binary CE derivative
        dW = (x.T @ rho) / len(x)                               # (eq 6)    backprop through matmul
        db = np.sum(rho, axis=0, keepdims=True) / len(x)        # (eq 7)
        
        # Gradient Check (defined at the end of the notebook)
        # ngW, ngb = numerical_gradient(x, y, W, b)
        # assert np.allclose(ngW, dW) and np.allclose(ngb, db)

        W += -lr * dW
        b += -lr * db

        # Train loss
        loss_train = loss(y, y_hat)                             # binary cross-entropy
        losses.append(loss_train)                               # save for plotting

        if e % (nb_epochs / 10) == 0:
            print('loss ', loss_train.round(4))
            
    return losses

Helper Functions

In [None]:
def forward(x, W, b):                                 #                        x.shape (batch_size, nb_inputs)
    return sigmoid( x @ W + b )                       #                        shape: (batch_size, nb_outputs)