<a href="https://colab.research.google.com/github/katrina-liu/bmi707-deep-learning/blob/main/Week_2_Lab_Perceptron_by_hand.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Week 2 Lab: Perceptron by hand

In this example, we will implement perceptron from scratch using only base pythong and numpy. 


In [1]:
########## Answer ##########
# Adapted from http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html
# Helper function
from sklearn.metrics import roc_auc_score
import matplotlib.pyplot as plt

def plot_learning_curve(train_scores, title=None, ylim=None):
    plt.figure()
    plt.title(title)
    if ylim is not None:
        plt.ylim(*ylim)
    plt.xlabel("Training Epochs")
    plt.ylabel("Loss Value")
    plt.grid()
    plt.plot(range(len(train_scores)), train_scores, '-', color="r",
             label="Training score")

    plt.legend(loc="best")
    plt.show()

## Question 1: Learning the OR function with a perceptron

Here we will implement a perceptron from scratch. We will start with defining four data points as the input and build a perceptron to model them. The goal is to have the perceptron learn the __OR__ function.

In [None]:
# import numpy and define the input data
import numpy as np

X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([0, 1, 1, 1]) 

##The forward and backwards pass

Here we will implement a perceptron from scratch. We first need to compute the __forward pass__ of the perceptron, which is given by:


$$\phi(w_1*x_{1i} + w_2*x_{2i} + b)$$

where $\phi(h)$ is the the _sigmoid_ transformation and can be computed as:

$$\phi(h) = \frac{1}{1+\exp(-h)}$$

In the example below, we will compute the forward pass for __all__ data points using a single matrix - vector multiplication. If weights $w_1, w_2$ are in a single vector $w$ we can multiply this vector with our $4x2$ $X$ matrix as follows:

$$ h = Xw  $$

We can then add our bias to $h$ and perform an element-wise sigmoid transformation to get a $4x1$ vector of output probabilities. 

Now, Recall that our gradients are for sample $i$ are given by:

$$\frac{\partial{l}}{\partial{w_{1}}} = (p - y)*p*(1-p)*x_{1i} \\
\frac{\partial{l}}{\partial{w_{2}}} = (p - y)*p*(1-p)*X_{2i} \\
\frac{\partial{l}}{\partial{b}} = (p - y)*p*(1-p)$$ \\

First, we're going to set up the forward pass. This function should take in the `X` matrix, the weights `w`, and the bias term `b` and returns a vector of prediction probabilities `p` for each sample in `X`. 

Fill in the function template below:

In [None]:
# The forward pass
# X: an n x d matrix of the input features + a column of all ones for the intercept term
# w: a d x 1 vector containing the weights for our peceptron
def forward_pass(X, w, b):
    p = 0 # your code for the forward pass goes here
    return p

Now the backward pass. The function should accept the label vector `y`, the data matrix `X`, and the vector of predicted probabilities `p`. Using the equations above, complete the backward pass function. 

In [None]:
# The backward pass
# X: the design matrix of the input features
# y: the ground truth of outcome labels (classes)
# p: the class probability outputted by the model
def backward_pass(y, X, p):
    # Your code for the gradients goes below:
    grad_w1 = 0
    grad_w2 = 0
    grad_b = 0
    # Average the gradients
    grad_w1 = grad_w1.mean()
    grad_w2 = grad_w2.mean()
    grad_b = grad_b.mean()
    
    return grad_w1, grad_w2, grad_b

Now we are going to put it all together and come up with function that will train our perceptron. Please fill out the incomplete lines to complete the function

In [None]:
# One possible implementation is as below
############################################################
def train(X, y, w, b, iters, lr=1):
    w_new = np.copy(w)
    b_new = np.copy(b)
    loss = []
    for i in range(iters):
        # preds is a 4x1 vector of probabilities
        p = 0 # Call the forward pass function
        # Now we want to com
        grad_w1, grad_w2, grad_b = 0,0,0 # Call the backward pass function
        
        # Now do the gradient descent updates using the average gradient from all four samples
        w_new[0] = 0 # Use the gradient descent update rule
        w_new[1] = 0 # Use the gradient descent update rule
        b_new = 0 # Use the gradient descent update rule

        # Calculate the loss
        mse = ((y - p)**2).mean()
        loss.append(mse)
        acc = len(np.where(np.round(p) == y)[0])/float(len(y))
        # Print out some info 
        if i % 10 == 0:
          print("Loss at iteration " + str(i) + ": " + str(np.mean(mse)) + '\t' + "Accuracy: " + str(acc))
    
    return w_new, b_new, loss

Now we are ready to train! Assuming we've done everything correctly the following should run and result in final values for the weights.

In [None]:
# Initialize the weights and train the model
w = np.array([1.,-1.])
b = 0.
w_new, w_b, loss = train(X, y, w, b, iters=100, lr=1)
plot_learning_curve(loss)

## Question 2: Learning the AND function

Because we've written our functions in a modular way, we can easily repurose them to learn on new data. Next, we will train our perceptron to learn the AND function. Give the code a try again on the new data for the AND function.

In [None]:
# New data
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([0, 0, 0, 1]) 

# Reinitialize the weights and train the model
w = np.array([1.,-1.])
b = 0.
w_new, w_b, loss = train(X, y, w, b, iters=100, lr=1)
plot_learning_curve(loss)

## Question 3: (Trying to) Learn the XOR function

Now we are going train our model on data for the XOR function. Run the code below and verify if it converges. If it doesn't, what do you think is the problem?

In [None]:
# New data
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([0, 1, 1, 0]) 

# Reinitialize the weights and train the model
w = np.array([1.,-1.])
b = 0.
w_new, w_b, loss = train(X, y, w, b, iters=100, lr=1)
plot_learning_curve(loss)