### Lab 2.2: Perceptron Algorithm in PyTorch

In this lab you will again implement the perceptron algorithm, but this time using PyTorch.

In [None]:
!pip install torch

In [131]:
import numpy as np
import torch

PyTorch is very similar to NumPy in its basic functionality.  In PyTorch arrays are called tensors.

In [None]:
a = torch.tensor(5)
a

In [None]:
b = torch.tensor(6)
a+b

In [None]:
c = torch.zeros(3,5).float()
c

*A note on broadcasting:* You may have noticed in the previous lab that NumPy is particular about the sizes of the arrays in operations; PyTorch is the same way.

For example, if `A` has shape `(10,5)` and `b` has shape `(10,)`, then we can't compute `A*b`.  It wants the *last* dimensions to match, not the first ones.  So you would need to do either `A.T*b`.

In [135]:
A = np.random.normal(size=(10,5))
b = np.ones(10)

In [None]:
try:
    A*b
except ValueError as e:
    print(e)

In [None]:
A.T*b

An alternative is to introduce an extra dimension of size one to $b$.  However, note that this produces the transposed result from before.

In [None]:
A*b[:,None]

In [None]:
A*np.expand_dims(b,-1)

In general, carefully check the sizes of all arrays in your code!

In [140]:
from palmerpenguins import load_penguins
from mlxtend.plotting import plot_decision_regions
from matplotlib import pyplot as plt

Here we loading and format the Palmer penguins dataset for binary classification.

In [141]:
df = load_penguins()

# drop rows with missing values
df.dropna(inplace=True)

# tricky code to randomly shuffle the rows
df = df.sample(frac=1).reset_index(drop=True)

# select only two specices
df = df[(df['species']=='Adelie')|(df['species']=='Chinstrap')]

# get two features
X = df[['flipper_length_mm','bill_length_mm']].values

# convert speces labels to -1 and 1
y = df['species'].map({'Adelie':-1,'Chinstrap':1}).values

To make the learning algorithm work more smoothly, we we will subtract the mean of each feature.

Here `np.mean` calculates a mean, and `axis=0` tells NumPy to calculate the mean over the rows (calculate the mean of each column).

In [142]:
X -= np.mean(X,axis=0)

Now we will convert our `X` and `y` arrays to torch Tensors.

In [143]:
X = torch.tensor(X).float()
y = torch.tensor(y).float()

In [None]:
X

### Exercises

Your task is to again complete this class for the perceptron, with two changes from last time:
- the implementation should use PyTorch tensors, not NumPy arrays;
- `train_step` now accepts the entire dataset as input and should calculate the average gradient over all examples, rather than updating the weights one data point at a time.

In [145]:
class Perceptron:
    def __init__(self,lr=1e-3):
        # store the learning rate
        self.lr = lr

        # initialize the weights to small, normally-distributed values
        self.w = torch.normal(mean=0, std=0.01, size=(2,))

        # initialize the bias to zero
        self.b = torch.zeros(1)

    def train_step(self,X:torch.Tensor,y:torch.Tensor) -> None:
        """ Apply the first update rule shown in lecture.
            Arguments:
             x: data matrix of shape (N,3)
             y: labels of shape (N,) 
        """
        # WRITE CODE HERE
    
    def predict(self,X:torch.Tensor) -> torch.Tensor:
        """ Calculate model prediction for all data points.
            Arguments:
             X: data matrix of shape (N,3)   
            Returns:
             Predicted labels (-1 or 1) of shape (N,)
        """
        # WRITE CODE HERE
    
    def score(self,X:torch.Tensor,y:torch.Tensor) -> torch.Tensor:
        """ Calculate model accuracy
            Arguments:
             X: data matrix of shape (N,3)   
             y: labels of shape (N,)
            Returns:
             Accuracy score
        """
        # WRITE CODE HERE


Run the following code to train the model and print out the accuracy at each step.

In [None]:
lr = 1e-3
epochs = 100
model = Perceptron(lr)
for i in range(epochs):
    model.train_step(X,y)
    print(f'step {i}: {model.score(X,y)}')

Run the training multiple times.  Is the training the same each time, or does it vary?  Why?

Play with the learning rate and number of epochs to find the best setting.