### Lab 2.2: Perceptron Algorithm in PyTorch

In this lab you will again implement the perceptron algorithm, but this time using PyTorch.

In [1]:
!pip install torch


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m


In [2]:
import numpy as np
import torch

PyTorch is very similar to NumPy in its basic functionality.  In PyTorch arrays are called tensors.

In [3]:
a = torch.tensor(5)
a

tensor(5)

In [4]:
b = torch.tensor(6)
a+b

tensor(11)

In [5]:
c = torch.zeros(3,5).float()
c

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

*A note on broadcasting:* You may have noticed in the previous lab that NumPy is particular about the sizes of the arrays in operations; PyTorch is the same way.

For example, if `A` has shape `(10,5)` and `b` has shape `(10,)`, then we can't compute `A*b`.  It wants the *last* dimensions to match, not the first ones.  So you would need to do either `A.T*b`.

In [6]:
A = np.random.normal(size=(10,5))
b = np.ones(10)

In [7]:
try:
    A*b
except ValueError as e:
    print(e)

operands could not be broadcast together with shapes (10,5) (10,) 


In [8]:
A.T*b

array([[ 1.53849326,  0.61794924,  0.25724827,  2.15560456,  0.65239875,
         0.1881677 , -0.28086119, -0.36333908, -1.39255057, -0.77861819],
       [ 0.40925365, -1.23791575,  0.08091155,  0.87416227, -1.47180364,
        -0.91677052,  1.20006053, -0.70775214, -0.39988387,  0.19216515],
       [-0.3016241 , -1.83171793, -0.79491266,  1.18114598,  0.1918409 ,
        -0.18141798,  0.45490127, -0.87588322, -0.13709627,  0.9047354 ],
       [ 2.5698935 ,  1.141673  ,  1.27170678, -0.13059384,  0.58410113,
        -1.94559903,  0.6486021 , -0.62424314, -1.09347829,  1.75402836],
       [ 1.29259891, -0.37273317, -0.39181144,  0.99370578,  2.74927729,
         0.21047045, -0.4912661 , -1.02389111,  0.25735736,  0.99920915]])

An alternative is to introduce an extra dimension of size one to $b$.  However, note that this produces the transposed result from before.

In [9]:
A*b[:,None]

array([[ 1.53849326,  0.40925365, -0.3016241 ,  2.5698935 ,  1.29259891],
       [ 0.61794924, -1.23791575, -1.83171793,  1.141673  , -0.37273317],
       [ 0.25724827,  0.08091155, -0.79491266,  1.27170678, -0.39181144],
       [ 2.15560456,  0.87416227,  1.18114598, -0.13059384,  0.99370578],
       [ 0.65239875, -1.47180364,  0.1918409 ,  0.58410113,  2.74927729],
       [ 0.1881677 , -0.91677052, -0.18141798, -1.94559903,  0.21047045],
       [-0.28086119,  1.20006053,  0.45490127,  0.6486021 , -0.4912661 ],
       [-0.36333908, -0.70775214, -0.87588322, -0.62424314, -1.02389111],
       [-1.39255057, -0.39988387, -0.13709627, -1.09347829,  0.25735736],
       [-0.77861819,  0.19216515,  0.9047354 ,  1.75402836,  0.99920915]])

In [10]:
A*np.expand_dims(b,-1)

array([[ 1.53849326,  0.40925365, -0.3016241 ,  2.5698935 ,  1.29259891],
       [ 0.61794924, -1.23791575, -1.83171793,  1.141673  , -0.37273317],
       [ 0.25724827,  0.08091155, -0.79491266,  1.27170678, -0.39181144],
       [ 2.15560456,  0.87416227,  1.18114598, -0.13059384,  0.99370578],
       [ 0.65239875, -1.47180364,  0.1918409 ,  0.58410113,  2.74927729],
       [ 0.1881677 , -0.91677052, -0.18141798, -1.94559903,  0.21047045],
       [-0.28086119,  1.20006053,  0.45490127,  0.6486021 , -0.4912661 ],
       [-0.36333908, -0.70775214, -0.87588322, -0.62424314, -1.02389111],
       [-1.39255057, -0.39988387, -0.13709627, -1.09347829,  0.25735736],
       [-0.77861819,  0.19216515,  0.9047354 ,  1.75402836,  0.99920915]])

In general, carefully check the sizes of all arrays in your code!

In [11]:
from palmerpenguins import load_penguins
from mlxtend.plotting import plot_decision_regions
from matplotlib import pyplot as plt

Here we loading and format the Palmer penguins dataset for binary classification.

In [12]:
df = load_penguins()

# drop rows with missing values
df.dropna(inplace=True)

# tricky code to randomly shuffle the rows
df = df.sample(frac=1).reset_index(drop=True)

# select only two specices
df = df[(df['species']=='Adelie')|(df['species']=='Chinstrap')]

# get two features
X = df[['flipper_length_mm','bill_length_mm']].values

# convert speces labels to -1 and 1
y = df['species'].map({'Adelie':-1,'Chinstrap':1}).values

To make the learning algorithm work more smoothly, we we will subtract the mean of each feature.

Here `np.mean` calculates a mean, and `axis=0` tells NumPy to calculate the mean over the rows (calculate the mean of each column).

In [13]:
X -= np.mean(X,axis=0)

Now we will convert our `X` and `y` arrays to torch Tensors.

In [14]:
X = torch.tensor(X).float()
y = torch.tensor(y).float()

In [15]:
X

tensor([[ 9.0794e+00,  1.2195e+01],
        [ 3.0794e+00, -5.6047e+00],
        [ 6.0794e+00, -2.0467e-01],
        [ 5.0794e+00,  3.7953e+00],
        [ 1.8079e+01,  9.9953e+00],
        [ 1.0079e+01,  8.1953e+00],
        [ 4.0794e+00, -1.7047e+00],
        [ 7.0794e+00,  6.0953e+00],
        [ 5.0794e+00,  1.0695e+01],
        [-1.0921e+01,  3.9533e-01],
        [ 1.0794e+00,  4.5953e+00],
        [ 5.0794e+00,  1.9533e-01],
        [-6.9206e+00, -3.8047e+00],
        [ 4.0794e+00, -2.8047e+00],
        [-4.9206e+00, -7.5047e+00],
        [-1.9206e+00, -4.2047e+00],
        [ 1.8079e+01,  2.0953e+00],
        [ 7.0794e+00, -3.4047e+00],
        [ 9.0794e+00, -5.0467e-01],
        [-1.9206e+00,  3.8953e+00],
        [-1.9206e+00, -8.5047e+00],
        [ 5.0794e+00,  9.2953e+00],
        [-5.9206e+00, -2.4047e+00],
        [-5.9206e+00, -2.4047e+00],
        [ 1.0079e+01, -6.3047e+00],
        [-9.2056e-01,  2.9533e-01],
        [ 6.0794e+00,  7.7953e+00],
        [-4.9206e+00, -1.104

### Exercises

Your task is to again complete this class for the perceptron, with two changes from last time:
- the implementation should use PyTorch tensors, not NumPy arrays;
- `train_step` now accepts the entire dataset as input and should calculate the average gradient over all examples, rather than updating the weights one data point at a time.

In [16]:
class Perceptron:
    def __init__(self,lr=1e-3):
        # store the learning rate
        self.lr = lr

        # initialize the weights to small, normally-distributed values
        self.w = torch.normal(mean=0, std=0.01, size=(2,))

        # initialize the bias to zero
        self.b = torch.zeros(1)

    def train_step(self,X:torch.Tensor,y:torch.Tensor) -> None:
        """ Apply the first update rule shown in lecture.
            Arguments:
             x: data matrix of shape (N,3)
             y: labels of shape (N,) 
        """
        # WRITE CODE HERE
        z = torch.matmul(X,self.w) + self.b
        error = (y - z)
        self.w += self.lr * torch.mean(error[:, None] * X, dim=0)  
        self.b += self.lr * torch.mean(error)                     
    
    def predict(self,X:torch.Tensor) -> torch.Tensor:
        """ Calculate model prediction for all data points.
            Arguments:
             X: data matrix of shape (N,3)   
            Returns:
             Predicted labels (-1 or 1) of shape (N,)
        """
        # WRITE CODE HERE
        z = torch.matmul(X, self.w) + self.b
        return torch.where(z > 0, torch.tensor(1), torch.tensor(-1))
    
    def score(self,X:torch.Tensor,y:torch.Tensor) -> torch.Tensor:
        """ Calculate model accuracy
            Arguments:
             X: data matrix of shape (N,3)   
             y: labels of shape (N,)
            Returns:
             Accuracy score
        """
        # WRITE CODE HERE
        pred = self.predict(X)
        return torch.mean((pred == y).float()).item()


Run the following code to train the model and print out the accuracy at each step.

In [21]:
lr = 1e-3
epochs = 100
model = Perceptron(lr)
for i in range(epochs):
    model.train_step(X,y)
    print(f'step {i}: {model.score(X,y)}')

step 0: 0.8177570104598999
step 1: 0.827102780342102
step 2: 0.8317757248878479
step 3: 0.836448609828949
step 4: 0.836448609828949
step 5: 0.836448609828949
step 6: 0.836448609828949
step 7: 0.84112149477005
step 8: 0.84112149477005
step 9: 0.84112149477005
step 10: 0.84112149477005
step 11: 0.8457943797111511
step 12: 0.8504672646522522
step 13: 0.8504672646522522
step 14: 0.8504672646522522
step 15: 0.855140209197998
step 16: 0.855140209197998
step 17: 0.855140209197998
step 18: 0.855140209197998
step 19: 0.855140209197998
step 20: 0.855140209197998
step 21: 0.855140209197998
step 22: 0.855140209197998
step 23: 0.855140209197998
step 24: 0.855140209197998
step 25: 0.855140209197998
step 26: 0.8598130941390991
step 27: 0.8598130941390991
step 28: 0.8644859790802002
step 29: 0.8644859790802002
step 30: 0.8644859790802002
step 31: 0.8644859790802002
step 32: 0.8691588640213013
step 33: 0.8691588640213013
step 34: 0.8691588640213013
step 35: 0.8738317489624023
step 36: 0.878504693508148

Run the training multiple times.  Is the training the same each time, or does it vary?  Why?

Play with the learning rate and number of epochs to find the best setting.