### Lab 2.2: Perceptron Algorithm in PyTorch

In this lab you will again implement the perceptron algorithm, but this time using PyTorch.

In [1]:
!pip install torch

Collecting torch
  Using cached torch-2.6.0-cp310-none-macosx_11_0_arm64.whl (66.5 MB)
Collecting filelock
  Using cached filelock-3.17.0-py3-none-any.whl (16 kB)
Collecting sympy==1.13.1
  Using cached sympy-1.13.1-py3-none-any.whl (6.2 MB)
Collecting fsspec
  Using cached fsspec-2024.12.0-py3-none-any.whl (183 kB)
Collecting networkx
  Using cached networkx-3.4.2-py3-none-any.whl (1.7 MB)
Collecting jinja2
  Using cached jinja2-3.1.5-py3-none-any.whl (134 kB)
Collecting mpmath<1.4,>=1.1.0
  Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Collecting MarkupSafe>=2.0
  Using cached MarkupSafe-3.0.2-cp310-cp310-macosx_11_0_arm64.whl (12 kB)
Installing collected packages: mpmath, sympy, networkx, MarkupSafe, fsspec, filelock, jinja2, torch
Successfully installed MarkupSafe-3.0.2 filelock-3.17.0 fsspec-2024.12.0 jinja2-3.1.5 mpmath-1.3.0 networkx-3.4.2 sympy-1.13.1 torch-2.6.0
[0m

In [2]:
import numpy as np
import torch

PyTorch is very similar to NumPy in its basic functionality.  In PyTorch arrays are called tensors.

In [3]:
a = torch.tensor(5)
a

tensor(5)

In [4]:
b = torch.tensor(6)
a+b

tensor(11)

In [5]:
c = torch.zeros(3,5).float()
c

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

*A note on broadcasting:* You may have noticed in the previous lab that NumPy is particular about the sizes of the arrays in operations; PyTorch is the same way.

For example, if `A` has shape `(10,5)` and `b` has shape `(10,)`, then we can't compute `A*b`.  It wants the *last* dimensions to match, not the first ones.  So you would need to do either `A.T*b`.

In [6]:
A = np.random.normal(size=(10,5))
b = np.ones(10)

In [7]:
try:
    A*b
except ValueError as e:
    print(e)

operands could not be broadcast together with shapes (10,5) (10,) 


In [8]:
A.T*b

array([[ 3.21385079e-01,  6.59079986e-01, -3.04642670e-01,
         7.18591382e-02, -1.26587816e+00,  5.68001941e-01,
        -7.42376443e-01,  1.34727676e+00, -7.52125753e-01,
        -1.64221905e+00],
       [ 5.69431710e-01,  9.64184857e-01, -1.10652684e+00,
         2.20473236e+00, -2.32666203e+00, -9.88282527e-02,
        -9.39001607e-01,  4.24797497e-01, -1.60407313e+00,
        -4.56155906e-01],
       [-2.65462082e-01,  9.21795022e-01, -1.35530665e+00,
         5.61381138e-01, -1.25965764e+00,  6.52502607e-01,
        -9.21587467e-01,  4.06561013e-01,  1.79163361e-01,
        -5.66323026e-01],
       [ 2.38590221e-01, -8.45061273e-02,  1.22072095e+00,
         8.30033057e-01, -8.49665151e-01,  3.33083328e-01,
        -7.34611521e-01, -5.08311548e-01,  6.37104244e-01,
        -2.82181030e-01],
       [ 1.70837482e+00,  4.62799004e-01, -1.29048331e+00,
        -4.57657262e-01, -6.15198004e-04, -1.39385065e+00,
         2.81806400e-01, -1.57626369e-01, -2.40893407e+00,
         9.

An alternative is to introduce an extra dimension of size one to $b$.  However, note that this produces the transposed result from before.

In [9]:
A*b[:,None]

array([[ 3.21385079e-01,  5.69431710e-01, -2.65462082e-01,
         2.38590221e-01,  1.70837482e+00],
       [ 6.59079986e-01,  9.64184857e-01,  9.21795022e-01,
        -8.45061273e-02,  4.62799004e-01],
       [-3.04642670e-01, -1.10652684e+00, -1.35530665e+00,
         1.22072095e+00, -1.29048331e+00],
       [ 7.18591382e-02,  2.20473236e+00,  5.61381138e-01,
         8.30033057e-01, -4.57657262e-01],
       [-1.26587816e+00, -2.32666203e+00, -1.25965764e+00,
        -8.49665151e-01, -6.15198004e-04],
       [ 5.68001941e-01, -9.88282527e-02,  6.52502607e-01,
         3.33083328e-01, -1.39385065e+00],
       [-7.42376443e-01, -9.39001607e-01, -9.21587467e-01,
        -7.34611521e-01,  2.81806400e-01],
       [ 1.34727676e+00,  4.24797497e-01,  4.06561013e-01,
        -5.08311548e-01, -1.57626369e-01],
       [-7.52125753e-01, -1.60407313e+00,  1.79163361e-01,
         6.37104244e-01, -2.40893407e+00],
       [-1.64221905e+00, -4.56155906e-01, -5.66323026e-01,
        -2.82181030e-01

In [10]:
A*np.expand_dims(b,-1)

array([[ 3.21385079e-01,  5.69431710e-01, -2.65462082e-01,
         2.38590221e-01,  1.70837482e+00],
       [ 6.59079986e-01,  9.64184857e-01,  9.21795022e-01,
        -8.45061273e-02,  4.62799004e-01],
       [-3.04642670e-01, -1.10652684e+00, -1.35530665e+00,
         1.22072095e+00, -1.29048331e+00],
       [ 7.18591382e-02,  2.20473236e+00,  5.61381138e-01,
         8.30033057e-01, -4.57657262e-01],
       [-1.26587816e+00, -2.32666203e+00, -1.25965764e+00,
        -8.49665151e-01, -6.15198004e-04],
       [ 5.68001941e-01, -9.88282527e-02,  6.52502607e-01,
         3.33083328e-01, -1.39385065e+00],
       [-7.42376443e-01, -9.39001607e-01, -9.21587467e-01,
        -7.34611521e-01,  2.81806400e-01],
       [ 1.34727676e+00,  4.24797497e-01,  4.06561013e-01,
        -5.08311548e-01, -1.57626369e-01],
       [-7.52125753e-01, -1.60407313e+00,  1.79163361e-01,
         6.37104244e-01, -2.40893407e+00],
       [-1.64221905e+00, -4.56155906e-01, -5.66323026e-01,
        -2.82181030e-01

In general, carefully check the sizes of all arrays in your code!

In [11]:
from palmerpenguins import load_penguins
from mlxtend.plotting import plot_decision_regions
from matplotlib import pyplot as plt

Here we loading and format the Palmer penguins dataset for binary classification.

In [12]:
df = load_penguins()

# drop rows with missing values
df.dropna(inplace=True)

# tricky code to randomly shuffle the rows
df = df.sample(frac=1).reset_index(drop=True)

# select only two specices
df = df[(df['species']=='Adelie')|(df['species']=='Chinstrap')]

# get two features
X = df[['flipper_length_mm','bill_length_mm']].values

# convert speces labels to -1 and 1
y = df['species'].map({'Adelie':-1,'Chinstrap':1}).values

To make the learning algorithm work more smoothly, we we will subtract the mean of each feature.

Here `np.mean` calculates a mean, and `axis=0` tells NumPy to calculate the mean over the rows (calculate the mean of each column).

In [13]:
X -= np.mean(X,axis=0)

Now we will convert our `X` and `y` arrays to torch Tensors.

In [14]:
X = torch.tensor(X).float()
y = torch.tensor(y).float()

In [15]:
X

tensor([[ 1.0794e+00, -6.9047e+00],
        [ 5.0794e+00,  8.2953e+00],
        [-1.9206e+00, -7.0047e+00],
        [ 9.0794e+00,  8.4953e+00],
        [ 7.9439e-02, -4.7047e+00],
        [-6.9206e+00, -3.8047e+00],
        [-9.2056e-01, -2.4047e+00],
        [-7.9206e+00, -1.1047e+00],
        [ 6.0794e+00,  8.1953e+00],
        [ 3.0794e+00,  5.5953e+00],
        [-1.0921e+01, -5.5047e+00],
        [-1.9206e+00, -3.2047e+00],
        [ 1.0794e+00, -4.2047e+00],
        [-8.9206e+00, -4.3047e+00],
        [-2.9206e+00, -5.1047e+00],
        [ 7.0794e+00,  6.0953e+00],
        [ 7.0794e+00, -4.7047e+00],
        [ 1.0794e+00,  3.6953e+00],
        [-1.9206e+00,  4.3953e+00],
        [ 6.0794e+00, -7.4047e+00],
        [ 1.3079e+01, -9.0467e-01],
        [-1.1921e+01, -4.3047e+00],
        [ 5.0794e+00,  9.9953e+00],
        [ 7.9439e-02, -9.0467e-01],
        [-2.9206e+00, -9.0467e-01],
        [-1.9206e+00,  8.0953e+00],
        [ 9.0794e+00,  9.9953e+00],
        [-7.9206e+00, -2.204

### Exercises

Your task is to again complete this class for the perceptron, with two changes from last time:
- the implementation should use PyTorch tensors, not NumPy arrays;
- `train_step` now accepts the entire dataset as input and should calculate the average gradient over all examples, rather than updating the weights one data point at a time.

In [16]:
class Perceptron:
    def __init__(self,lr=1e-3):
        # store the learning rate
        self.lr = lr

        # initialize the weights to small, normally-distributed values
        self.w = torch.normal(mean=0, std=0.01, size=(2,))

        # initialize the bias to zero
        self.b = torch.zeros(1)

    def train_step(self,X:torch.Tensor,y:torch.Tensor) -> None:
        """ Apply the first update rule shown in lecture.
            Arguments:
             x: data matrix of shape (N,3)
             y: labels of shape (N,) 
        """
        # WRITE CODE HERE
        z = torch.matmul(X,self.w) + self.b
        self.w += self.lr*torch.matmul((y-z),X)
        self.b += self.lr*torch.sum(y-z)
    
    def predict(self,X:torch.Tensor) -> torch.Tensor:
        """ Calculate model prediction for all data points.
            Arguments:
             X: data matrix of shape (N,3)   
            Returns:
             Predicted labels (-1 or 1) of shape (N,)
        """
        # WRITE CODE HERE
        z = torch.matmul(X,self.w) + self.b
        return torch.where(z>0,1,-1)
    
    def score(self,X:torch.Tensor,y:torch.Tensor) -> torch.Tensor:
        """ Calculate model accuracy
            Arguments:
             X: data matrix of shape (N,3)   
             y: labels of shape (N,)
            Returns:
             Accuracy score
        """
        # WRITE CODE HERE
        preds = self.predict(X)
        return (preds == y).float().mean().item()


Run the following code to train the model and print out the accuracy at each step.

In [17]:
lr = 1e-4
epochs = 1000
model = Perceptron(lr)
for i in range(epochs):
    model.train_step(X,y)
    print(f'step {i}: {model.score(X,y)}')

step 0: 0.836448609828949
step 1: 0.9158878326416016
step 2: 0.9158878326416016
step 3: 0.9299065470695496
step 4: 0.9299065470695496
step 5: 0.9345794320106506
step 6: 0.9392523169517517
step 7: 0.9392523169517517
step 8: 0.9392523169517517
step 9: 0.9439252614974976
step 10: 0.9439252614974976
step 11: 0.9439252614974976
step 12: 0.9439252614974976
step 13: 0.9532710313796997
step 14: 0.9532710313796997
step 15: 0.9532710313796997
step 16: 0.9532710313796997
step 17: 0.9579439163208008
step 18: 0.9579439163208008
step 19: 0.9439252614974976
step 20: 0.9485981464385986
step 21: 0.9532710313796997
step 22: 0.9532710313796997
step 23: 0.9532710313796997
step 24: 0.9579439163208008
step 25: 0.9579439163208008
step 26: 0.9579439163208008
step 27: 0.9579439163208008
step 28: 0.9532710313796997
step 29: 0.9532710313796997
step 30: 0.9532710313796997
step 31: 0.9579439163208008
step 32: 0.9579439163208008
step 33: 0.9579439163208008
step 34: 0.9579439163208008
step 35: 0.9579439163208008
ste

Run the training multiple times.  Is the training the same each time, or does it vary?  Why?

After running the training multiple times, the training will vary each time and this is because the weights are initialized with small random values. The starting point for each training will be different for each run.

Play with the learning rate and number of epochs to find the best setting.

Decreasing the learning rate made the accuracy much better as it was able to converge to the optimal solution. Increasing the number of epochs also helped to increase the accuracy. The learning rate chosen was 1e-4 and the number of epochs was 1000.