### Lab 3.1: Basic Neural Network in PyTorch - Solution

Let's create a linear classifier one more time, but using PyTorch's automatic differentiation and optimization algorithms.  Then you will extend the perceptron into a multi-layer perceptron (MLP).

In [1]:
import numpy as np
import torch

We need to explicitly tell PyTorch when creating a tensor that we are interested in later computing its gradient

In [2]:
a = torch.tensor(5.,requires_grad=True)
a

tensor(5., requires_grad=True)

In [3]:
b = torch.tensor(6.,requires_grad=True)
c = 2*a+3*b
c

tensor(28., grad_fn=<AddBackward0>)

To extract the gradients, we first need to call `backward()`.

In [4]:
c.backward()

Now to get the gradient of any variable with respect to `c`, we simply access the `grad` attribute of that variable.

In [5]:
a.grad

tensor(2.)

In [6]:
b.grad

tensor(3.)

Let's load and format the Palmer penguins dataset for multi-class classification.

In [7]:
from palmerpenguins import load_penguins
from matplotlib import pyplot as plt

In [8]:
df = load_penguins()

# drop rows with missing values
df.dropna(inplace=True)

# get two features
X = df[['flipper_length_mm','bill_length_mm']].values

# convert species labels to integers
y = df['species'].map({'Adelie':0,'Chinstrap':1,'Gentoo':2}).values

To make the learning algorithm work more smoothly, we we will subtract the mean of each feature.

Here `np.mean` calculates a mean, and `axis=0` tells NumPy to calculate the mean over the rows (calculate the mean of each column).

In [9]:
X -= np.mean(X,axis=0)

Now we will convert our `X` and `y` arrays to torch Tensors.

In [10]:
X = torch.tensor(X).float()
y = torch.tensor(y).long()

In [11]:
from torch import nn

The `torch.nn.Sequential` class creates a feed-forward network from a list of `nn.Module` objects.  Here we provide a single `nn.Linear` class which performs an affine transformation ($Wx+b$) so that we will have a linear classifier.

In [12]:
linear_model = torch.nn.Sequential(
    torch.nn.Linear(2,3), # two inputs, three outputs
)

Now we create a cross-entropy loss function object and a stochastic gradient descent (SGD) optimizer.

In [13]:
loss_fn = torch.nn.CrossEntropyLoss()

In [14]:
lr = 1e-2
opt = torch.optim.SGD(linear_model.parameters(), lr=lr)

Finally we can iteratively optimize the model.

In [15]:
epochs = 100
for epoch in range(epochs):
    opt.zero_grad() # zero out the gradients

    z = linear_model(X) # compute z values
    loss = loss_fn(z,y) # compute loss

    loss.backward() # compute gradients

    opt.step() # apply gradients

    print(f'epoch {epoch}: loss is {loss.item()}')

epoch 0: loss is 1.1994599103927612
epoch 1: loss is 1.1479090452194214
epoch 2: loss is 1.0977424383163452
epoch 3: loss is 1.048858880996704
epoch 4: loss is 1.0012003183364868
epoch 5: loss is 0.9547415971755981
epoch 6: loss is 0.9094857573509216
epoch 7: loss is 0.8654584288597107
epoch 8: loss is 0.8227044939994812
epoch 9: loss is 0.7812856435775757
epoch 10: loss is 0.7412770986557007
epoch 11: loss is 0.7027654647827148
epoch 12: loss is 0.6658433079719543
epoch 13: loss is 0.630605936050415
epoch 14: loss is 0.5971452593803406
epoch 15: loss is 0.5655445456504822
epoch 16: loss is 0.5358721017837524
epoch 17: loss is 0.5081758499145508
epoch 18: loss is 0.4824797809123993
epoch 19: loss is 0.4587804079055786
epoch 20: loss is 0.4370463490486145
epoch 21: loss is 0.4172193706035614
epoch 22: loss is 0.39921584725379944
epoch 23: loss is 0.3829289674758911
epoch 24: loss is 0.3682323396205902
epoch 25: loss is 0.35498398542404175
epoch 26: loss is 0.3430333733558655
epoch 27: l

### Exercises

Extend the above code to implement an MLP with a single hidden layer of size 100.

Write code to compute the accuracy of each model.

Can you get the MLP to outperform the linear model?

In [34]:
class Swish(nn.Module): 
    def forward(self,x):
        return x * torch.sigmoid(x)

In [35]:
mlp_model = torch.nn.Sequential(
    torch.nn.Linear(2,100),
    Swish(),
    torch.nn.Linear(100,100),
    Swish(),
    torch.nn.Linear(100,3),
)

loss_fn = torch.nn.CrossEntropyLoss()
opt = torch.optim.SGD(mlp_model.parameters(), lr=lr)

epochs = 100
for epoch in range(epochs):
    opt.zero_grad() # zero out the gradients

    z = mlp_model(X) # compute z values
    loss = loss_fn(z,y) # compute loss

    loss.backward() # compute gradients

    opt.step() # apply gradients

    print(f'epoch {epoch}: loss is {loss.item()}')
    


epoch 0: loss is 1.344245433807373
epoch 1: loss is 0.6275767087936401
epoch 2: loss is 0.5275899171829224
epoch 3: loss is 0.47496917843818665
epoch 4: loss is 0.43775811791419983
epoch 5: loss is 0.4086676239967346
epoch 6: loss is 0.38483893871307373
epoch 7: loss is 0.364777147769928
epoch 8: loss is 0.34756264090538025
epoch 9: loss is 0.3325784504413605
epoch 10: loss is 0.319388210773468
epoch 11: loss is 0.3076719343662262
epoch 12: loss is 0.29718729853630066
epoch 13: loss is 0.28774598240852356
epoch 14: loss is 0.27919870615005493
epoch 15: loss is 0.27142465114593506
epoch 16: loss is 0.26432475447654724
epoch 17: loss is 0.25781649351119995
epoch 18: loss is 0.2518306374549866
epoch 19: loss is 0.24630843102931976
epoch 20: loss is 0.24119971692562103
epoch 21: loss is 0.23646116256713867
epoch 22: loss is 0.23205533623695374
epoch 23: loss is 0.22794964909553528
epoch 24: loss is 0.2241155058145523
epoch 25: loss is 0.220527783036232
epoch 26: loss is 0.21716445684432983

In [36]:
from sklearn.metrics import accuracy_score

linear_predictions = linear_model(X).argmax(dim=1).numpy()
mlp_predictions = mlp_model(X).argmax(dim=1).numpy()

true_labels = y.numpy()

linear_accuracy = accuracy_score(true_labels,linear_predictions)
mlp_accuracy = accuracy_score(true_labels,mlp_predictions)

print(f'linear model accuracy: {linear_accuracy}')
print(f'mlp model accuracy: {mlp_accuracy}')

linear model accuracy: 0.948948948948949
mlp model accuracy: 0.948948948948949
