### Lab 3.1: Basic Neural Network in PyTorch - Solution

Let's create a linear classifier one more time, but using PyTorch's automatic differentiation and optimization algorithms.  Then you will extend the perceptron into a multi-layer perceptron (MLP).

In [1]:
import numpy as np
import torch

We need to explicitly tell PyTorch when creating a tensor that we are interested in later computing its gradient

In [2]:
a = torch.tensor(5.,requires_grad=True)
a

tensor(5., requires_grad=True)

In [3]:
b = torch.tensor(6.,requires_grad=True)
c = 2*a+3*b
c

tensor(28., grad_fn=<AddBackward0>)

To extract the gradients, we first need to call `backward()`.

In [4]:
c.backward()

Now to get the gradient of any variable with respect to `c`, we simply access the `grad` attribute of that variable.

In [5]:
a.grad

tensor(2.)

In [6]:
b.grad

tensor(3.)

Let's load and format the Palmer penguins dataset for multi-class classification.

In [7]:
from palmerpenguins import load_penguins
from matplotlib import pyplot as plt

In [8]:
df = load_penguins()

# drop rows with missing values
df.dropna(inplace=True)

# get two features
X = df[['flipper_length_mm','bill_length_mm']].values

# convert species labels to integers
y = df['species'].map({'Adelie':0,'Chinstrap':1,'Gentoo':2}).values

To make the learning algorithm work more smoothly, we we will subtract the mean of each feature.

Here `np.mean` calculates a mean, and `axis=0` tells NumPy to calculate the mean over the rows (calculate the mean of each column).

In [9]:
X -= np.mean(X,axis=0)

Now we will convert our `X` and `y` arrays to torch Tensors.

In [10]:
X = torch.tensor(X).float()
y = torch.tensor(y).long()

In [11]:
from torch import nn

The `torch.nn.Sequential` class creates a feed-forward network from a list of `nn.Module` objects.  Here we provide a single `nn.Linear` class which performs an affine transformation ($Wx+b$) so that we will have a linear classifier.

In [12]:
linear_model = torch.nn.Sequential(
    torch.nn.Linear(2,3), # two inputs, three outputs
)

Now we create a cross-entropy loss function object and a stochastic gradient descent (SGD) optimizer.

In [13]:
loss_fn = torch.nn.CrossEntropyLoss()

In [14]:
lr = 1e-2
opt = torch.optim.SGD(linear_model.parameters(), lr=lr)

Finally we can iteratively optimize the model.

In [15]:
epochs = 100
for epoch in range(epochs):
    opt.zero_grad() # zero out the gradients

    z = linear_model(X) # compute z values
    loss = loss_fn(z,y) # compute loss

    loss.backward() # compute gradients

    opt.step() # apply gradients

    print(f'epoch {epoch}: loss is {loss.item()}')

epoch 0: loss is 3.1404871940612793
epoch 1: loss is 2.3001060485839844
epoch 2: loss is 1.5649179220199585
epoch 3: loss is 1.0467150211334229
epoch 4: loss is 0.7963911294937134
epoch 5: loss is 0.6812124848365784
epoch 6: loss is 0.6105504035949707
epoch 7: loss is 0.5587332844734192
epoch 8: loss is 0.5171430110931396
epoch 9: loss is 0.4821283519268036
epoch 10: loss is 0.45186424255371094
epoch 11: loss is 0.4253222346305847
epoch 12: loss is 0.40186363458633423
epoch 13: loss is 0.38104790449142456
epoch 14: loss is 0.362533837556839
epoch 15: loss is 0.34603509306907654
epoch 16: loss is 0.33130520582199097
epoch 17: loss is 0.3181341886520386
epoch 18: loss is 0.3063450753688812
epoch 19: loss is 0.2957889139652252
epoch 20: loss is 0.28633853793144226
epoch 21: loss is 0.277883917093277
epoch 22: loss is 0.27032747864723206
epoch 23: loss is 0.2635810375213623
epoch 24: loss is 0.25756365060806274
epoch 25: loss is 0.2522004544734955
epoch 26: loss is 0.2474217265844345
epoch

### Exercises

Extend the above code to implement an MLP with a single hidden layer of size 100.

Write code to compute the accuracy of each model.

Can you get the MLP to outperform the linear model?

In [16]:
class Swish(nn.Module): 
    def forward(self,x):
        return x * torch.sigmoid(x)

In [17]:
mlp_model = torch.nn.Sequential(
    torch.nn.Linear(2,100),
    Swish(),
    torch.nn.Linear(100,100),
    Swish(),
    torch.nn.Linear(100,3),
)

loss_fn = torch.nn.CrossEntropyLoss()
opt = torch.optim.Adam(mlp_model.parameters(), lr=lr)

epochs = 100
for epoch in range(epochs):
    opt.zero_grad() # zero out the gradients

    z = mlp_model(X) # compute z values
    loss = loss_fn(z,y) # compute loss

    loss.backward() # compute gradients

    opt.step() # apply gradients

    print(f'epoch {epoch}: loss is {loss.item()}')
    


epoch 0: loss is 0.9294375777244568
epoch 1: loss is 0.35687896609306335
epoch 2: loss is 0.15387175977230072
epoch 3: loss is 0.36597418785095215
epoch 4: loss is 0.1276821792125702
epoch 5: loss is 0.2119918167591095
epoch 6: loss is 0.2661876678466797
epoch 7: loss is 0.24283650517463684
epoch 8: loss is 0.19285379350185394
epoch 9: loss is 0.15709592401981354
epoch 10: loss is 0.1441585272550583
epoch 11: loss is 0.17221440374851227
epoch 12: loss is 0.18947114050388336
epoch 13: loss is 0.16572603583335876
epoch 14: loss is 0.14370092749595642
epoch 15: loss is 0.14575855433940887
epoch 16: loss is 0.15290257334709167
epoch 17: loss is 0.1561116725206375
epoch 18: loss is 0.15422974526882172
epoch 19: loss is 0.1465693563222885
epoch 20: loss is 0.13486307859420776
epoch 21: loss is 0.12525872886180878
epoch 22: loss is 0.12379186600446701
epoch 23: loss is 0.12727829813957214
epoch 24: loss is 0.12483032047748566
epoch 25: loss is 0.11666157841682434
epoch 26: loss is 0.111860051

In [18]:
from sklearn.metrics import accuracy_score

linear_predictions = linear_model(X).argmax(dim=1).numpy()
mlp_predictions = mlp_model(X).argmax(dim=1).numpy()

true_labels = y.numpy()

linear_accuracy = accuracy_score(true_labels,linear_predictions)
mlp_accuracy = accuracy_score(true_labels,mlp_predictions)

print(f'linear model accuracy: {linear_accuracy}')
print(f'mlp model accuracy: {mlp_accuracy}')

linear model accuracy: 0.9429429429429429
mlp model accuracy: 0.975975975975976
