# Logistic regression (binary and multiclass)

## Binary logistic regression

We classified flowers in `lin_reg.ipynb`. But now let's do the same thing but for binary logistic regression.

### Setup

In [1]:
import matplotlib.pyplot as plt
import torchvision
import torch
from PIL import Image

In [2]:
size = (128, 128)

In [3]:
transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize(size),
    torchvision.transforms.ToTensor()
])

In [5]:
train_dataset = list(
    torchvision.datasets.Flowers102(
        "./flowers", "train", transform=transform, download=True
    )
)
test_dataset = list(
    torchvision.datasets.Flowers102(
        "./flowers", "test", transform=transform, download=True
    )
)

In [7]:
train_images = torch.stack([im for im, _ in train_dataset], dim=0)
train_labels = torch.tensor([label for _, label in train_dataset])

In [8]:
train_images_01 = train_images[train_labels <= 1]
train_labels_01 = train_labels[train_labels <= 1]

### Model

Here, we set up our binary logistic regression. We'll actually still do a linear regression, but we use `torch.nn.BCEWithLogitsLoss` for our loss function. Our linear regression will output logits and we'll allow our loss function to handle the steps towards transforming it via the sigmoid function.

In [10]:
model = torch.nn.Linear(3 * 128 * 128, 1)
loss = torch.nn.BCEWithLogitsLoss()
lr = 0.0001
num_epochs = 100
momentum = 0
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [11]:
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(train_images_01.view(-1, 3 * 128 * 128))
    loss_value = loss(outputs.squeeze(), train_labels_01.float())

    # Backward pass
    optimizer.zero_grad()
    loss_value.backward()
    optimizer.step()
    print(f"Epoch {epoch}, loss: {loss_value.item()}")

Epoch 0, loss: 0.6584396362304688
Epoch 1, loss: 0.6478623151779175
Epoch 2, loss: 0.6382631063461304
Epoch 3, loss: 0.6293513178825378
Epoch 4, loss: 0.6209424138069153
Epoch 5, loss: 0.612918496131897
Epoch 6, loss: 0.6052039265632629
Epoch 7, loss: 0.5977489352226257
Epoch 8, loss: 0.5905206799507141
Epoch 9, loss: 0.5834963321685791
Epoch 10, loss: 0.5766600370407104
Epoch 11, loss: 0.5699996948242188
Epoch 12, loss: 0.5635061264038086
Epoch 13, loss: 0.5571719408035278
Epoch 14, loss: 0.550990879535675
Epoch 15, loss: 0.5449570417404175
Epoch 16, loss: 0.5390655994415283
Epoch 17, loss: 0.5333117246627808
Epoch 18, loss: 0.5276911854743958
Epoch 19, loss: 0.5221996903419495
Epoch 20, loss: 0.5168333053588867
Epoch 21, loss: 0.5115882754325867
Epoch 22, loss: 0.5064607858657837
Epoch 23, loss: 0.5014473795890808
Epoch 24, loss: 0.49654465913772583
Epoch 25, loss: 0.49174919724464417
Epoch 26, loss: 0.4870578646659851
Epoch 27, loss: 0.4824675917625427
Epoch 28, loss: 0.477975428104

As we can see, the model slowly gets better over time. It computes the forward pass, and the output of the forward pass is actually the same as in the linear regression from `lin_reg.ipynb`. What changes here is the loss function. The loss function converts the value to a probability between 0 and 1 and then computes the loss based on that probability.

Loss starts of at $\approx 0.7$, which is 50:50 chance ($log(\frac{1}{2}) \approx 0.7$) and then slowly improves. We can actually calculate the accuracy given a loss and vice versa.

In [12]:
import numpy as np

In [20]:
def calculate_accuracy(loss):
    return np.exp(-loss)


def calculate_loss(accuracy):
    return -np.log(accuracy)

In [21]:
desired_accuracy = 0.8
print(f"Desired accuracy: {desired_accuracy}")
desired_loss = calculate_loss(desired_accuracy)
print(f"Desired loss: {desired_loss}")

Desired accuracy: 0.8
Desired loss: 0.2231435513142097


In [22]:
observed_loss = 0.3
print(f"Observed loss: {observed_loss}")
observed_accuracy = calculate_accuracy(observed_loss)
print(f"Observed accuracy: {observed_accuracy}")

Observed loss: 0.3
Observed accuracy: 0.7408182206817179


We see that we get an accuracy of ~75%, which is not bad normally but for this task it should be higher. Let's try a faster learning rate.

In [23]:
model = torch.nn.Linear(3 * 128 * 128, 1)
loss = torch.nn.BCEWithLogitsLoss()
lr = 0.01
num_epochs = 100
momentum = 0
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [24]:
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(train_images_01.view(-1, 3 * 128 * 128))
    loss_value = loss(outputs.squeeze(), train_labels_01.float())

    # Backward pass
    optimizer.zero_grad()
    loss_value.backward()
    optimizer.step()
    print(f"Epoch {epoch}, loss: {loss_value.item()}")

Epoch 0, loss: 0.6505982279777527
Epoch 1, loss: 0.2226615697145462
Epoch 2, loss: 0.38910719752311707
Epoch 3, loss: 6.53595495223999
Epoch 4, loss: 10.759622573852539
Epoch 5, loss: 1.10702383518219
Epoch 6, loss: 3.9149563312530518
Epoch 7, loss: 4.459329128265381
Epoch 8, loss: 6.251101493835449
Epoch 9, loss: 1.0071905851364136
Epoch 10, loss: 0.21176591515541077
Epoch 11, loss: 0.03600088879466057
Epoch 12, loss: 0.02686724066734314
Epoch 13, loss: 0.02115124836564064
Epoch 14, loss: 0.01966703310608864
Epoch 15, loss: 0.018400084227323532
Epoch 16, loss: 0.017298337072134018
Epoch 17, loss: 0.016331221908330917
Epoch 18, loss: 0.015475313179194927
Epoch 19, loss: 0.014712309464812279
Epoch 20, loss: 0.014027567580342293
Epoch 21, loss: 0.013409322127699852
Epoch 22, loss: 0.012848171405494213
Epoch 23, loss: 0.012336289510130882
Epoch 24, loss: 0.011867186054587364
Epoch 25, loss: 0.01143560465425253
Epoch 26, loss: 0.011037009768188
Epoch 27, loss: 0.010667601600289345
Epoch 28

In [25]:
observed_loss = 0.003
print(f"Observed loss: {observed_loss}")
observed_accuracy = calculate_accuracy(observed_loss)
print(f"Observed accuracy: {observed_accuracy}")

Observed loss: 0.003
Observed accuracy: 0.997004495503373


We see that (for this very simple image classification task), we now have a super-high accuracy! Let's see how it does on the test set (and see if our model overfits).

In [26]:
test_images = torch.stack([im for im, _ in test_dataset], dim=0)
test_labels = torch.tensor([label for _, label in test_dataset])

test_images_01 = test_images[test_labels <= 1]
test_labels_01 = test_labels[test_labels <= 1]

In [29]:
pred_labels = model(test_images_01.view(-1, 3 * 128 * 128)).squeeze() > 0
pred_loss = loss(pred_labels.float(), test_labels_01.float())
accuracy = (pred_labels == test_labels_01).float().mean()
print(f"Accuracy: {accuracy.item()}")
print(f"Loss: {pred_loss.item()}")

Accuracy: 0.75
Loss: 0.5468730330467224


Accuracy of 75% on the test set! Good for the task, though it also means that our model likely overfit on the training set.

## Multi-class regression

Let's now do this task, but for all 102 classes in our dataset. The setup is mostly the same, except now our model, instead of outputting 1 value, needs to now output 102 values (one for each class) and we need to use cross-entropy for our loss.

In [35]:
model = torch.nn.Linear(3 * 128 * 128, 102)
loss = torch.nn.CrossEntropyLoss()
lr = 0.01
num_epochs = 100
momentum = 0
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [36]:
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(train_images.view(-1, 3 * 128 * 128))
    # torch.Size([1020, 102]), for 1020 images and 102 classes
    # each prediction has 102 different values, one for each class
    # each observation is a row of 102 values.
    # print(outputs.shape)

    # cross-entropy loss expects integer labels.
    loss_value = loss(outputs, train_labels)
    # Backward pass
    optimizer.zero_grad()
    loss_value.backward()
    optimizer.step()
    if epoch % 10 == 0:
        print(f"Epoch {epoch}, loss: {loss_value.item()}")

Epoch 0, loss: 4.659365177154541
Epoch 10, loss: 4.135303020477295
Epoch 20, loss: 3.7487950325012207
Epoch 30, loss: 3.4315574169158936
Epoch 40, loss: 3.1617236137390137
Epoch 50, loss: 2.926675319671631
Epoch 60, loss: 2.718764305114746
Epoch 70, loss: 2.532965660095215
Epoch 80, loss: 2.3657009601593018
Epoch 90, loss: 2.2142693996429443


The loss went down, but not as quickly as before. This is because the task is much harder now since it's multiclass prediction. Let's try a larger learning rate as well as increasing epochs.

In [49]:
model = torch.nn.Linear(3 * 128 * 128, 102)
loss = torch.nn.CrossEntropyLoss()
lr = 0.1
num_epochs = 250
momentum = 0
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [50]:
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(train_images.view(-1, 3 * 128 * 128))
    # torch.Size([1020, 102]), for 1020 images and 102 classes
    # each prediction has 102 different values, one for each class
    # each observation is a row of 102 values.
    # print(outputs.shape)

    # cross-entropy loss expects integer labels.
    loss_value = loss(outputs, train_labels)
    # Backward pass
    optimizer.zero_grad()
    loss_value.backward()
    optimizer.step()
    if epoch % 10 == 0:
        print(f"Epoch {epoch}, loss: {loss_value.item()}")

Epoch 0, loss: 4.660036563873291
Epoch 10, loss: 26.879213333129883
Epoch 20, loss: 33.95164108276367
Epoch 30, loss: 29.772903442382812
Epoch 40, loss: 22.860305786132812
Epoch 50, loss: 11.575940132141113
Epoch 60, loss: 2.6904492378234863
Epoch 70, loss: 0.15246695280075073
Epoch 80, loss: 0.09906092286109924
Epoch 90, loss: 0.0782773569226265
Epoch 100, loss: 0.06751013547182083
Epoch 110, loss: 0.0602407306432724
Epoch 120, loss: 0.05471309274435043
Epoch 130, loss: 0.05028054863214493
Epoch 140, loss: 0.04662073776125908
Epoch 150, loss: 0.04353221878409386
Epoch 160, loss: 0.040880925953388214
Epoch 170, loss: 0.03857392817735672
Epoch 180, loss: 0.036543313413858414
Epoch 190, loss: 0.0347396545112133
Epoch 200, loss: 0.03312429040670395
Epoch 210, loss: 0.0316675566136837
Epoch 220, loss: 0.03034592606127262
Epoch 230, loss: 0.02914057858288288
Epoch 240, loss: 0.028035849332809448


The loss initially spikes up before it comes back down, and it actually does pretty well!

In [61]:
pred_test = model(test_images.view(-1, 3 * 128 * 128))

In [62]:
loss_value = (pred_test.argmax(dim=1) == test_labels).float().mean()

In [63]:
print(f"Accuracy: {loss_value.item()}")

Accuracy: 0.14717839658260345


This means that if we had to pick a flower out of the 102 classes, we get a 15% accuracy. Not great, but definitely better than at random.

## Notes

The setup of loading the data, loading the model, defining the loss and optimizer and hyperparameters, and then running the training loop, is all the basic building blocks of what training will look like.