#PADL Week 5 Practical: Logistic Regression

##Logistic regression with scikit-learn

**Initial reading:**

Reading and understanding the scikit-learn examples on logistic regression is a good way to get started. There are no fewer than 5 examples given in the [logistic regression section](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression) of the scikit-learn User manual. Feel free to look at all five but for sure look at the first two: L1 Penalty and Sparsity in Logistic Regression and Regularization path of L1- Logistic Regression. You will see that by default scikit-learn uses an L2 penalty (like in ridge regression) but it is also possible to use an
L1 penalty (like in lasso regression). Built-in cross-validation support for choosing the ‘right’ value for the complexity parameter is also available via the [LogisticRegressionCV class](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html).

**Diagnosing breast cancer:**

Go to the Breast Cancer Wisconsin (Diagnostic) data set [webpage](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)). If you click on the ‘Data Folder’ link near the top of the page, then you will be able to get the data. It is the file [breast-cancer-wisconsin.data](https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/). You can either upload this to the session storage for your colab notebook (but it will be lost each time your session times out, although you can add `!wget -q https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data` to your script to automatically download to session storage each time it runs - as done below) or you can mount a google drive folder and store the file there.

To save you hassle of working out how to get this data into a Python program is some code to read this data in, and then remove any datapoints containing missing values:

In [2]:
!wget -q https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data

import numpy as np
from sklearn.linear_model import LogisticRegression

data = np.genfromtxt('breast-cancer-wisconsin.data',delimiter=',',missing_values='?')
data = data[~np.isnan(data).any(axis=1)]
X = data[:,1:-1] # ignore first column and omit class variable at the end
y = data[:,-1]
X.shape

(683, 9)

**To do:**

Now use logistic regression to build a model to predict either malignant or benign. In fact, I would like you to build a number of logistic regression models where you *vary the size of the training data* and where you *vary the complexity parameter setting*. In all cases use whatever data you have excluded from training as a test set, and compute the score.

Check that training on more data increases predictive accuracy and compare the performance of different complexity parameter settings on smaller training sets.

##Logistic Regression in PyTorch

Below is a straightforward re-implementation of logistic regression in PyTorch. Nearly all of this should now be very familiar to you. We put our logistic regression model in a superclass of `torch.nn.Module`. The model itself consists of a linear layer mapping 9 input features to 1 output. This is then passed through a sigmoid layer so that the model outputs probability of one of the two classes. Since we are doing binary cross entropy loss, we use `torch.nn.BCELoss` as our loss function (note: the sigmoid is applied inside the model so we don't use the version of the loss function that combines sigmoid and BCE - but this would be a perfectly valid alternative). We train as normal and evaluate on the test set. But this time, we threshold the output probabilities to make our final hard class decisions and compute the percent correct.

**To do:**

Read and understand this code block. Run it. Print out the shapes of the tensors as they pass through the `logisticRegression` model (all tensors have a `shape` attribute). Make sure the shapes corresponds with your understanding of what each layer is doing. Try changing the training set size, number of training iterations and learning rate and see the effect.

In [5]:
import torch

class logisticRegression(torch.nn.Module):
    def __init__(self, inputSize):
        # Call superclass constructor
        super(logisticRegression, self).__init__()
        # Initialise components of model:
        # 1. Linear layer
        self.linear = torch.nn.Linear(inputSize, 1)
        # 2. Sigmoid layer
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        # Forward pass through the model:
        # 1. Apply linear layer to input
        y = self.linear(x)
        # 2. Apply sigmoid to output of linear layer
        y = self.sigmoid(y)
        return y

# Instantiate model logistic regression model 9 channel input
model = logisticRegression(9)
# Instantiate loss function (binary cross entropy loss - sigmoid applied inside model)
criterion = torch.nn.BCELoss()
# Setup optimiser
optim = torch.optim.SGD(model.parameters(), lr=0.1)

training_size = 400
epochs = 100

# Convert labels to binary 0/1 classes as expected by PyTorch
y_01 = np.array([0 if x==2 else 1 for x in data[:,-1]])

# Split train/test and convert to PyTorch tensors
X_train_tensor = torch.from_numpy(np.float32(X[:training_size]))
Y_train_tensor = torch.from_numpy(np.float32(y_01[:training_size])).unsqueeze(1)
X_test_tensor = torch.from_numpy(np.float32(X[training_size:]))
Y_test_tensor = torch.from_numpy(np.float32(y_01[training_size:])).unsqueeze(1)

# Main training loop
for epoch in range(epochs):
    # Pass training data through model
    y_predict = model(X_train_tensor)
    print(f"Epoch {epoch}: y_predict shape = {y_predict.shape}")
    # Compute BCE loss
    loss = criterion(y_predict,Y_train_tensor)
    # Backward pass and gradient step
    optim.zero_grad()
    loss.backward()
    optim.step()
    if not epoch % 10:
        # Print out the loss every 200 iterations
        print('epoch {}, loss {}'.format(epoch, loss.item()))

# Pass training set set through model
y_predict = model(X_train_tensor)
# Threshold probabilities to binary classes
predictions = (y_predict>0.5).float()
# Compare predicted classes to labels
correct = (predictions == Y_train_tensor).float().sum()
print("Percent training set correctly classified: {:.2f}%".format(100*correct/training_size))

# Pass test set through model
y_predict = model(X_test_tensor)
# Threshold probabilities to binary classes
predictions = (y_predict>0.5).float()
# Compare predicted classes to labels
correct = (predictions == Y_test_tensor).float().sum()
print("Percent test set correctly classified: {:.2f}%".format(100*correct/X_test_tensor.shape[0]))

Epoch 0: y_predict shape = torch.Size([400, 1])
epoch 0, loss 1.9602806568145752
Epoch 1: y_predict shape = torch.Size([400, 1])
Epoch 2: y_predict shape = torch.Size([400, 1])
Epoch 3: y_predict shape = torch.Size([400, 1])
Epoch 4: y_predict shape = torch.Size([400, 1])
Epoch 5: y_predict shape = torch.Size([400, 1])
Epoch 6: y_predict shape = torch.Size([400, 1])
Epoch 7: y_predict shape = torch.Size([400, 1])
Epoch 8: y_predict shape = torch.Size([400, 1])
Epoch 9: y_predict shape = torch.Size([400, 1])
Epoch 10: y_predict shape = torch.Size([400, 1])
epoch 10, loss 0.5802050828933716
Epoch 11: y_predict shape = torch.Size([400, 1])
Epoch 12: y_predict shape = torch.Size([400, 1])
Epoch 13: y_predict shape = torch.Size([400, 1])
Epoch 14: y_predict shape = torch.Size([400, 1])
Epoch 15: y_predict shape = torch.Size([400, 1])
Epoch 16: y_predict shape = torch.Size([400, 1])
Epoch 17: y_predict shape = torch.Size([400, 1])
Epoch 18: y_predict shape = torch.Size([400, 1])
Epoch 19: y_

##MLP in PyTorch

**To do:**

I would now like you to extend the logistic regression model to a very basic MLP. This MLP should have one hidden layer with 16 nodes and ReLU activation, then an output layer with sigmoid activation (to output the class probability). To do this, replace the logisticRegression class with an MLP model that includes the additional layers (note: this is only a small modification of the logistic regression model).

What happens to the training loss and classification accuracy? How about testing? (i.e. generalisation). How many parameters does your model have in total? Experiment with using different numbers of neurons in the hidden layer. What happens to performance and generalisation? Keep in mind how many parameters your model has and how many training samples you have. Try adding a second hidden layer and again, experiment with the effect on performance.

**Optional Extention**

Revisit the diabetes dataset from the Week 3 practical. Try replacing linear regression with an MLP. In this case, you won't be doing classification so won't want the sigmoid activation on the output. Also, you won't use BCE loss. Can you improve performance relative to linear regression?

In [6]:
class MLP(torch.nn.Module):
    def __init__(self, inputSize):
        super(MLP, self).__init__()
        self.fc1 = torch.nn.Linear(inputSize, 64)
        self.fc2 = torch.nn.Linear(64, 64)
        self.fc3 = torch.nn.Linear(64, 1)
        self.reLU = torch.nn.ReLU()
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        y = self.fc1(x)
        y = self.reLU(y)
        y = self.fc2(y)
        y = self.reLU(y)
        y = self.fc3(y)
        y = self.sigmoid(y)
        return y

In [7]:
# Instantiate model logistic regression model 9 channel input
model = MLP(9)
# Instantiate loss function (binary cross entropy loss - sigmoid applied inside model)
criterion = torch.nn.BCELoss()
# Setup optimiser
optim = torch.optim.SGD(model.parameters(), lr=0.65)

training_size = 400
epochs = 100

# Convert labels to binary 0/1 classes as expected by PyTorch
y_01 = np.array([0 if x == 2 else 1 for x in data[:, -1]])

# Split train/test and convert to PyTorch tensors
X_train_tensor = torch.from_numpy(np.float32(X[:training_size]))
Y_train_tensor = torch.from_numpy(np.float32(y_01[:training_size])).unsqueeze(1)
X_test_tensor = torch.from_numpy(np.float32(X[training_size:]))
Y_test_tensor = torch.from_numpy(np.float32(y_01[training_size:])).unsqueeze(1)

# Main training loop
for epoch in range(epochs):
    # Pass training data through model
    y_predict = model(X_train_tensor)
    # Compute BCE loss
    loss = criterion(y_predict, Y_train_tensor)
    # Backward pass and gradient step
    optim.zero_grad()
    loss.backward()
    optim.step()
    if not epoch % 10:
        # Print out the loss every 200 iterations
        print("epoch {}, loss {}".format(epoch, loss.item()))

# Pass training set set through model
y_predict = model(X_train_tensor)
# Threshold probabilities to binary classes
predictions = (y_predict > 0.5).float()
# Compare predicted classes to labels
correct = (predictions == Y_train_tensor).float().sum()
print(
    "Percent training set correctly classified: {:.2f}%".format(
        100 * correct / training_size
    )
)

# Pass test set through model
y_predict = model(X_test_tensor)
# Threshold probabilities to binary classes
predictions = (y_predict > 0.5).float()
# Compare predicted classes to labels
correct = (predictions == Y_test_tensor).float().sum()
print(
    "Percent test set correctly classified: {:.2f}%".format(
        100 * correct / X_test_tensor.shape[0]
    )
)

epoch 0, loss 0.6903680562973022
epoch 10, loss 0.7441874146461487
epoch 20, loss 0.42092305421829224
epoch 30, loss 0.24725504219532013
epoch 40, loss 0.3616285026073456
epoch 50, loss 0.15881195664405823
epoch 60, loss 0.1622162014245987
epoch 70, loss 0.15792058408260345
epoch 80, loss 0.11028960347175598
epoch 90, loss 0.2540406286716461
Percent training set correctly classified: 97.00%
Percent test set correctly classified: 97.88%


In [8]:
class MLP(torch.nn.Module):
    def __init__(self, inputSize):
        super(MLP, self).__init__()
        self.fc1 = torch.nn.Linear(inputSize, 64)
        self.fc2 = torch.nn.Linear(64, 64)
        self.fc3 = torch.nn.Linear(64, 1)
        self.reLU = torch.nn.ReLU()
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        y = self.fc1(x)
        y = self.reLU(y)
        y = self.fc2(y)
        y = self.reLU(y)
        y = self.fc3(y)
        # y = self.sigmoid(y)
        return y

In [9]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

(X, y) = fetch_california_housing(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [10]:

model = MLP(8)
criterion = torch.nn.MSELoss()

optim = torch.optim.SGD(model.parameters(), lr=0.0000019, momentum=0.9)

epochs = 300

# Split train/test and convert to PyTorch tensors
X_train_tensor = torch.from_numpy(np.float32(X_train))
Y_train_tensor = torch.from_numpy(np.float32(y_train)).unsqueeze(1)
X_test_tensor = torch.from_numpy(np.float32(X_test))
Y_test_tensor = torch.from_numpy(np.float32(y_test)).unsqueeze(1)

# Main training loop
for epoch in range(epochs + 1):
    # Pass training data through model
    y_predict = model(X_train_tensor)
    # Compute BCE loss
    loss = criterion(y_predict, Y_train_tensor)
    # Backward pass and gradient step
    optim.zero_grad()
    loss.backward()
    optim.step()
    if not epoch % 100:
        # Print out the loss every 200 iterations
        print("epoch {}, loss {}".format(epoch, loss.item()))

epoch 0, loss 5118.83251953125
epoch 100, loss 1.369065284729004
epoch 200, loss 1.3338567018508911
epoch 300, loss 1.3224562406539917


In [11]:
print(criterion(model(X_test_tensor), Y_test_tensor))

tensor(1.3408, grad_fn=<MseLossBackward0>)
