#### Complete logistic regression pipeline

The steps in this pipeline are the same with some minor differences
<ul>
    <li>Define model in terms of input size and output size</li>
    <li>Define loss function (BCEloss here) and optimiser</li>
    <li>Training loop</li>
    <ul>
        <li>Predict using model</li>
        <li>Compute loss</li>
        <li>loss.backward()</li>
        <li>Optimiser step</li>
        <li>Zero out gradients</li>
    </ul>
</ul>

In [1]:
import torch
import torch.nn as nn
import numpy as np
from sklearn import datasets
from sklearn.preprocessing import StandardScaler # To scale data to a fixed range
from sklearn.model_selection import train_test_split # To split the data into train and test batches

In [2]:
# Prepare dataset
bc = datasets.load_breast_cancer() # Breast cancer dataset presenting a binary classification problem
X, y = bc.data, bc.target

In [3]:
# Prepare input and output sizes
num_samples, num_features = X.shape

In [4]:
# Test train split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) # Take 20% of data as test data

In [5]:
# Scale features to a 0 mean and unit variance
sc = StandardScaler()
X_train = sc.fit_transform(X_train) # Fit to the training set
X_test = sc.transform(X_test) # Transform testing set accordingly

In [6]:
# Convert to pytorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)

In [7]:
# Convert y to column vector
y_train = y_train.view(y_train.shape[0], 1)
y_test = y_test.view(y_test.shape[0], 1)

In [8]:
# Custom model class
class LogisticRegression(nn.Module):
    def __init__(self, input_features):
        super(LogisticRegression, self).__init__()
        self.linear = nn.Linear(input_features, 1) # Linear layer has only one output (0 or 1)
    
    def forward(self, x):
        y_predicted = torch.sigmoid(self.linear(x))
        return y_predicted

In [9]:
# Hyperparameters
alpha = 1e-2
epochs = 1000

In [10]:
# Model definition
model = LogisticRegression(num_features)

In [11]:
# Loss function and optimizer
criterion = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=alpha)

In [12]:
# Training loop
for epoch in range(epochs):
    # Forward pass
    predicted_output = model(X_train)
    # Loss
    loss = criterion(predicted_output, y_train)
    # Backward pass
    loss.backward()
    # Update parameters
    optimizer.step()
    # Zero out gradients
    optimizer.zero_grad()
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch: {epoch+1}, loss: {loss.item():.3f}")

Epoch: 10, loss: 0.521
Epoch: 20, loss: 0.448
Epoch: 30, loss: 0.399
Epoch: 40, loss: 0.363
Epoch: 50, loss: 0.335
Epoch: 60, loss: 0.313
Epoch: 70, loss: 0.295
Epoch: 80, loss: 0.279
Epoch: 90, loss: 0.266
Epoch: 100, loss: 0.255
Epoch: 110, loss: 0.245
Epoch: 120, loss: 0.236
Epoch: 130, loss: 0.228
Epoch: 140, loss: 0.221
Epoch: 150, loss: 0.215
Epoch: 160, loss: 0.209
Epoch: 170, loss: 0.204
Epoch: 180, loss: 0.199
Epoch: 190, loss: 0.194
Epoch: 200, loss: 0.190
Epoch: 210, loss: 0.186
Epoch: 220, loss: 0.182
Epoch: 230, loss: 0.179
Epoch: 240, loss: 0.176
Epoch: 250, loss: 0.173
Epoch: 260, loss: 0.170
Epoch: 270, loss: 0.167
Epoch: 280, loss: 0.165
Epoch: 290, loss: 0.162
Epoch: 300, loss: 0.160
Epoch: 310, loss: 0.158
Epoch: 320, loss: 0.156
Epoch: 330, loss: 0.154
Epoch: 340, loss: 0.152
Epoch: 350, loss: 0.150
Epoch: 360, loss: 0.148
Epoch: 370, loss: 0.146
Epoch: 380, loss: 0.145
Epoch: 390, loss: 0.143
Epoch: 400, loss: 0.142
Epoch: 410, loss: 0.140
Epoch: 420, loss: 0.139
E

In [13]:
# Evaluation
with torch.no_grad():
    test_prediction = model(X_test)
    test_prediction = test_prediction.round() # Binary classification ( >= 0.5 -> 1, 0 otherwise)
    acc = test_prediction.eq(y_test).sum() / float(y_test.shape[0]) # Accuracy calculated as number of correctly classified samples
    print(f"Accuracy: {acc:.3f}")

Accuracy: 0.965


Here a few things were done differently:
<ul>
    <li>Binary classification problem required the use of logistic regression</li>
    <li>Forward pass of logistic regresison requires calculating sigmoid of dense layers' output</li>
    <li>Evaluation was done on test set</li>
    <li>StandardScaler was used to transofrm inputs to 0 mean and unit standard deviation</li>
</ul>