# Logistic Regression

Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. Unlike linear regression which outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then be mapped to two or more discrete classes. Given data on **time spent studying** and **exam scores**. Linear Regression and logistic regression can predict different things:

- **Linear Regression** could help us predict the student’s test score on a scale of 0 - 100. Linear regression predictions are continuous (numbers in a range).
- **Logistic Regression** could help use predict whether the student passed or failed. Logistic regression predictions are discrete (only specific values or categories are allowed). We can also view probability scores underlying the model’s classifications.

### Types of Logistic Regression

- Binary (Pass/Fail)
- Multi (Cats, Dogs, Sheep)
- Ordinal (Low, Medium, High)

In this notebook, we are going to classify handwritten digits from the MNIST dataset using Logistic Regression in PyTorch. This is a multi-class classification.

## The MNIST Dataset

Before we define our model, let's go ahead and import our dataset. The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database has a training set of 60,000 examples, and a test set of 10,000 examples.

In [1]:
import torch
import torch.nn as nn
from torchvision import datasets
import torchvision.transforms as transforms
from torch.autograd import Variable

# Hyperparameters
input_size = 784  # Our images are 28px by 28px in size
num_classes = 10  # We have handwritten digits from 0 - 9
num_epochs = 5  # Number of epochs
batch_size = 100  # Batch size
learning_rate = 0.001  # Learning rate

transfm = transforms.ToTensor()  # Transform the dataset objects to tensors

# MNIST dataset - images and labels
train_dataset = datasets.MNIST(root='./data',
                               train=True,
                               transform=transfm,
                               download=True)

test_dataset = datasets.MNIST(root='./data',
                              train=False,
                              transform=transfm)

# Input pipeline
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

Here, the `torch.nn` module contains the code required for the model, `torchvision.datasets` contains the MNIST dataset. It contains the dataset of handwritten digits that we shall be using here. The `torchvision.transforms` module contains various methods to transform objects into others. Here, we shall be using it to transform from images to PyTorch tensors. Also, the `torch.autograd` module contains the `Variable` class amongst others, which will be used by us while defining our tensors.

In our dataset, the image size is 28x28. Thus, our input size is 784. Also, 10 digits are present in this dataset and hence, we can have 10 different outputs. Thus, we set `num_classes` as 10. Also, we shall train our model for five times on the entire dataset. Finally, we will train in small batches of 100 images each so as to prevent the crashing of the program due to memory overflow.

## The Model

Here, we shall initialise our model as a subclass of `torch.nn.Module` and then define the forward pass. In the code that we are writing, the softmax is internally calculated during each forward pass and hence we do not need to specify it inside the `forward()` function.

In [2]:
class LogisticRegression(nn.Module):
    def __init__(self, input_size, num_classes):
        super().__init__()
        self.linear = nn.Linear(input_size, num_classes)
        
    def forward(self, x):
        y_hat = self.linear(x)
        return y_hat

## Loss Function and Optimizer

Next, we set our loss function and the optimiser. Here, we shall be using the cross entropy loss and for the optimiser, we shall be using the stochastic gradient descent algorithm with a learning rate of 0.001 as defined in the hyper parameter above. 

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0.

![Imgur](https://i.imgur.com/uLXWRg8.png)


The graph above shows the range of possible loss values given a true observation. As the predicted probability approaches 1, log loss slowly decreases. As the predicted probability decreases, however, the log loss increases rapidly. Log loss penalizes both types of errors, but especially those predications that are confident and wrong!

Cross-entropy and log loss are slightly different depending on context, but in machine learning when calculating error rates between 0 and 1 they resolve to the same thing.

In [3]:
model = LogisticRegression(input_size, num_classes)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

## Training the Model

Now, we shall start the training. Here, we shall be performing the following tasks:

1. Reset all gradients to 0.
2. Make a forward pass.
3. Calculate the loss.
4. Perform backpropagation.
5. Update all weights.

In [4]:
# Training the model
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images.view(-1, 28 * 28))  # Images flattened into 1D tensors
        labels = Variable(labels)  # Labels 
        
        # Forward -> Backprop -> Optimize
        optimizer.zero_grad()  # Manually zero the gradient buffers
        outputs = model(images)  # Predict the class using the test set
        loss = criterion(outputs, labels)  # Compute the loss given the predicted label
                                           # and actual label
        
        loss.backward()  # Compute the error gradients
        optimizer.step()  # Optimize the model via Stochastic Gradient Descent
        
        if (i + 1) % 100 == 0:
            print("Epoch {}, loss :{}".format(epoch + 1, loss.data[0]))

Epoch 1, loss :2.2049152851104736
Epoch 1, loss :2.0981056690216064
Epoch 1, loss :2.0286219120025635
Epoch 1, loss :1.917850375175476
Epoch 1, loss :1.8506721258163452
Epoch 1, loss :1.810417652130127
Epoch 2, loss :1.7023305892944336
Epoch 2, loss :1.680514931678772
Epoch 2, loss :1.6131799221038818
Epoch 2, loss :1.5625419616699219
Epoch 2, loss :1.525355577468872
Epoch 2, loss :1.453291416168213
Epoch 3, loss :1.4182931184768677
Epoch 3, loss :1.364874005317688
Epoch 3, loss :1.3616888523101807
Epoch 3, loss :1.3887336254119873
Epoch 3, loss :1.2908176183700562
Epoch 3, loss :1.3103325366973877
Epoch 4, loss :1.2904797792434692
Epoch 4, loss :1.2262612581253052
Epoch 4, loss :1.1611058712005615
Epoch 4, loss :1.1176444292068481
Epoch 4, loss :1.0963448286056519
Epoch 4, loss :1.0735220909118652
Epoch 5, loss :1.1075222492218018
Epoch 5, loss :1.0989426374435425
Epoch 5, loss :1.0390461683273315
Epoch 5, loss :0.968352735042572
Epoch 5, loss :1.0160298347473145
Epoch 5, loss :1.1056

## Testing

Now, let's test our model and see how accurate our model can classify handwritten digits.

In [5]:
# Test the Model
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.view(-1, 28 * 28))
    outputs = model(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()
 
print('Accuracy: {}%'.format(100 * correct / total))

Accuracy: 82.82%
