# 2. Logistic regression

- Used as part of INFO8010 Deep Learning (Gilles Louppe, 2018-2019).
- Originally adapted from [Pytorch tutorial for Deep Learning researchers](https://github.com/yunjey/pytorch-tutorial) (Yunvey Choi, 2018).

---

In [None]:
# Imports
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import torch 
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

# Hyper-parameters

In [None]:
input_size = 784
num_classes = 10
num_epochs = 5
batch_size = 100
learning_rate = 0.001

# Data

In [None]:
# MNIST Dataset (Images and Labels)
train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

# Dataset Loader (Input Pipline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

In [None]:
data_iter = iter(train_loader)
X, y = data_iter.next()

In [None]:
plt.imshow(X[0, 0])

# Model

In [None]:
# Model
class LogisticRegression(nn.Module):
    def __init__(self, input_size, num_classes):
        super(LogisticRegression, self).__init__()
        self.linear = nn.Linear(input_size, num_classes)
        self.softmax = nn.Softmax()
    
    def forward(self, x):
        out = self.linear(x)
        out = self.softmax(out)
        return out

model = LogisticRegression(input_size, num_classes)

In [None]:
X.view(-1, 28*28).shape

In [None]:
model(X.view(-1, 28*28))[0]

<div class="alert alert-success">
<b>EXERCISE</b>:

Print <code>y</code> and explain the data structure you see (type and content).

Is it consistent with what a neural network expects when dealing with a classification problem?
If not, why can we still train the model? 
</div>

In [None]:
# Your code

# Loss and optimizer

In [None]:
# Loss and Optimizer
# Softmax is internally computed.
# Set parameters to be updated.
criterion = nn.NLLLoss()  
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  

# Training the Model
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.view(-1, 28*28)
        
        # Forward + Backward + Optimize
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs.log(), labels)
        loss.backward()  # <=> x.grad += dloss/dx for all parameters x
        optimizer.step()
        
    print('Epoch: [%d/%d], Step: [%d/%d], Loss: %.4f' 
           % (epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss))

In [None]:
plt.imshow(X[8, 0]), y[8]

In [None]:
model(X[8].view(1, 784))

<div class="alert alert-success">
<b>EXERCISE</b>:

Update the code with the use of different optimizers and plot the resulting learning curves (check out matplotlib).
What do you observe in terms of convergence time?

</div>

In [None]:
# Your code

<div class="alert alert-success">
<b>EXERCISE</b>:

Update the code above to implement gradient descent instead of stochastic gradient descent and plot the two corresponding training curves.

</div>

In [None]:
# Your code

<div class="alert alert-success">
<b>EXERCISE</b>:

Update the structure of the architecture and the training code so that it uses the torch.nn.CrossEntropyLoss()).

</div>

In [None]:
# Your code

# Test the model

<div class="alert alert-success">
<b>EXERCISE</b>:
<ul>
    <li> Explain what it means to enter in testing mode. </li>
    <li> Why can't we directly optimize with respect to the maximum operator at training time, since this is the final objective we want to achieve. </li>
</ul>
</div>

In [None]:
# Test the Model
correct = 0
total = 0

for images, labels in test_loader:
    images = images.view(-1, 28*28)
    outputs = model(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()
    
print('Accuracy of the model on the 10000 test images: %d %%' % (100 * correct / total))

<div class="alert alert-success">
<b>EXERCISE</b>:
    
Now monitor the behavior on both training and testing set at learning time.

</div>

In [None]:
# Your code