# Tutorial 1 - Introduction to PyTorch 
### Course: DA5001/6400 (Jul-Nov 2024)
### Course Instructor: Prof. Krishna Pillutla
### Teaching Assistant: Ganesh S (ME20B070)
PyTorch is a popular library for deep learning. This tutorial is designed for you to get familiar with PyTorch, by the end of which we will create a basic Neural Network, for classifying digits, trained on the MNIST dataset

## Packages
Please install the `torch` and `torchvision` packages [[Instructions](https://pytorch.org/get-started/locally/)].
We recommend that you use [conda](https://docs.anaconda.com/miniconda/) as your package manager.

## Exercises
Please complete the `TODO` sections in the code

## Load the MNIST data

### Let's load the MNIST data without using pytorch dataloaders

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from torchvision import datasets
# Import the dataset from torchvision.datasets

train_dataset = datasets.MNIST(root="./data", train=True, download=True)
test_dataset = datasets.MNIST(root="./data", train=False, download=True)

X_train, y_train = train_dataset.data, train_dataset.targets.long()
X_test, y_test = test_dataset.data, test_dataset.targets.long()

# Flatten each image in X_train and X_test
# Normalize from pixels [0, 255] -> [0, 1]
X_train = X_train.reshape(X_train.shape[0], -1).float() / 256
X_test = X_test.reshape(X_test.shape[0], -1).float() / 256

# Print the shapes of the training and test datasets
print(f'Train images shape: {X_train.shape}')
print(f'Train labels shape: {y_train.shape}')
print(f'Test images shape: {X_test.shape}')
print(f'Test labels shape: {y_test.shape}')

### We take only 10% of the data, to avoid computational bottlenecks, and reduce training time

In [None]:
# TODO: modify the code below to use 6000 random samples.
# Note to sample input-output pairs from the same indices.
X_train, y_train = X_train[:6000], y_train[:6000]

### Let's visualize some images

In [None]:
num_images = 10
plt.figure(figsize=(10, 1))
for i in range(num_images):
    plt.subplot(1, num_images, i+1)
    plt.imshow(X_train[i].reshape(28, 28), cmap='gray')
    plt.title(y_train[i].item())
    plt.axis('off')
plt.show()

## Define the Neural Network

In [None]:
import torch

### Below is the architecture of the Neural Network that will be used as a classification model for the MNIST data

In [None]:
class MLP(torch.nn.Module):
    def __init__(self, input_size, hidden_size, output_size, transform=None):
        super(MLP, self).__init__()
        self.hidden1 = torch.nn.Linear(input_size, hidden_size)
        self.output = torch.nn.Linear(hidden_size, output_size)
        self.transform = transform
        
    def forward(self, x):
        if self.transform:
            x = self.transform(x)
        x = x.view(-1, 28 * 28)
        x = torch.relu(self.hidden1(x))
        x = self.output(x)
        return x

In [None]:
model = MLP(input_size=28*28, hidden_size=100, output_size=10)

# To see the model parameters for each layer of the network
for name, param in model.named_parameters():
    print(f"Parameter name: {name}, shape: {param.shape}")

# To predict the output of a single image, before training
model.eval()
output = model(torch.tensor(X_train[0]).float())
prediction = torch.argmax(output, dim=1).item() 
plt.imshow(X_train[0].reshape(28, 28), cmap='gray')
print(f'Prediction: {prediction}')

# Note that the prediction will likely be incorrect,
# as the model has not been trained yet.

## Train and evaluate the Neural Network

In [None]:
def evaluate_model(model, test_data, test_labels, batch_size):
    model.eval()  # Set the model to evaluation mode
    num_samples = len(test_labels)
    correct_predictions = 0

    with torch.no_grad():
        for start_idx in range(0, num_samples, batch_size):
            batch_images = test_data[start_idx:start_idx + batch_size]
            batch_labels = test_labels[start_idx:start_idx + batch_size]

            outputs = model(batch_images)
            _, predicted = torch.max(outputs, 1)
            # TODO: calculate the number of correct predictions in this batch.
            # Then increment `correct_predictions` by this amount.
            correct_predictions += # TODO: your code here.

    accuracy = correct_predictions / num_samples
    return accuracy

## Training using Stochastic Gradient descent

The Stochastic Gradient Descent update rule is as follows:
$$
\theta \leftarrow \theta - \eta \nabla_\theta \mathcal{L}(f_\theta(X_B), y_B)
$$

Where 
- $\theta$ is the set of parameters, 
- $f_\theta$ is the function represented by the Neural Network, 
- $X_B$ and $y_B$ is a batch of input data and labels, sampled randomly from the training dataset and 
- $\mathcal{L}$ is the Loss function

The above update is done for each iteration until convergence

## Exercise

Complete the following functions `train_model` and `evaluate_model` to train the model using Stochastic Gradient Descent and evaluate the model on the test set. Use three different learning rates:

1. 5.0
2. 0.5
3. 0.005


What do you observe? Note that we usually want a learning rate as large as possible such that the loss actually decreases. Which of these learning rates satisfy that criteria?

In [None]:
def train_model(model, train_data, labels, batch_size, learning_rate, num_iterations):
    criterion = torch.nn.CrossEntropyLoss()

    num_samples = len(labels)
    indices = list(range(num_samples))

    losses = []
    test_accuracies = []
    for itr in range(num_iterations):
        # choose random set of integers of size batch_size
        batch_indices = np.random.choice(indices, size=batch_size, replace=False)
        batch_images = train_data[batch_indices]
        batch_labels = labels[batch_indices]

        # Forward pass
        outputs = model(batch_images)
        loss = criterion(outputs, batch_labels)

        # Compute gradients using autograd
        grads = torch.autograd.grad(outputs=loss, inputs=model.parameters(), create_graph=False)
        # This is a list of tensors. Each tensor is the gradient with respect to 
        # corresponding parameter from `model.parameters()`.
        
        # Update parameters
        with torch.no_grad():
            for param, grad in zip(model.parameters(), grads):
                # TODO: your code here
                # Note: you need to update the parameters of the model using the gradients in `grads`.
                # The update has to be in-place.
                pass
                

        iteration_loss = loss.item()
        test_accuracies.append(evaluate_model(model, X_test, y_test, batch_size))
        losses.append(iteration_loss)
        if(itr % 100 == 99):
            print(f'Iteration [{itr + 1}/{num_iterations}], Loss: {iteration_loss:.4f}, Test accuracy: {test_accuracies[-1]:.4f}')

    return losses, test_accuracies

In [None]:
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)

model = MLP(input_size=28*28, hidden_size=200, output_size=10)
losses, test_accuracies = train_model(
    model, X_train, y_train, batch_size=100, learning_rate=0.005, num_iterations=10000
)

In [None]:
plt.plot(range(1, num_epochs+1 ), losses)
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.title('Loss in each iteration')
plt.grid(True)
plt.show()

In [None]:
plt.plot(range(1, 1000+1), test_accuracies)
plt.xlabel('Iteration')
plt.ylabel('Test accuracy')
plt.title('Test Accuracy in each iteration')
plt.grid(True)
plt.show()

# Exercise

Vary the hidden width of the neural network. At what width do we observe interpolation? Recall that interpolation is when the training loss is exactly zero. 