# Lab 6: Visualizing the training of neural networks. Handwritten digit classification.

In this lab, you will learn how to use [wandb](https://wandb.ai/) to visualize the training process of neural networks. We are going to build and train a feed-forward neural network for recognizing handwritten digits of the MNIST dataset. The training process will be visualized in the wandb dashboard, which will allow us to monitor the loss and accuracy of the model in real-time.

---

Feel free to create an account at [wandb.ai](https://wandb.ai/) before starting this lab.

### A simple example of how to use wandb in a typical training loop is shown below:

```python
import wandb

wandb.login() # Log in to your wandb account

some_config = {
    'learning_rate': 0.01,
    'layer_1_size': 128,
    'layer_2_size': 64,
    'batch_size': 32
} # This is just an example of a configuration
# dictionary, you can put anything you want here

# start a new run and log parameters
wandb.init(project='mnist-classifier', config=some_config)

# Here you would prepare your data,
# initialize the model, optimizer, etc.

# Training loop
for epoch in range(100):
    ...
    wandb.log({'loss': loss, 'accuracy': accuracy})
    # This will send the loss and accuracy to wandb
    # and you can visualize it in the dashboard

# End of the run
wandb.finish()
```

The most important part is the `wandb.log()` function, which sends the data to the wandb dashboard. You can log any metric you want, not just loss and accuracy. The value passed to the function must be a dictionary.


## Exercise 1: Prepare data for training a mnist classifier (4 points)

Before you start training a neural network, you need to prepare the data. In this exercise, you will prepare the MNIST dataset of handwritten digits for training a classifier. Load the MNIST dataset using from `data/mnist_train.csv` and `data/mnist_test.csv` files. You should then:

1. Normalize the features to the range [0, 1].
2. Encode the labels using one-hot encoding.
3. Create a PyTorch `Dataset` object for the training and test sets.
4. Create a PyTorch `DataLoader` object for the training and test sets.

In [6]:
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F


train_df = pd.read_csv("data/mnist-train.csv")
test_df  = pd.read_csv("data/mnist-test.csv")


X_train = train_df.iloc[:, 1:].values.astype('float32')
y_train = train_df.iloc[:, 0].values.astype('int64')

X_test = test_df.iloc[:, 1:].values.astype('float32')
y_test = test_df.iloc[:, 0].values.astype('int64')

#normalizacja
X_train /= 255.0
X_test  /= 255.0


num_classes = 10
y_train_onehot = F.one_hot(torch.tensor(y_train), num_classes=num_classes).float()
y_test_onehot  = F.one_hot(torch.tensor(y_test), num_classes=num_classes).float()



class Dataset(Dataset):
    def __init__(self, X, y):
        self.X = torch.tensor(X, dtype=torch.float32)
        self.y = torch.tensor(y, dtype=torch.long) 

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]


train_dataset = Dataset(X_train, y_train)
test_dataset  = Dataset(X_test, y_test)


batch_size = 64

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader  = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)


## Exercise 2: Prepare the architecture of the neural network (4 points)

In this exercise, you will prepare the architecture of the neural network. You should:

1. Create a neural network class that inherits from `torch.nn.Module`.
2. The neural network should have at least one hidden layer.
3. Use ReLU activation functions after each but the output layer.
4. Use a softmax activation function in the output layer to get the probabilities of each class.

**Feel free to experiment with the architecture of your network** - try adding more hidden layers, changing the number of neurons in each layer, etc. You can also add a dropout layer or some other regularization technique and see if it improves the performance of your model.

In [7]:
import torch.nn as nn


class Classifier(nn.Module):
    def __init__(self, hidden_sizes=[128, 64], dropout_prob=0.2):

        super(Classifier, self).__init__()

       
        self.input_size = 28 * 28
        self.hidden_sizes = hidden_sizes

        
        layers = []
        in_features = self.input_size
        for hidden_size in hidden_sizes:
            layers.append(nn.Linear(in_features, hidden_size))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(dropout_prob))
            in_features = hidden_size
        self.hidden_layers = nn.Sequential(*layers)

        
        self.output_layer = nn.Linear(in_features, 10)

    def forward(self, x):

        h = self.hidden_layers(x)
        logits = self.output_layer(h)
        probs = F.softmax(logits, dim=1)  
        return probs

## *Training PyTorch models on GPU

**GPUs are optimized for performing matrix operations in parallel.** Although we call them "graphics processing units", they are actually very powerful processors that can be used for any kind of parallel computation, including training deep neural networks. In fact, data science is one of the most common applications of GPUs today, as can be seen by the revenue of companies like NVIDIA over the past few years. NVIDIA is a monopolist in the GPU market - in 2023, the company owned 92% of the data center GPU market share. As for 31 July, the 2024 revenue of NVIDIA was 60.92 billion USD, while the total revenue of 2020 was $10.92 billion. If someone benefits from the deep learning hype, it is certainly NVIDIA.

If you happen to have an NVIDIA GPU in your computer, you can use it to train your deep learning models, as PyTorch has support for CUDA, which is NVIDIA's parallel computing API. To train a model on GPU, you need to explicitly tell PyTorch to move the model and the data to the GPU.

Here is an example training loop that uses the GPU:

```python
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Check if a GPU is available

# Initialize the model and move it to the GPU
model = SomeNeuralNetwork().to(device)   # Move the model to the GPU

for epoch in range(100):
    for batch in data_loader:
        X, y = batch
        X, y = X.to(device), y.to(device)   # Move the tensors to GPU
        
        y_pred = model(X)   # Perform a forward pass (on the GPU)
        loss = criterion(y_pred, y)   # Compute the loss (still on the GPU)
        
        ...  # The rest of the training loop
        
        y_pred = y_pred.detach().cpu()   # Move the predictions back to the CPU
        # to do anything else with them
```

Note that **the model and all the tensors it uses for computation should be moved to the GPU**. You can do this by calling the `.to(device)` method on the model and the data tensors. If you want to move the data back to the CPU (to process it further, calculate metrics, visualize), you call the `.cpu()` method on the tensor.

**Doing calculations on the GPU, you should be wary of few things:**

* **The GPU has a limited amount of memory**, so you should be careful not to run out of memory. A typical graphics card has a few gigabytes of memory, so you should be fine with most models and datasets. However, moving very large tensors to the GPU can cause out-of-memory errors. That's one of the reasons why we use a dataloader and process the data in batches.
* While the GPU is much faster than the CPU for large matrix operations, **transferring data between the CPU and the GPU is slow**. Therefore, it is best to minimize the number of data transfers between the CPU and the GPU.

## Exercise 3: Prepare the training loop (2 points)

In this exercise, you will prepare the training loop. You should:

1. Initialize the neural network.
2. Define the loss function (CrossEntropyLoss) and the optimizer (Adam).
3. Pass a dictionary with the configuration to wandb. This dictionary should contain all the hyperparameters of our model, including the learning rate, the size of the hidden layers, batch size, etc.
4. Train the neural network. Each epoch should consist of a training and validation phase. You should log the loss and accuracy of the training and validation sets using wandb. Track the training progress by opening your project at [wandb.ai](https://wandb.ai/) and see how cool of a tool it is!

### Saving and loading the model
As training can take some time, it is a good idea to save the model's state dictionary (its learned weights) to a file after training. You can do this with the following code:

    torch.save(model.state_dict(), 'model.pth')
    
To load the model from the file, you can use the following code:

    model.load_state_dict(torch.load('model.pth'))

In [8]:
import torch.optim as optim
import wandb


epochs = 100  
model = Classifier(hidden_sizes = [256, 128], dropout_prob = 0.3) 


optimizer = optim.Adam(model.parameters(), lr=0.001) 
loss = nn.CrossEntropyLoss() 
train_loss_history = []
test_loss_history = []

for epoch in range(epochs):
    model.train()  
    train_loss = 0

    for X_batch, y_batch in train_loader:  

        optimizer.zero_grad()   
        y_pred = model(X_batch)  
        batch_loss = loss(y_pred, y_batch) 
        batch_loss.backward()   
        optimizer.step()    
        train_loss += batch_loss.item() 

    train_loss = train_loss / len(train_loader) 
    train_loss_history.append(train_loss)   
    print(f'Epoch: {epoch}')
    print(f'Train loss: {train_loss}')

    model.eval()    
    test_loss = 0    

    for X_batch, y_batch in test_loader:

        y_pred = model(X_batch) 
        test_loss += loss(y_pred, y_batch).item()    

    test_loss = test_loss / len(test_loader)
    test_loss_history.append(test_loss)  
    print(f'Test loss: {test_loss}')

Epoch: 0
Train loss: 1.6046130692780907
Test loss: 1.5337799863450845
Epoch: 1
Train loss: 1.5274836795924822
Test loss: 1.5083029384066344
Epoch: 2
Train loss: 1.51433591903654
Test loss: 1.5015678413354667
Epoch: 3
Train loss: 1.5074531015302581
Test loss: 1.4966955496247407
Epoch: 4
Train loss: 1.5018805930101033
Test loss: 1.495457850444089
Epoch: 5
Train loss: 1.4989477295611204
Test loss: 1.4959122534770115
Epoch: 6
Train loss: 1.4975055942911584
Test loss: 1.4905199184539213
Epoch: 7
Train loss: 1.4941698936765382
Test loss: 1.4903107623385776
Epoch: 8
Train loss: 1.4918757164910403
Test loss: 1.4882256096335733
Epoch: 9
Train loss: 1.491950422080595
Test loss: 1.4875932240941723
Epoch: 10
Train loss: 1.4898885258479413
Test loss: 1.489066959186724
Epoch: 11
Train loss: 1.4896259156625662
Test loss: 1.4859998803229848
Epoch: 12
Train loss: 1.4881699197073734
Test loss: 1.4865967179559598
Epoch: 13
Train loss: 1.4877652186574712
Test loss: 1.488666857883429
Epoch: 14
Train loss: 

## *Exercise: Easy hyperparameter tuning with wandb

Wandb allows you to perform hyperparameter tuning by automatically creating multiple runs with different hyperparameters and logging the performance of each run. Below is a brief instruction to `wandb` hyperparameter tuning, but you are more than welcome to find more information in the [official wandb guide](https://docs.wandb.ai/guides/sweeps/).

Your task is to use wandb to perform hyperparameter tuning of the neural network, trying different values of the learning rate, batch size, and the size of the hidden layers. You can use the following hyperparameters:

First, we need to define a dictionary with the hyperparameters that we want to tune. For example:

```python
parameters = {
    'learning_rate': {'values': [0.01, 0.001, 0.0001]},
    'batch_size': {'values': [32, 64, 128]},
    'layer_1_size': {'values': [64, 128, 256]},
    'layer_2_size': {'values': [32, 64, 128]}
}
```

Then we need to create a dictionary with the configuration of the run:

```python
sweep_config = {
    'name': 'mnist-sweep',
    'method': 'grid',   # grid search, you can also try 'random' or 'bayes'
    'metric': {'goal': 'minimize', 'name': 'val_loss'},
    'parameters': parameters,   # that's the dictionary with the hyperparameters
}
```

Finally, we can use the `wandb.sweep` function to perform hyperparameter tuning:

```python
sweep_id = wandb.sweep(sweep_config, project='mnist-classifier')
```

After that, we can finally run the sweep:

```python
wandb.agent(sweep_id, function=train)
```
where `train` is a function that trains the model and logs the metrics to wandb. This function should take a `config` argument, which will contain the hyperparameters of the run. That is how wandb knows which hyperparameters to tune.

1. Rewrite the training loop into a function that takes a single dictionary `parameters` as an argument, initializes the model, optimizer, and criterion, and trains the model for a fixed number of epochs. The function should log the loss and accuracy of the training and validation sets to wandb.
2. Create a dictionary with the hyperparameters that you want to tune.
3. Create a sweep configuration dictionary.
4. Run the sweep and monitor the results in the wandb dashboard.

In [None]:
def train(parameters: dict):
    # your code goes here
    ...

In [None]:
parameters = {...}

sweep_config = {
    'name': 'mnist-sweep',
    'method': 'bayes',
    'metric': {'goal': 'maximize', 'name': 'accuracy'}, # if we want to maximize the accuracy
    # remember to log the metric that you want to maximize or minimize!
    'parameters': parameters,
}

sweep_id = wandb.sweep(sweep_config, project='mnist-classifier')    # This will create a new sweep
wandb.agent(sweep_id, function=train)   # This will start the hyperparameter tuning process