In [1]:
import torch
import seaborn as sns
import sys
from pathlib import Path
sys.path.insert(0, "../..")

In [2]:
from src.data import make_dataset
from src.models import imagemodels
from src.models import train_model
import gin


In [3]:
gin.parse_config_file("model.gin")

ParsedConfigFileIncludesAndImports(filename='model.gin', imports=['gin.torch.external_configurables'], includes=[])

We will be using `gin-config` to easily keep track of our experiments, and to easily save the different things we did during our experiments.

The `model.gin` file is a simple file that will try to load parameters for funcitons that are already imported. 

So, if you wouldnt have imported train_model, the ginfile would not be able to parse settings for train_model.trainloop and will give an error.

We can print all the settings that are operational with `gin.operative_config_str()` once we have loaded the functions to memory.

So, while `.get_MNIST()` has two parameters that need to be set (a batchsize and a datadir), we can now load the function without having to do that: gin has done it already for us.

In [7]:
train_dataloader, test_dataloader = make_dataset.get_MNIST()

In [8]:
print(gin.operative_config_str())

import gin.torch.external_configurables

# Parameters for CrossEntropyLoss:
CrossEntropyLoss.ignore_index = -100
CrossEntropyLoss.label_smoothing = 0.0
CrossEntropyLoss.reduce = None
CrossEntropyLoss.reduction = 'mean'
CrossEntropyLoss.size_average = None
CrossEntropyLoss.weight = None

# Parameters for get_MNIST:
get_MNIST.batch_size = 32
get_MNIST.data_dir = '../../data/raw'

# Parameters for NeuralNetwork:
NeuralNetwork.num_classes = 10
NeuralNetwork.units1 = 512
NeuralNetwork.units2 = 512

# Parameters for trainloop:
trainloop.epochs = 10
trainloop.factor = 0.9
trainloop.learning_rate = 0.001
trainloop.log_dir = '../../models/sgd/'
trainloop.loss_fn = @CrossEntropyLoss()
trainloop.optimizer = @SGD
trainloop.patience = 10
trainloop.tunewriter = False



A big advantage is that we can save this config as a file; that way it is easy to track what you changed during your experiments.

In [10]:
from src.models import metrics
accuracy = metrics.Accuracy()

In [11]:
gin.parse_config_file("model.gin")

model = imagemodels.NeuralNetwork()

model =  train_model.trainloop(
    model=model,
    train_dataloader=train_dataloader,
    test_dataloader=test_dataloader,
    metrics=[accuracy],
    train_steps=100,
    eval_steps=100,
)

2022-11-22 19:28:54.039 | INFO     | src.data.data_tools:dir_add_timestamp:114 - Logging to ../../models/sgd/20221122-1928
100%|██████████| 100/100 [00:00<00:00, 315.03it/s]
2022-11-22 19:28:54.605 | INFO     | src.models.train_model:trainloop:171 - Epoch 0 train 2.2957 test 2.2870 metric ['0.1266']
100%|██████████| 100/100 [00:00<00:00, 314.77it/s]
2022-11-22 19:28:55.094 | INFO     | src.models.train_model:trainloop:171 - Epoch 1 train 2.2801 test 2.2713 metric ['0.2009']
100%|██████████| 100/100 [00:00<00:00, 306.41it/s]
2022-11-22 19:28:55.579 | INFO     | src.models.train_model:trainloop:171 - Epoch 2 train 2.2656 test 2.2560 metric ['0.2828']
100%|██████████| 100/100 [00:00<00:00, 323.18it/s]
2022-11-22 19:28:56.049 | INFO     | src.models.train_model:trainloop:171 - Epoch 3 train 2.2468 test 2.2376 metric ['0.3759']
100%|██████████| 100/100 [00:00<00:00, 314.09it/s]
2022-11-22 19:28:56.529 | INFO     | src.models.train_model:trainloop:171 - Epoch 4 train 2.2311 test 2.2208 metri

Run the experiment, and study the result with tensorboard. 

Locally, it is easy to do that with VS code itself. On the server, you have to take these steps:

- in the terminal, navigate to ~/code/ML22 
- activate the python environment for the shell with `poetry shell`. Note how the correct environment is being activated.
- run `tensorboard --logdir=models` in the terminal
- tensorboard will launch at `localhost:6006` and vscode will notify you that the port is forwarded
- you can either press the `launch` button in VScode or open your local browser at `localhost:6006`

Experiment with things like:

- changing the amount of units1 and units2 to values between 16 and 1024. Use factors of 2: 16, 32, 64, etc.
- changing the batchsize to values between 4 and 128. Again, use factors of two.
- all your experiments are saved in the `models` directory, with a timestamp. Inside you find a saved_config.gin file, that 
contains all the settings for that experiment. The `events` file is what tensorboard will show.
- plot the result in a heatmap: units vs batchsize.
- changing the learningrate to values between 1e-2 and 1e-5 
- changing the optimizer from SGD to one of the other available algoritms at [torch](https://pytorch.org/docs/stable/optim.html) (scroll down for the algorithms)

A note on train_steps: this is a setting that determines how often you get an update. 
Because our complete dataset is 938 (60000 / 64) batches long, you will need 938 trainstep to cover the complete 60.000 images.

This can actually be a bit confusion, because every value below 938 changes the meaning of `epoch` slightly, because one epoch is no longer
the full dataset, but simply `trainstep` batches. Setting trainsteps to 100 means you need to wait twice as long before you get feedback on the performance,
as compared to trainsteps=50. You will also see that settings trainsteps to 100 improves the learning, but that is simply because the model has seen twice as 
much examples as compared to trainsteps=50.

This implies that it is not usefull to compare trainsteps=50 and trainsteps=100, because setting it to 100 will always be better.
Just pick an amount, and adjust your number of epochs accordingly.