# 2022 Flatiron Machine Learning x Science Summer School

## Step 3: Train plain MLP

In this step, we train plain multilayer perceptrons (MLP) to approximate the generic data of the various functions $f \circ g$ created in Step 1.

We set up the training pipeline and explore hyperparameters.

The current status of chaos is that training `F01(X01)` and `G01(X01)` did not work well adhoc, thus, we created `F04(X04)`.

Furthermore, training on a Google Colab GPU was not successful, so training on CPUs it is.

Ideally, we would train either locally with a script or on Colab with a pure training notebook. This notebook should be for analysis only.

Also, let's send the results to Weights & Biases.

```
!pip install wandb --upgrade

import wandb
wandb.login()

config = hp

with wandb.init(project="XY", config=hyperparameters):
    config = wandb.config

    ... = make(config)

    train(...)

    test(...)

    return model

wandb.watch(model, criterion, log="all", log_freq=10)

def train_log(loss, example_ct, epoch):
    loss = float(loss)
    wandb.log({"epoch": epoch, "loss": loss}, step=example_ct)

def test(...):
    with torch.no_grad():
        wandb.log({"test_accuracy": correct/total})
    torch.onnx.export(model, images, "name")
    wandb.save("name")
```

Sweeps:
```
sweep_config = {
    "method": "random"  # or grid or bayesian
}

metric = {
    "name": "loss",
    "goal": "minimize",
}

sweep_config["metric"] = metric

parameters_dict = {
    
    "layer_size": {
        "values": [128, 256, 512],
    }

    "learning_rate": {
        'distribution': 'uniform',      # q_log_uniform
        'min': 0,
        'max': 0.1,
    }
}

sweep_config["parameters"] = parameters_dict

sweep_id = wandb.sweep(sweep_config, project="name")

wandb.agent(sweep_id, train, count=5)
``` 

Are the results on `F04(X04)` accurate enough for reconstruction?

In [1]:
%matplotlib widget
%load_ext autoreload
%autoreload 2

import os
import numpy as np
import joblib
import matplotlib.pyplot as plt

import torch
from srnet import SRNet, SRData, run_training

In [2]:
# set device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# load data
data_path = "data"

in_var = "X04"
lat_var = None
target_var = "F04"

mask_ext = ".mask"
masks = joblib.load(os.path.join(data_path, in_var + mask_ext))     # TODO: create mask if file does not exist

train_data = SRData(data_path, in_var, lat_var, target_var, masks["train"], device=device)
val_data = SRData(data_path, in_var, lat_var, target_var, masks["val"], device=device)

# define hyperparameters
hyperparams = {
    "arch": {
        "in_size": train_data.in_data.shape[1],
        "out_size": train_data.target_data.shape[1],
        "hid_num": 3,
        "hid_size": 25, 
        "hid_type": "MLP",
        "lat_size": 10,
        },
    "epochs": 5000,
    "runtime": None,
    "batch_size": 50,
    "lr": 1e-4,                                                     # TODO: adaptive learning rate?
    "wd": 1e-4,
    # "l1": 1e-4,
    "shuffle": True,
}

res = run_training(SRNet, hyperparams, train_data, val_data, device=device)

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Total training loss: 9.532e+01
Total validation loss: 1.012e+02


In [None]:
_, ax = plt.subplots()

# plot training loss
lines = ax.plot(res['train_loss'])

# plot validation loss
ax.plot(res['val_loss'], '--', color=lines[-1].get_color())

ax.set_xlabel("Epoch")
ax.set_ylabel("Loss")
plt.show()

To run this notebook on Google Colab, the following commands are required:

`!git clone https://github.com/fabxy/symrep.git`

`%cd symrep`

However, running the training on a GPU is actually slower than on a CPU. (19.26it/s vs. 31.04it/s)

`wandb.watch` slows down the training process ()

`num_workers` also slows down the training process (7.38it/s vs. 17.88it/s for `num_workers=2`)

torch.backends.cudnn.benchmark (for const batch size)

accelerate

TensorDataset
DataLoader: num_workers, shuffle, batch_size

lighting