
# Hyperparameter Sweeps

In this project, we use Hyperparemter sweeps with Pytorch on "Weights & Biases". For further details, check out this [Colab](http://wandb.me/sweeps-colab).

Inspired by https://github.com/SheezaShabbir/Time-series-Analysis-using-LSTM-RNN-and-GRU

## Setup

Start out by installing the experiment tracking library and setting up your free W&B account:

1. Install with `!pip install`
2. `import` the library into Python
3. `.login()` so you can log metrics to your projects

If you've never used Weights & Biases before,
the call to `login` will give you a link to sign up for an account.
W&B is free to use for personal and academic projects!

In [None]:
!pip install wandb -Uq

In [None]:
import wandb

In [None]:
wandb.login()

## Defining the sweep config

We define the sweep config via dict in our Jupyter notebook. You can find more information on sweeps in the [documentation](https://docs.wandb.com/sweeps/configuration).

You can find a list of all configuration options [here](https://docs.wandb.com/library/sweeps/configuration) and a big collection of examples in YAML format [here](https://github.com/wandb/examples/tree/master/examples/keras/keras-cnn-fashion).

In [None]:
# See also https://towardsdatascience.com/choosing-the-right-hyperparameters-for-a-simple-lstm-using-keras-f8e9ed76f046

sweep_config = {
    'method': 'bayes',
    'metric': {
        'goal': 'minimize',
        'name': 'loss'
    },
    'parameters': {
        'model': {
            'values': ['lstm', 'rnn']
        },
        'scaler': {
            'values': ['standard', 'minmax', 'robust', 'maxabs']
        },
        'batch_size': {
            'distribution': 'q_log_uniform_values',
            'max': 256,
            'min': 32,
            'q': 8
        },
        'train_val_ratio': {
            'value': 0.8
        },
        'dropout': {
            'values': [0, 0.1, 0.2, 0.5]
        },
        'num_layers': {
            'values': [2, 4, 8, 16, 32]
        },
        'epochs': {
            'values': [5, 10, 20, 40]
        },
        'fc_layer_size': {
            'values': [50, 100, 200, 400, 1000]
        },
        'learning_rate': {
            'distribution': 'uniform',
            'max': 0.1,
            'min': 0.00001
        },
        'optimizer': {
            'values': ['adam', 'sgd']
        }
    }
}

## Initialize the setup

In [None]:
sweep_id = wandb.sweep(sweep_config, project="pp-sg-lstm")

## Run the sweep agent

### Define Your Training Procedure

Before we can actually execute the sweep, we need to define the training procedure that uses those values.

In the functions below, we define a simple fully-connected neural network in PyTorch, and add the following `wandb` tools to log model metrics, visualize performance and output and track our experiments:
* [**`wandb.init()`**](https://docs.wandb.com/library/init) – Initialize a new W&B Run. Each Run is a single execution of the training function.
* [**`wandb.config`**](https://docs.wandb.com/library/config) – Save all your hyperparameters in a configuration object so they can be logged. Read more about how to use `wandb.config` [here](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/wandb-config/Configs_in_W%26B.ipynb).
* [**`wandb.log()`**](https://docs.wandb.com/library/log) – log model behavior to W&B. Here, we just log the performance; see [this Colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/wandb-log/Log_(Almost)_Anything_with_W%26B_Media.ipynb) for all the other rich media that can be logged with `wandb.log`.

For more details on instrumenting W&B with PyTorch, see [this Colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/pytorch/Simple_PyTorch_Integration.ipynb).

In [None]:
import os
import torch
import torch.optim as optim
import torch.nn.functional as F
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import TensorDataset, DataLoader
from sklearn.preprocessing import StandardScaler, MaxAbsScaler, MinMaxScaler, RobustScaler
from models import LSTMModel, RNNModel, GRUModel
from scaler import Scaler
from data.metadata.metadata import feature_columns, parking_data_labels
from data.preprocessing.preprocess_features import PreprocessFeatures

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(f"Training on {device}")

train_data_path = "../data/preprocessing/01_pp_sg_train_cleaned.csv"
test_data_path = "../data/preprocessing/01_pp_sg_test_cleaned.csv"


def train(config=None):
    # Initialize a new wandb run
    with wandb.init(config=config):
        # If called by wandb.agent, as below,
        # this config will be set by Sweep Controller
        config = wandb.config

        X, y, input_dim, output_dim = load_features_labels(train_data_path)
        X_train, X_val, y_train, y_val = split_train_val(X, y, config.train_val_ratio)
        X_test, y_test, _, _ = load_features_labels(test_data_path)
        scaler = apply_scaler(config.scaler)
        X_train_scaled, X_val_scaled, X_test_scaled, y_train_scaled, y_val_scaled, y_test_scaled = scaler.scale(X_train,
                                                                                                        X_val,
                                                                                                        X_test,
                                                                                                        y_train,
                                                                                                        y_val,
                                                                                                        y_test)
        train_loader = build_dataset(config.batch_size, X_train_scaled, y_train_scaled)
        val_loader = build_dataset(config.batch_size, X_val_scaled, y_val_scaled)
        test_loader = build_dataset(config.batch_size, X_test_scaled, y_test_scaled)
        network = build_network(config.fc_layer_size, config.dropout, config.num_layers, input_dim, output_dim,
                                config.model)
        optimizer = build_optimizer(network, config.optimizer, config.learning_rate)

        for epoch in range(config.epochs):
            avg_loss = train_epoch(network, train_loader, optimizer, config.batch_size, input_dim)
            avg_val_loss = val_epoch(network, val_loader, config.batch_size, input_dim)
            wandb.log({"loss": avg_loss, "epoch": epoch})
            wandb.log({"loss (validation)": avg_val_loss, "epoch": epoch})

        avg_test_loss, test_outputs, test_targets = test_network(network, test_loader, config.batch_size, input_dim)
        wandb.log({"loss (test)": avg_test_loss})
        plot_test_prediction(scaler, test_outputs, test_targets)
        save_model_scaler(network, scaler)

In [None]:
def load_features_labels(csv_path):
    df = pd.read_csv(csv_path, sep=";")

    preprocess_features = PreprocessFeatures(df)

    # TODO unify this
    # df['datetime'] = pd.to_datetime(df['datetime'], format='%d.%m.%Y %H:%M')

    y = df[parking_data_labels]
    X, input_dim = preprocess_features.get_features_for_model()

    output_dim = len(y.columns)

    print(f"Input dimension: {input_dim}, columns: {X.columns}")
    print(f"Output dimension: {output_dim}, columns: {y.columns}")

    return X, y, input_dim, output_dim


def split_train_val(X, y, train_val_ratio):
    train_size = int(len(X) * train_val_ratio)
    X_train, X_val = X[:train_size], X[train_size:]
    y_train, y_val = y[:train_size], y[train_size:]
    return X_train, X_val, y_train, y_val

In [None]:
def apply_scaler(scaler):
    if scaler == "standard":
        return Scaler(StandardScaler())
    elif scaler == "minmax":
        return Scaler(MinMaxScaler())
    elif scaler == "robust":
        return Scaler(RobustScaler())
    elif scaler == "maxabs":
        return Scaler(MaxAbsScaler())
    else:
        raise ValueError(f"Invalid scaler value: {scaler}")

This cell defines the four pieces of our training procedure:
`build_dataset`, `build_network`, `build_optimizer`, and `train_epoch`.

All of these are a standard part of a basic PyTorch pipeline,
and their implementation is unaffected by the use of W&B,
so we won't comment on them.

In [None]:
def build_dataset(batch_size, X, y):
    features = torch.Tensor(X)
    targets = torch.Tensor(y)

    dataset = TensorDataset(features, targets)

    return DataLoader(dataset, batch_size=batch_size, shuffle=False, drop_last=True)


def build_network(fc_layer_size, dropout, num_layers, input_dim, output_dim, model):
    if model == "rnn":
        network = RNNModel(input_dim=input_dim, hidden_dim=fc_layer_size, layer_dim=num_layers, output_dim=output_dim,
                           dropout_prob=dropout)
    elif model == "lstm":
        network = LSTMModel(input_dim=input_dim, hidden_dim=fc_layer_size, layer_dim=num_layers, output_dim=output_dim,
                            dropout_prob=dropout)
    elif model == "gru":
        network = GRUModel(input_dim=input_dim, hidden_dim=fc_layer_size, layer_dim=num_layers, output_dim=output_dim,
                           dropout_prob=dropout)
    else:
        raise ValueError(f"Invalid model value: {model}")

    return network.to(device)


def build_optimizer(network, optimizer, learning_rate):
    if optimizer == "sgd":
        optimizer = optim.SGD(network.parameters(),
                              lr=learning_rate, momentum=0.9)
    elif optimizer == "adam":
        optimizer = optim.Adam(network.parameters(),
                               lr=learning_rate)
    return optimizer


def train_epoch(network, loader, optimizer, batch_size, input_dim):
    losses = []
    network.train()
    for _, (data, target) in enumerate(loader):
        data, target = data.view([batch_size, -1, input_dim]).to(device), target.to(device)
        optimizer.zero_grad()

        # output = network(data.unsqueeze(0)).squeeze() # See https://medium.com/@mike.roweprediger/using-pytorch-to-train-an-lstm-forecasting-model-e5a04b6e0e67

        # ➡ Forward pass
        loss = F.mse_loss(network(data), target)
        losses.append(loss.item())

        # ⬅ Backward pass + weight update
        loss.backward()
        optimizer.step()

        wandb.log({"batch loss": loss.item()})

    return np.mean(losses)


def val_epoch(network, loader, batch_size, input_dim):
    losses = []
    with torch.no_grad():
        network.eval()
        for _, (data, target) in enumerate(loader):
            data, target = data.view([batch_size, -1, input_dim]).to(device), target.to(device)
            loss = F.mse_loss(network(data), target)
            losses.append(loss.item())

            wandb.log({"batch loss (validation)": loss.item()})

    return np.mean(losses)


def test_network(network, loader, batch_size, input_dim):
    outputs = []
    targets = []
    losses = []
    with torch.no_grad():
        network.eval()
        for _, (data, target) in enumerate(loader):
            data, target = data.view([batch_size, -1, input_dim]).to(device), target.to(device)
            output = network(data)
            loss = F.mse_loss(output, target)
            outputs.append(output.detach().cpu().numpy())
            targets.append(target.detach().cpu().numpy())
            losses.append(loss.item())

    return np.mean(losses), outputs, targets

In [None]:
def inverse_transform(scaler, df, columns):
    for col in columns:
        df[col] = scaler.inverse_transform(df[col])
    return df

def plot_test_prediction(scaler, outputs, targets):
    outputs = inverse_transform(scaler, pd.DataFrame(np.concatenate(outputs)), parking_data_labels)
    targets = inverse_transform(scaler, pd.DataFrame(np.concatenate(targets)), parking_data_labels)

    for i, (output, target) in enumerate(zip(outputs, targets)):
        if i % 10 != 0:
            continue

        df_output = pd.DataFrame(output, columns=parking_data_labels)
        df_target = pd.DataFrame(target, columns=parking_data_labels)

        n_features = len(df_output.columns)

        # Plotting
        fig, ax = plt.subplots(figsize=(10, 6))

        # Setting the positions of the bars
        ind = np.arange(n_features)  # the x locations for the groups
        width = 0.35  # the width of the bars

        # Plotting bars for each row
        bars1 = ax.bar(ind - width / 2, df_output.iloc[0], width, label='Prediction (from model)')
        bars2 = ax.bar(ind + width / 2, df_target.iloc[0], width, label='Target (form dataset)')

        # Adding some text for labels, title, and custom x-axis tick labels
        ax.set_xlabel('Parking garages')
        ax.set_ylabel('Free parking spots')
        ax.set_title(f'Comparison of Two Rows in a Bar Chart {i}')
        ax.set_xticks(ind)
        ax.set_xticklabels(df_output.columns)
        ax.legend()

        plt.show()


In [None]:
def save_model_scaler(network, scaler):
    model_scripted = torch.jit.script(network)
    model_path = os.path.join(wandb.run.dir, "model_scripted.pt")
    print(f"Saving model to {model_path}")
    model_scripted.save(model_path)
    scaler.save(os.path.join(wandb.run.dir, "scaler.pkl"))

The cell below will launch an `agent` that runs `train` 5 times,
usingly the randomly-generated hyperparameter values returned by the Sweep Controller. Execution takes under 5 minutes.

Now, we're ready to start sweeping! 🧹🧹🧹

Sweep Controllers, like the one we made by running `wandb.sweep`, sit waiting for someone to ask them for a `config` to try out.

That someone is an `agent`, and they are created with `wandb.agent`.
To get going, the agent just needs to know
1. which Sweep it's a part of (`sweep_id`)
2. which function it's supposed to run (here, `train`)
3. (optionally) how many configs to ask the Controller for (`count`)

In [None]:
wandb.agent(sweep_id, train, count=1)