# Benchmark tutorial

This notebook provides basic tutorial of benchmark of bearing fault diagnosis model and optimizer's hyperparameter. Core implementation of this code is in the `fdob` module. This modlue provides data download, data preprocessing, model implementation, quasi-random hyperparameter sampling, and model trainning.

In [2]:
import fdob
import fdob.processing as processing
import fdob.model as model
import info
import benchmark

import torch
from torchvision import transforms

# Data download

We can download the CWRU and MFPT datasets using `download_cwru` and `download_mfpt`, respectively. These functions automatically download each dataset from URLs and return pandas `DataFrame`. `split_dataframe` splits dataframe to train, validation, and test `Dataframe`. `build_from_dataframe` build `numpy.ndarray` dataset by overlapping. In this tutorial, we use the CWRU dataset for training, and the data is generated with the sample length 4,096 and shift size 2,048.

In [None]:
df = fdob.download_cwru("./data/cwru")

# We exclude label named 999 and 0 HP motor load condition.
df = df[(df["label"] != 999) & (df["load"] != 0)]

train_df, val_df, test_df = fdob.split_dataframe(df, 0.6, 0.2)

X_train, y_train = fdob.build_from_dataframe(train_df, 4096, 2048, False)
X_val, y_val = fdob.build_from_dataframe(val_df, 4096, 2048, False)
X_test, y_test = fdob.build_from_dataframe(test_df, 4096, 2048, False)

# Getting the model and preparing `DataLoader`

The models and the initial hyperparameter search spaces for each optimizer used in the paper are in the `info.py` file. `info.model` contains the model, input length of the model, and `transform` of data. `info.hparam` contains the information of the search space of four optimizers, sgd, momentum, RMSProp, and adam. Users can employ the models and hyperparameter search space in the `info.py` but also can use the custom models and search space.

To train the model using PyTorch Lightning, `DataLoader` should be prepared. We provide `DatasetHandler`, which is the collection of multi-domain datasets. `assign` method generates `DataLoader`, and users can access the `DataLoader` by key of `DataLoader`. This tutorial uses two `DataLoader`s. `DataLoader` with key `cwru` is the noise-free data from the CWRU dataset, and `DataLoader` with key `cwru0` is the noisy data generated by Gaussian noise SNR 0dB from the CWRU dataset.

In [4]:
model_name = "wdcnn"

model = info.model[model_name]["model"]
sample_length = info.model[model_name]["sample_length"]
tf_data = info.model[model_name]["tf"]
tf_label = [processing.NpToTensor()]
batch_size = 32
num_workers = 1

dmodule = fdob.DatasetHandler()

dmodule.assign(
    X_train,
    y_train,
    X_val,
    y_val,
    X_test,
    y_test,
    sample_length,
    "cwru",
    transforms.Compose(tf_data),
    transforms.Compose(tf_label),
    batch_size,
    num_workers
)

dmodule.assign(
    X_train,
    y_train,
    X_val,
    y_val,
    X_test,
    y_test,
    sample_length,
    "cwru0",
    transforms.Compose([processing.AWGN(0)] + tf_data),
    transforms.Compose(tf_label),
    batch_size,
    num_workers
)

In [None]:
# access to the DataLoader of train dataset of the noise-free CWRU dataset.
dmodule.dataloaders["cwru"]["train"]

# Access to the Dataset of train dataset of the noisy CWRU dataset.
dmodule.dataloaders["cwru"]["train"].dataset

# Hyperparameter sampling

`log_qsample` samples hyperaprameters in log-scale quasi-random distribution. For example, if adam optimizer and hyperparameter search space defined in `info.py` is used, each hyperparameters are sampled from probability distributions below.

* $\eta \sim 10^{U[-4, -1]}$
* $1 - \beta_{1} \sim 10^{U[-3, 0]}$
* $1 - \beta_{2} \sim 10^{U[-4, -1]}$
* $\epsilon \sim 10^{U[-10, 0]}$

In [None]:
n_exps = 4

hparam_info = info.hparam["adam"]

hparams = fdob.log_qsample(
    hparam_info["n_params"],
    hparam_info["param_names"],
    hparam_info["lb"],
    hparam_info["ub"],
    hparam_info["reversed"],
    n_exps
)

hparams

# Model training and evaluation

To benchmark, following materials should be prepared. 

* train `DataLoader`
* validation `DataLoader`
* PyTorch model
* model's keyword argument (if there is no keyword argument, pass `None`)
* PyTorch optimizer from `torch.optim`
* optimizer's keyword argument (if there is no keyword argument, pass `None`)
* PyTorch loss function from `torch.nn`
* loss function's keyword argument (if there is no keyword argument, pass `None`)
* The number of epochs
* Random seed (if `None` is passed, random seed is not set)
* The number of GPU (only CUDA GPU is supported)
* Result directory of the experiemnt

Following code train the WDCNN using the first hyperparameter determined above, and the result is saved in the `./logs/mytest`.


In [None]:
model_kwargs = {
    "n_classes": 10
}

opt = hparam_info["optimizer"]
opt_kwargs = {
    "lr": hparams["lr"][0],
    "betas": (hparams["beta1"][0], hparams["beta2"][0]),
    "eps": hparams["eps"][0]
}

loss = torch.nn.CrossEntropyLoss
loss_kwargs = None

seed =6464
n_gpu = 0
n_epochs = 5

result_dir = "./logs/mytest"

benchmark.train(
    dmodule.dataloaders["cwru"]["train"],
    dmodule.dataloaders["cwru"]["val"],
    model,
    model_kwargs,
    opt,
    opt_kwargs,
    loss,
    loss_kwargs,
    n_epochs,
    seed,
    n_gpu,
    result_dir
)
benchmark.test(
    dmodule.dataloaders["cwru"]["test"],
    model,
    model_kwargs,
    opt,
    opt_kwargs,
    loss,
    loss_kwargs,
    n_epochs,
    seed,
    n_gpu,
    result_dir,
    "noise-free"
)
benchmark.test(
    dmodule.dataloaders["cwru0"]["test"],
    model,
    model_kwargs,
    opt,
    opt_kwargs,
    loss,
    loss_kwargs,
    n_epochs,
    seed,
    n_gpu,
    result_dir,
    "noise"
)