# Training

This notebook consists on some code for training a neural network on the ECG dataset, to the aggregated superclass level ("NORM", "MI", "STTC", "CD", "HYP").

This notebook uses the scripts 
- *train.py* to define the training function (quite simple; could be just defined in this notebook)
- *trainutils.py* defining some utils related to the training 
- *models.py* defining the 1d-resnet model used

## Mount repository
Doing this to make sure we have all the files of the repository.
Needed to import the *train.py*, *trainutils.py* and *models.py* files.

In [None]:
!git clone https://[PERSONAL ACCESS TOKEN REMOVED HERE!!!]@github.com/sergi-andreu/Idoven-challenge.git idoven

fatal: destination path 'idoven' already exists and is not an empty directory.


## Mount drive and import some required packages

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import os
from scipy.fft import rfft, rfftfreq
import scipy.signal as sp

import seaborn as sb
import pandas as pd

import torch
from torch.utils.data import TensorDataset

import os

from sklearn.preprocessing import StandardScaler

### Set the seed for reproducibility

In [None]:
def set_all_seeds(seed):
  np.random.seed(seed)
  torch.manual_seed(seed)
  torch.cuda.manual_seed(seed)
  torch.backends.cudnn.deterministic = True

set_all_seeds(0)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

### Import the training and test data

Using folds 1-8 for training, 9 for testing. Fold 10 is left for evaluation

In [None]:
drivepath = "drive/MyDrive/idoven"
fold_idx = 1
X = np.load(f"{drivepath}/nparrays/{str(fold_idx).zfill(2)}.npy")
Y = np.load(f"{drivepath}/nparrays/labels/{str(fold_idx).zfill(2)}.npy")

for fold_idx in range(2, 9):
  X = np.append(X, np.load(f"{drivepath}/nparrays/{str(fold_idx).zfill(2)}.npy"), axis=0)
  Y = np.append(Y, np.load(f"{drivepath}/nparrays/labels/{str(fold_idx).zfill(2)}.npy"), axis=0)


def apply_scaler(X, scaler):
    X_tmp = []
    for x in X:
        x_shape = x.shape
        X_tmp.append(scaler.transform(x.flatten()[:,np.newaxis]).reshape(x_shape))
    X_tmp = np.array(X_tmp)
    return X_tmp

# Scale the data using a standard scaler
scaler = StandardScaler()
scaler.fit(np.vstack(X).flatten()[:,np.newaxis].astype(float))
X = apply_scaler(X, scaler)

X, Y = torch.from_numpy(X).float(), torch.from_numpy(Y).float()
train_dataset = TensorDataset(X,Y)
del X, Y

X_test = np.load(f"{drivepath}/nparrays/09.npy")
Y_test = np.load(f"{drivepath}/nparrays/labels/09.npy")

X_test = apply_scaler(X_test, scaler)

X_test, Y_test = torch.from_numpy(X_test).float(), torch.from_numpy(Y_test).float()
test_dataset = TensorDataset(X_test, Y_test)
del X_test, Y_test

Save the scaler (for later evaluation)

In [None]:
import pickle as pck
dict_scaler = {"scale_" : scaler.scale_, "mean_" : scaler.mean_, "var_" : scaler.var_}

with open("drive/MyDrive/idoven/scalers/standardscaler.txt","wb") as filehandler:
  pck.dump(dict_scaler,filehandler)

### Define the dataloaders

In [None]:
from torch.utils.data import DataLoader, BatchSampler

train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=64, shuffle=True)

### Import utils

In [None]:
from idoven.models import *
from idoven.trainutils import *
from idoven.train import *

In [None]:
from sklearn.metrics import roc_auc_score, average_precision_score

loss_meter = AverageMeter()
CEloss = torch.nn.CrossEntropyLoss()

### Create model

In [None]:
num_outputs = 5
model = resnet18(num_outputs = num_outputs).to(device)

# Import Weights&Biases
This is used for experiment tracking. Here, Weights&Biases is specially useful since the colab notebook sometimes break, and it is a way to log the data on the cloud, and access is later.

In [None]:
!pip install wandb

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wandb
  Downloading wandb-0.13.5-py2.py3-none-any.whl (1.9 MB)
[K     |████████████████████████████████| 1.9 MB 13.6 MB/s 
Collecting pathtools
  Downloading pathtools-0.1.2.tar.gz (11 kB)
Collecting sentry-sdk>=1.0.0
  Downloading sentry_sdk-1.10.1-py2.py3-none-any.whl (166 kB)
[K     |████████████████████████████████| 166 kB 59.4 MB/s 
[?25hCollecting setproctitle
  Downloading setproctitle-1.3.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30 kB)
Collecting docker-pycreds>=0.4.0
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting GitPython>=1.0.0
  Downloading GitPython-3.1.29-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 71.1 MB/s 
[?25hCollecting shortuuid>=0.5.0
  Downloading shortuuid-1.0.10-py3-none-any.whl (10 kB)
Collecting gitdb<5,>=4.0.1
  Downloading gitdb

### Create a wandb sweep for hyperparameter tuning
I have tried also other parameters (not doing an exhaustive grid search). Only doing a grid search here on the learning rate values.

In [None]:
import wandb

sweep_config = {
  "name": "Learning rate sweep 2",
  "method": "grid",
  "parameters": {
        "lr": {
            "values": [1e-5, 5e-5, 1e-4, 5e-4, 1e-3]
        },
        "epochs" : {
            "values" : [5]
        },
        "log_every" : {
            "values" : [10]
        },
        "model_used" : {
            "values" : ["ResNet18"]
        }
    }
}

# Define the sweep ID
sweep_id = wandb.sweep(sweep_config, entity="sergi-andreu", project="idoven")

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Create sweep with ID: g0nr5xgv
Sweep URL: https://wandb.ai/sergi-andreu/idoven/sweeps/g0nr5xgv


### Train the model

The model is trained by the sweep (different trains, with the different parameters of the sweep).

For that matter, we define a TRAIN() function on the train() function, that mounts the model (the model is not defined on the train() function.

In [None]:
save_loc = "drive/MyDrive/idoven/models/"

def TRAIN():
  model = resnet18(num_outputs = num_outputs).to(device)
  #model = resnet34(num_outputs = num_outputs).to(device)
  train(model, train_dataloader, test_dataloader, verbose=False, 
        save_model=True, save_loc=save_loc)



Run the sweep

In [None]:
wandb.agent(sweep_id, function=TRAIN)

[34m[1mwandb[0m: Agent Starting Run: hzcofnd0 with config:
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	log_every: 10
[34m[1mwandb[0m: 	lr: 1e-05
[34m[1mwandb[0m: 	model_used: ResNet18
ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33msergi-andreu[0m. Use [1m`wandb login --relogin`[0m to force relogin


VBox(children=(Label(value='0.151 MB of 0.151 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
CD auc,▂▂▁▃▄▅▆▆▇▆▇▇▇▇▇▇▇▇█▇▇▇▇▇▇▇▇▇▇▇▇█▇▇█▇▇██▇
CD auprc,▂▂▁▂▃▄▅▆▇▆▇▇▇▇███▇█▇▇▇█▇▇████▇▇████▇▇███
HYP auc,▁▁▃▄▅▅▅▅▅▆▆▆▆▆▇▇▆▇▇▇▇▇▇▇▇▇▇██▇▇▇█████▇██
HYP auprc,▁▁▂▃▄▃▄▅▅▅▆▇▇▇▇▇▆▇▇▇▇▇▇▇▇▇███▆▇▇█████▆██
MI auc,▁▁▁▃▄▅▆▇▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█▇▇█▇▇█▇▇██▇██▇
MI auprc,▁▁▁▂▃▄▅▆▆▆▆▆▆▇▆▇▇▇▇▇▇▇▇▇▇█▇▇█▇▇████████▇
NORM auc,▁▁▃▅▇▇▇▇▇███████████████████████████████
NORM auprc,▁▁▃▅▇▇▇▇████████████████████████████████
STTC auc,▁▂▃▂▅▅▆▆▆▇▇▇▇█▇▇██▇▇████████████████████
STTC auprc,▁▁▂▂▄▄▄▅▅▅▆▆▇▇▇▇▇▇▇▇▇▇▇▇█████▇██████████

0,1
CD auc,0.79827
CD auprc,0.61988
HYP auc,0.87087
HYP auprc,0.56876
MI auc,0.80671
MI auprc,0.59702
NORM auc,0.91807
NORM auprc,0.88408
STTC auc,0.88822
STTC auprc,0.70352


[34m[1mwandb[0m: Agent Starting Run: xhnl3t0t with config:
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	log_every: 10
[34m[1mwandb[0m: 	lr: 5e-05
[34m[1mwandb[0m: 	model_used: ResNet18
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.001 MB of 0.151 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.004765…

0,1
CD auc,▂▄▄▅▄▃▅▆▅▆▆▇▅▃▆▆▇▇▅▄▅▆▄▇█▇▇▆▇▅▄▄▁▇▅▅█▇▆▆
CD auprc,▂▃▄▅▄▂▅▆▅▆▇▇▅▃▆▆▇▇▆▄▅▇▄▇█▇▇▆▇▄▄▂▁▆▄▅█▆▅▅
HYP auc,▁▃▅▅▆▆▇▆▆▇▇▆▆▇▇█▇▆▇▅▆█▇▇▇▇▆▇▇▇▆▇▅▆▆▆▇▇▆▇
HYP auprc,▁▃▃▄▄▅▅▆▅▆▆▆▄▇▇▇▇▅▇▄▆█▆▆▇▇▆▆▇▇▆▆▅▇▇▇▇▇▆▇
MI auc,▁▅▅▇▆█▇▇▆▇▆▆▃▇▇▆████▄▆▆▇▇█▇▇▆▆▇█▇▇▆▇▆▇█▇
MI auprc,▁▄▄▆▇▇▆▆▆▆▅▆▂▆▆▅█▇▇█▃▅▅▆▇▇▇▇▅▅▇▇▇▆▆▆▅▇█▇
NORM auc,▁▆▇▇▇█▇█████████████████████████████████
NORM auprc,▁▆▇█▇███████████████▇███████▇██████▇████
STTC auc,▁▄▆▆██▇████████▇▇███████▇███████▇███▇▇▇█
STTC auprc,▁▃▅▅▇▆▇█▇▇▇▇▇█▇▇▇█▇█▇███▆▇█▇▇█▇▇▇▇▇▇▇▆▇▇

0,1
CD auc,0.7822
CD auprc,0.51158
HYP auc,0.85608
HYP auprc,0.53442
MI auc,0.80497
MI auprc,0.57707
NORM auc,0.9109
NORM auprc,0.87002
STTC auc,0.88321
STTC auprc,0.66022


[34m[1mwandb[0m: Agent Starting Run: aj6halk1 with config:
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	log_every: 10
[34m[1mwandb[0m: 	lr: 0.0001
[34m[1mwandb[0m: 	model_used: ResNet18
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.151 MB of 0.151 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
CD auc,▁▄▅▅▇▄▆▆█▆▇▅▇▅▃▄▅▇█▇▆▆█▃▆▆█▇▇▆▆▄█▆▇▇▆▆▇█
CD auprc,▁▃▄▅▇▃▇▇█▅█▅▇▆▃▂▆██▇▇▅█▂▅▆▇▇▇▅▄▃█▅▇▇▅▅▆█
HYP auc,▁▃▅▆▆▇▆▇▇▇▇▇▇▇▆▆█▆▇██▇▇▆▇▅▆▆▆▆▇▄▆▆▄▆▅▆▅▆
HYP auprc,▁▂▅▇▆▇▆▇██▇▇▇█▆▆█▆██▇▇█▆▇▆▇▅▇██▇▇█▆▇▆▇▇▆
MI auc,▁▅▆▆▇▇▇▇▇█▅▇▇▆▇█▇███▇▅█▇▇▆▇▇█▇▇▇▆▇▇▇▇▇▇▇
MI auprc,▁▅▆▆▇▇▇▇▆▇▃▆▇▅▆█▇█▇▇▆▃▇▆▆▆▆▆▇▇▆▇▅▆▆▇▇▆▆▅
NORM auc,▁▆▇▇▇▇▇██▇███████▇▇██████▇▇▇█▇█▇█▇▇▇▇▇▇█
NORM auprc,▁▆▇▇▇▇▇██▇▇██████▇▇▇██▇█▇▇▇▇▇▇█▇▇▇▇▇▇▇▇█
STTC auc,▁▅▆█▇▇▇████▇███████▇██▇█▇█▇▇▇▇▇█▆▇▇▇▇▇▇▇
STTC auprc,▁▄▆▇▇▇▆▇███▇█████▇█▇▇▇▇███▇▇▇▇▇█▅▇▇▆▆▇▆▆

0,1
CD auc,0.84543
CD auprc,0.64712
HYP auc,0.85769
HYP auprc,0.54309
MI auc,0.81896
MI auprc,0.56232
NORM auc,0.92376
NORM auprc,0.89602
STTC auc,0.87255
STTC auprc,0.61926


[34m[1mwandb[0m: Agent Starting Run: e2uu0vhg with config:
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	log_every: 10
[34m[1mwandb[0m: 	lr: 0.0005
[34m[1mwandb[0m: 	model_used: ResNet18
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.001 MB of 0.153 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.004752…

0,1
CD auc,▁▄▅▅▅▅▆▄▆▆▅▇▆▇▆▇▄▇▃▆▇▆▅█▇███▇█▇█▇███▇▇█▇
CD auprc,▁▄▄▅▆▆▅▅▆▆▆▆▆█▆▇▆▇▃▆█▅▅█▇▇▇█▇█▇█▆▇█▇▇▇▇▇
HYP auc,▁▁▃▄▅▆▃▅▅▅▅▆▅▅▅▄▆▆▇▆▆▆▅▆▇▇▇▇▅▄▆▇▇▆▆█▇▇▇▇
HYP auprc,▁▁▂▄▅▄▂▃▅▆▄▄▃▃▄▂▅▅▇▅▆▄▄▅▇▆▆▅▂▃▅█▅▃▃▇▆▄▄▅
MI auc,▁▃▃▃▅▄▅▅▄▄▇▆▃▅▄▆▆▆▇▆▅▇▇▅▆▇▆▇█▇▆▇▆▆▇▅██▇▅
MI auprc,▁▂▃▃▄▄▃▃▃▅▇▄▂▅▄▄▆▅▅▅▅▇▅▅▅▆▆██▅▅▇▄▅▆▄█▇▆▄
NORM auc,▁▅▆▆▇▆▇▇▆█▇▇▇██▇███▇██▇▇████▇██▇████████
NORM auprc,▁▄▆▆▇▆▇▇▆█▇▇▇▇█▇▇▇▇▇▇█▇▇██▇▇▇▇█▇▇███▇▇██
STTC auc,▁▄▅▅▇▇▇█▆█▇▇██▇▆▇▇▇▇▇█▇▇▇▇█▆▇██▆▇▆█▇▇██▇
STTC auprc,▁▃▄▅▆▇▇▇▅█▇▅█▇▇▅▇▆▇█▇█▇▇▇▆▇▆▆▇▇▅▇▅█▇▇█▇▆

0,1
CD auc,0.87074
CD auprc,0.66942
HYP auc,0.8796
HYP auprc,0.49371
MI auc,0.83594
MI auprc,0.57119
NORM auc,0.93877
NORM auprc,0.91576
STTC auc,0.90278
STTC auprc,0.68248


[34m[1mwandb[0m: Agent Starting Run: sf6b0136 with config:
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	log_every: 10
[34m[1mwandb[0m: 	lr: 0.001
[34m[1mwandb[0m: 	model_used: ResNet18
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.153 MB of 0.153 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
CD auc,▄▂▃▅▅▅▄▁▆▅▅▆▄▇▅▇▇▅▆▅▇▇▅▅▄▇▇▆▇▅▇▅▅█▆▅▇▇▆▆
CD auprc,▃▂▃▅▆▅▅▁▆▅▆▆▄▇▅▇▇▅▅▄▆▇▆▃▄▆▇▆▆▅▆▄▄█▆▄▇▆▅▅
HYP auc,▁▅▅▆▆▇▆▆▇▅▇▇▇▆▇▆▇▆▇▇▇▇▇▇▇▇▇▇▇█▇█▇▇██▇▇▇█
HYP auprc,▁▃▄▅▄▇▄▅▄▃▆▄▄▄▅▄▅▄▆▄▅▅▅▆▅▄▄▅▄█▅▅▅▆▆▅▇▄▄▆
MI auc,▃▁▆▆▄▄▇▇▂▇▆▇▆▆▇▄▅▇▆▇▇▇▆▇█▆▆▇▅▇▇██▇▇▇█▆█▇
MI auprc,▁▁▅▄▃▃▅▅▁▄▄▄▄▄▄▃▄▇▄▅▅▆▄▅▆▅▄▅▃▆▅█▇▅▄▅█▄▆▅
NORM auc,▁▅▇▇▇▇▇▇▇▇████▇▇████████████████████████
NORM auprc,▁▄▇▆▇▇▇▆▇▆▇▇██▇▇███████████▇██▇████████▇
STTC auc,▁▆▇█▇█▇▇█▇▇██▇█▇█▇▇██▇▇█████▇████▇▇█████
STTC auprc,▁▄▆▇▇█▆▇█▆▆█▇▆█▆▇▆▅▇▇▅▆▇▇▇▇█▅▇▇▇▇▆▆█▇█▇▇

0,1
CD auc,0.82463
CD auprc,0.59373
HYP auc,0.87835
HYP auprc,0.48256
MI auc,0.85309
MI auprc,0.57481
NORM auc,0.93424
NORM auprc,0.90727
STTC auc,0.89602
STTC auprc,0.70232


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Sweep Agent: Exiting.


The results are now stored in my Wandb project. Some plots may be added in the readme file of the repository.