<a href="https://colab.research.google.com/github/palkop11/soh-ml/blob/master/interaction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Preparations

## Mount
Mount google drive so you can upload DATA from drive to colab environment

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Clone, change directory
Clone repository (works when repo is public), then jump to repo directory

In [None]:
# clone repo
!git clone https://github.com/palkop11/soh-ml.git

# jump to repo directory
%cd soh-ml

fatal: destination path 'soh-ml' already exists and is not an empty directory.
/content/soh-ml


## Upload dataset
Before running cell below, make sure that 'dataset_v5_ts_npz.zip' was uploaded anywhere in your drive, and drive was mounted. \
Find archive and extract into content/soh-ml/DATA :

In [None]:
# find "dataset_v5_ts_npz.zip" and then extract it into temporary workspace
!find /content/drive -name "dataset_v5_ts_npz.zip" | xargs -I {} unzip -qo {} -d "./DATA/"
# check if dataset directory was created
!find /content/soh-ml/DATA -maxdepth 1

/content/soh-ml/DATA
/content/soh-ml/DATA/dataset_v5_ts_npz


## Installs, imports, fix seed

Run cell below to install packages

In [None]:
"""
POTENTIAL FUTURE PROBLEMS WITH REPRODUCIBILITY
here should be pip install from requirements_colab for reproducibility
but now it results in errors :/
!pip install -r requirements_colab.txt -q
"""

!pip install tensorboard tbparse pytorch-lightning lightning -q

make sure that you are in project directory content/soh-ml, otherwise some imports will fail

In [None]:
%cd /content/soh-ml/

/content/soh-ml


make imports, fix seed

In [None]:
import learning
import cross_validation

from lightning import seed_everything
seed_everything(42)

INFO: Seed set to 42
INFO:lightning.fabric.utilities.seed:Seed set to 42


42

# Running single experiments

## how to

you can use run_experiment function from learning.py to run experiment. \
This will run experiment with default testing configuration:
```python
learning.run_experiment(learning.test_config)
```
Also, you can run it from command line. Without arguments it will run with test_config configuration:
```bash
!python learning.py
```

To run experiment with your own config, you can specify it using python code and then pass config dictionary as argument to run_experiment function:
```python
# all keys (except ['test'] in ['data'])
# shoud be present in this single experiment configure dictionary:
my_config = {
    'experiment_name': 'my_experiment', # also used for TensorBoard logging
    'seed': 42,

    'data': {
        'datadir':'./DATA/dataset_v5_ts_npz/',
        # 'supported subsets as string arguments:
        # 'blacklist', 'small', 'train', 'val', 'test'
        # or you can specify subset using list with battery IDs
        'train': 'train',
        'val': 'val',
        #'test': None # normally, you dont need to specify test dataset
        # supported normalization types:
        # 'minmax_zero_one', 'minmax_symmetric', 'meanimax', 'meanstd'
        'normalization': {'x': None, 'y': 'minmax_zero_one'},
        'n_diff': 0,
    },

    'model': {
        'input_size': 2,
        'cnn_hidden_dim': 32,
        'cnn_channels': [4, 8, 16],
        'lstm_hidden_size': 32,
        'num_layers': 1,
        'output_size': 1, # do not change that
        'dropout': 0.,
        'regressor_hidden_dim': 1024,
        'output_activation': 'sigmoid', # 'tanh', 'sigmoid', 'relu' are supported
    },

    'training': {
        # resume_ckpt supported options:
        # None, 'auto', 'from_best', 'from_last' or path to .ckpt file
        # aware of using checkpoints when it is not applicable,
        # i.e. model has changed
        'resume_ckpt': 'from_last',
        'batch_size': 32,
        'learning_rate': 1e-3,
        'loss_type': 'huber1.0', # 'mse', 'mae, 'huberX.X', 'bce' are supported
        'epochs': 1,
        'accelerator': 'auto',
        'devices': 1,
    },

    'metrics': 'all', # 'all', 'mse', 'mae', 'mape', 'r2', 'pcc' are supported

    'logging': {
        # change logging directory
        # so you wouldnt waste logs when runtime would be killed
        # /content/drive/MyDrive/for-soh-ml/LOGS
        'log_dir': './LOGS',
        'progress_bar': True,
        'plot': True,
        'savefig': True,
    }
}
```
and then run experiment just like this:
```python
learning.run_experiment(my_config)
```
also, such config may be stored in .yaml format and then you can pass it to run_experiment:
```python
learning.run_experiment('path/to/my/config/my_config.yaml')
```
also you can specify config when running learning.py in command line:
```bash
!python learning.py --config path/to/my/config.yaml
```

during experiment, inside LOGS directory (or any other 'log_dir', which you will specify in config) \
would be created directory for each experiment and version: \
my_experiment/version_0

## Experiments

In [None]:
%load_ext tensorboard
%tensorboard --logdir /content/drive/MyDrive/for-soh-ml/LOGS --port=5005

### Train on small, validate on train

In [None]:
# all keys (except ['test'] in ['data'])
# shoud be present in this single experiment configure dictionary:
tr_small_val_train_config_128 = {
    'experiment_name': 'tr_small_val_train_128', # also used for TensorBoard logging
    'seed': 42,

    'data': {
        'datadir':'./DATA/dataset_v5_ts_npz/',
        # 'supported subsets as string arguments:
        # 'blacklist', 'small', 'train', 'val', 'test'
        # or you can specify subset using list with battery IDs
        'train': 'small',
        'val': 'train',
        #'test': None # normally, you dont need to specify test dataset
        # supported normalization types:
        # 'minmax_zero_one', 'minmax_symmetric', 'meanimax', 'meanstd'
        'normalization': {'x': None, 'y': 'minmax_zero_one'},
        'n_diff': 0,
    },

    'model': {
        'input_size': 2,
        'cnn_hidden_dim': 32,
        'cnn_channels': [4, 8, 16],
        'lstm_hidden_size': 32,
        'num_layers': 1,
        'output_size': 1, # do not change that
        'dropout': 0.,
        'regressor_hidden_dim': 128,
        'output_activation': 'sigmoid', # 'tanh', 'sigmoid', 'relu' are supported
    },

    'training': {
        # resume_ckpt supported options:
        # None, 'auto', 'from_best', 'from_last' or path to .ckpt file
        # aware of using checkpoints when it is not applicable,
        # i.e. model has changed
        'resume_ckpt': 'from_last',
        'batch_size': 32,
        'learning_rate': 1e-3,
        'loss_type': 'huber1.0', # 'mse', 'mae, 'huberX.X', 'bce' are supported
        'epochs': 250,
        'accelerator': 'auto',
        'devices': 1,
    },

    'metrics': 'all', # 'all', 'mse', 'mae', 'mape', 'r2', 'pcc' are supported

    'logging': {
        # change logging directory
        # so you wouldnt waste logs when runtime would be killed
        # /content/drive/MyDrive/for-soh-ml/LOGS
        'log_dir': '/content/drive/MyDrive/for-soh-ml/LOGS',
        'progress_bar': True,
        'plot': True,
        'savefig': True,
    }
}

learning.run_experiment(tr_small_val_train_config_128)

In [None]:
# all keys (except ['test'] in ['data'])
# shoud be present in this single experiment configure dictionary:
tr_small_val_train_config_256 = {
    'experiment_name': 'tr_small_val_train_256', # also used for TensorBoard logging
    'seed': 42,

    'data': {
        'datadir':'./DATA/dataset_v5_ts_npz/',
        # 'supported subsets as string arguments:
        # 'blacklist', 'small', 'train', 'val', 'test'
        # or you can specify subset using list with battery IDs
        'train': 'small',
        'val': 'train',
        #'test': None # normally, you dont need to specify test dataset
        # supported normalization types:
        # 'minmax_zero_one', 'minmax_symmetric', 'meanimax', 'meanstd'
        'normalization': {'x': None, 'y': 'minmax_zero_one'},
        'n_diff': 0,
    },

    'model': {
        'input_size': 2,
        'cnn_hidden_dim': 32,
        'cnn_channels': [4, 8, 16],
        'lstm_hidden_size': 32,
        'num_layers': 1,
        'output_size': 1, # do not change that
        'dropout': 0.,
        'regressor_hidden_dim': 256,
        'output_activation': 'sigmoid', # 'tanh', 'sigmoid', 'relu' are supported
    },

    'training': {
        # resume_ckpt supported options:
        # None, 'auto', 'from_best', 'from_last' or path to .ckpt file
        # aware of using checkpoints when it is not applicable,
        # i.e. model has changed
        'resume_ckpt': 'from_last',
        'batch_size': 32,
        'learning_rate': 1e-3,
        'loss_type': 'huber1.0', # 'mse', 'mae, 'huberX.X', 'bce' are supported
        'epochs': 250,
        'accelerator': 'auto',
        'devices': 1,
    },

    'metrics': 'all', # 'all', 'mse', 'mae', 'mape', 'r2', 'pcc' are supported

    'logging': {
        # change logging directory
        # so you wouldnt waste logs when runtime would be killed
        # /content/drive/MyDrive/for-soh-ml/LOGS
        'log_dir': '/content/drive/MyDrive/for-soh-ml/LOGS',
        'progress_bar': True,
        'plot': True,
        'savefig': True,
    }
}

learning.run_experiment(tr_small_val_train_config_256)

# Running cross-validation

## how to

Example of master cross-validation config:

```python
cv_experiment_config = {
    'master_name': 'cv_experiment_config',
    'base_config': {
        'experiment_name': None,
        'seed': 42,
        'data': {
            'datadir': './DATA/dataset_v5_ts_npz/',
            'normalization': {'x': None, 'y': 'minmax_zero_one'},
            'n_diff': 0,
        },
        'model': {
            'input_size': 2,
            'cnn_hidden_dim': 32,
            'cnn_channels': [4, 8, 16],
            'lstm_hidden_size': 32,
            'num_layers': 1,
            'output_size': 1,
            'dropout': 0.,
            'regressor_hidden_dim': 1024,
            'output_activation': 'sigmoid',
        },
        'training': {
            'resume_ckpt': None,
            'batch_size': 32,
            'learning_rate': 1e-3,
            'loss_type': 'huber1.0',
            'epochs': 2,
            'accelerator': 'auto',
            'devices': 1,
        },
        'metrics': 'all',
        'logging': {
            # to MyDrive:
            'log_dir': '/content/drive/MyDrive/for-soh-ml/LOGS/cross-validation/',
            'progress_bar': True,
            'plot': False,
            'savefig': True,
        }
    },
    'hyperparam_grid': {
        'model': {
            'cnn_hidden_dim': [16, 32],
        },
    },
    'crossval_settings': {
        'n_splits': 2,
        'method': 'stratified', # 'regular' or 'stratified'
        'strat_label': 'chem', # should be None if 'method' == 'regular'
        'dataset_subset': 'small',
    }
}
```

you can run cross validation from command line \
(without any arguments it will run with test config):
```
!python cross_validation.py
```
or specify master cross validation config in .yaml file:
```
!python cross_validation.py --config path/to/master/cv/config.yaml
```

and ofcourse you can run it using python code. \
you can specify config as python dictionary:
```python
validator = cross_validation.CrossValidator(cv_experiment_config)
```

or you can specify path to config in .yaml:
```python
validator = cross_validation.CrossValidator('path/to/master/cv/config.yaml')
```

then you have to run validation just by .run() method:
```python
validator.run()
```

## Experiments

### test

In [None]:
%load_ext tensorboard
%tensorboard --logdir /content/soh-ml/LOGS/cross-validation/cv_test --port=7007

In [None]:
cv_test_config = {
    'master_name': 'cv_test',
    'base_config': {
        'experiment_name': None,
        'seed': 42,
        'data': {
            'datadir': './DATA/dataset_v5_ts_npz/',
            'normalization': {'x': None, 'y': 'minmax_zero_one'},
            'n_diff': 0,
        },
        'model': {
            'input_size': 2,
            'cnn_hidden_dim': 32,
            'cnn_channels': [4, 8, 16],
            'lstm_hidden_size': 32,
            'num_layers': 1,
            'output_size': 1,
            'dropout': 0.,
            'regressor_hidden_dim': 1024,
            'output_activation': 'sigmoid',
        },
        'training': {
            'resume_ckpt': None,
            'batch_size': 32,
            'learning_rate': 1e-3,
            'loss_type': 'huber1.0',
            'epochs': 2,
            'accelerator': 'auto',
            'devices': 1,
        },
        'metrics': 'all',
        'logging': {
            # to MyDrive:
            #'log_dir': '/content/drive/MyDrive/for-soh-ml/LOGS/cross-validation/',
            'log_dir': '/content/soh-ml/LOGS/cross-validation/',
            'progress_bar': True,
            'plot': False,
            'savefig': True,
        }
    },
    'hyperparam_grid': {
        'model': {
            'cnn_hidden_dim': [16, 32],
        },
    },
    'crossval_settings': {
        'n_splits': 2,
        'method': 'stratified', # 'regular' or 'stratified'
        'strat_label': 'chem', # should be None if 'method' == 'regular'
        'dataset_subset': 'small',
    }
}

In [None]:
validator = cross_validation.CrossValidator(cv_test_config)

In [None]:
validator.run()

Skipping completed experiment: 29b413e0_fold_0
Skipping completed experiment: 29b413e0_fold_1
Skipping completed experiment: 0f9db9ac_fold_0
Skipping completed experiment: 0f9db9ac_fold_1


In [None]:
validator._generate_param_combinations()

[{'model': {'cnn_hidden_dim': 16}}, {'model': {'cnn_hidden_dim': 32}}]

### cv1_small

In [None]:
%load_ext tensorboard
%tensorboard --logdir /content/soh-ml/LOGS/cross-validation/cv1_small --port=2003

In [None]:
cv1_small_config = {
    'master_name': 'cv1_small',
    'base_config': {
        'experiment_name': None,
        'seed': 42,
        'data': {
            'datadir': './DATA/dataset_v5_ts_npz/',
            'normalization': {'x': None, 'y': 'minmax_zero_one'},
            'n_diff': 0,
        },
        'model': {
            'input_size': 2,
            'cnn_hidden_dim': 32,
            'cnn_channels': [4, 8, 16],
            'lstm_hidden_size': 32,
            'num_layers': 1,
            'output_size': 1,
            'dropout': 0.,
            'regressor_hidden_dim': 128,
            'output_activation': 'sigmoid',
        },
        'training': {
            'resume_ckpt': None,
            'batch_size': 32,
            'learning_rate': 1e-3,
            'loss_type': 'huber1.0',
            'epochs': 50,
            'accelerator': 'auto',
            'devices': 1,
        },
        'metrics': 'all',
        'logging': {
            # to MyDrive:
            #'log_dir': '/content/drive/MyDrive/for-soh-ml/LOGS/cross-validation/',
            'log_dir': '/content/soh-ml/LOGS/cross-validation/',
            'progress_bar': True,
            'plot': False,
            'savefig': True,
        }
    },
    'hyperparam_grid': {
        'model': {
            'dropout': [0.0, 0.25],
        },
    },
    'crossval_settings': {
        'n_splits': 4,
        'method': 'stratified', # 'regular' or 'stratified'
        'strat_label': 'chem', # should be None if 'method' == 'regular'
        'dataset_subset': 'small',
    }
}

In [None]:
validator_1_small = cross_validation.CrossValidator(cv1_small_config)

In [None]:
validator_1_small.run()

In [None]:
# !cp -r "{temp_dir}/" "{drive_dir}/"
!cp -r /content/soh-ml/LOGS/cross-validation/cv1_small /content/drive/MyDrive/for-soh-ml/LOGS/cross-validation/cv1_small

## cv2_small

In [None]:
%load_ext tensorboard
%tensorboard --logdir /content/soh-ml/LOGS/cross-validation/cv2_small --port=3002

In [None]:
cv2_small_config = {
    'master_name': 'cv2_small',
    'base_config': {
        'experiment_name': None,
        'seed': 42,
        'data': {
            'datadir': './DATA/dataset_v5_ts_npz/',
            'normalization': {'x': None, 'y': 'minmax_zero_one'},
            'n_diff': 0,
        },
        'model': {
            'input_size': 2,
            'cnn_hidden_dim': 32,
            'cnn_channels': [4, 8, 16],
            'lstm_hidden_size': 32,
            'num_layers': 1,
            'output_size': 1,
            'dropout': 0.,
            'regressor_hidden_dim': 128,
            'output_activation': 'sigmoid',
        },
        'training': {
            'resume_ckpt': None,
            'batch_size': 32,
            'learning_rate': 1e-3,
            'loss_type': 'huber1.0',
            'epochs': 150,
            'accelerator': 'auto',
            'devices': 1,
        },
        'metrics': 'all',
        'logging': {
            # to MyDrive:
            #'log_dir': '/content/drive/MyDrive/for-soh-ml/LOGS/cross-validation/',
            'log_dir': '/content/soh-ml/LOGS/cross-validation/',
            'progress_bar': True,
            'plot': False,
            'savefig': True,
        }
    },
    'hyperparam_grid': {
        'model': {
            'dropout': [0.0, 0.25],
        },
    },
    'crossval_settings': {
        'n_splits': 4,
        'method': 'stratified', # 'regular' or 'stratified'
        'strat_label': 'chem', # should be None if 'method' == 'regular'
        'dataset_subset': 'small',
    }
}

In [None]:
validator_2_small = cross_validation.CrossValidator(cv2_small_config)

In [None]:
validator_2_small.run()

In [None]:
# !cp -r "{temp_dir}/" "{drive_dir}/"
!cp -r /content/soh-ml/LOGS/cross-validation/cv2_small /content/drive/MyDrive/for-soh-ml/LOGS/cross-validation/cv2_small

## cv3_small

In [None]:
%load_ext tensorboard
%tensorboard --logdir /content/soh-ml/LOGS/cross-validation/cv3_small --port=4003

In [None]:
cv3_small_config = {
    'master_name': 'cv3_small',
    'base_config': {
        'experiment_name': None,
        'seed': 42,
        'data': {
            'datadir': './DATA/dataset_v5_ts_npz/',
            'normalization': {'x': None, 'y': 'minmax_zero_one'},
            'n_diff': 0,
        },
        'model': {
            'input_size': 2,
            'cnn_hidden_dim': 32,
            'cnn_channels': [4, 8, 16],
            'lstm_hidden_size': 32,
            'num_layers': 1,
            'output_size': 1,
            'dropout': 0.,
            'regressor_hidden_dim': 128,
            'output_activation': 'sigmoid',
        },
        'training': {
            'resume_ckpt': None,
            'batch_size': 32,
            'learning_rate': 1e-3,
            'loss_type': 'huber1.0',
            'epochs': 150,
            'accelerator': 'auto',
            'devices': 1,
        },
        'metrics': 'all',
        'logging': {
            # to MyDrive:
            #'log_dir': '/content/drive/MyDrive/for-soh-ml/LOGS/cross-validation/',
            'log_dir': '/content/soh-ml/LOGS/cross-validation/',
            'progress_bar': True,
            'plot': False,
            'savefig': True,
        }
    },
    'hyperparam_grid': {
        'data':{
            'normalization': [
                {'x': None, 'y': 'minmax_zero_one'},
                {'x': 'meanimax', 'y': 'minmax_zero_one'},
            ],
            'n_diff': [0, 1],
        },
    },
    'crossval_settings': {
        'n_splits': 4,
        'method': 'stratified', # 'regular' or 'stratified'
        'strat_label': 'chem', # should be None if 'method' == 'regular'
        'dataset_subset': 'small',
    }
}