# Testing different Hyperparameters

Lets say we're curious about __how different learning rates and different batch sizes affect our model's accuracy when restricted to 5 epochs__, and we want to build an experiment to test out these hyperparameters.

In this notebook, we'll walk through the following:

- use python to perform this experiment
- use the CLI to perform this experiment
- evalute the results using Pandas

---

In [7]:
import sys
sys.path.append("../../")
import pandas as pd
import os

from utils_ic import ic_root_path
from utils_ic.datasets import unzip_url, Urls, data_path
from utils_ic.benchmark import (
    prettify_df, benchmark, Architecture, TrainingSchedule, get_parameters
)
from utils_ic.datasets import unzip_urls

Lets download some data that we want to test on.

In [2]:
input_data = unzip_url(Urls.fridge_objects_path, exist_ok=True)

## Using Python

Before we start testing, it's a good idea to see what the default parameters. We can use a the helper function `get_parameters` to easily get those default values.

Use `get_parameters??` to see what the method signature looks like, and also what the defaults are. 

In [None]:
get_parameters??

Now that we know the defaults, we can pass in the list of the parameters we want to benchmark. 

In this notebook, we want to see the effect of different learning rates across different batch sizes using only 5 epochs (the default number of epochs is 15). To do so, I would run the `get_parameters` functions as follows:

```python
params = get_parameters(lrs=[1e-3, 1e-4], batch_sizes=[8, 16, 32], epochs=[5])
```

Notice that all parameters must be passed in as a list, including single values such as `epochs=[5]`.

These parameters will be used to calculate the number of permutations to run. In this case, we've passed in `lrs=[1e-3, 1e-4]`, `batch_sizes=[8, 16, 32]`, and `epochs=[5]`. This will result in 2 X 3 X 1 total permutations (in otherwords, 6 permutations). 

In [11]:
params = get_parameters(lrs=[1e-3, 1e-4], batch_sizes=[8, 16, 32], epochs=[5])

Now that we have our parameters defined, we call the function `benchmark()` which takes in those params. 

We also need to pass in:
- the number of repetitions to run each permutation
- whether or not we want the training to stop early if the metric (accuracy) doesn't improve by 0.01 (1%) over 3 epochs

In [12]:
reps = 3
early_stopping = False

The `benchmark()` function returns a multi-index dataframe which we can work with right away.

In [13]:
df = benchmark([input_data], params, reps, early_stopping); df

Unnamed: 0,Unnamed: 1,Unnamed: 2,duration,accuracy
0,"lr: 0.0001, epochs: 5, batch_size: 16, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: True, one_cycle_policy: True",fridgeObjects,0.836295,0.136364
0,"lr: 0.0001, epochs: 5, batch_size: 32, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: True, one_cycle_policy: True",fridgeObjects,1.108188,0.409091
0,"lr: 0.0001, epochs: 5, batch_size: 8, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: True, one_cycle_policy: True",fridgeObjects,0.644506,0.272727
0,"lr: 0.001, epochs: 5, batch_size: 16, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: True, one_cycle_policy: True",fridgeObjects,0.825757,0.159091
0,"lr: 0.001, epochs: 5, batch_size: 32, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: True, one_cycle_policy: True",fridgeObjects,1.113043,0.159091
0,"lr: 0.001, epochs: 5, batch_size: 8, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: True, one_cycle_policy: True",fridgeObjects,4.986475,0.272727
1,"lr: 0.0001, epochs: 5, batch_size: 16, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: True, one_cycle_policy: True",fridgeObjects,0.78972,0.204545
1,"lr: 0.0001, epochs: 5, batch_size: 32, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: True, one_cycle_policy: True",fridgeObjects,1.286736,0.136364
1,"lr: 0.0001, epochs: 5, batch_size: 8, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: True, one_cycle_policy: True",fridgeObjects,0.710519,0.204545
1,"lr: 0.001, epochs: 5, batch_size: 16, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: True, one_cycle_policy: True",fridgeObjects,0.871194,0.272727


## Using the CLI tool to benchmark

Instead of using python to run this experiment, we may want to test from the CLI. We can do so by using the `scripts/benchmark.py` file. 

First we move up to the `/image_classification` directory.

In [3]:
os.chdir(ic_root_path())

To reproduce the same test (different learning rates across different batch sizes using only 5 epochs), and the same settings (datasets in 'tmp_data', 3 repetitions, and no early_stopping) we can run the following:

```sh
python scripts/benchmark.py 
    --learning-rates 1e-3 1e-4 
    --batch-sizes 8 16 32 
    --epochs 5 
    --repeat 3 
    --no-early-stopping 
    --inputs <my-data-dir>
    --output lr_bs_test.csv
```

Additionally, we've added an output parameter, which will automatically dump our dataframe into a csv file.

To simplify the command, we can use the acryonyms of the params. We can also remove `--no-early-stopping` as that is the default behavior.

```sh
python scripts/benchmark.py -lr 1e-3 1e-4 -bs 8 16 32 -e 5 -r 3 -i <my-data-dir> -o lr_bs_test.csv
```

HINT: You can learn more about how to use the script: `python script/benchmark.py --help`

In [4]:
# use {sys.executable} instead of just running `python` to ensure the command is executed using the environment cvbp
!{sys.executable} scripts/benchmark.py -l 1e-3 1e-4 -bs 8 16 32 -e 5 -r 3 -i {input_data} -o data/lr_bs_test.csv

Running 1 of 6 permutations. Repeat 1 of 3.
Running 2 of 6 permutations. Repeat 1 of 3.      
Running 3 of 6 permutations. Repeat 1 of 3.      
Running 4 of 6 permutations. Repeat 1 of 3.      
Running 5 of 6 permutations. Repeat 1 of 3.      
Running 6 of 6 permutations. Repeat 1 of 3.      
Running 1 of 6 permutations. Repeat 2 of 3.      
Running 2 of 6 permutations. Repeat 2 of 3.      
Running 3 of 6 permutations. Repeat 2 of 3.      
Running 4 of 6 permutations. Repeat 2 of 3.      
Running 5 of 6 permutations. Repeat 2 of 3.      
Running 6 of 6 permutations. Repeat 2 of 3.      
Running 1 of 6 permutations. Repeat 3 of 3.      
Running 2 of 6 permutations. Repeat 3 of 3.      
Running 3 of 6 permutations. Repeat 3 of 3.      
Running 4 of 6 permutations. Repeat 3 of 3.      
Running 5 of 6 permutations. Repeat 3 of 3.      
Running 6 of 6 permutations. Repeat 3 of 3.      
Total Time elapsed: 32.6 seconds.                
Output has been saved to '/home/jiata/code/cvbp/image_cl

Once the script completes, load the csv into a dataframe to explore it's contents. We'll want to specify `index_col=[0, 1, 2]` since it is a multi-index dataframe.

In [10]:
df = pd.read_csv("data/lr_bs_test.csv", index_col=[0, 1, 2]); df

Unnamed: 0,Unnamed: 1,Unnamed: 2,duration,accuracy
0,"lr: 0.0001, epochs: 5, batch_size: 16, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: False, one_cycle_policy: False",fridgeObjects,0.838662,0.363636
0,"lr: 0.0001, epochs: 5, batch_size: 32, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: False, one_cycle_policy: False",fridgeObjects,1.092153,0.431818
0,"lr: 0.0001, epochs: 5, batch_size: 8, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: False, one_cycle_policy: False",fridgeObjects,0.647182,0.295455
0,"lr: 0.001, epochs: 5, batch_size: 16, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: False, one_cycle_policy: False",fridgeObjects,0.792494,0.25
0,"lr: 0.001, epochs: 5, batch_size: 32, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: False, one_cycle_policy: False",fridgeObjects,1.198691,0.181818
0,"lr: 0.001, epochs: 5, batch_size: 8, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: False, one_cycle_policy: False",fridgeObjects,4.770589,0.272727
1,"lr: 0.0001, epochs: 5, batch_size: 16, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: False, one_cycle_policy: False",fridgeObjects,0.786056,0.181818
1,"lr: 0.0001, epochs: 5, batch_size: 32, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: False, one_cycle_policy: False",fridgeObjects,1.113328,0.318182
1,"lr: 0.0001, epochs: 5, batch_size: 8, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: False, one_cycle_policy: False",fridgeObjects,0.656206,0.227273
1,"lr: 0.001, epochs: 5, batch_size: 16, im_size: 299, arch: resnet18, transforms: True, dropout: 0.5, weight_decay: 0.01, training_schedule: head_first_then_body, discriminative_lr: False, one_cycle_policy: False",fridgeObjects,0.786065,0.25


---

## Visualizing our results

When we read in out multi-index dataframe, index 0 represents the run number, index 1 represents a single permutation of parameters, and index 2 represents the dataset.

To see the results, show the df using the `prettify_df` function. This will display all the hyperparameters in a nice, readable way.

In [17]:
prettify_df(df.T)

Unnamed: 0_level_0,0,0,0,0,0,0,1,1,1,1,1,1,2,2,2,2,2,2
Unnamed: 0_level_1,lr: 0.0001  epochs: 5  batch_size: 16  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 32  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 8  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 16  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 32  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 8  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 16  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 32  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 8  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 16  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 32  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 8  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 16  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 32  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 8  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 16  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 32  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 8  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True
Unnamed: 0_level_2,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects
duration,0.836295,1.108188,0.644506,0.825757,1.113043,4.986475,0.78972,1.286736,0.710519,0.871194,1.064035,0.638611,0.776679,1.080423,0.66222,0.784717,1.103894,0.639533
accuracy,0.136364,0.409091,0.272727,0.159091,0.159091,0.272727,0.204545,0.136364,0.204545,0.272727,0.159091,0.159091,0.204545,0.181818,0.386364,0.340909,0.25,0.340909


Since we've run our benchmarking over 3 repetitions, we may want to just look at the averages across the different __run numbers__.

In [19]:
prettify_df(df.mean(level=(1,2)).T)

Unnamed: 0_level_0,lr: 0.0001  epochs: 5  batch_size: 16  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 32  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 8  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 16  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 32  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 8  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True
Unnamed: 0_level_1,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects,fridgeObjects
duration,0.800898,1.158449,0.672415,0.827223,1.093657,2.088206
accuracy,0.181818,0.242424,0.287879,0.257576,0.189394,0.257576


Additionally, we may want simply to see which set of hyperparameters perform the best across the different __datasets__. We can do that by averaging the results of the different datasets. (The results of this step will look similar to the above since we're only passing in one dataset).

In [20]:
prettify_df(df.mean(level=(1)).T)

Unnamed: 0,lr: 0.0001  epochs: 5  batch_size: 16  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 32  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.0001  epochs: 5  batch_size: 8  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 16  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 32  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True,lr: 0.001  epochs: 5  batch_size: 8  im_size: 299  arch: resnet18  transforms: True  dropout: 0.5  weight_decay: 0.01  training_schedule: head_first_then_body  discriminative_lr: True  one_cycle_policy: True
duration,0.800898,1.158449,0.672415,0.827223,1.093657,2.088206
accuracy,0.181818,0.242424,0.287879,0.257576,0.189394,0.257576


---

## Clean up

Clean up the data directory we created since we've finished with our benchmarks.

In [None]:
shutil.rmtree("tmp_data")