## Optuna

To optimize our hyperparameters we use Optuna which is an automatic hyperparameter optimization software framework that uses Bayesian Optimization.

Each parameter to search over should have a value of the following format: `'parameter_name': ('type', lower_bound, upper_bound)`

`'type'` can have one of 4 values:
- `'int'`: search over integers in the interval **[lower_bound, upper_bound]** (uniform distribution)
- `'int-log'`: search over powers of 2 in the interval **[2^lower_bound, 2^upper_bound]** (uniform distribution)
- `'float'`: search over floating-point numbers in the interval **[lower_bound, upper_bound]** with each number having the same probability to be sampled (uniform distribution)
- `'float-log'`: search over floating-point numbers in the interval **[lower_bound, upper_bound]** with smaller values having a higher probability to be sampled (uniform distribution in log domain)

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from src.optuna import create_optuna_objective
from src.dataset import get_imagenet_info, get_tiny_imagenet_info

### Pretext

This is an example of the optuna_pretext_script running on a small subset of the ImageNet dataset for only 2 epochs for demonstration purposes

This is an example of the optuna_downstream_script running on a small subset of the Tiny ImageNet dataset for only 2 epochs for demonstration purposes.
In this example we would search over the following parameters:
- `optimizer_kwargs.lr`: logarithmically sample in the interval [1e-5, 1e-3]

In [3]:
imagenet_info = get_imagenet_info()

In [6]:
import optuna
from torchvision.transforms import Compose, RandomResizedCrop, RandomGrayscale, GaussianBlur, ColorJitter, RandomSolarize

from src.optuna import create_optuna_objective

aug_transform = Compose([
    RandomResizedCrop(size=224, scale=(0.32, 1.0), ratio=(0.75, 1.3333333333333333)),
    ColorJitter(brightness=0.8, contrast=0.8, saturation=0.8, hue=0.2),
    RandomGrayscale(p=0.05),
    GaussianBlur(kernel_size=23, sigma=(1e-10, 0.2)),
    RandomSolarize(0.7, p=0.2),
])

RUN_PRETEXT_PARAMS = {
    "experiment_id": "optuna_test",
    "pretext_type": "our",
    "aug_transform": aug_transform,
    "loss_alpha": 1,
    "loss_symmetric": True,
    "optimizer_kwargs": {
        "lr": ("float-log", 1e-5, 1e-3),
        "weight_decay": 0,
    },
    "batch_size": 64,
    "num_workers": 0,
    "log_frequency": 100,
    "cache_images": True,
    "resume_from_checkpoint": False,
    "imagenet_info": imagenet_info[:100],
    "n_train": 90,
    "num_epochs": 2,
}

# maximal number of trials to perform
N_TRIALS = 10
# stops search if last trial ended more than TIMEOUT seconds after the start
TIMEOUT = 300

# create objective function
objective = create_optuna_objective(RUN_PRETEXT_PARAMS, save_models=False)

# create study
study = optuna.create_study(direction="maximize")

# run study
study.optimize(objective, n_trials=N_TRIALS, timeout=TIMEOUT)

[32m[I 2022-12-13 12:34:20,454][0m A new study created in memory with name: no-name-099c443d-b9e7-4400-93d4-6ae7565c0ac6[0m
| parameter              | value                                                                                                                                               |
|:-----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------|
| experiment_id          | optuna_test_2022-12-13_12-34-20                                                                                                                     |
| aug_transform          | Compose(                                                                                                                                            |
|                        |     RandomResizedCrop(size=(224, 224), scale=(0.32, 1.0), ratio=(0.75, 1.3333), interpolation=bilinear), antialias=None)                  

### Downstream

This is an example of the optuna_downstream_script running on a small subset of the Tiny ImageNet dataset for only 2 epochs for demonstration purposes.
In this example we would search over the following parameters:
- `use_aug_transform`: uniformly sample either 0 (False) or 1 (True)
- `optimizer_kwargs.lr`: logarithmically sample in the interval [1e-5, 1e-3]
- `optimizer_kwargs.lr`: logarithmically sample in the interval [1e-7, 1e-3]
- `batch_size`: uniformly sample an integer in the interval [5, 8] and set to 2^[sampled_int]

In [5]:
tiny_imagenet_info = get_tiny_imagenet_info()

In [7]:
import optuna
from src.utils import load_best_model
from src.models import OurPretextNetwork
from src.optuna import create_optuna_objective

# specify experiment id to load pretext model from
PRETEXT_EXPERIMENT_ID = "dustin_lr_5e5"

# load pretext model
pretext_model = load_best_model(PRETEXT_EXPERIMENT_ID, OurPretextNetwork(backbone="resnet18"))

RUN_DOWNSTREAM_PARAMS = {
    "experiment_id": "optuna_downstream_test",
    "pretext_model": pretext_model,
    "use_aug_transform": ("int", 0, 1),
    "optimizer_kwargs": {
        "lr": ("float-log", 1e-5, 1e-3),
        "weight_decay": ("float-log", 1e-7, 1e-3),
    },
    "batch_size": ("int-log", 5, 8),
    "n_train": 9000,
    "cache_images": True,
    "num_workers": 0,
    "tiny_imagenet_info": tiny_imagenet_info[:100],
    "n_train": 90,
    "num_epochs": 2,
}

# maximal number of trials to perform
N_TRIALS = 10
# stops search if last trial ended more than TIMEOUT seconds after the start
TIMEOUT = 300

# create objective function
objective = create_optuna_objective(RUN_DOWNSTREAM_PARAMS, save_models=False)

# create study
study = optuna.create_study(direction="maximize")

# run study
study.optimize(objective, n_trials=N_TRIALS, timeout=TIMEOUT)

[32m[I 2022-12-13 12:39:47,470][0m A new study created in memory with name: no-name-fd4d1b30-03ab-4554-a1dc-6b62f52bb511[0m
| parameter              | value                                                                 |
|:-----------------------|:----------------------------------------------------------------------|
| experiment_id          | optuna_downstream_test_2022-12-13_12-39-47                            |
| use_aug_transform      | 1                                                                     |
| n_train                | 90                                                                    |
| optimizer_kwargs       | {'lr': 0.00033736263398492994, 'weight_decay': 1.760927885804667e-07} |
| num_epochs             | 2                                                                     |
| batch_size             | 128                                                                   |
| num_workers            | 0                                                     