# Ray Tune

## How it works

Requirements
- objective function
- space as a dict
    - key: param name
    - value: range
- `tune.Tuner`

How it works
```python
results=tune.fit()
results.get_best_result(metric="score", mode="min").config`
```

## Key concepts
6 components
- search space: passed to trainable
- trainable: objective function
- search algo: how to tune
- schedular: when to stop
- trials: run exp (trainable, search algo, scheduler passed to tuner)
- analyses: tuner ResultGrid

### trainable
- use Functional API
    - create a function (here called `trainable`) that takes in a dictionary of hyperparameters.
    - This function computes a score in a “training loop” and reports this score back to Tune
```python
from ray import train

def objective(x, a, b):  # Define an objective function.
    return a * (x**0.5) + b

def trainable(config):  # Pass a "config" dictionary into your trainable.

    for x in range(20):  # "Train" for 20 iterations and compute intermediate scores.
        score = objective(x, config["a"], config["b"])
        train.report({"score": score})  # Send the score to Tune.

```
- `session.report`: report the intermediate score in the training loop
- trainable can have a `for` loop = iterations

### search space
- usually called `config`
- bunch of distributions

### trials
- first arg: trainable
- `param_space`: search space config
- `Tuner.fit` generates trial objects
- *Tune automatically determines how many trials will run in parallel.*
    - can specify num_samples: `tune_config=tune.TuneConfig(num_samples=10)`
    - or time budget: `time_budget_s`

### search algo
- random search by default
- Tune has Search Algorithms that integrate with many popular optimization libraries, such as HyperOpt or Optuna.
- bayesian-optimization
    - `pip install bayesian-optimization` first
- becomes a part of `tune_config`
```python
from ray.tune.search.bayesopt import BayesOptSearch
algo = BayesOptSearch(random_search_steps=4)
tuner = tune.Tuner(
    trainable,
    tune_config=tune.TuneConfig(
        metric="score",
        mode="min",
        search_alg=algo,
    ),
    run_config=train.RunConfig(stop={"training_iteration": 20}),
    param_space=search_space,
)
```

### scheduler
- more efficient training/tuning
    - schedulers can stop, pause, or tweak the hyperparameters of running trials, potentially making your hyperparameter tuning process much faster
- allows early stopping
    - example: Median Stopping Rule, HyperBand, and `ASHA`.
- use a first-in-first-out (FIFO) scheduler by default
    - simply passes through the trials selected by your search algorithm in the order they were picked and **does not perform any early stopping**
- note: Certain schedulers cannot be used with search algorithms, and certain schedulers require that you implement checkpointing.

### ResultGrid
- Tuner.fit() returns an ResultGrid object which has methods you can use for analyzing your training.
```python
best_result = results.get_best_result()  # Get best result object
best_config = best_result.config  # Get best trial's hyperparameters
best_logdir = best_result.path  # Get best trial's result directory
best_checkpoint = best_result.checkpoint  # Get best trial's best checkpoint
best_metrics = best_result.metrics  # Get best trial's last results
best_result_df = best_result.metrics_dataframe  # Get best result as pandas dataframe
```

### others
- [passing additional params](https://docs.ray.io/en/latest/tune/faq.html#id10)
    - `tuner = tune.Tuner(tune.with_parameters(f, data=data))`
- set random seed: use both?
```python
random.seed(1234)
np.random.seed(5678)
```
- `ray.tune.TuneConfig`: metric, mode (objective goal: `min` or `max`), search_alg, scheduler, num_samples, time_budget_s, 
- `ray.tune.RunConfig`: Runtime configuration for training and tuning runs.
    - storage_path, storage_filesystem, failure_config, checkpoint_config, sync_config, verbose (1=default), stop, callbacks (DeveloperAPI)

### callbacks
- https://docs.ray.io/en/latest/tune/tutorials/tune-metrics.html
- callbacks are called during various times of the training process
- Callbacks can be passed as a parameter to RunConfig
```python
class MyCallback(Callback):
    def on_trial_result(self, iteration, trials, trial, result,
                        **info):
        print(f"Got result: {result['metric']}")

tuner = tune.Tuner(
    train_func,
    run_config=train.RunConfig(
        callbacks=[MyCallback()]
    )
)
```

## defining trainable 
- traininable 

In [1]:
import lightgbm as lgb
import numpy as np
import sklearn.datasets
import sklearn.metrics
from sklearn.model_selection import train_test_split

from ray import train, tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.integration.lightgbm import TuneReportCheckpointCallback

In [None]:
def train_breast_cancer(config):

    data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
    train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.25)
    train_set = lgb.Dataset(train_x, label=train_y)
    test_set = lgb.Dataset(test_x, label=test_y)
    gbm = lgb.train(
        config,
        train_set,
        valid_sets=[test_set],
        valid_names=["eval"],
        verbose_eval=False,
        callbacks=[
            TuneReportCheckpointCallback(
                {
                    "binary_error": "eval-binary_error",
                    "binary_logloss": "eval-binary_logloss",
                }
            )
        ],
    )
    preds = gbm.predict(test_x)
    pred_labels = np.rint(preds)
    train.report(
        {
            "mean_accuracy": sklearn.metrics.accuracy_score(test_y, pred_labels),
            "done": True,
        }
    )