# Neptune + Optuna
## Before you start
### Install dependencies

In [9]:
! pip install --quiet optuna==2.7.0 lightgbm==3.2.1 plotly==4.14.3 neptune-client['optuna']==0.9.12

### Create a Neptune project and get your API token (optional)

To log metadata to the Neptune project you need the `project` name and the `api_token`.

To make this example easy to follow, we have created a public project 'common/optuna-integration' and a public user 'neptuner' who has the API token 'ANONYMOUS'.

```python
run = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration')
```

To log to your Neptune project:
* [Create a Neptune account](https://neptune.ai/register)
* [Find you API token](https://docs.neptune.ai/getting-started/installation#authentication-neptune-api-token)
* [Find your project name](https://docs.neptune.ai/getting-started/installation#setting-the-project-name)
* Pass your credentials to `project` and `api_token` arguments of `neptune.init()`


```python
run = neptune.init(api_token='<YOUR_API_TOKEN>', project='<YOUR_WORKSPACE/YOUR_PROJECT>') # pass your credentials
```

### Import libraries

In [10]:
import lightgbm as lgb
import optuna
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

### Create a sample `objective` function for Optuna

In [11]:
def objective(trial):
    data, target = load_breast_cancer(return_X_y=True)
    train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)

    param = {
        'verbose': -1,
        'objective': 'binary',
        'metric': 'binary_logloss',
        'num_leaves': trial.suggest_int('num_leaves', 2, 256),
        'feature_fraction': trial.suggest_uniform('feature_fraction', 0.2, 1.0),
        'bagging_fraction': trial.suggest_uniform('bagging_fraction', 0.2, 1.0),
        'min_child_samples': trial.suggest_int('min_child_samples', 3, 100),
    }

    gbm = lgb.train(param, dtrain)
    preds = gbm.predict(test_x)
    accuracy = roc_auc_score(test_y, preds)

    return accuracy

## Quickstart
### Step 1: Create a Neptune Run

Add a snippet at the top of your script:

In [13]:
import neptune.new as neptune

run = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration') # you can pass your credentials here

https://app.neptune.ai/common/optuna-integration/e/NEP1-106


This creates a Run in Neptune and you can log model building metadata to it. 

**Click on the link above** to open the Run in Neptune UI.
For now it is empty but keep the tab open to see what happens next. 

### Step 2: Initialize the NeptuneCallback

In [14]:
import neptune.new.integrations.optuna as optuna_utils

neptune_callback = optuna_utils.NeptuneCallback(run)

### Step 3: Pass the NeptuneCallback to Optuna Study `.optimize()` method

In [15]:
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100, callbacks=[neptune_callback])

[32m[I 2021-05-21 08:35:37,505][0m A new study created in memory with name: no-name-ecbf6748-9d47-410a-aeda-f3714a9f4468[0m
[32m[I 2021-05-21 08:35:37,642][0m Trial 0 finished with value: 0.9939759036144578 and parameters: {'num_leaves': 65, 'feature_fraction': 0.8660861789274954, 'bagging_fraction': 0.815429197613698, 'min_child_samples': 52}. Best is trial 0 with value: 0.9939759036144578.[0m
[33m[W 2021-05-21 08:35:37,875][0m Param bagging_fraction unique value length is less than 2.[0m
[33m[W 2021-05-21 08:35:37,888][0m Param bagging_fraction unique value length is less than 2.[0m
[33m[W 2021-05-21 08:35:37,899][0m Param bagging_fraction unique value length is less than 2.[0m
[33m[W 2021-05-21 08:35:37,915][0m Param feature_fraction unique value length is less than 2.[0m
[33m[W 2021-05-21 08:35:37,922][0m Param feature_fraction unique value length is less than 2.[0m
[33m[W 2021-05-21 08:35:37,926][0m Param feature_fraction unique value length is less than 2.

MetadataInconsistency: Attribute or namespace best/trials/0/datetime_start is already defined

You can view the logging live in the Neptune tab once Optuna you run the below cell

### Step 4: Stop logging

When you track your ML runs with Neptune in Jupyter notebooks you need to explicitly stop the Run by ```run.stop()```.

If you are running Neptune in regular ```.py``` scripts it will stop automatically when your code finishes.

In [16]:
run.stop()

Shutting down background jobs, please wait a moment...
Done!


Waiting for the remaining 1 operations to synchronize with Neptune. Do not kill this process.


All 1 operations synced, thanks for waiting!


## More Options

### Customize which plots you want to log and how often

By default `NeptuneCallback` creates and logs all of the plots from `optuna.visualizations` but it adds overhead to your Optuna sweep.

You can decide which plots to create and log and how often you want to do that with:
* `plot_update_freq` argument: pass integer k to update plots every k trials or 'never' to not log any plots
* `log_plot_contour`, `log_plot_slice`, and other `log_{OPTUNA_PLOT_FUNCTION}` arguments: pass 'False' and the plots will not be created or logged 

By default `NeptuneCallback` creates and logs all of the plots from `optuna.visualizations` but it adds overhead to your Optuna sweep.

You can decide which plots to create and log and how often you want to do that with:
* `plot_update_freq` argument: pass integer k to update plots every k trials or 'never' to not log any plots
* `log_plot_contour`, `log_plot_slice`, and other `log_{OPTUNA_PLOT_FUNCTION}` arguments: pass 'False' and the plots will not be created or logged 

In [22]:
# Create a Neptune Run
run = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration')  # you can pass your credentials here

# Create a NeptuneCallback for Optuna
neptune_callback = optuna_utils.NeptuneCallback(run,
                                                plots_update_freq=10, # create/log plots every 10 trials
                                                log_plot_slice=False, # do not create/log plot_slice
                                                log_plot_contour=False, # do not create/log plot_contour
                                                )

# Pass NeptuneCallback to Optuna Study .optimize()
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50, callbacks=[neptune_callback])

# Stop logging to a Neptune Run
run.stop()

https://app.neptune.ai/common/optuna-integration/e/NEP1-109


[32m[I 2021-05-21 10:26:51,862][0m A new study created in memory with name: no-name-63e8ab26-bd8a-4965-83ec-5e380fe9f999[0m
[32m[I 2021-05-21 10:26:52,018][0m Trial 0 finished with value: 0.9922920892494929 and parameters: {'num_leaves': 35, 'feature_fraction': 0.9583818518915357, 'bagging_fraction': 0.5334911778535456, 'min_child_samples': 38}. Best is trial 0 with value: 0.9922920892494929.[0m
[32m[I 2021-05-21 10:26:52,186][0m Trial 1 finished with value: 0.9987603305785124 and parameters: {'num_leaves': 148, 'feature_fraction': 0.5956982136436998, 'bagging_fraction': 0.986086607980867, 'min_child_samples': 20}. Best is trial 1 with value: 0.9987603305785124.[0m
[32m[I 2021-05-21 10:26:52,283][0m Trial 2 finished with value: 0.9915289256198346 and parameters: {'num_leaves': 157, 'feature_fraction': 0.4184317266747109, 'bagging_fraction': 0.9200950216295636, 'min_child_samples': 34}. Best is trial 1 with value: 0.9987603305785124.[0m


MetadataInconsistency: Attribute or namespace best/trials/1/datetime_start is already defined

### Log charts and study object after the sweep

If you want to log study metadata after the study was finished you can use the `.log_study_metadata`.

`.log_study_metadata` logs the same things that  `NeptuneCallback` logs and you can customize what is logged with similar flags. 

In [24]:
# Create a Run
run = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration') # you can pass your credentials here

# Run Optuna with Neptune Callback
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

# Log Optuna charts and study object after the sweep is complete
optuna_utils.log_study_metadata(study, 
                                log_plot_contour=False)

# Stop logging 
run.stop()

https://app.neptune.ai/common/optuna-integration/e/NEP1-110


[32m[I 2021-05-21 10:40:59,643][0m A new study created in memory with name: no-name-90401725-86d0-4495-a95d-c1095f2e1a62[0m
[32m[I 2021-05-21 10:40:59,798][0m Trial 0 finished with value: 0.9829614604462474 and parameters: {'num_leaves': 59, 'feature_fraction': 0.8469508681977751, 'bagging_fraction': 0.5124880900926294, 'min_child_samples': 21}. Best is trial 0 with value: 0.9829614604462474.[0m
[32m[I 2021-05-21 10:40:59,858][0m Trial 1 finished with value: 0.991860465116279 and parameters: {'num_leaves': 49, 'feature_fraction': 0.930201852813815, 'bagging_fraction': 0.27941364238065014, 'min_child_samples': 96}. Best is trial 1 with value: 0.991860465116279.[0m
[32m[I 2021-05-21 10:40:59,922][0m Trial 2 finished with value: 0.9913832199546485 and parameters: {'num_leaves': 28, 'feature_fraction': 0.20698658998844552, 'bagging_fraction': 0.7953617478717263, 'min_child_samples': 78}. Best is trial 1 with value: 0.991860465116279.[0m
[32m[I 2021-05-21 10:40:59,982][0m Tria

TypeError: log_study_metadata() missing 1 required positional argument: 'run'

### Load optuna study 

If you logged the Optuna Study to Neptune you can load the Study directly from the Run with `load_study_from_run()` function and continue working with it. 

It works both for `InMemoryStorage` and database storage Optuna studies.  

In [None]:
# Fetch an existing Neptune Run
run = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration', run='') # you can pass your credentials here

# Run Optuna with Neptune Callback
study = optuna_utils.load_study_from_run(run)

# Create callback to log advanced options during the sweep
neptune_callback = optuna_utils.NeptuneCallback(run)

# Continue logging to the same run
study.optimize(objective, n_trials=30, callbacks=[neptune_callback])

# Stop logging 
run.stop()

### Keep track of both study-level and trial-level Runs

If you want to 

#### Step 1: Create a unique sweep ID

In [None]:
import uuid
sweep_id = uuid.uuid1()

#### Step 2: Create a a study-level Neptune Run

In [None]:
run_study_level = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration')  # you can pass your credentials here

#### Step 3: Attach the sweap ID to the study-level Run 

You can also add a tag 'study-level' to distinguish between the study-level and trial-level runs for the sweap. 

In [None]:
run_study_level['sys/tags'] = 'study-level'
run_study_level['sweep-id'] = sweep_id

#### Step 4: Create an objective function that logs each trial to Neptune as a Run

Inside of the objective function you need to:
* create a trial-level Neptune Run
* log the sweap ID and a tag 'trial-level' to distinguish between study-level and trial-level Runs
* log parameters and scores to the trial-level Run
* stop the trial-level Run

In [None]:
def objective_with_logging(trial):
    data, target = load_breast_cancer(return_X_y=True)
    train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)

    param = {
        'verbose': -1,
        'objective': 'binary',
        'metric': 'binary_logloss',
        'num_leaves': trial.suggest_int('num_leaves', 2, 256),
        'feature_fraction': trial.suggest_uniform('feature_fraction', 0.2, 1.0),
        'bagging_fraction': trial.suggest_uniform('bagging_fraction', 0.2, 1.0),
        'min_child_samples': trial.suggest_int('min_child_samples', 3, 100),
    }

    # create a trial-level Run
    run_trial_level = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration')

    # log sweep id to trial-level Run
    run_trial_level['sys/tags'] = 'trial-level'
    run_trial_level['sweep-id'] = sweep_id

    # log parameters of a trial-level Run
    run_trial_level['parameters'] = param

    # run model training
    gbm = lgb.train(param, dtrain)
    preds = gbm.predict(test_x)
    accuracy = roc_auc_score(test_y, preds)

    # log score of a trial-level Run
    run_trial_level['score'] = accuracy

    # stop trial-level Run
    run_trial_level.stop()

    return accuracy

#### Step 5: Create a study-level NeptuneCallback

In [None]:
neptune_callback = optuna_utils.NeptuneCallback(run_study_level)

#### Step 6: Pass the NeptuneCallback to the `study.optimize()` method and run the parameter sweap

In [None]:
study = optuna.create_study(direction='maximize')
study.optimize(objective_with_logging, n_trials=100, callbacks=[neptune_callback])

#### Step 7: Stop logging to the Neptune Run

In [20]:
run_study_level.stop()

https://app.neptune.ai/common/optuna-integration/e/NEP1-108
d3b29b80-ba01-11eb-ab7c-0f3c3f5a9153


# Go to the Neptune UI to see your parameter sweap

Now when you go to the Neptune UI you have:
* all the trial-level Runs logged with `'sys/tags'='trial-level'`
* study-level Run logged with `'sys/tags'='study-level'`

You can use filters to find all the Runs that belong to the 'sweep-id' of the parameter sweap and compare them. 
You can also look only at the 'study-level' Run to see the high-level picture of the sweap. 