# Neptune + Optuna
## Before you start
### Install dependencies

In [None]:
! pip install --quiet optuna==2.7.0 lightgbm==3.2.1 plotly==4.14.3 neptune-client[optuna]==0.9.16

### Create a Neptune project and get your API token (optional)

To log metadata to the Neptune project, you need the `project` name and the `api_token`.

To make this example easy to follow, we have created a public project 'common/optuna-integration' and a shared user 'neptuner' with the API token 'ANONYMOUS'.

```python
run = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration')
```

To log to your Neptune project:
* [Create a Neptune account](https://neptune.ai/register)
* [Find your API token](https://docs.neptune.ai/getting-started/installation#authentication-neptune-api-token)
* [Find your project name](https://docs.neptune.ai/getting-started/installation#setting-the-project-name)
* Pass your credentials to `project` and `api_token` arguments of `neptune.init()`


```python
run = neptune.init(api_token='<YOUR_API_TOKEN>', project='<YOUR_WORKSPACE/YOUR_PROJECT>') # pass your credentials
```

### Import libraries

In [None]:
import lightgbm as lgb
import optuna
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

### Create a sample `objective` function for Optuna

In [None]:
def objective(trial):
    data, target = load_breast_cancer(return_X_y=True)
    train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)

    param = {
        'verbose': -1,
        'objective': 'binary',
        'metric': 'binary_logloss',
        'num_leaves': trial.suggest_int('num_leaves', 2, 256),
        'feature_fraction': trial.suggest_uniform('feature_fraction', 0.2, 1.0),
        'bagging_fraction': trial.suggest_uniform('bagging_fraction', 0.2, 1.0),
        'min_child_samples': trial.suggest_int('min_child_samples', 3, 100),
    }

    gbm = lgb.train(param, dtrain)
    preds = gbm.predict(test_x)
    accuracy = roc_auc_score(test_y, preds)

    return accuracy

## Quickstart
### Step 1: Create a Neptune Run

Add a snippet at the top of your script:

In [None]:
import neptune.new as neptune

run = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration') # you can pass your credentials here

Running this cell creates a Run in Neptune, and you can log model building metadata to it.

**Click on the link above to open the Run in Neptune UI**. For now, it is empty, but you should keep the tab open to see what happens next.

### Step 2: Initialize the NeptuneCallback

In [None]:
import neptune.new.integrations.optuna as optuna_utils

neptune_callback = optuna_utils.NeptuneCallback(run)

### Step 3: Pass the NeptuneCallback to Optuna Study `.optimize()` method

In [None]:
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=20, callbacks=[neptune_callback])

You can view the logging live in the Neptune tab once Optuna you run the below cell

### Step 4: Stop logging

When you track your ML runs with Neptune in Jupyter notebooks, you need to stop the Run by `run.stop()` explicitly.

If you are running Neptune in regular '.py' scripts, it will stop automatically when your code finishes.

In [None]:
run.stop()

## More Options

### Customize which plots you want to log and how often

By default, `NeptuneCallback` creates and logs all of the plots from `optuna.visualizations`, but it adds overhead to your Optuna sweep.
You can decide which plots to create and log and how often you want to do that with:
* `plot_update_freq` argument: pass integer k to update plots every k trials or 'never' to not log any plots
* `log_plot_contour`, `log_plot_slice`, and other `log_{OPTUNA_PLOT_FUNCTION}` arguments: pass 'False', and the plots will not be created or logged

In [None]:
# Create a Neptune Run
run = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration')  # you can pass your credentials here

# Create a NeptuneCallback for Optuna
neptune_callback = optuna_utils.NeptuneCallback(run,
                                                plots_update_freq=10, # create/log plots every 10 trials
                                                log_plot_slice=False, # do not create/log plot_slice
                                                log_plot_contour=False, # do not create/log plot_contour
                                                )

# Pass NeptuneCallback to Optuna Study .optimize()
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50, callbacks=[neptune_callback])

# Stop logging to a Neptune Run
run.stop()

### Log charts and study object after the sweep

If you want to log study metadata after the Study was finished you can use the `.log_study_metadata()`.
`.log_study_metadata()` function logs the same things that `NeptuneCallback` logs, and you can customize what is logged with similar flags.

In [None]:
# Create a Run
run = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration') # you can pass your credentials here

# Run Optuna with Neptune Callback
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)

# Log Optuna charts and study object after the sweep is complete
optuna_utils.log_study_metadata(study,
                                run,  
                                log_plot_contour=False)

# Stop logging 
run.stop()

### Load the Optuna Study from an existing Neptune Run

If you logged the Optuna Study to Neptune, you can load the Study directly from the Run with `load_study_from_run()` function and continue working with it.

It works both for Optuna `InMemoryStorage` and database storage.

First lets get the Run ID of a Neptune Run that we have just created in this notebook. You can use the Neptune Run ID of some other Run. 

In [None]:
existing_run_id = run['sys/id'].fetch()
print(existing_run_id)

In [None]:
# Fetch an existing Neptune Run
run = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration', # you can pass your credentials here
                   run=existing_run_id) # You can pass Run ID for some other Run

# Run Optuna with Neptune Callback
study = optuna_utils.load_study_from_run(run)

# Create callback to log advanced options during the sweep
neptune_callback = optuna_utils.NeptuneCallback(run)

# Continue logging to the same run
study.optimize(objective, n_trials=10, callbacks=[neptune_callback])

# Stop logging 
run.stop()

### Keep track of both study-level and trial-level Runs

You can log trial-level information to separate Neptune Runs and have a main Run for the study-level information.

**Warning**
The sweep will take longer as each trial-level Run needs to synchronize with Neptune. 

#### Step 1: Create a unique sweep ID

In [None]:
import uuid
sweep_id = uuid.uuid1()
print('sweep-id: ', sweep_id)

#### Step 2: Create a study-level Neptune Run

In [None]:
run_study_level = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration')  # you can pass your credentials here

#### Step 3: Log the sweep ID to the study-level Run 

You can also add a tag 'study-level' to distinguish between the study-level and trial-level runs for the sweap. 

In [None]:
run_study_level['sys/tags'].add('study-level')
run_study_level['sweep-id'] = sweep_id

#### Step 4: Create an objective function that logs each trial to Neptune as a Run

Inside of the objective function, you need to:
create a trial-level Neptune Run
* log the sweep ID and a tag 'trial-level' to distinguish between study-level and trial-level Runs
* log parameters and scores to the trial-level Run
* stop the trial-level Run

In [None]:
def objective_with_logging(trial):
    data, target = load_breast_cancer(return_X_y=True)
    train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)

    param = {
        'verbose': -1,
        'objective': 'binary',
        'metric': 'binary_logloss',
        'num_leaves': trial.suggest_int('num_leaves', 2, 256),
        'feature_fraction': trial.suggest_uniform('feature_fraction', 0.2, 1.0),
        'bagging_fraction': trial.suggest_uniform('bagging_fraction', 0.2, 1.0),
        'min_child_samples': trial.suggest_int('min_child_samples', 3, 100),
    }

    # create a trial-level Run
    run_trial_level = neptune.init(api_token='ANONYMOUS', project='common/optuna-integration')

    # log sweep id to trial-level Run
    run_trial_level['sys/tags'].add('trial-level')
    run_trial_level['sweep-id'] = sweep_id

    # log parameters of a trial-level Run
    run_trial_level['parameters'] = param

    # run model training
    gbm = lgb.train(param, dtrain)
    preds = gbm.predict(test_x)
    accuracy = roc_auc_score(test_y, preds)

    # log score of a trial-level Run
    run_trial_level['score'] = accuracy

    # stop trial-level Run
    run_trial_level.stop()

    return accuracy

#### Step 5: Create a study-level NeptuneCallback

In [None]:
neptune_callback = optuna_utils.NeptuneCallback(run_study_level)

#### Step 6: Pass the NeptuneCallback to the `study.optimize()` method and run the parameter sweep

In [None]:
study = optuna.create_study(direction='maximize')
study.optimize(objective_with_logging, n_trials=20, callbacks=[neptune_callback])

#### Step 7: Stop logging to the Neptune Run

In [None]:
run_study_level.stop()

# Go to the Neptune UI to see your parameter sweep

Now when you go to the Neptune UI, you have:
* all the trial-level Runs logged with `'sys/tags'='trial-level'`
* study-level Run logged with `'sys/tags'='study-level'`

You can use filters to find all the Runs that belong to the 'sweep-id' of the parameter sweep and compare them. You can also look only at the 'study-level' Run to see the high-level picture of the sweep.

To compare sweeps between each other or find your current sweep use Group by:
* Go to the Runs Table
* Click **+ Group by** in the top right
* Type 'sweep-id' and click on it
* Click **Show all** to see your trials in a separate Table View