# Neptune + LightGBM

## Install dependencies

In [None]:
! pip install neptune-client lightgbm==3.2.1 neptune-lightgbm psutil==5.8.0 graphviz==0.16

## Import libraries

In [None]:
import lightgbm as lgb
import neptune.new as neptune
import numpy as np
from neptune.new.integrations.lightgbm import NeptuneCallback, create_booster_summary
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

## Create run

In [None]:
run = neptune.init(
    project="common/lightgbm-integration",
    api_token="ANONYMOUS",
    name="train-cls",
    tags=["lgbm-integration", "train", "cls", "notebook"]
)

Link above is a link to the run. Click on it, and leave the run tab open - you will get back to it when you start model training.

Few explanations:
1. You need to pass project to the `project`, parameter to inform Neptune where to log metadata. Project is a string of this form: `my_workspace/my_project`.
1. There are more parameters to customize Neptune behavior, check [neptune.init() docs](https://docs.neptune.ai/api-reference/neptune#init) for more details.

----

**Note**

Instead of logging data to the public project `"common/lightgbm-integration"` as an anonymous user `"neptuner"` you can log it to your own project.

To do that:
1. Follow the [installation and setup](https://docs.neptune.ai/getting-started/installation) that will show you how to use individual, private api_token.
1. Create new [private project](https://docs.neptune.ai/administration/workspace-project-and-user-management/projects).
1. Pass this project name here, instead of `"common/lightgbm-integration"`.

At this point you will be ready to log LightGBM runs to your own project :)

## Create neptune callback

In [None]:
neptune_callback = NeptuneCallback(run=run)

This callback will do metadata logging during training. You will pass it to the LightGBM `train()` function.

It also works with `cv()` function and sklearn-like API of the LightGBM (in this case you pass it to the `fit()` function).

## Prepare data and define parameters

In [None]:
# Prepare data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

# Define parameters
params = {
    "boosting_type": "gbdt",
    "objective": "multiclass",
    "num_class": 10,
    "metric": ["multi_logloss", "multi_error"],
    "num_leaves": 21,
    "learning_rate": 0.05,
    "feature_fraction": 0.9,
    "bagging_fraction": 0.8,
    "bagging_freq": 5,
    "max_depth": 12,
}

## Train the model

In [None]:
gbm = lgb.train(
    params,
    lgb_train,
    num_boost_round=200,
    valid_sets=[lgb_train, lgb_eval],
    valid_names=["training", "validation"],
    callbacks=[neptune_callback],
)

This cell do the model trainig and metadata logging to Neptune.

## Log summary metadata to the same run under the "lgbm_summary" namespace

In [None]:
y_pred = np.argmax(gbm.predict(X_test), axis=1)

# Log summary metadata to the same run under the "lgbm_summary" namespace
run["lgbm_summary"] = create_booster_summary(
    booster=gbm,
    log_trees=True,
    list_trees=[0, 1, 2, 3, 4],
    log_confusion_matrix=True,
    y_pred=y_pred,
    y_true=y_test
)

## Analyze logged metadata in the Neptune App

Go to the run link and explore metadata (parameters, metrics, visualizations, pickled model) that were logged to the run in Neptune.

Link should look like this:

https://app.neptune.ai/o/common/org/lightgbm-integration/e/LGBM-86/all

## Stop logging

<font color=red>**Warning:**</font><br>
Once you are done logging, you should stop tracking the run using the `stop()` method.
This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.

In [None]:
run.stop()