# Neptune + XGBoost

## Introduction

This guide will show you how to:

* Initialize Neptune and create a `run`,
* Create a `NeptuneCallback()`,
* Log model training metrics to Neptune using `NeptuneCallback()`.

## Before you start

This notebook example lets you try out Neptune as an anonymous user, with zero setup.

* If you are running the notebook on your local machine, you need to have [Python](https://www.python.org/downloads/) and [pip](https://pypi.org/project/pip/) installed.
* If you want to see the example recorded to your own workspace instead:
    * Create a Neptune account → [Take me to registration](https://neptune.ai/register)
    * Create a Neptune project that you will use for tracking metadata → [Tell me more about projects](https://docs.neptune.ai/administration/projects)

## Install Neptune and dependencies

In [None]:
! pip install graphviz==0.10.1 scikit-learn==1.0.2 neptune-client neptune-xgboost xgboost==1.4.0

## Import libraries

In [None]:
import xgboost as xgb
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

## Create run

In [None]:
import neptune.new as neptune

run = neptune.init(
    project="common/xgboost-integration",
    api_token="ANONYMOUS",
    name="xgb-train",
    tags=["xgb-integration", "train"],
)

Link above is a link to the run. Click on it, and leave the run tab open - you will get back to it when you start model training.

Few explanations:
1. You need to pass project to the `project`, parameter to inform Neptune where to log metadata. Project is a string of this form: `my_workspace/my_project`.
1. There are more parameters to customize Neptune behavior, check [neptune.init() docs](https://docs.neptune.ai/api-reference/neptune#init) for more details.

----

**Note**

Instead of logging data to the public project `"common/xgboost-integration"` as an anonymous user `"neptuner"` you can log it to your own project.

To do that:
1. Follow the [installation and setup](https://docs.neptune.ai/getting-started/installation) that will show you how to use individual, private api_token.
1. Create new [private project](https://docs.neptune.ai/administration/workspace-project-and-user-management/projects).
1. Pass this project name here, instead of `"common/xgboost-integration"`.

At this point you will be ready to log XGBoost runs to your own project :)

## Create NeptuneCallback()

In [None]:
from neptune.new.integrations.xgboost import NeptuneCallback

neptune_callback = NeptuneCallback(run=run, log_tree=[0, 1, 2, 3])

This callback will do metadata logging during training. You will pass it to the XGBoost `train()` function.

It also works with `cv()` function and sklearn-like API of the XGBoost (in this case you pass it to the `model.fit()` function).

----

Notice that you will log trees with indices `[0, 1, 2, 3]`.

## Prepare data and define parameters

In [None]:
# Prepare data
X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)
dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_test, label=y_test)

# Define parameters
model_params = {
    "eta": 0.7,
    "gamma": 0.001,
    "max_depth": 9,
    "objective": "reg:squarederror",
    "eval_metric": ["mae", "rmse"],
}
evals = [(dtrain, "train"), (dval, "valid")]
num_round = 57

## Train the model and log metadata to the run in Neptune

In [None]:
xgb.train(
    params=model_params,
    dtrain=dtrain,
    num_boost_round=num_round,
    evals=evals,
    callbacks=[
        neptune_callback,
        xgb.callback.LearningRateScheduler(lambda epoch: 0.99**epoch),
        xgb.callback.EarlyStopping(rounds=30),
    ],
)

This cell do the model trainig and metadata logging to Neptune.

## Stop logging

<font color=red>**Warning:**</font><br>
Once you are done logging, you should stop tracking the run using the `stop()` method.
This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.


In [None]:
run.stop()

## Analyze logged metadata in the Neptune app

Go to the run link and explore metadata (metrics, all parameters, learning rate, pickled model, visualizations) that were logged to the run in Neptune.

Link should look like this:

https://app.neptune.ai/o/common/org/xgboost-integration/e/XGBOOST-84/all?path=training