# How to track models end-to-end in Neptune

<a target="_blank" href="https://colab.research.google.com/github/neptune-ai/examples/blob/main/how-to-guides/e2e-tracking/notebooks/e2e_tracking.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>
<a target="_blank" href="https://github.com/neptune-ai/examples/blob/main/how-to-guides/e2e-tracking">
  <img alt="Open in GitHub" src="https://img.shields.io/badge/Open_in_GitHub-blue?logo=github&labelColor=black">
</a>
<a target="_blank" href="https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/table?viewId=9cb5bc7c-3bce-4c69-8f5c-90d3d9cc682c"> 
  <img alt="Explore in Neptune" src="https://neptune.ai/wp-content/uploads/2024/01/neptune-badge.svg">
</a>
<a target="_blank" href="https://docs.neptune.ai/tutorials/e2e_tracking/">
  <img alt="View tutorial in docs" src="https://neptune.ai/wp-content/uploads/2024/01/docs-badge-2.svg">
</a>

## Introduction

This notebook shows how you can use Neptune to track a model across all stages of its lifecycle by
* **Logging model and run metadata to a central project**
* **Grouping models by their stage**
* **Comparing models to select the best performing model**
* **Monitoring a model once in production**

You will learn how to use the Neptune webapp to explore run metadata, compare runs, and manually promote a model. However, this notebook can also be used as a template to design an automated end-to-end pipeline that covers the entire lifecycle of a model  without needing any manual intervention.

This example uses Optuna hyperparameter-optimization to simulate training and evaluating multiple XGBoost models, and Evidently to monitor models in production. However, given Neptune's flexibility and [multiple integrations](https://docs.neptune.ai/integrations/), you can use any library and framework of your choice. 

## Before you start

This notebook example lets you try out Neptune anonymously, with zero setup.

If you want to see the example logged to your own workspace instead:

  1. Create a Neptune account. [Register &rarr;](https://neptune.ai/register)
  1. Create a Neptune project that you will use for tracking metadata. For instructions, see [Creating a project](https://docs.neptune.ai/setup/creating_project) in the Neptune docs.

## Install Neptune and dependencies

In [None]:
! pip install -qU "neptune[xgboost,optuna]" xgboost scikit-learn optuna evidently matplotlib

In [None]:
# To fix the `RuntimeError: main thread is not in main loop` error in Windows
import matplotlib.pyplot as plt

plt.switch_backend("agg")

## Track model training

In this section, we'll use the following:
- [Optuna](https://optuna.org/) to train multiple [XGBoost](https://xgboost.readthedocs.io/en/stable/) regression models,
- Neptune's [XGBoost](https://docs.neptune.ai/integrations/xgboost/) and [Optuna](https://docs.neptune.ai/integrations/optuna/) integrations to automatically log metadata and metrics to Neptune for easy run visualization and comparison.

### Prepare the dataset

In [None]:
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing

data, target = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.25)
dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_test, label=y_test)

evals = [(dtrain, "train"), (dval, "valid")]

### Set up Neptune environment variables

To connect to the Neptune app, you need to tell Neptune who you are (`api_token`) and where to send the data (`project`).

You can use the default code cell below to create an anonymous run in the public project [common/e2e-tracking](https://app.neptune.ai/common/e2e-tracking). **Note**: Public projects are cleaned regularly, so anonymous runs are only stored temporarily.

#### Log to your own project instead

Replace the code below with the following:

```python
import os
import neptune
from getpass import getpass

os.environ["NEPTUNE_API_TOKEN"]=getpass("Enter your Neptune API token: ")
os.environ["NEPTUNE_PROJECT"]="workspace-name/project-name",  # replace with your own
```

In [None]:
import os
import neptune

os.environ["NEPTUNE_API_TOKEN"] = neptune.ANONYMOUS_API_TOKEN
os.environ["NEPTUNE_PROJECT"] = "common/e2e-tracking"

### Create the Optuna objective function
We will create trial level runs within the objective function to capture trial-level metadata using Neptune's XGBoost integration.

In [None]:
def objective(trial):
    from neptune.integrations.xgboost import NeptuneCallback

    # Define model parameters
    model_params = {
        "max_depth": trial.suggest_int("max_depth", 0, 10),
        "min_child_weight": trial.suggest_int("min_child_weight", 2, 9),
        "learning_rate": trial.suggest_float("learning_rate", 0.5, 1.0),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 0.75),
        "subsample": trial.suggest_float("subsample", 0.4, 1.0),
        "objective": "reg:squarederror",
        "eval_metric": ["mae", "rmse"],
    }

    # Define training parameters
    train_params = {
        "num_boost_round": trial.suggest_int("num_boost_round", 10, 50),
    }

    # Create a trial-level run
    run_trial_level = neptune.init_run(
        capture_hardware_metrics=True,
        capture_stderr=True,
        capture_stdout=True,
        capture_traceback=True,
        tags=["notebook", "trial-level"],
    )

    # Log study name and trial number to trial-level run
    run_trial_level["study-name"] = str(study.study_name)
    run_trial_level["trial-number"] = trial.number

    # Log training parameters of a trial-level run
    run_trial_level["training/parameters"] = train_params

    # Model parameters are logged automatically by the NeptuneCallback

    # Create NeptuneCallback to log trial-level metadata
    neptune_xgb_callback = NeptuneCallback(run=run_trial_level)

    # Train the model and log trial-level metadata to the trial-level run
    model = xgb.train(
        params=model_params,
        dtrain=dtrain,
        num_boost_round=train_params["num_boost_round"],
        evals=evals,
        verbose_eval=False,
        callbacks=[
            neptune_xgb_callback,
            xgb.callback.LearningRateScheduler(lambda epoch: 0.99**epoch),
            xgb.callback.EarlyStopping(rounds=10, save_best=True, maximize=False),
        ],
    )

    # Use group tags to identify the stage of the model
    run_trial_level["sys/group_tags"].add(["development"])

    # Stop trial-level run
    run_trial_level.stop()

    return model.best_score

### Create the Optuna study and a Neptune study-level run
This run will have all the study-level metadata from Optuna, and can be used to group and compare runs across multiple HPO sweeps/studies.

In [None]:
import optuna

study = optuna.create_study(direction="minimize")

In [None]:
run_study_level = neptune.init_run(
    capture_hardware_metrics=True,
    capture_stderr=True,
    capture_stdout=True,
    capture_traceback=True,
    tags=["notebook", "study-level"],
    dependencies="infer",
)

run_study_level["study-name"] = study.study_name

### Initialize Neptune's Optuna callback
This will log the HPO sweeps and trials to the study-level run.

In [None]:
from neptune.integrations.optuna import NeptuneCallback as NeptuneOptunaCallback

neptune_optuna_callback = NeptuneOptunaCallback(run_study_level)

### Run the hyperparameter-sweep with Neptune's Optuna callback

In [None]:
study.optimize(
    objective,
    n_trials=5,
    show_progress_bar=True,
    callbacks=[neptune_optuna_callback],
)

### Stop the study level run

In [None]:
run_study_level.stop()

## Compare the runs, and choose the best model to move to production

You can compare, choose, and promote models both manually using the Neptune web app, or programmatically using the Neptune Python client.

### Manually through the Neptune web app

In this section, we'll:
* Explore logged study and trial level metadata using [custom dashboards](https://docs.neptune.ai/app/custom_dashboard/)
* Compare trials and sweeps using [custom table views](https://docs.neptune.ai/app/custom_views/)
* Select the best model version and update its stage to *production*

#### 🔍 Explore the runs 
* Browse through the logged metadata, images, and charts in the study and trial-level runs.
* To view all important metadata in one place, create custom dashboards.

You can also browse these example custom dashboards:
* [Example study-level custom dashboard](https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/details?viewId=9b71ae0c-8946-4a21-9f1a-4062cad659a4&detailsTab=dashboard&dashboardId=9b71b35b-13fe-4b7d-9d24-a0922d3b07d3&shortId=EET-34&type=run)
* [Example trial-level custom dashboard](https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/details?viewId=9cb5bc7c-3bce-4c69-8f5c-90d3d9cc682c&detailsTab=dashboard&dashboardId=9cb5c2b5-d0ab-4760-8c96-2e3116db1089&shortId=EET-36&type=run) 

#### ⚖️ Compare models

* To sort models, add important metrics to the **Experiments** table
* Select models within or across studies for more granular comparison

You can also browse an example custom table view with [models grouped by stage and sorted by score](https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/table?viewId=9cb5bc7c-3bce-4c69-8f5c-90d3d9cc682c) and custom dashboard [comparing models across stages](https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/compare?viewId=9cb5bc7c-3bce-4c69-8f5c-90d3d9cc682c&dash=dashboard&dashboardId=Compare-models-9cb5bf5a-32c9-4def-addc-0fb8bb1a9ce3&compare=EwTgNAjJ1cP3KTJA).

#### 🥇 Promote the best model to "production"

* Browse through all the models
* Update the stage of the best model to "production"

### Select the best model programmatically

All the comparisons and selections done manually above can also be automated programmatically.

#### Download the runs table as a pandas DataFrame

In [None]:
score_namespace = (
    "training/early_stopping/best_score"  # This is where the best score is logged in Neptune
)

project = neptune.init_project(mode="read-only")

Fetch the *champion* model. This is the model currently in production

In [None]:
champion_model_df = project.fetch_runs_table(
    query='`sys/group_tags`:stringSet CONTAINS "production"',
    columns=[score_namespace],
    sort_by=score_namespace,
    ascending=True,
    limit=1,
).to_pandas()

champion_model_df if not champion_model_df.empty else print("No champion model found")

Fetch the *challenger* model. This is the best model in *development*

In [None]:
challenger_model_df = project.fetch_runs_table(
    query='(`sys/group_tags`:stringSet CONTAINS "development") AND (`sys/tags`:stringSet CONTAINS "trial-level")',
    columns=[score_namespace],
    sort_by=score_namespace,
    ascending=True,
    limit=1,
).to_pandas()

challenger_model_df

### Get scores and IDs of challenger and champion models

In [None]:
try:
    champion_model_id = champion_model_df["sys/id"].values[0]
    champion_model_score = champion_model_df[score_namespace].values[0]
    print(f"Champion model ID: {champion_model_id} and score: {champion_model_score}")
    NO_CHAMPION = False
except KeyError:
    print("❌ No model found in production")
    NO_CHAMPION = True

In [None]:
challenger_model_id = challenger_model_df["sys/id"].values[0]
challenger_model_score = challenger_model_df[score_namespace].values[0]
print(f"Challenger model ID: {challenger_model_id} and score: {challenger_model_score}")

### Promote challenger to champion if score is better

In [None]:
if NO_CHAMPION:
    print(f"Promoting {challenger_model_id} to Production")
    with neptune.init_run(with_id=challenger_model_id) as challenger_model:
        challenger_model["sys/group_tags"].add("production")
        challenger_model["sys/group_tags"].remove("development")

elif challenger_model_score < champion_model_score:
    print("Challenger is better than champion")

    print(f"Archiving champion model {champion_model_id}")
    with neptune.init_run(with_id=champion_model_id) as champion_model:
        champion_model["sys/group_tags"].remove("production")
        champion_model["sys/group_tags"].add("archived")

    print(f"Promoting {challenger_model_id} to Production")
    with neptune.init_run(with_id=challenger_model_id) as challenger_model:
        challenger_model["sys/group_tags"].add("production")
        challenger_model["sys/group_tags"].remove("development")

else:
    print("Champion model is better than challenger")
    print(f"Archiving challenger model {challenger_model_id}")
    with neptune.init_run(with_id=challenger_model_id) as challenger_model:
        challenger_model["sys/group_tags"].add("archived")
        challenger_model["sys/group_tags"].remove("development")

## Monitor model in production
In this section, we'll:
1. Download the model binary from the run to make predictions in production.
2. Use [EvidentlyAI](https://www.evidentlyai.com/) to monitor the model in production.

We'll use a modified version of a tutorial from the [Evidently documentation](https://docs.evidentlyai.com/get-started/tutorial).

### Setup

We'll use an example dataset and mock historical predictions to use as a reference.

In [None]:
import numpy as np


from evidently.report import Report
from evidently.metric_preset import RegressionPreset
from evidently.metrics import *

In [None]:
data = fetch_california_housing(as_frame=True)
housing_data = data.frame

housing_data.rename(columns={"MedHouseVal": "target"}, inplace=True)

reference = housing_data.sample(n=10000, replace=False)
reference["prediction"] = reference["target"].values + np.random.normal(
    0, 0.1, reference.shape[0]
)  # Mocking historical predictions

current = housing_data.sample(n=10000, replace=False)
dcurrent = xgb.DMatrix(current.drop("target", axis=1), label=current["target"])

### Download saved model from Neptune

In [None]:
production_model_id = (
    project.fetch_runs_table(
        query='`sys/group_tags`:stringSet CONTAINS "production"',
        columns=[],
        sort_by=score_namespace,
        ascending=True,
        limit=1,
    )
    .to_pandas()["sys/id"]
    .values[0]
)

production_model = neptune.init_run(with_id=production_model_id)
production_model["training/pickled_model"].download()

### Make predictions from downloaded model on current test data

In [None]:
import pickle as pkl

with open("pickled_model.pkl", "rb") as f:
    model = pkl.load(f)

current["prediction"] = model.predict(dcurrent)

current

### Generate regression report

In [None]:
reg_performance_report = Report(metrics=[RegressionPreset()])

reg_performance_report.run(reference_data=reference, current_data=current)

reg_performance_report.show()

### Upload report to the model

In [None]:
reg_performance_report.save_html("report.html")
production_model["production/report"].upload("report.html")

### Upload metrics to the model

In [None]:
from neptune.utils import stringify_unsupported

production_model["production/metrics"] = stringify_unsupported(
    reg_performance_report.as_dict()["metrics"][0]
)
production_model.wait()

You can see an example of a production monitoring custom dashboard [here](https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/details?viewId=9cb5bc7c-3bce-4c69-8f5c-90d3d9cc682c&detailsTab=dashboard&dashboardId=9cb5c04f-e405-497f-a028-d80d0a82c55a&shortId=EET-36&type=run&compare=EwTgNAjJ1cP3KTJA&lbViewUnpacked=true&sortBy=%5B%22training%2Fearly_stopping%2Fbest_score%22%5D&sortFieldType=%5B%22string%22%5D&sortFieldAggregationMode=%5B%22auto%22%5D&sortDirection=%5B%22ascending%22%5D&groupBy=%5B%22sys%2Fgroup_tags%22%5D&groupByFieldType=%5B%22stringSet%22%5D&groupByFieldAggregationMode=%5B%22auto%22%5D&suggestionsEnabled=false&query=((%60sys%2Ftags%60%3AstringSet%20CONTAINS%20%22trial-level%22))).

These metrics can then be fetched downstream to trigger a model refresh or retraining, if needed.

In [None]:
retraining_threshold = 0.5  # example threshold

if production_model["production/metrics/result/current/rmse"].fetch() > retraining_threshold:
    print("Model degradation detected. Retraining model...")
    ...
else:
    print("Model performance within expectations")

### (Optional) Maintain a history of production metrics

The above method only logs the latest production metrics by overwriting the previous report and metrics.  
To maintain a history of production metrics:

* Log each report under a different folder, as shown below:

  ```py
  from datetime import datetime
  npt_model[f"production/{datetime.now().date()}/report"].upload("report.html")
  ```
  This will create a new folder for each day and upload the report to that folder.

* [Log metrics as a series](https://docs.neptune.ai/logging/series/#numerical-series-floatseries), as shown below:

  ```py
  npt_model["production/rmse"].append(rmse)
  ```
  This will let you visualize the production metrics over time as a chart.

## Stop Neptune objects

In [None]:
project.stop()
production_model.stop()