# How to track models end-to-end in Neptune

<a target="_blank" href="https://colab.research.google.com/github/neptune-ai/examples/blob/main/how-to-guides/e2e-tracking/notebooks/e2e_tracking.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>
<a target="_blank" href="https://github.com/neptune-ai/examples/blob/main/how-to-guides/e2e-tracking">
  <img alt="Open in GitHub" src="https://img.shields.io/badge/Open_in_GitHub-blue?logo=github&labelColor=black">
</a>
<a target="_blank" href="https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/table?viewId=9b71afba-648f-40b7-9c70-98dc99bebc66"> 
  <img alt="Explore in Neptune" src="https://neptune.ai/wp-content/uploads/2024/01/neptune-badge.svg">
</a>
<a target="_blank" href="https://docs.neptune.ai/tutorials/e2e_tracking/">
  <img alt="View tutorial in docs" src="https://neptune.ai/wp-content/uploads/2024/01/docs-badge-2.svg">
</a>

## Introduction

This notebook shows how you can use Neptune to track a model across all stages of its lifecycle by
* **Logging model and run metadata to a central project**
* **Comparing runs to select the best performing model**
* **Monitoring a model once in production**

You will learn how to use the Neptune webapp to explore run metadata, compare runs, and manually promote a model. However, this notebook can also be used as a template to design an automated end-to-end pipeline that covers the entire lifecycle of a model  without needing any manual intervention.

This example uses Optuna hyperparameter-optimization to simulate training and evaluating multiple scikit-learn models, and Evidently to monitor models in production. However, given Neptune's flexibility and [multiple integrations](https://docs.neptune.ai/integrations/), you can use any library and framework of your choice. 

## Before you start

This notebook example lets you try out Neptune anonymously, with zero setup.

If you want to see the example logged to your own workspace instead:

  1. Create a Neptune account. [Register &rarr;](https://neptune.ai/register)
  1. Create a Neptune project that you will use for tracking metadata. For instructions, see [Creating a project](https://docs.neptune.ai/setup/creating_project) in the Neptune docs.

## Install Neptune and dependencies

In [None]:
%pip install -q -U neptune[sklearn,optuna] scikit-learn optuna evidently matplotlib scipy<1.12
%pip install -q --user scikit-learn matplotlib

In [None]:
# Run this if you get a `RuntimeError: main thread is not in main loop` error
import matplotlib

matplotlib.use("Agg")

## Track model training

In this section, we'll use the following:
1. Optuna to train multiple scikit-learn regression models,
2. Neptune's [scikit-learn](https://docs.neptune.ai/integrations/sklearn/) and [Optuna](https://docs.neptune.ai/integrations/optuna/) integrations to automatically log metadata and metrics to Neptune for easy run comparison,
3. Neptune's [model registry](https://docs.neptune.ai/model_registry/overview/) to track models.

### Prepare the dataset

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing

data, target = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.25)

### Initialize the Neptune project

To connect to the Neptune app, you need to tell Neptune who you are (`api_token`) and where to send the data (`project`).

You can use the default code cell below to create an anonymous run in the public project [common/e2e-tracking](https://app.neptune.ai/common/e2e-tracking). **Note**: Public projects are cleaned regularly, so anonymous runs are only stored temporarily.

#### Log to your own project instead

Replace the code below with the following:

```python
import os
import neptune
from getpass import getpass

os.environ["NEPTUNE_API_TOKEN"]=getpass("Enter your Neptune API token: ")
os.environ["NEPTUNE_PROJECT"]="workspace-name/project-name",  # replace with your own

project = neptune.init_project()
```

In [None]:
import os
import neptune

os.environ["NEPTUNE_API_TOKEN"] = neptune.ANONYMOUS_API_TOKEN
os.environ["NEPTUNE_PROJECT"] = "common/e2e-tracking"

project = neptune.init_project(mode="read-only")

### Create a new model in the model registry
This model will serve as a placeholder for all the model versions created in different Optuna trials.

In [None]:
from neptune.exceptions import NeptuneModelKeyAlreadyExistsError

model_key = "RFR"

try:
    # Create a new model if it does not already exist
    npt_model = neptune.init_model(key=model_key)
except NeptuneModelKeyAlreadyExistsError:
    # Initialize the model if it already exists
    npt_model = neptune.init_model(with_id=f"{project['sys/id'].fetch()}-{model_key}")

### Create the Optuna objective function
We will create trial level runs and model versions within the objective function to capture trial-level metadata using Neptune's scikit-learn integration.

In [None]:
from sklearn.ensemble import RandomForestRegressor
from neptune.integrations.sklearn import create_regressor_summary, get_pickled_model


def objective(trial):
    param = {
        "n_estimators": trial.suggest_int("n_estimators", 2, 64),
        "max_depth": trial.suggest_int("max_depth", 2, 5),
        "min_samples_split": trial.suggest_int("min_samples_split", 3, 10),
    }

    # Create a trial-level run
    run_trial_level = neptune.init_run(
        capture_hardware_metrics=True,
        capture_stderr=True,
        capture_stdout=True,
        tags=["notebook", "trial-level"],
    )

    # Log study name and trial number to trial-level run
    run_trial_level["study-name"] = str(study.study_name)
    run_trial_level["trial-number"] = trial.number

    # Log parameters of a trial-level run
    run_trial_level["parameters"] = param

    # Train the model
    model = RandomForestRegressor(**param)
    model.fit(X_train, y_train)

    # Log model metadata to the trial level run
    run_trial_level["model_summary"] = create_regressor_summary(
        model, X_train, X_test, y_train, y_test
    )

    # Fetch objective score from the run
    run_trial_level.wait()
    score = run_trial_level["model_summary/test/scores/mean_absolute_error"].fetch()

    # Create a new model version
    model_version = neptune.init_model_version(model=f"{project['sys/id'].fetch()}-{model_key}")

    # Link model-version to the trial-level run
    model_version["training/run/id"] = run_trial_level["sys/id"].fetch()
    model_version["training/run/url"] = run_trial_level.get_url()

    run_trial_level["model_summary/id"] = model_version["sys/id"].fetch()
    run_trial_level["model_summary/url"] = model_version.get_url()

    # Log score to model version
    model_version["training/score"] = score

    # Upload model binary to model version
    model_version["saved_model"].upload(get_pickled_model(model))

    # Update model stage to "staging"
    model_version.change_stage("staging")

    # Stop model version and trial-level run
    model_version.stop()
    run_trial_level.stop()

    return score

### Create the Optuna study and a Neptune study-level run
This run will have all the study-level metadata from Optuna, and can be used to group and compare runs across multiple HPO sweeps/studies.

In [None]:
import optuna

study = optuna.create_study(direction="minimize")

In [None]:
run_study_level = neptune.init_run(
    capture_hardware_metrics=True,
    capture_stderr=True,
    capture_stdout=True,
    tags=["notebook", "study-level"],
    dependencies="infer",
)

run_study_level["study-name"] = study.study_name

### Initialize Neptune's Optuna callback
This will log the HPO sweeps and trials to the study-level run.

In [None]:
from neptune.integrations.optuna import NeptuneCallback

neptune_optuna_callback = NeptuneCallback(run_study_level)

### Run the hyperparameter-sweep with Neptune's Optuna callback

In [None]:
study.optimize(objective, n_trials=5, callbacks=[neptune_optuna_callback])

### Stop the study level run

In [None]:
run_study_level.stop()

## Compare the runs, and choose the best model to move to production

You can compare, choose, and promote models both manually using the Neptune web app, or programmatically using the Neptune Python client.

### Manually through the Neptune web app

In this section, we'll:
* Explore logged study and trial level metadata using [custom dashboards](https://docs.neptune.ai/app/custom_dashboard/)
* Compare trials and sweeps using [custom table views](https://docs.neptune.ai/app/custom_views/)
* Select the best model version and [update its stage](https://docs.neptune.ai/model_registry/managing_stage/) to "production"

#### 🔍 Explore the runs 
* Browse through the logged metadata, images, and charts in the study and trial-level runs.
* To view all important metadata in one place, create custom dashboards.

You can also browse these example custom dashboards:
* [Example study-level custom dashboard](https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/details?viewId=9b71ae0c-8946-4a21-9f1a-4062cad659a4&detailsTab=dashboard&dashboardId=9b71b35b-13fe-4b7d-9d24-a0922d3b07d3&shortId=EET-15&type=run)
* [Example trial-level custom dashboard](https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/details?viewId=9b71afba-648f-40b7-9c70-98dc99bebc66&detailsTab=dashboard&dashboardId=9b71b14d-9a28-42a0-ac9e-3fcb0f5985e6&shortId=EET-18&type=run)

#### ⚖️Compare trials and sweeps

* To sort trials, add metrics of importance to the **Experiments** table
* Select trials within or across sweeps for more granular comparison

You can also browse example custom views:
* [Studies sorted by score](https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/table?viewId=9b71ae0c-8946-4a21-9f1a-4062cad659a4&detailsTab=dashboard&dashboardId=Trial-level-9b71b14d-9a28-42a0-ac9e-3fcb0f5985e6&shortId=EET-4&dash=charts&compare=MwGgjOlkA)
* [Trials grouped by study](https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/table?viewId=9b71afba-648f-40b7-9c70-98dc99bebc66&detailsTab=dashboard&dashboardId=Trial-level-9b71b14d-9a28-42a0-ac9e-3fcb0f5985e6&shortId=EET-4&dash=charts)

and custom compare dashboards:
* [Compare studies](https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/compare?viewId=9b71ae0c-8946-4a21-9f1a-4062cad659a4&detailsTab=dashboard&dashboardId=Trial-level-9b71b14d-9a28-42a0-ac9e-3fcb0f5985e6&shortId=EET-4&dash=Compare-sweeps-9b71c0a2-b240-49bb-aa5d-2f0ee85289fa&compare=EwGgbCDsQ)
* [Compare trials](https://app.neptune.ai/o/showcase/org/e2e-tracking/runs/compare?viewId=9b71afba-648f-40b7-9c70-98dc99bebc66&detailsTab=dashboard&dashboardId=Trial-level-9b71b14d-9a28-42a0-ac9e-3fcb0f5985e6&shortId=EET-4&dash=Compare-trials-9b71bf85-329a-4855-91dc-c78d96e35079&compare=MwGgjOkQTFehR4g)

#### 🥇 Promote the best model to "production"

* Browse through all the model versions in the **Models** section
* Update the stage of the best model to "production"

You can also browse an [example model registry](https://app.neptune.ai/o/showcase/org/e2e-tracking/models?shortId=EET-RFR&type=model) *(read-only)*.

### Select the best model programmatically

All the comparisons and selections done manually above can also be automated programmatically.

#### Download the model versions table as a pandas dataframe

In [None]:
model_versions_df = npt_model.fetch_model_versions_table(
    columns=["sys/stage", "training/score"],
    sort_by="training/score",
    ascending=True,
    progress_bar=False,
).to_pandas()

model_versions_df

### Get scores and IDs of challenger and champion models

In [None]:
try:
    champion_model = model_versions_df[model_versions_df["sys/stage"] == "production"][
        "sys/id"
    ].values[0]
    champion_model_score = model_versions_df[model_versions_df["sys/stage"] == "production"][
        "training/score"
    ].values[0]
    print(f"Champion model ID: {champion_model} and score: {champion_model_score}")
    NO_CHAMPION = False
except IndexError:
    print("❌ No model found in production")
    NO_CHAMPION = True

In [None]:
staged_models = model_versions_df[model_versions_df["sys/stage"] == "staging"]
challenger_model_score = min(staged_models["training/score"])
challenger_model_id = staged_models[staged_models["training/score"] == challenger_model_score][
    "sys/id"
].values[0]

print(f"Challenger model ID: {challenger_model_id} and score: {challenger_model_score}")

### Promote challenger to champion if score is better

In [None]:
if NO_CHAMPION:
    print(f"Promoting {challenger_model_id} to Production")
    with neptune.init_model_version(with_id=challenger_model_id) as challenger_model:
        challenger_model.change_stage("production")

elif challenger_model_score < champion_model_score:
    print("Challenger is better than champion")

    print(f"Archiving champion model {champion_model}")
    with neptune.init_model_version(with_id=champion_model) as champion_model:
        champion_model.change_stage("archived")

    print(f"Promoting {challenger_model_id} to Production")
    with neptune.init_model_version(with_id=challenger_model_id) as challenger_model:
        challenger_model.change_stage("production")

else:
    print("Champion model is better than challenger")
    print(f"Archiving challenger model {challenger_model_id}")
    with neptune.init_model_version(with_id=challenger_model_id) as challenger_model:
        challenger_model.change_stage("archived")

## Monitor model in production
In this section, we'll:
1. Download the model binary from the model registry to make predictions in production
2. Use [EvidentlyAI](https://www.evidentlyai.com/) to monitor the model in production.

We'll use a modified version of a tutorial from the [Evidently documentation](https://docs.evidentlyai.com/get-started/tutorial).

### Setup

We'll use an example dataset and mock historical predictions to use as a reference.

In [None]:
import numpy as np


from evidently.report import Report
from evidently.metric_preset import RegressionPreset
from evidently.metrics import *

In [None]:
data = fetch_california_housing(as_frame=True)
housing_data = data.frame

housing_data.rename(columns={"MedHouseVal": "target"}, inplace=True)

reference = housing_data.sample(n=10000, replace=False)
reference["prediction"] = reference["target"].values + np.random.normal(0, 5, reference.shape[0])

current = housing_data.sample(n=10000, replace=False)

### Download saved model from model registry

In [None]:
model_versions_df = npt_model.fetch_model_versions_table(
    columns=["sys/stage"], progress_bar=False
).to_pandas()

production_model = model_versions_df[model_versions_df["sys/stage"] == "production"][
    "sys/id"
].values[0]

npt_model_version = neptune.init_model_version(with_id=production_model)
npt_model_version["saved_model"].download()

### Make predictions from downloaded model on current test data

In [None]:
import pickle as pkl

with open("saved_model.pkl", "rb") as f:
    model = pkl.load(f)

current["prediction"] = model.predict(current.drop(columns=["target"]))

current

### Generate regression report

In [None]:
reg_performance_report = Report(metrics=[RegressionPreset()])

reg_performance_report.run(reference_data=reference, current_data=current)

reg_performance_report.show()

### Upload report to the model

In [None]:
reg_performance_report.save_html("report.html")
npt_model_version["production/report"].upload("report.html")

### Upload metrics to the model

In [None]:
from neptune.utils import stringify_unsupported

npt_model_version["production/metrics"] = stringify_unsupported(
    reg_performance_report.as_dict()["metrics"][0]
)
npt_model_version.wait()

These metrics can then be fetched downstream to trigger a model refresh or retraining, if needed.

In [None]:
retraining_threshold = 0.5

if (
    npt_model_version["production/metrics/result/current/mean_abs_error"].fetch()
    > retraining_threshold
):
    print("Model degradation detected. Retraining model...")
    ...
else:
    print("Model performance within expectations")

## Stop Neptune objects

In [None]:
project.stop()
npt_model.stop()
npt_model_version.stop()