![Neptune + Evidently](https://neptune.ai/wp-content/uploads/2023/09/evidently.svg)

# Neptune + Evidently

<a target="_blank" href="https://colab.research.google.com/github/neptune-ai/examples/blob/main/integrations-and-supported-tools/evidently/notebooks/Neptune_Evidently.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a><a target="_blank" href="https://github.com/neptune-ai/examples/blob/main/integrations-and-supported-tools/evidently/notebooks/Neptune_Evidently.ipynb">
  <img alt="Open in GitHub" src="https://img.shields.io/badge/Open_in_GitHub-blue?logo=github&labelColor=black">
</a><a target="_blank" href="https://app.neptune.ai/o/common/org/evidently-support/runs/table?viewId=9b014afd-cdc8-4f08-9d0f-70b343e7f4d2&detailsTab=dashboard&dashboardId=9917f940-757a-424d-879e-7781a00bf0c3&shortId=EV-7&type=run"> 
  <img alt="Explore in Neptune" src="https://neptune.ai/wp-content/uploads/2024/01/neptune-badge.svg">
</a><a target="_blank" href="https://docs.neptune.ai/integrations/evidently/">
  <img alt="View tutorial in docs" src="https://neptune.ai/wp-content/uploads/2024/01/docs-badge-2.svg">
</a>

## Introduction

[Evidently](https://www.evidentlyai.com/) is an open-source tool to evaluate, test, and monitor machine learning models.
This guide will show you how to:

* Upload Evidently's interactive reports to Neptune,
* Log report values as key-value pairs in Neptune, 
* Log and visualize production data drift using Evidently and Neptune.

## Before you start

This notebook example lets you try out Neptune as an anonymous user, with zero setup.

If you want to see the example logged to your own workspace instead:

  1. Create a Neptune account. [Register &rarr;](https://neptune.ai/register)
  1. Create a Neptune project that you will use for tracking metadata. For instructions, see [Creating a project](https://docs.neptune.ai/setup/creating_project) in the Neptune docs.

## Install Neptune and dependencies

In [None]:
%pip install -U evidently neptune pandas scikit-learn
%pip install -U --user scikit-learn

## Import libraries

In [None]:
from sklearn import datasets

from evidently.test_suite import TestSuite
from evidently.test_preset import DataStabilityTestPreset

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

## Log reports

This section shows how you can log Evidently test suites and reports to Neptune.  
You can find the entire list of pretests in the [Evidently documentation](https://docs.evidentlyai.com/presets/all-presets).

### Load data

In [None]:
iris_frame = datasets.load_iris(as_frame=True).frame

### Run Evidently test suites and reports

In [None]:
data_stability = TestSuite(
    tests=[
        DataStabilityTestPreset(),
    ]
)
data_stability.run(
    current_data=iris_frame.iloc[:60], reference_data=iris_frame.iloc[60:], column_mapping=None
)
data_stability

In [None]:
data_drift_report = Report(
    metrics=[
        DataDriftPreset(),
    ]
)

data_drift_report.run(
    current_data=iris_frame.iloc[:60], reference_data=iris_frame.iloc[60:], column_mapping=None
)
data_drift_report

### (Neptune) Start a run

To create a new run for tracking the metadata, you tell Neptune who you are (`api_token`) and where to send the data (`project`).

You can use the default code cell below to create an anonymous run in the public project [common/evidently-support](https://app.neptune.ai/o/common/org/evidently-support). **Note**: Public projects are cleaned regularly, so anonymous runs are only stored temporarily.

### Log to your own project instead

Replace the code below with the following:

```python
import neptune
from getpass import getpass

run = neptune.init_run(
    project="workspace-name/project-name",  # replace with your own (see instructions below)
    api_token=getpass("Enter your Neptune API token: "),
    tags=["reports"],  # (optional) replace with your own
)
```

To find your API token and full project name:

1. [Log in to Neptune](https://app.neptune.ai/).
1. In the bottom-left corner, expand your user menu and select **Get your API token**.
1. The workspace name is displayed in the top-left corner of the app.

    To copy the project path, in the top-right corner, open the settings menu and select **Properties**.

For more help, see [Setting Neptune credentials](https://docs.neptune.ai/setup/setting_credentials) in the Neptune docs.

In [None]:
import neptune

run = neptune.init_run(
    api_token=neptune.ANONYMOUS_API_TOKEN,
    project="common/evidently-support",
    tags=["reports"],  # (optional) replace with your own
)

**To open the run in the Neptune web app, click the link that appeared in the cell output.**

We'll use the `run` object we just created to log metadata. You'll see the metadata appear in the app.

### (Neptune) Save reports as HTML

Using Neptune's HTML previewer, you can view and interact with Evidently's rich HTML reports on Neptune.

In [None]:
data_stability.save_html("data_stability.html")
data_drift_report.save_html("data_drift_report.html")

In [None]:
run["data_stability/report"].upload("data_stability.html")
run["data_drift/report"].upload("data_drift_report.html")

### (Neptune) Save reports as dict
By saving Evidently's results as a dictionary to Neptune, you can have programmatic access to them to use in your CI/CD pipelines.

In [None]:
from neptune.utils import stringify_unsupported

run["data_stability"] = stringify_unsupported(data_stability.as_dict())
run["data_drift"] = stringify_unsupported(data_drift_report.as_dict())

### Stop logging

Once you are done logging, stop tracking the run.

In [None]:
run.stop()

### Analyze logged reports in the Neptune app

Explore the run (reports, dictionaries) in the Neptune app, or check this [example dashboard](https://app.neptune.ai/o/common/org/evidently-support/runs/details?viewId=standard-view&detailsTab=dashboard&dashboardId=9917f940-757a-424d-879e-7781a00bf0c3&shortId=EV-7&type=run).

## Log production data drift
This section shows how you can use Evidently to evaluate production data drift and log the results to Neptune.

### Load sample dataset

In [None]:
import pandas as pd

! curl https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip --create-dirs -o data/Bike-Sharing-Dataset.zip
! unzip -o data/Bike-Sharing-Dataset.zip -d data

bike_df = pd.read_csv("data/hour.csv")
bike_df["datetime"] = pd.to_datetime(bike_df["dteday"])
bike_df["datetime"] += pd.to_timedelta(bike_df.hr, unit="h")
bike_df.set_index("datetime", inplace=True)
bike_df = bike_df[
    [
        "season",
        "holiday",
        "workingday",
        "weathersit",
        "temp",
        "atemp",
        "hum",
        "windspeed",
        "casual",
        "registered",
        "cnt",
    ]
]
bike_df

For demonstration purposes, we treat this data as the input data for a live model. To use with production models, the prediction logs should be available.

### Define column mapping for Evidently

In [None]:
from evidently import ColumnMapping

data_columns = ColumnMapping()
data_columns.numerical_features = ["weathersit", "temp", "atemp", "hum", "windspeed"]
data_columns.categorical_features = ["holiday", "workingday"]

### Define what to log
Specify which metrics you want to calculate. In this case, you can generate the Data Drift report and log the drift score for each feature.

In [None]:
def eval_drift(reference, production, column_mapping):
    data_drift_report = Report(metrics=[DataDriftPreset()])
    data_drift_report.run(
        reference_data=reference, current_data=production, column_mapping=column_mapping
    )
    report = data_drift_report.as_dict()

    drifts = []

    for feature in column_mapping.numerical_features + column_mapping.categorical_features:
        drifts.append(
            (feature, report["metrics"][1]["result"]["drift_by_columns"][feature]["drift_score"])
        )

    return drifts

### Define the comparison windows

Specify the period that is considered reference: Evidently will use it as the base for the comparison. Then, you should choose the periods to treat as experiments. This emulates the production model runs.

In [None]:
# Set reference dates
reference_dates = ("2011-01-01 00:00:00", "2011-06-30 23:00:00")

# Set experiment batches dates
experiment_batches = [
    ("2011-07-01 00:00:00", "2011-07-31 00:00:00"),
    ("2011-08-01 00:00:00", "2011-08-31 00:00:00"),
    ("2011-09-01 00:00:00", "2011-09-30 00:00:00"),
    ("2011-10-01 00:00:00", "2011-10-31 00:00:00"),
    ("2011-11-01 00:00:00", "2011-11-30 00:00:00"),
    ("2011-12-01 00:00:00", "2011-12-31 00:00:00"),
    ("2012-01-01 00:00:00", "2012-01-31 00:00:00"),
    ("2012-02-01 00:00:00", "2012-02-29 00:00:00"),
    ("2012-03-01 00:00:00", "2012-03-31 00:00:00"),
    ("2012-04-01 00:00:00", "2012-04-30 00:00:00"),
    ("2012-05-01 00:00:00", "2012-05-31 00:00:00"),
    ("2012-06-01 00:00:00", "2012-06-30 00:00:00"),
    ("2012-07-01 00:00:00", "2012-07-31 00:00:00"),
    ("2012-08-01 00:00:00", "2012-08-31 00:00:00"),
    ("2012-09-01 00:00:00", "2012-09-30 00:00:00"),
    ("2012-10-01 00:00:00", "2012-10-31 00:00:00"),
    ("2012-11-01 00:00:00", "2012-11-30 00:00:00"),
    ("2012-12-01 00:00:00", "2012-12-31 00:00:00"),
]

### (Neptune) Run and log drifts to Neptune

In [None]:
import uuid
from datetime import datetime

custom_run_id = str(uuid.uuid4())

for date in experiment_batches:
    with neptune.init_run(
        api_token=neptune.ANONYMOUS_API_TOKEN,
        project="common/evidently-support",
        custom_run_id=custom_run_id,  # Passing a custom run ID ensures that the metrics are logged to the same run.
        tags=["prod monitoring"],  # (optional) replace with your own
    ) as run:
        metrics = eval_drift(
            bike_df.loc[reference_dates[0] : reference_dates[1]],
            bike_df.loc[date[0] : date[1]],
            column_mapping=data_columns,
        )

        for feature in metrics:
            run["drift"][feature[0]].append(
                round(feature[1], 3),
                timestamp=datetime.strptime(date[0], "%Y-%m-%d %H:%M:%S").timestamp(),
            )
            # Passing a timestamp in the append methods lets you visualize the date in the x-axis of the charts

### Analyze logged drifts in the Neptune app
Go to the run link and explore the drifts in the **Charts** dashboard. You might have to change the x-axis from **Step** to **Time (absolute)**.
You can also explore this [example Drifts dashboard](https://app.neptune.ai/o/common/org/evidently-support/runs/details?viewId=standard-view&detailsTab=dashboard&dashboardId=9918072b-90f2-4963-a3a1-e857acd6e65c&shortId=EV-8&type=run).