# Model monitoring and drift detection

This tutorial illustrates the basic model monitoring capabilities of MLRun: deploying a model to a live endpoint and 
calculating data drift.

See the overview to model monitoring in {ref}`model-monitoring`.

Make sure you have reviewed the basics in MLRun [**Quick Start Tutorial**](../01-mlrun-basics.html).

## MLRun installation and configuration

Before running this notebook make sure `mlrun` is installed and that you have configured the access to the MLRun service. 

## Set up the project

First, import the dependencies and create an [MLRun project](https://docs.mlrun.org/en/latest/projects/project.html). This  contains all of the models, functions, datasets, etc.

In [1]:
%config Completer.use_jedi = False
import os
import pandas as pd
from sklearn.datasets import load_iris
import mlrun
from mlrun import import_function, get_dataitem, get_or_create_project
import uuid

project_name = "mm-app-project"
project = get_or_create_project(project_name, context="./")

> 2024-03-27 11:22:46,181 [info] Loading project from path: {'project_name': 'mm-app-project', 'path': './'}
> 2024-03-27 11:23:01,621 [info] Project loaded successfully: {'project_name': 'mm-app-project', 'path': './', 'stored_in_db': True}


```{admonition} Note
This tutorial does not focus on training a model. Instead, it starts with a trained model and its corresponding training dataset.
```

## Enable model monitoring

Model monitoring is enabled per project. {py:meth}`~mlrun.projects.MlrunProject.enable_model_monitoring` brings up the controller and schedules it according to the `base_period`, and deploys the writer.

The controller runs, by default, every 10 minutes, which is also the minimum interval. You can modify the frequency with the parameter `base_period`. To change the `base_period`, first run {py:meth}`~mlrun.projects.MlrunProject.disable_model_monitoring`, then run `enable_model_monitoring` with the new `base_period` value. 

In [20]:
project.enable_model_monitoring(base_period=1)

{'kind': 'job',
 'metadata': {'name': 'model-monitoring-controller',
  'tag': 'latest',
  'hash': '9824dcff5ff5a7c39e9c2abdddd2083ae0ae2709',
  'project': 'mm-app-project',
  'categories': [],
  'updated': '2024-03-27T22:41:00.042253+00:00',
  'credentials': {'access_key': '$ref:mlrun-auth-secrets.2042be5df933a276c0d947d7082c1126940f4c4efa1b79184cbd4aac'}},
 'spec': {'command': '',
  'args': [],
  'image': 'mlrun/mlrun',
  'build': {'functionSourceCode': 'IyBDb3B5cmlnaHQgMjAyMyBJZ3VhemlvCiMKIyBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNpb24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKIyB5b3UgbWF5IG5vdCB1c2UgdGhpcyBmaWxlIGV4Y2VwdCBpbiBjb21wbGlhbmNlIHdpdGggdGhlIExpY2Vuc2UuCiMgWW91IG1heSBvYnRhaW4gYSBjb3B5IG9mIHRoZSBMaWNlbnNlIGF0CiMKIyAgIGh0dHA6Ly93d3cuYXBhY2hlLm9yZy9saWNlbnNlcy9MSUNFTlNFLTIuMAojCiMgVW5sZXNzIHJlcXVpcmVkIGJ5IGFwcGxpY2FibGUgbGF3IG9yIGFncmVlZCB0byBpbiB3cml0aW5nLCBzb2Z0d2FyZQojIGRpc3RyaWJ1dGVkIHVuZGVyIHRoZSBMaWNlbnNlIGlzIGRpc3RyaWJ1dGVkIG9uIGFuICJBUyBJUyIgQkFTSVMsCiMgV0lUSE9VVCBXQVJS

## Log the model artifacts

See full parameter details in {py:meth}`~mlrun.projects.MlrunProject.log_model`.

In [3]:
iris = load_iris()
train_set = pd.DataFrame(
    iris["data"],
    columns=["sepal_length_cm", "sepal_width_cm", "petal_length_cm", "petal_width_cm"],
)

model_name = "RandomForestClassifier"
project.log_model(
    model_name,
    model_file="model.pkl",
    training_set=train_set,
    framework="sklearn",
)

<mlrun.artifacts.model.ModelArtifact at 0x7f2f1f7cadf0>

## Import, enable monitoring, and deploy the serving function

Import the [model server](https://github.com/mlrun/functions/tree/master/v2_model_server) function from the [MLRun Function Hub](https://www.mlrun.org/hub/), add the model that was logged via experiment tracking, and enable drift detection.

The core line here is `serving_fn.set_tracking()`, which creates the required infrastructure behind the scenes to perform drift detection. See {ref}`model-monitoring` for more details on what is deployed.

Then you deploy the serving function with drift detection enabled with a single line of code.

The result of this step is that the model-monitoring stream pod writes data to Parquet, by model endpoint. Every base period, the controller checks for new data and if it finds, sends it to the relevant app.

In [4]:
# Import the serving function
serving_fn = import_function(
    "hub://v2_model_server", project=project_name, new_name="serving"
)

# Add the model to the serving function's routing spec
serving_fn.add_model(
    model_name, model_path=f"store://models/{project_name}/{model_name}:latest"
)

# Enable monitoring on this serving function
serving_fn.set_tracking()

serving_fn.spec.build.requirements = ["scikit-learn"]

# Deploy the serving function
project.deploy_function(serving_fn)

> 2024-03-27 11:23:03,268 [info] Starting remote function deploy
2024-03-27 11:23:05  (info) Deploying function
2024-03-27 11:23:05  (info) Building
2024-03-27 11:23:06  (info) Staging files and preparing base images
2024-03-27 11:23:06  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-03-27 11:23:06  (info) Building processor image
2024-03-27 11:24:41  (info) Build complete
2024-03-27 11:24:50  (info) Function deploy complete
> 2024-03-27 11:24:57,598 [info] Successfully deployed function: {'internal_invocation_urls': ['nuclio-mm-app-project-serving.default-tenant.svc.cluster.local:8080'], 'external_invocation_urls': ['mm-app-project-serving.default-tenant.app.dev13.lab.iguazeng.com/']}


DeployStatus(state=ready, outputs={'endpoint': 'http://mm-app-project-serving.default-tenant.app.dev13.lab.iguazeng.com/', 'name': 'mm-app-project-serving'})

## View deployed resources

At this point, you should see the controller and the model-monitoring-batch jobs in the UI under **Projects | Jobs and Workflows**.

## Invoke the model

See full parameter details in {py:meth}`~mlrun.runtimes.RemoteRuntime.invoke`.

In [5]:
import json
from time import sleep
from random import choice, uniform

iris = load_iris()
iris_data = iris["data"].tolist()

model_name = "RandomForestClassifier"
serving_1 = project.get_function("serving")
0
for i in range(150):
    # data_point = choice(iris_data)
    data_point = [0.5, 0.5, 0.5, 0.5]
    serving_1.invoke(
        f"v2/models/{model_name}/infer", json.dumps({"inputs": [data_point]})
    )
    sleep(choice([0.01, 0.04]))

> 2024-03-27 11:24:57,744 [info] Invoking function: {'method': 'POST', 'path': 'http://nuclio-mm-app-project-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer'}
> 2024-03-27 11:24:58,156 [info] Invoking function: {'method': 'POST', 'path': 'http://nuclio-mm-app-project-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer'}
> 2024-03-27 11:24:58,184 [info] Invoking function: {'method': 'POST', 'path': 'http://nuclio-mm-app-project-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer'}
> 2024-03-27 11:24:58,241 [info] Invoking function: {'method': 'POST', 'path': 'http://nuclio-mm-app-project-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer'}
> 2024-03-27 11:24:58,300 [info] Invoking function: {'method': 'POST', 'path': 'http://nuclio-mm-app-project-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer'}
> 2024-03-27 11:24:5

At this stage you can see the model endpoints and minimal meta data (for example, last prediction and average latency) in the **Models | Model Endpoints** page. 

<img src="./_static/images/model_endpoint_1.png" width="1000" >

You can also see the basic statistics in Grafana.

## Register and deploy the model-monitoring jobs

The next step is to deploy the model-monitoring job to generate the full meta data. Add the monitoring function to the project using {py:meth}`~mlrun.projects.MlrunProject.set_model_monitoring_function`. 
Then, deploy the function using {py:meth}`~mlrun.projects.MlrunProject.deploy_function`.

This example illustrates two monitoring jobs:
- The first is the default [batch monitoring job](https://www.mlrun.org/hub/functions/master/model-monitoring-batch/).
- The second integrates [Evidently](https://github.com/evidentlyai/evidently) as an MLRun function to create MLRun artifacts.

After deploying the jobs they show in the UI under Real-time functions (Nuclio).

In [6]:
# register the first app named "demo_app"
my_app = project.set_model_monitoring_function(
    func="./assets/demo_app.py",
    application_class="DemoMonitoringApp",
    name="myApp",
)

project.deploy_function(my_app)

> 2024-03-27 11:25:05,047 [info] Starting remote function deploy
2024-03-27 11:25:05  (info) Deploying function
2024-03-27 11:25:05  (info) Building
2024-03-27 11:25:06  (info) Staging files and preparing base images
2024-03-27 11:25:06  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-03-27 11:25:06  (info) Building processor image
2024-03-27 11:26:31  (info) Build complete
2024-03-27 11:27:03  (info) Function deploy complete
> 2024-03-27 11:27:07,628 [info] Successfully deployed function: {'internal_invocation_urls': ['nuclio-mm-app-project-myapp.default-tenant.svc.cluster.local:8080'], 'external_invocation_urls': ['mm-app-project-myapp.default-tenant.app.dev13.lab.iguazeng.com/']}


DeployStatus(state=ready, outputs={'endpoint': 'http://mm-app-project-myapp.default-tenant.app.dev13.lab.iguazeng.com/', 'name': 'mm-app-project-myapp'})

In [7]:
# register the second app named "evidently_app"
my_evidently_app = project.set_model_monitoring_function(
    func="./assets/evidently_app.py",
    image="mlrun/mlrun",
    requirements=[
        "evidently~=0.4.3",
    ],
    name="MyEvidentlyApp",
    application_class="DemoEvidentlyMonitoringApp",
    evidently_workspace_path=os.path.abspath(
        f"/v3io/projects/{project_name}/artifacts/evidently_workspace"
    ),
    evidently_project_id=str(uuid.uuid4()),
)

project.deploy_function(my_evidently_app)

> 2024-03-27 11:27:07,787 [info] Starting remote function deploy
2024-03-27 11:27:08  (info) Deploying function
2024-03-27 11:27:08  (info) Building
2024-03-27 11:27:08  (info) Staging files and preparing base images
2024-03-27 11:27:08  (warn) Using user provided base image, runtime interpreter version is provided by the base image
2024-03-27 11:27:08  (info) Building processor image
2024-03-27 11:28:33  (info) Build complete
2024-03-27 11:29:09  (info) Function deploy complete
> 2024-03-27 11:29:10,021 [info] Successfully deployed function: {'internal_invocation_urls': ['nuclio-mm-app-project-myevidentlyapp.default-tenant.svc.cluster.local:8080'], 'external_invocation_urls': ['mm-app-project-myevidentlyapp.default-tenant.app.dev13.lab.iguazeng.com/']}


DeployStatus(state=ready, outputs={'endpoint': 'http://mm-app-project-myevidentlyapp.default-tenant.app.dev13.lab.iguazeng.com/', 'name': 'mm-app-project-myevidentlyapp'})

## Invoke the model again
The controller checks for new datasets every `base_period` to send to the app. Invoking the model a second time ensures that the previous window closed and therefore the data contains the full monitoring window. From this point on, the applications are triggered by the controller. The controller checks the Parquet DB every 10 minutes (or non-default 
`base_period`) and streams any new data to the app.

In [13]:
import json
from time import sleep
from random import choice, uniform

iris = load_iris()
iris_data = iris["data"].tolist()

model_name = "RandomForestClassifier"
serving_1 = project.get_function("serving")

for i in range(150):
    data_point = choice(iris_data)
    # data_point = [0.5,0.5,0.5,0.5]
    serving_1.invoke(
        f"v2/models/{model_name}/infer", json.dumps({"inputs": [data_point]})
    )
    sleep(choice([0.01, 0.04]))

> 2024-03-27 11:34:53,930 [info] Invoking function: {'method': 'POST', 'path': 'http://nuclio-mm-app-project-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer'}
> 2024-03-27 11:34:53,964 [info] Invoking function: {'method': 'POST', 'path': 'http://nuclio-mm-app-project-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer'}
> 2024-03-27 11:34:54,022 [info] Invoking function: {'method': 'POST', 'path': 'http://nuclio-mm-app-project-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer'}
> 2024-03-27 11:34:54,082 [info] Invoking function: {'method': 'POST', 'path': 'http://nuclio-mm-app-project-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer'}
> 2024-03-27 11:34:54,141 [info] Invoking function: {'method': 'POST', 'path': 'http://nuclio-mm-app-project-serving.default-tenant.svc.cluster.local:8080/v2/models/RandomForestClassifier/infer'}
> 2024-03-27 11:34:5

Now you can view the application results. 

<img src="./_static/images/mm-myapp.png" >

And if you've used Evidently:

<img src="./_static/images/mm-logger-dashb-evidently.png" >

<img src="./_static/images/mm-evidently.png" >



## View the status of the model monitoring jobs 

View the model monitoring jobs in Jobs and Workflows. Model monitoring jobs run continuously, therefore they should 
have a blue dot indicating that the function is running. (A green dot indicates that the job completed.)

For more information on the UI, see [Model monitoring using the platform UI](../monitoring/model-monitoring-deployment.html#model-monitoring-in-the-platform-ui).

<img src="./_static/images/mm-monitor-jobs.png" >

<a id="view-dashboards"></a>
## View detailed drift dashboards

Grafana has detailed dashboards that show additional information on each model in the project:

For more information on the dashboards, see [Model monitoring using Grafana dashboards](../monitoring/model-monitoring-deployment.html#model-monitoring-using-grafana-dashboards).

![grafana_dashboard_1](./_static/images/grafana_dashboard_1.png)

Graphs of individual features over time:

![grafana_dashboard_2](./_static/images/grafana_dashboard_2.png)

As well as drift and operational metrics over time:

![grafana_dashboard_3](./_static/images/grafana_dashboard_3.png)

## Done!
Congratulations! You’ve completed Part 5 of the MLRun getting-started tutorial. To continue, proceed to [Part 6 Batch inference and drift detection](./07-batch-infer).