# Azure ML Model Monitoring Demo - Offline Monitoring (IN PROGRESS)

Series of sample notebooks designed to showcase [AML's continuous model monitoring capabilities](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-cli). The series of notebooks in this repo have been developed to perform core operations including model training, deployment, simulated production data scoring, and inference data collection. These notebooks have been designed to be run in order and include the following steps:

- 00. Data Upload - Load time-series weather data from a local CSV into an AML datastore, and register as training & evaluation datasets
- 01. Model Training - Train a custom temperature prediction regression model using Mlflow & Scikit-Learn and register into your AML workspace
- 02. Model Deployment - Deploy your newly trained model to a Managed Online Endpoint with production data collection configured.
- 03. Production Data Simulation - Send time-series data to your endpoint at a slow rate to simulate production inferencing. All submitted data will be collected automatically.
- 04. Monitoring Configuration - Configure a production model data monitor looking for drift in inferencing data, and scored results which can indicate that retraining should be performed.
- <b>05. Offline Monitoring - Sample notebook showcasing how to identify drift in data from datasets scored outside of Azure ML.</b>

<b>This notebook is designed to showcase data drift monitoring in an offline mode. One capability of model monitoring is the ability to analyze production data scored outside of AML. In this scenario, you will need to register your dataset in AML and then can configure your monitor to analyze continuously updating production datasets. This functionality, and the sample below are based on [this example in Microsoft's documentation](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=python#set-up-model-monitoring-by-bringing-your-own-production-data-to-azure-machine-learning)</b>

### Import required packages

In [None]:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment, Environment, CodeConfiguration, DataCollector, DeploymentCollection
from azure.identity import DefaultAzureCredential
from mlflow import set_tracking_uri
import mlflow

from azure.identity import InteractiveBrowserCredential
from azure.ai.ml import Input, MLClient
from azure.ai.ml.constants import (
    MonitorFeatureType,
    MonitorMetricName,
    MonitorDatasetContext
)
from azure.ai.ml.entities import (
    AlertNotification,
    DataDriftSignal,
    DataQualitySignal,
    DataDriftMetricThreshold,
    DataQualityMetricThreshold,
    MonitorFeatureFilter,
    MonitorInputData,
    MonitoringTarget,
    MonitorDefinition,
    MonitorSchedule,
    RecurrencePattern,
    RecurrenceTrigger,
    SparkResourceConfiguration,
    TargetDataset
)

### Establish connection to Azure ML workspace

In [None]:
subscription_id = "<your_subscription_id>"
resource_group = "<your_resource_group>"
workspace_name = "<your_workspace_name>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace_name)
workspace = ml_client.workspaces.get(workspace_name)
tracking_uri = workspace.mlflow_tracking_uri

set_tracking_uri(tracking_uri)

### Configure data monitor

Here, the two datasets we will use for drift analysis are the `weather-training-data` dataset we used in Notebook 01 to train our model as a baseline, along with the `scored-weather-evaluation-data` dataset we scored with our trained model, and subsequently registered.

In [None]:
spark_configuration = SparkResourceConfiguration(
    instance_type="standard_e4s_v3",
    runtime_version="3.2"
)

#define target dataset (production dataset)
input_data = MonitorInputData(
    input_dataset=Input(
        type="mltable",
        path="azureml:scored-weather-evaluation-data:1"
    ),
    dataset_context=MonitorDatasetContext.MODEL_INPUTS
)

input_data_target = TargetDataset(dataset=input_data)

# training data to be used as baseline dataset
input_data_baseline = MonitorInputData(
    input_dataset=Input(
        type="mltable",
        path="azureml:weather-training-data:4"
    ),
    dataset_context=MonitorDatasetContext.TRAINING
)

# create an advanced data drift signal
features = MonitorFeatureFilter(top_n_feature_importance=20)
numerical_metric_threshold = DataDriftMetricThreshold(
    applicable_feature_type=MonitorFeatureType.NUMERICAL,
    metric_name=MonitorMetricName.JENSEN_SHANNON_DISTANCE,
    threshold=0.01
)
categorical_metric_threshold = DataDriftMetricThreshold(
    applicable_feature_type=MonitorFeatureType.CATEGORICAL,
    metric_name=MonitorMetricName.PEARSONS_CHI_SQUARED_TEST,
    threshold=0.02
)
metric_thresholds = [numerical_metric_threshold, categorical_metric_threshold]

advanced_data_drift = DataDriftSignal(
    target_dataset=input_data_target,
    baseline_dataset=input_data_baseline,
    features=features,
    metric_thresholds=metric_thresholds
)

# create an advanced data quality signal
features = MonitorFeatureFilter(top_n_feature_importance=20)
numerical_metric_threshold = DataQualityMetricThreshold(
    applicable_feature_type=MonitorFeatureType.NUMERICAL,
    metric_name=MonitorMetricName.NULL_VALUE_RATE,
    threshold=0.01
)
categorical_metric_threshold = DataQualityMetricThreshold(
    applicable_feature_type=MonitorFeatureType.CATEGORICAL,
    metric_name=MonitorMetricName.OUT_OF_BOUND_RATE,
    threshold=0.02
)
metric_thresholds = [numerical_metric_threshold, categorical_metric_threshold]

advanced_data_quality = DataQualitySignal(
    target_dataset=input_data_target,
    baseline_dataset=input_data_baseline,
    features=features,
    metric_thresholds=metric_thresholds,
    alert_enabled="False"
)

# put all monitoring signals in a dictionary
monitoring_signals = {
    'data_drift_advanced': advanced_data_drift,
    'data_quality_advanced': advanced_data_quality
}

# create alert notification object
alert_notification = AlertNotification(
    emails=['nick.kwiecien@microsoft.com']
)

# Finally monitor definition
monitor_definition = MonitorDefinition(
    compute=spark_configuration,
    monitoring_signals=monitoring_signals,
    alert_notification=alert_notification
)

recurrence_trigger = RecurrenceTrigger(
    frequency="day",
    interval=1,
    schedule=RecurrencePattern(hours=3, minutes=15)
)

model_monitor = MonitorSchedule(
    name="offline-weather-data-monitoring",
    trigger=recurrence_trigger,
    create_monitor=monitor_definition
)

poller = ml_client.schedules.begin_create_or_update(model_monitor)
created_monitor = poller.result()