# Azure ML Model Monitoring Demo - Data Upload

Series of sample notebooks designed to showcase [AML's continuous model monitoring capabilities](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-cli). The series of notebooks in this repo have been developed to perform core operations including model training, deployment, simulated production data scoring, and inference data collection. These notebooks have been designed to be run in order and include the following steps:

- <b>00. Data Upload - Load time-series weather data from a local CSV into an AML datastore, and register as training & evaluation datasets</b>
- 01. Model Training - Train a custom temperature prediction regression model using Mlflow & Scikit-Learn and register into your AML workspace
- 02. Model Deployment - Deploy your newly trained model to a Managed Online Endpoint with production data collection configured.
- 03. Production Data Simulation - Send time-series data to your endpoint at a slow rate to simulate production inferencing. All submitted data will be collected automatically.
- 04. Monitoring Configuration - Configure a production model data monitor looking for drift in inferencing data, and scored results which can indicate that retraining should be performed.
- 05. Offline Monitoring - Sample notebook showcasing how to identify drift in data from datasets scored outside of Azure ML.

<b>This notebook uploads a CSV of time-series weather data collected between January and October 2019 which will be used as the basis for our demonstration. This dataset was chosen specifically as observed weather patterns will naturally drift between months which should register in our downstream analysis.</b>

### Install azureml-fsspec and mltable packages if not previously installed in environment

In [None]:
# ! pip install -U azureml-fsspec mltable

### Import required packages

In [None]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from mlflow import set_tracking_uri
import pandas as pd
import os
from azureml.fsspec import AzureMachineLearningFileSystem
import mltable
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

### Establish connection to Azure ML workspace using the v2 SDK

In [None]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from mlflow import set_tracking_uri

subscription_id = "<your_subscription_id>"
resource_group = "<your_resource_group>"
workspace_name = "<your_workspace_name>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace_name)
workspace = ml_client.workspaces.get(workspace_name)


### Load weather data from CSV and partition across months

Read CSV data into a pandas dataframe and spit into training (Jan-Mar) and validation (Apr-Oct) subsets. These months were chose as the weather patterns between them are characteristically different. Save all subsets to CSV files.

In [None]:
df = pd.read_csv('./CleanedWeatherData.csv')

df['month'] = pd.to_numeric(df['month'])

df_jan_mar_2019 = df[df['month'] < 4]
df_apr_oct_2019 = df[df['month'] >= 4]
df_jan_oct_2019 = df

df_jan_mar_2019.to_csv('./weather_training_data.csv', index=False)
df_apr_oct_2019.to_csv('./weather_eval_data.csv', index=False)
df_jan_oct_2019.to_csv('./weather_full_data.csv', index=False)

### Upload subsetted weather data to Azure ML datastore

In [None]:
from azureml.fsspec import AzureMachineLearningFileSystem

datastore_name = 'workspaceblobstore' # default
path_on_datastore = 'weather_data'

# long-form Datastore uri format:
uri = f'azureml://subscriptions/{subscription_id}/resourcegroups/{resource_group}/workspaces/{workspace_name}/datastores/{datastore_name}/paths/'
uri

# instantiate file system using following URI
fs = AzureMachineLearningFileSystem(uri)

# you can specify recursive as False to upload a file
fs.upload(lpath='./weather_training_data.csv', rpath='weather_data', recursive=False, **{'overwrite': 'MERGE_WITH_OVERWRITE'})
fs.upload(lpath='./weather_eval_data.csv', rpath='weather_data', recursive=False, **{'overwrite': 'MERGE_WITH_OVERWRITE'})
fs.upload(lpath='./weather_full_data.csv', rpath='weather_data', recursive=False, **{'overwrite': 'MERGE_WITH_OVERWRITE'})

### Register uploaded datasets as reusable data assets within the AML workspace

In [None]:
tbl = mltable.from_delimited_files([{'pattern': uri + 'weather_data/weather_training_data.csv'}])
tbl.save('./training_data')

training_data = Data(
    path = './training_data',
    type = AssetTypes.MLTABLE,
    description = 'January to March 2019 Weather Data',
    name='weather-training-data',
    version="5"
)
ml_client.data.create_or_update(training_data)

tbl = mltable.from_delimited_files([{'pattern': uri + 'weather_data/weather_eval_data.csv'}])
tbl.save('./eval_data')

eval_data = Data(
    path = './eval_data',
    type = AssetTypes.MLTABLE,
    description = 'April to October 2019 Weather Data',
    name='weather-evaluation-data',
    version="5"
)
ml_client.data.create_or_update(eval_data)

tbl = mltable.from_delimited_files([{'pattern': uri + 'weather_data/weather_full_data.csv'}])
tbl.save('./full_data')

full_data = Data(
    path = './full_data',
    type = AssetTypes.MLTABLE,
    description = 'January to October 2019 Weather Data',
    name='weather-full-data',
    version="5"
)
ml_client.data.create_or_update(full_data)