# Train and deploy a model

> [!NOTE] Must use Python 3.10 SDK V2 for this demo.

## Train a model

Initiate a connection to the Azure ML workspace and set up MLflow for tracking.

In [None]:
## Train a model

# Handle to the workspace
from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient
import mlflow

ml_client = MLClient.from_config(
    DefaultAzureCredential()
)

# Gather MLflow URI information from workspace
azureml_mlflow_uri = ml_client.workspaces.get(ml_client.workspace_name).mlflow_tracking_uri
mlflow.set_tracking_uri(azureml_mlflow_uri)

Import necessary libraries and set up the experiment in MLflow.

In [None]:
# Import python packages
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score
import numpy as np
import os

experiment_name = "Monitoring-Models-Experiment"
mlflow.set_experiment(experiment_name)

Load the dataset, convert it to a Pandas DataFrame, and prepare the directory for model saving.

In [None]:
import mltable

# iterate over all versions of the data asset

data_asset = ml_client.data.get("diabetes-mltable-dev", label="latest")

tbl = mltable.load(data_asset.path)

df = tbl.to_pandas_dataframe()
df

model_path = "./models/monitoring"
os.makedirs(model_path, exist_ok=True)

Start logging the training process in MLflow, train a Decision Tree model, and log the model performance metrics.

In [None]:
# delete model directory if it exists
import shutil
if os.path.exists(model_path):
    shutil.rmtree(model_path)

# Start Logging
mlflow.start_run()

# Enable autologging (optional)
# mlflow.sklearn.autolog()

diabetes = df

# Breaking up data into input/target features
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Breaking data into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Training a model:
model = DecisionTreeClassifier().fit(X_train, y_train)

# Calculating performance and logging them
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
mlflow.log_metric('Accuracy', float(acc))

y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
mlflow.log_metric('AUC', float(auc))

Infer the model signature, register the model to the workspace, and save the model to a file.

In [None]:
from mlflow.models import infer_signature

signature = infer_signature(X_test, y_hat)

# Registering the model to the workspace
print("Registering the model via MLFlow")
mlflow.sklearn.log_model(
    sk_model=model,
    registered_model_name="monitoring-diabetes",
    artifact_path="model",
    signature=signature,
)

# Saving the model to a file
mlflow.sklearn.save_model(
    sk_model=model, 
    path=model_path,
    signature=signature
)

# Stop logging
mlflow.end_run()


## Deploy a Model

After training and registering the model, it's time to deploy it. The easiest and most explaining way is to use the AzureML Studio UI.

Under `Models` select the recently trained model and select `Deploy` as a `Real time endpoint`. 

- This action will create a new Deployment on a new or existing endpoint (as you can deploy multiple models behind one endpoint).
- Keep the instances at 3 instances, so that we can add autoscaling later.
- Make sure to enable data collection (still in preview as of Nov '23).

The deployment will take a few minutes, make sure to grab a cup of coffee. 

## Explore the endpoint

Once the deploymet is complete go to `Endpoints`, select the newly deployed endpoint, and then select `Test`. 

The data should be pre-populated. If for some reason it is not, use this JSON Data
```JSON
[11.0, 97.0, 89.0, 11.0, 23.0, 46.47006691, 1.476670289, 39.0],
[3.0, 108.0, 63.0, 45.0, 297.0, 49.37516891, 0.100979095, 46.0],
[9, 103, 78, 25, 304, 29.58219193, 1.282869847, 43]
```
or `notebooks/test.json` for scoring.

## Data Collection

Now after having scored some samples, head to `Data` within the Studio. You'll now find a few more datasets such as `<endpointname>-inputs` and `<endpointname>-outputs`. Explore the data that is automatically collected.

## Monitoring the Endpoint

Finally go back to the `Endpoint`, scroll to the bottom and select `View metrics`. This will lead you to the `Azure Application Insights` instance that logs your workspace. 

Explore the metrics logged in your `Application Insights`. 

Be aware, some metrics and logs are only collected after you enabled `Diagnostic settings`.

## Monitoring your dataset and model drift

Azure ML recently added the preview of Model and Data monitoring. We can now leverage this powerful feature via `Monitoring` in the AzureML Studio.

Add a new monitor via `+ Add` and follow the wizard. 
- Select the model we trained and the deployment we just created.
- Under `Configure data assets` add the training data we registered in the first notbook.
- Under `Select monitoring signals` make sure to hit edit one of the signal by selecting `Diabetic` as the target column.

Create the monitor, grab another coffee, and chat about your learning.

## (Optional) Monitoring Drift: Bring your own data

If you are keen on exploring **how to bring your own production data** (that is not collected from an online endpoint) now is your time to understand how that works.

Remember, we registered a `diabetes-urifolder-production` dataset in the first notebook. This repository also contains a `preprocess component` (in the `components/preprocess_production_data` directory). 

You can use both to investigate how you could bring your own data, and (if you know how to register components) set up a monitor.

## Next steps

In the next notebook we'll learn more about Data Drift. Continue with the next notebook to **Create Synthentic Data** and then gather inference data that triggers data drift detection.