# Deployment and Scoring

Datarobot does not apply "thresholds" to multiclass predictions. Rather, the "prediction" is simply the class with the highest probablity. In some use cases, this can be problematic. Customers may want systems that withhold prediction of a class unless the probablity exceeds some threshold. This can be supported using a "sidecar" external deployment for monitoring. The end result is two deployments:

1. The original DataRobot model: This will be used for scoring. 
2. An "external" deployment: Which will be used for monitoring predictions with thresholds applied. 

This example uses the [datarobot-mlops-connected-client](https://pypi.org/project/datarobot-mlops-connected-client/8.2.14/) library which is available from Pypi but also can be found in the MLOps package under Developer Tools in the DataRobot application. 

For this example, set environment variables for DATAROBOT_API_TOKEN, DATAROBOT_ENDPOINT. The MLOps client will read these values from environment variables making authentication easier. 
## Deploy the DataRobot Model

In [1]:
import datarobot as dr
TRAINING_DATASET_ID = "6759ceeae340977868ae9eda" # FROM PREVIOUS NOTEBOOKS
PROJECT_ID = "6759e044304dbe533a7fe6a8" # FROM PREVIOUS NOTEBOOKS 

best_model = dr.Project.get(PROJECT_ID).get_models()[0]

print(f'''Deploying M{best_model.model_number}: {best_model.blueprint.model_type}''')

Deploying M4: Keras Slim Residual Neural Network Classifier using Training Schedule (1 Layer: 64 Units)


  print(f'''Deploying M{best_model.model_number}: {best_model.blueprint.model_type}''')


In [4]:
original_model_deployment = dr.Deployment.create_from_learning_model(best_model.id, label="Multiclass Example",
                                                      default_prediction_server_id="67521300fe4b98000d28270f" # not needed for bare metal installs
                                                     )

## Create the External Deployment 


An external deployment is a way to report metrics about models even though the models are being run outside of DataRobot. In this case, we can leverage an external deployment to track metrics for our predicitions after we apply thresholds and modify the output of the original model. 



In [10]:
from datarobot.mlops.connected.client import MLOpsClient
from datarobot.mlops.constants import Constants
import os


# Note the endpoint should not have 'api/v2' that will be derived automatically
mlops_client = MLOpsClient(service_url=os.getenv("DATAROBOT_ENDPOINT").replace("api/v2",""), 
                           api_key=os.getenv("DATAROBOT_API_TOKEN")
                          )
multi_class_params = {
                "name": "Multiclass External Threshold Monitoring",
                "modelDescription": {
                    "description": "Applies thresholding to the prediction",
                    "location": "/tmp/myModel"
                },
                "target": {
                    "type": "Multiclass",
                    "name": "main_class",
                    "classNames": [
                         "DR_OTHER", 
                         "Paper",
                         "Glass",
                         "Food Organics",
                         "Metal",
                         "Plastic"
                        ]
                }
            }
mlpkg = mlops_client.create_model_package(multi_class_params)

In [12]:
deployment = mlops_client.deploy_model_package(model_package_id=mlpkg, label="Multiclass External Monitoring", wait_for_result=True, timeout=600)

For data drift, we need to associate some sort of known baseline to the deployment. For now, I am just going to use the training data but that may not work for all use cases. Remember to just pick a dataset with the same features and target AND that represents a baseline that you want be drifitng away from. 

To associate a baseline/training dataset, it must contain at least one row for each class that will appear in the predictions data. 

**A Note about Versions**
Associating data with `associate_deployment_dataset` is deprecated in DataRobot 9.x and above. Instead, training data is associated to registered models which were introduced in 9.0. The code below will error in DataRobot >9.0 but will work for 8.x. 

In [25]:
import datarobot as dr 
import pandas as pd 
import tempfile 

with tempfile.NamedTemporaryFile() as tmpfile:
    
    df = dr.Dataset.get("675c98d9f2e4b4189bff25f5").get_file(tmpfile.name)
    tmpfile.seek(0)
    rdf = pd.read_csv(tmpfile)

classes_in_monitoring = ["Paper",
                         "Glass",
                         "Food Organics",
                         "Metal",
                         "Plastic"
                        ]
rdf['main_class'] = rdf['main_class'].apply(lambda class_value: class_value if class_value in classes_in_monitoring else "DR_OTHER")

new_baseline_dataset = dr.Dataset.create_from_in_memory_data(rdf)

In [23]:
from datarobot.mlops.connected.enums import DatasetSourceType
mlops_client.associate_deployment_dataset(deployment_id=deployment, 
                                          dataset_id=new_baseline_dataset.id,
                                          data_source_type = DatasetSourceType.TRAINING) 
                                          

DRMLOpsConnectedException: (('Associating training data with deployments is not allowed. Instead associate training data with the model package.',), {})

Now we will enable tracking features on the external deployment.

In [64]:
mlops_client.update_deployment_settings(target_drift=True, feature_drift=True, deployment_id=deployment)

## Scoring 
For scoring, we will use the original model but then apply our thresholding and then report results to the final model. We are taking a sample from one specific class in order to simulate drift. 

In [41]:
sample_to_score = rdf[rdf.main_class == 'Metal'].sample(50)


In [42]:
original_model_deployment = dr.Deployment.get("6759e8aebd38a7fca6ba234a")

In [50]:
from time import time
start_time = time()

_ , predictions = dr.BatchPredictionJob.score_pandas(original_model_deployment, df=sample_to_score)
end_time = time()
predictions.head()

Streaming DataFrame as CSV data to DataRobot
Created Batch Prediction job ID 675c9fa18ca34350edaf3cc0
Waiting for DataRobot to start processing
Job has started processing at DataRobot. Streaming results.


Unnamed: 0,main_class,image,main_class_DR_OTHER_PREDICTION,main_class_Food Organics_PREDICTION,main_class_Glass_PREDICTION,main_class_Metal_PREDICTION,main_class_Paper_PREDICTION,main_class_Plastic_PREDICTION,main_class_PREDICTION,DEPLOYMENT_APPROVAL_STATUS
150,Metal,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBA...,0.002047,0.000518,0.001296,0.992474,0.001351,0.002314,Metal,APPROVED
169,Metal,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBA...,0.001926,0.000113,0.000936,0.972959,0.021576,0.00249,Metal,APPROVED
171,Metal,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBA...,0.001013,0.000128,0.001456,0.995682,0.000295,0.001426,Metal,APPROVED
159,Metal,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBA...,0.00386,0.000425,0.000548,0.982924,0.006876,0.005367,Metal,APPROVED
199,Metal,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBA...,0.009969,0.000901,0.003317,0.975838,0.002484,0.00749,Metal,APPROVED


In [45]:
predictions.dtypes

main_class                              object
image                                   object
main_class_DR_OTHER_PREDICTION         float64
main_class_Food Organics_PREDICTION    float64
main_class_Glass_PREDICTION            float64
main_class_Metal_PREDICTION            float64
main_class_Paper_PREDICTION            float64
main_class_Plastic_PREDICTION          float64
main_class_PREDICTION                   object
DEPLOYMENT_APPROVAL_STATUS              object
dtype: object

In [62]:
THRESHOLD = 0.75
TARGET_NAME = "main_class"

# class probabilities will be the name of the target followed by an underscore
class_columns = [col for col in predictions.columns if (col.startswith('main_class_') and col != "main_class_PREDICTION") ]
class_names = [col.replace(TARGET_NAME + "_", "").replace("_PREDICTION", "") for col in class_columns]

predictions['threshold_prediction'] = predictions[class_columns].apply(
    lambda row: next(
        (col for col, val in row.items() if val > THRESHOLD), 
        'DR_OTHER'
    ),
    axis=1
)

predictions['threshold_prediction'].value_counts()

main_class_Metal_PREDICTION    47
DR_OTHER                        3
Name: threshold_prediction, dtype: int64

### Report to DataRobot 

The format for this call (which reports the data) can be a little tricky:

1. `data`: Pass in the prediction dataframe
2. `deployment_id`: the external deployment you are using for monitoring.
3. `model_id`: the model package id created above.
4. `association_ids`: this is a unique identifier for each prediction. This is a unique identifier for the prediciton that will be used to report actuals. More information on this is [in the docs](https://docs.datarobot.com/en/docs/get-started/glossary/index.html#association-id)
5. `prediction_cols:` a listing of columns in the data that have the probablities of each class.
6. `class_names`: the names of each class in the same order as the columns.



In [63]:
# first report that the predictions occured



await mlops_client.report_deployment_stats(
deployment_id=deployment, 
model_id=mlpkg, 
num_predictions=len(predictions),
execution_time_ms=end_time - start_time
)


# then report the predictions data
await mlops_client.report_prediction_data(
    data=predictions, 
    deployment_id=deployment, 
model_id=mlpkg, 
association_ids=[str(r) for r in predictions.index.tolist()], # should be unique identifier for the prediction used for acutals, must be string
prediction_cols=class_columns,
class_names=class_names
    # there are also inputs for feature data as well. 
)


({'message': 'ok'}, 848856)