<a href="https://colab.research.google.com/github/jmalbornoz/MLOps-II/blob/main/12Nov2020_MLOps_II_Laboratory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MLOps II Laboratory

**Pre-requisites:**
1. You need a DataRobot account and API key.  
2. Add your API Key and DataRobot URL to the first cell in the notebook. The API Key is found in the Developer Tools which is located on the profile icon in the DataRobot GUI App.


**Documentation:**

The MLOps Agent tarball includes documentation in the docs folder.




# 1.- Add your API_KEY and the location of the DataRobot instance you are using.  


In [None]:
import yaml
import requests
import re
API_KEY = ""
DR_URL = "https://app.datarobot.com"

The following two shell commands will show you \
a) where we are within the Colab runtime and \
b) what is contained within it.

In [None]:
% pwd

In [None]:
% ls -al

# 2.- Download the MLOps Agent tarball to the local Colab directory.

In [None]:
url = DR_URL + "/api/v2/mlopsInstaller"

headers = {'Authorization': 'Bearer {}'.format(API_KEY)}
response = requests.request("GET", url, headers=headers)
if 'UNAUTHORIZED' in response.reason:
    print('Put your real API key in')
with open("mlops-agents.tar.gz", "wb") as f:
    f.write(response.content)

In [None]:
#Lets grab the filename which has the latest version
d = response.headers['content-disposition']
fname = re.findall("filename=(.+).tar.gz", d)[0]
n = fname.rfind("-")
filename = fname[:n]
filename

In [None]:
% ls

As shown by the output of the previous shell command, we now have the MLOps Agent tarball within the runtime.

# 3.- Untar the MLOPs Agent tarball, and then create a tmp directory to spool the predictions


In [None]:
!tar -xvf /content/mlops-agents.tar.gz

In [None]:
# Here we create the directory where the spool file will be located
%cd $filename
!mkdir -p /tmp/ta
%ls -al

# 4.-  Install the MLOps library.

### The tarball contains a Wheel file that wiil be used to install the MLOps Agent Libraries:  
### **lib/datarobot_mlops-6.2.4-py2.py3-none-any.whl** 


In [None]:
!pip install lib/datarobot_mlops-6.2.4-py2.py3-none-any.whl   ##If you have a newer version of the agent, this could be different filename

# 5.- Edit mlops.agent.conf.yaml

This file contains the properties used in the configuration of the MLOps service.  For this notebook, you will only need to set the DR host and your API token.

In [None]:
with open(r'conf/mlops.agent.conf.yaml') as file:      # read the yaml file as a dictionary
    documents = yaml.load(file)

# Set your DR host:
documents['mlopsUrl'] = DR_URL                         # set the required values in this dictionary
# Set your API token
documents['apiToken'] = API_KEY

with open('conf/mlops.agent.conf.yaml', "w") as f:     # write back the dictionary to the yaml file
    yaml.dump(documents, f)

In this notebook we will use FS_SPOOL as the messaging channel. More sophisticated monitoring will likely use other channels.

channelConfigs:
   - type: “FS_SPOOL”
     details: {name: “bench”, spoolDirectoryPath: “/tmp/ta”}
   - type: “SQS_SPOOL”
     details: {name: “sqsSpool”, queueUrl: “https://SQS_URL”}
   - type: “PUBSUB_SPOOL”
     details: {name: “pubsubSpool”, projectId: “yourprojectId”, topicName: “yourtopicName”}
   - type: “RABBITMQ_SPOOL”
     details: {name: “rabbit”, queueName: “rabbitmq”, queueUrl: “https://SQS_URL”}

Verify the changes in the mlops.agent.conf.yaml.  You should see the correct MLOps URL and API token.


In [None]:
print(open('conf/mlops.agent.conf.yaml').read())

# 6.- Start the agent and get its status

The following shell commands are required to \
a) start the MLOps Agent service. \
b) get the status of the MLOps Agent service. 

In [None]:
# Start the agent
!./bin/___

In [None]:
# Get agent status
!./bin/___

# 7.- Load sample data and split it into training and testing sets 

* The training data is the exact same one used to train the model pipeline that will be used in this laboratory \
* The test data will play the role of the scoring data 

In [None]:
import pandas as pd
import numpy as np
import time
import csv
import pytz
import json
import yaml
import datetime
import joblib

TRAINING_DATA = './examples/data/surgical-dataset.csv'

df = pd.read_csv(TRAINING_DATA)

columns = list(df.columns)
arr = df.to_numpy()

np.random.shuffle(arr)

split_ratio = 0.8
prediction_threshold = 0.5

train_data_len = int(arr.shape[0] * split_ratio)

train_data = arr[:train_data_len, :-1]
label = arr[:train_data_len, -1]
test_data = arr[train_data_len:, :-1]
test_df = df[train_data_len:]

# 8.- Create and deploy an external model package


First we will inspect the JSON file that contains the model configuration for this example. This is necessary as we will create an external model package via code.

In [None]:
print(open('./examples/model_config/surgical_binary_classification.json').read())

The name specified in this JSON file is the name that the external model package will have in the  MLOps GUI

We will now \
a) create & deploy the external model package. \
b) upload training data to MLOPs in order to monitor data drift.

In [None]:
# Here we set the name of the deployment that will be seen in the MLOps GUI
DEPLOYMENT_NAME="Google Collab MLOps II Lab - Python binary classification"

# MLOps Agent Library imports
from datarobot.mlops.mlops import MLOps
from datarobot.mlops.common.enums import OutputType
from datarobot.mlops.connected.client import MLOpsClient
from datarobot.mlops.common.exception import DRConnectedException
from datarobot.mlops.constants import Constants

# Read the model configuration info from the JSON file. This is used to create the remote model package.
with open('./examples/model_config/surgical_binary_classification.json', "r") as f:
    model_info = json.loads(f.read())
model_info

# Read the MLOps connection info from the YAML file
with open('conf/mlops.agent.conf.yaml') as file:
    # The FullLoader parameter handles the conversion from YAML
    # scalar values to Python dictionary format
    agent_yaml_dict = yaml.load(file, Loader=yaml.Loader) ##CHANGE##

# Here are the values of the API key and the MLOps URL
MLOPS_URL = agent_yaml_dict['mlopsUrl']
API_TOKEN = agent_yaml_dict['apiToken']

# Create connected client object
mlops_connected_client = ___(___,___)

# Upload training data to MLOps
print("Uploading training data - {}. This may take some time...".format(TRAINING_DATA))
dataset_id = mlops_connected_client.___(___)    # 

print("Training dataset uploaded. Catalog ID {}.".format(dataset_id))
model_info["datasets"] = {"trainingDataCatalogId": dataset_id}

# Create the model package
print('Create model package')
model_pkg_id = mlops_connected_client.___(___)

# Get model package info 
model_pkg = mlops_connected_client.___(___)

# get model id from model package info
model_id = model_pkg["modelId"]

# Deploy the model package & get deployment id
print('Deploy model package')
deployment_id = mlops_connected_client.___(___,___)

# Enable data drift tracking
print('Enable feature drift monitoring')
enable_feature_drift = TRAINING_DATA is not None
mlops_connected_client.___(___,___,___)

# Get deployment settings
_ = mlops_connected_client.___(___)

print("\nDone.")
print("DEPLOYMENT_ID=%s, MODEL_ID=%s" % (deployment_id, model_id))

DEPLOYMENT_ID = deployment_id
MODEL_ID = model_id

# 9.- Upload a pickle file with a pre-trained model pipeline to Google Colab

We will load a pickle file named "pipeline.pkl" (found in the zip file that contains the class material). Navigate to the folder where the class material is and select "pipeline.pkl"

In [None]:
from google.colab import files
uploaded = files.upload()

# 10.- Run Model Predictions

We call the remote model's predict function and send prediction data to MLOps. Note that the model is supplied using the pickle file uploaded in the previous step.

In [None]:
#
# Code samples can be found in:
# 1. under the Integration tab for your depoyment in DataRobot MLOps, or in
# 2. the agent example code on your filesystem in ./examples/python/ and ./tools/
#    This example is from BinaryClassificationExample
#

CLASS_NAMES = ['0', "1"]
OUTPUT_TYPE = OutputType.OUTPUT_DIR

# Spool directory path must match the Monitoring Agent path configured by admin in the YAML configuration file.
SPOOL_DIR = "/tmp/ta"
SPOOL_MAX_FILE_SIZE = 104857600
SPOOL_MAX_FILES = 5

# name of the file that contains actuals
ACTUALS_OUTPUT_FILE = "actuals.csv"
            
def process_predictions(deployment_id, model_id, output_type, spool_dir, spool_max_file_size, spool_max_files, class_names):
    """
    This is a binary classification algorithm example.
    User can call the DataRobot MLOps library functions to report statistics.
    """
    # load pickle file with model pipeline
    _model = joblib.load(filename="pipeline.pkl")

    # Get predictions
    start_time = time.time()
    predictions = _model.predict_proba(test_data).tolist()
    num_predictions = len(predictions)
    end_time = time.time()
    
    # Generate assocation ids for the predictions so we can match them with actuals
    # this is necessary for accuracy monitoring
    def _generate_unique_association_ids(num_samples):
        ts = time.time()
        return ["x_{}_{}".format(ts, i) for i in range(num_samples)]
    association_ids = _generate_unique_association_ids(len(test_data))

    # MLOPS: initialize the MLOps instance
    # These are thre stpes that we need in order to initializa a MLOps Agent instance
    # The necessary parameters are:
    #    * deployment ID
    #    * model id
    #    * output type
    #    * spool directory
    #    * maximum spool file size
    #    * maximum number of spool files
    print("Get an MLOps instance")
    mlops = MLOps().___(___)\
                   .___(___)\
                   .___(___)\
                   .___(___)\
                   .___(___)\
                   .___(___)\
                   .___()

    # MLOPS: report the number of predictions in the request and the execution time.
    print("Send MLOps deployment stats")
    mlops.___(___,___)

    # MLOPS: report the predictions data: features, predictions, class_names
    print("Send MLOps prediction data")
    mlops.___(___,____,____,____)
    
    target_column_name = columns[len(columns) - 1]
    target_values = []
    orig_labels = test_df[target_column_name].tolist()
    
    print("Wrote actuals file: %s" % ACTUALS_OUTPUT_FILE)
    def write_actuals_file(out_filename, test_data_labels, association_ids):
        """
         Generate a CSV file with the association ids and labels, this example
         uses a dataset that has labels already.
         In a real use case actuals (labels) will show after prediction is done.

        :param out_filename:      name of csv file
        :param test_data_labels:  actual values (labels)
        :param association_ids:   association id list used for predictions
        """
        with open(out_filename, mode="w") as actuals_csv_file:
            writer = csv.writer(actuals_csv_file, delimiter=",")
            writer.writerow(
                [
                    Constants.ACTUALS_ASSOCIATION_ID_KEY,
                    Constants.ACTUALS_VALUE_KEY,
                    Constants.ACTUALS_TIMESTAMP_KEY
                ]
            )
            tz = pytz.timezone("America/Los_Angeles")
            for (association_id, label) in zip(association_ids, test_data_labels):
                actual_timestamp = datetime.datetime.now().replace(tzinfo=tz).isoformat()
                writer.writerow([association_id, "1" if label else "0", actual_timestamp])

        
    # Write csv file with labels and association Id, when output file is provided
    write_actuals_file(ACTUALS_OUTPUT_FILE, orig_labels, association_ids)

    # MLOPS: release MLOps resources when finished.
    mlops.___()

    print("Done4.")

process_predictions(DEPLOYMENT_ID, MODEL_ID, OUTPUT_TYPE, SPOOL_DIR, SPOOL_MAX_FILE_SIZE, SPOOL_MAX_FILES, CLASS_NAMES)

# 11.- Upload actuals back to MLOps

In [None]:
def _get_correct_actual_value(deployment_type, value):
    if deployment_type == "Regression":
        return float(value)
    return str(value)

def _get_correct_flag_value(value_str):
    if value_str == "True":
        return True
    return False
    
def upload_actuals():
    print("Connect MLOps client")           # create connected client object
    mlops_connected_client = ___(___,___)

    # get deployment type
    deployment_type = mlops_connected_client.___(___)

    # read actuals file
    actuals = []
    with open(ACTUALS_OUTPUT_FILE, mode="r") as actuals_csv_file:
        reader = csv.DictReader(actuals_csv_file)
        for row in reader:
            actual = {}
            for key, value in row.items():
                if key == Constants.ACTUALS_WAS_ACTED_ON_KEY:
                    value = _get_correct_flag_value(value)
                if key == Constants.ACTUALS_VALUE_KEY:
                    value = _get_correct_actual_value(deployment_type, value)
                actual[key] = value
            actuals.append(actual)

            # actuals are submitted if there are 10000 of them
            if len(actuals) == 10000:
                mlops_connected_client.___(___,___)
                actuals = []

    # Upload actuals to MLOps
    print("Submit actuals")
    mlops_connected_client.___(___, ___)
    
    print("Done.")    

upload_actuals()

# 12.- Stop the mlops service

In [None]:
!bin/stop-agent.sh

# 13.- Inspect the MLOps agent logs

In [None]:
cat /content/datarobot-mlops-agent-6.2.4/logs/mlops.agent.log