# Building, Testing, and Deploying a Custom Model

This notebook walks through the general workflow for building, testing, and deploying a custom inference model on a custom environment. 

## Agenda
In this tutorial, we'll learn:
<br/>
1. How to use the client to create an environment<br/>
2. How to check the status of an environment build<br/>
3. How to create a custom model<br/>
4. How to iteratively test and debug a custom model on a custom environment<br/>
5. How to deploy and run predictions on a tested custom model.<br/>

## Setup and Requirements
This tutorial assumes a few things about your filepath and prior work. 

**Firstly, you need two different feature flags enabled:**
1. Enable Custom Inference Models
2. Enable Experimental API Access

Secondly, you should have a folder at the path `~/custom-model-templates/`. If you put the folder in a different location, make sure you update the `TESTING_PATH` variable. This folder should contain 4 things:
<br/>

1. A folder containing your properly configured custom environment.     
    In this example, it's named `custom_environment_templates/python_3/py3_sklearn_base/`
    
    
2. A folder containing your properly-configured custom model.     
    In this example, it's named `custom_model_templates/python_model/`
    
    
3. The current version of the custom model API client, found at the following link: [API client](https://github.com/datarobot/py-dse). 
    - The client can also be installed via pip using DataRobot's artifactory. [DR Artifactory tutorial](https://datarobot.atlassian.net/wiki/spaces/DEVINFRA/pages/620265476/Artifactory+User+Guide).
    - Full documentation for the client can be found here: [DSE API Client Docs](http://py-dse.docs.hq.datarobot.com/index.html)


4. A test dataset that you can use to test predictions from your custom model.     
    In this example, it's stored at `custom_model_templates/python_model/extras/training_data/10k_diabetes_no_null_text.csv`

It also assumes that you have access to staging.datarobot.com, which means that **you must have your vpn on**.

## Configuring Models and Environments
For more information on how to properly configure custom models and environments, read the README of our [custom model templates repository](https://github.com/datarobot/custom-model-templates).


## Setup Environment
First we setup a python environment to contain the correct libraries and versions. Follow these steps:

- Go to the location: 
```
cd ~/Documents/GIT-Projects/data-science-scripts/eujinlok/MLOps-Demo-Certification
```
- Initialise environment: 
```
python3 -m venv myenv
```
- Activate environment:
```
source myenv/bin/activate
```

- Run jupyter notebook by typing on the terminal:
```
jupyter notebook
```



In [1]:
from platform import python_version
print(python_version())

import os 
os.getcwd()

3.7.0


'/Users/eu.jin.lok/Documents/GIT-Projects/DataRobot_Workflows/MLOps_certification/development'

## Imports
First, we need to make the proper imports. Make sure the `TESTING_PATH` is correct and pointing to the right folder:

In [2]:
# NOTE: Install these libraries via: pip install -r requirements.txt
# Highly recommend to install this before installing the datarobot-DSE client
# Also install any datarobot-DSE dependencies outside of Artifactory or it will take hours to install
%load_ext autoreload
%autoreload 2
import sys
import os
import requests
import logging
from io import open
from pprint import pprint
import json

This is where you save the `TESTING_PATH` that contains the relevant folders.

In [3]:
# Save the path to the custom model testing folder, and add it to the PYTHONPATH so we can import the client
# NOTE: Installing datarobot DSE client needs to go via Artifactory and thus activating the VPN. 
# Remember to switch on/off the Artificatory pip install path on the .pip folder
TESTING_PATH = os.getcwd() + '/'
sys.path.append(TESTING_PATH)

from datarobot.dse import DSEClient 
from datarobot.dse.enums import CustomModelType, DeploymentType, DatasetCategory

##  Configuring User Credentials
Make sure to fill in your username and API token from staging.datarobot.com. 

Also ensure that all the paths are correct!

In [4]:
## Save user credentials ##
with open('/Users/eu.jin.lok/Documents/GIT-Projects/creds.json') as f:
    param = json.load(f)

TOKEN = param['token']
USERNAME = param['username']

## Save path to environment ##
environment_folder = TESTING_PATH + 'Custom-Environment/python_3/py3_sklearn_base/'

## Save path to custom model ##
custom_model_folder = TESTING_PATH + 'Custom-Model/python_model/'

## Save test dataset path ##
test_dataset = TESTING_PATH + 'Custom-Model/python_model/extras/training_data/10k_diabetes_no_null_text.csv'

## Loading the API clients
This saves the staging API clients for custom models testing and predictions. **You shouldn't need to change anything in this block if you configured your credentials properly!**

In [5]:
# Configure client
client = DSEClient(
    base_url='https://app.datarobot.com/api/v2',
    username=USERNAME,
    token=TOKEN,
)

## Creating a Custom Environment
This command creates a custom environment! When you run the command, it uploads your Docker context and we attempt to build the Docker Image (the container that your model will eventually run in). 
 
Depending on the environment and the libraries you want to download, this process can take a while (10-30 minutes)! This command sets the wait time to 30 minutes, but if it fails with a RunTimeError, it's possible that the environment is still processing and could still succeed.

### Custom Environment Templates
We have a repository for custom environment templates here: [environment templates](https://github.com/datarobot/custom-model-templates/tree/master/custom_environment_templates)

You'll find templates for Python 2 and Python 3 environments. Over time, we will continue to add more templates to this repository for more languages: R, Scala, and others.

In [0]:
## Create the environment, which will eventually contain versions  ##
execution_environment = client.execution_environments.create(
    name="Python3 Sklearn Environment",
    description="This environment contains a set of Python3 libraries, including pandas, sklearn, and keras.",
)

## Create the environment version ##
environment_version = execution_environment.versions.sync_create(
    environment_path=environment_folder,
    timeout=3600,  # 1 hour timeout
)



## Creating a Custom Model
Once the Custom Environment is successfully built, now it's time to build the Custom Model. You will need to define details about your custom model in this command, depending on the type of model.

### Required fields:
`model_path` : string containing the path to the model folder

`name` : string that defines the name of the model

`target_name` : string that defines the name of the target column that the model was trained on

`supports_binary_classification` : boolean that describes the target type. True if the model is trained for Binary Classification

`supports_regression` : boolean that describes the target type. True if the model is trained for Regression. 
Only one of these labels should be set to True.

`positive_class_label` : string that defines the "positive class". Only required for Binary Classification models

`negative_class_label` : string that defines the "negative class". Only required for Binary Classification models

### Optional Fields:
`prediction_threshold` : a float that defines the prediction threshold for binary classification. This value is used for features and charts in MMM.

`description` : a string that describe the model. User can input whatever they want for the description.

`language` : a string that details the language the model uses. User can input whatever they want for the language.

In [0]:
## Create the custom model ##
custom_model = client.custom_models.create(
    model_path=custom_model_folder,
    custom_model_type=CustomModelType.INFERENCE,
    name='Python 3 Sklearn Custom Model',
    supports_binary_classification=True,
    prediction_threshold=0.5,
    target_name='readmitted',
    positive_class_label='Yes',
    negative_class_label='No',
    description='This is a Python3-based custom model. It has a simple sklearn Pipeline built on 10k diabetes',
    language='Python 3'
)

## The Model Testing Workflow
Just because you created an environment and a model doesn't mean that it will actually work in production! There are all sorts of things that can go wrong, whether on the engineering side or the data science side. Bad code, an environment with the wrong versions of libraries, or even a model that can't handle missing values in the inference data can all lead to a model that will break in production.

With this in mind, we created an easy way to ensure that a custom inference model will work in production: You can actually test your model with a specific environment using sample inference data before deploying the model. 

### Model Testing Step 1: Save ids
To facilitate quicker and easier testing, save the environment and model ids and version ids.

In [0]:
# Ensure you've saved the environment and model version
model_version = custom_model.latest_version

print(model_version)
# CustomModelVersion(description='', created='2019-11-25T22:31:40.673776Z', file_name='model.tgz', meta={}, label='v1', id='245d4e2a2408858625024fb3248a2a372245cf3a')

print("\n")
print(environment_version)
# ExecutionEnvironmentVersion(environment_id='5ddc55ee7fe7aa3d43b6cc0e', description='', created='2019-11-25T22:30:07.033503Z', label='Version 1', id='5ddc55efe6858f065c10534c', build_status='success')


### Step 2: Run the Test
To run a custom model test, you upload and save a test dataset from the sample inference data. Then, you simply select the appropriate model and environment (as well as version) IDs, and test it on that dataset.

Depending on the k8s cluster and the model itself, it may take a few minutes to test the model. Once the test is finished, it will have a status property to let you know whether the test passed. If it failed, it will contain an `error` property that contains the relevant error!

An important note: As of right now, the only available test is an error check, where we simply ensure the model can return predictions. In the future, we will add more tests to that suite: prediction consistency, missing value handling, and more.

In [0]:
dataset = client.datasets.sync_create(
    dataset_path=test_dataset,
    categories=DatasetCategory.CUSTOM_MODEL_TESTING,
)


In [0]:
# Perform custom model test
custom_model_test = client.custom_model_tests.sync_create(
    timeout=3600,  # 1 hour timeout
    custom_model_id=custom_model.id, 
    custom_model_version_id=model_version.id,
    environment_id=execution_environment.id, 
    environment_version_id=environment_version.id,
    dataset_id=dataset.dataset_id,
)

In [0]:
print("Test statuses: {}".format(custom_model_test.testing_status))
# Test statuses: {'errorCheck': {'status': 'succeeded', 'message': ''}, 'sideEffects': {'status': 'not_tested', 'message': ''}, 'longRunningService': {'status': 'succeeded', 'message': ''}}

if any(test['status'] == 'failed' for test in custom_model_test.testing_status.values()):
    print('Test log:\n')
    print(custom_model_test.log)
else:
    print('Testing succeeded!')
# Testing succeeded!

### Step 3: Iterate
If the test passed, then congratulations! You can skip this test; your model is ready to be deployed. If it failed the test however, it's easy to iterate. 

First, check the error from the custom model test. Then, fix any errors in the code that you uploaded. Finally, upload a new version of the model using the updated code, and test it again!

In [0]:
# Add new version of custom model. Repeat these last two blocks until the model passes testing!
model_version = custom_model.versions.create(model_path=custom_model_folder,
                                             description='Fixing errors from testing')


In [0]:
# Perform custom model test... again
custom_model_test = client.custom_model_tests.sync_create(
    timeout=3600,  # 1 hour timeout
    custom_model_id=custom_model.id, 
    custom_model_version_id=model_version.id,
    environment_id=execution_environment.id, 
    environment_version_id=environment_version.id,
    dataset_id=dataset.dataset_id,
)

print("Test statuses: {}".format(custom_model_test.testing_status))

if any(test['status'] == 'failed' for test in custom_model_test.testing_status.values()):
    print('Test log:\n')
    print(custom_model_test.log)

else:
    print('Testing succeeded!')

In [0]:
# This command shows all tests that have been run on the model
model_tests = client.custom_model_tests.list(custom_model_id=custom_model.id)
print(model_tests)
# RESTList(objects=[CustomModelTest(custom_model_image_id='5ddc57207fe7aa3d70b6cc06', custom_model={'id': '5ddc564c3bcc11065acc148c', 'name': 'Python 3 Sklearn Custom Model'}, execution_environment={'id': '5ddc55ee7fe7aa3d43b6cc0e', 'name': 'Python3 Sklearn Environment'}, testing_status={'errorCheck': {'status': 'succeeded', 'message': ''}, 'longRunningService': {'status': 'succeeded', 'message': ''}, 'sideEffects': {'status': 'not_tested', 'message': ''}}, created='2019-11-25T22:35:12.771178Z', custom_model_version={'id': '245d4e2a2408858625024fb3248a2a372245cf3a', 'label': 'v1'}, dataset_version_id='5ddc56dde6858f057010547d', execution_environment_version={'id': '5ddc55efe6858f065c10534c', 'label': 'Version 1'}, created_by='eu.jin.lok@datarobot.com', completed_at='2019-11-25T22:36:14.659379Z', id='5ddc57207fe7aa3d70b6cc12', dataset_id='5ddc56dce6858f057010547c')], next_page_link=None, previous_page_link=None, count=1, total=None)

## Deploying the model
To deploy an inference model, you create something called a `custom_model_image`, which saves the custom model code with a _specific_ environment. This will make it easy to see which custom models have been tested or deployed on specific environments.

Once you have the desired custom model image, simply call the `client.deployments.sync_create()` method, inputting the model image's id, the prediction server's `instance_id`, the inference model deployment type, and the desired deployment label.

In [0]:
# Ensure that the client is using the correct prediction server to deploy the model. 
# This uses the prediction server for testing on staging.

available_prediction_server_urls = [
    "https://cfds-ccm-prod.orm.datarobot.com",
]

for prediction_server in client.prediction_servers:
    if prediction_server.url in available_prediction_server_urls:
        instance_id = prediction_server.id
        break
else:
    raise Exception("no suitable prediction server found")


In [0]:
custom_model_image = client.custom_model_images.create(
    custom_model_id=custom_model.id,
    custom_model_version_id=model_version.id,
    environment_id=execution_environment.id,
    environment_version_id=environment_version.id,
)


deployment = client.deployments.sync_create(
    timeout=3600,  # 1 hour timeout
    model_id=custom_model_image.id,
    # instance id is only required for Cloud DataRobot App
    # ignore for on-premises Platform installations.
    instance_id=instance_id,
    deployment_type=DeploymentType.CUSTOM_INFERENCE_MODEL,
    label='Test client deployment',
)


### Making predictions on a deployed custom inference model
Predictions look exactly the same for a custom inference model and a native DR model. If training data was assigned to the model, then we can also provide predictions explanations and all MMM features, deeply integrated with the custom model.

In [0]:
# Make predictions on the custom model deployment
predictions = deployment.predict(test_dataset)
pprint(predictions)