## DataRobot Python API Client Mini Demo

<pre>raul.arrabales@datarobot.com</pre>

<hr>

- See https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.1/setup/getting_started.html#installation

Additionally: 

- API Client Documentation: https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.26.0/ 
- API Reference: https://datarobot-public-api-client.readthedocs-hosted.com/en/v2.25.1/autodoc/api_reference.html

In [2]:
# Set interactive shell in Jupyter
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

### 1.- Install de Python API client with pip (to be used to access the managed cloud DR platform)

In [1]:
! pip install datarobot

Collecting datarobot
  Downloading datarobot-2.25.1-py3-none-any.whl (471 kB)
Collecting trafaret!=1.1.0,<2.0,>=0.7
  Downloading trafaret-1.2.0-py3-none-any.whl (27 kB)
Collecting requests-toolbelt>=0.6
  Downloading requests_toolbelt-0.9.1-py2.py3-none-any.whl (54 kB)
Installing collected packages: trafaret, requests-toolbelt, datarobot
Successfully installed datarobot-2.25.1 requests-toolbelt-0.9.1 trafaret-1.2.0


You should consider upgrading via the 'c:\users\array\anaconda3\python.exe -m pip install --upgrade pip' command.


### 2.- Create the API Key in DR (https://app.datarobot.com/account/developer-tools)

Now, you have a key looking like this: "NjE0OTk1 [...]"

I will use the end point of the US Managed IA Cloud: https://app.datarobot.com/api/v2 

Both, the API Key and the end point are stored in the same dir a the file **drconfig.yaml**

### 3.- Create and Configure the client object

In [8]:
from datetime import date
import pandas as pd

import datarobot as dr

In [4]:
# Create and configure the client
dr.Client(config_path = 'drconfig.yaml')

<datarobot.rest.RESTClientObject at 0x1ebe584f0b8>

### 4.- Grabbing a sample dataset

The wine data comes from the University of California, Irvine Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/wine+quality.

Citation: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

In [30]:
train_DS_Path = "D:\Dropbox-Array2001\Dropbox\DataSets\DataRobot\WineQualityDataSet\winequality-white-training.csv" 
score_DS_Path = "D:\Dropbox-Array2001\Dropbox\DataSets\DataRobot\WineQualityDataSet\winequality-white-score.csv"
preds_DS_Path = "D:\Dropbox-Array2001\Dropbox\DataSets\DataRobot\WineQualityDataSet\winequality-white-predictions.csv"

In [11]:
trainingData = pd.read_csv(train_DS_Path)
trainingData.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


### 5.- Start a Project

- Load Dataset
- Perform EDA1

In [9]:
projectName = 'Python wine quality ' + date.today().strftime(format = "%Y-%m-%d")
projectName

'Python wine quality 2021-09-21'

In [12]:
project = dr.Project.create (
    sourcedata = trainingData,
    project_name = projectName
)

print(project.id, project.project_name)

# Check https://app.datarobot.com/manage-projects to see this project in the GUI

6149a3419a7958e3e48b9e0d Python wine quality 2021-09-21


In [18]:
# Check project feats:
feats = project.get_features()
feats

[Feature(alcohol),
 Feature(chlorides),
 Feature(citric acid),
 Feature(density),
 Feature(fixed acidity),
 Feature(free sulfur dioxide),
 Feature(pH),
 Feature(quality),
 Feature(residual sugar),
 Feature(sulphates),
 Feature(total sulfur dioxide),
 Feature(volatile acidity)]

### 6.- Build Models

- Run DataRobot Autopilot
- Perform EDA2
- Build initial set of models.

In [21]:
# Set the target vble (y) to quality:
project.set_target(target = 'quality')

Project(Python wine quality 2021-09-21)

In [22]:
# Wait for Autopilot completion (you can check the GUI)
project.wait_for_autopilot()

In progress: 2, queued: 7 (waited: 0s)
In progress: 2, queued: 7 (waited: 2s)
In progress: 2, queued: 7 (waited: 3s)
In progress: 2, queued: 7 (waited: 5s)
In progress: 2, queued: 7 (waited: 7s)
In progress: 1, queued: 7 (waited: 9s)
In progress: 2, queued: 6 (waited: 14s)
In progress: 7, queued: 0 (waited: 22s)
In progress: 7, queued: 0 (waited: 36s)
In progress: 1, queued: 0 (waited: 57s)
In progress: 1, queued: 0 (waited: 78s)
In progress: 4, queued: 0 (waited: 99s)
In progress: 4, queued: 0 (waited: 121s)
In progress: 10, queued: 6 (waited: 142s)
In progress: 10, queued: 6 (waited: 164s)
In progress: 10, queued: 0 (waited: 186s)
In progress: 3, queued: 0 (waited: 207s)
In progress: 0, queued: 0 (waited: 228s)
In progress: 0, queued: 0 (waited: 249s)
In progress: 0, queued: 0 (waited: 270s)
In progress: 0, queued: 0 (waited: 291s)
In progress: 0, queued: 0 (waited: 312s)
In progress: 5, queued: 0 (waited: 333s)
In progress: 5, queued: 0 (waited: 354s)
In progress: 0, queued: 0 (wait

### 7.- View and Deploy Models

- Check projects. 
- Check models in a project. 
- Get the recommended model for deployment. 
- Deploy the model

In [23]:
# Check all projects
for p in dr.Project.list():
    print (p.id, p.project_name)

6149a3419a7958e3e48b9e0d Python wine quality 2021-09-21
6148b476f5322a919051135e DRU_Lab_Eval_Regression
6142f779b37d92fd04d16c45 DRU_AutoML_DataScientist
614207c0beee4a728d4c00ee DRU_AutoML_CitizenDS
5fff1dc1c50bef9d7aa46c4b Utah Housing Listings Demo
5fdb79f2597db9c2478cfcd1 AutoML 1
5fcfdf1cb39bd5561be4d7a7 Demo: Readmission AFD (Main, No ICD10)
5fce704970ebf7dd7de4d7f5 HR Hiring (Bias & Fairness)


In [24]:
# I already have mine in vble project: 
print(project.project_name)

# If I hadn't:
# projectId = 'your-project-id'
# project = dr.Project.get(projectId)
# print(project)

Python wine quality 2021-09-21


In [25]:
# Get the list of models built by Autopilot:
models = project.get_models()
for m in models:
    print(m.id,m.model_type)

6149a76dfd6b8985128bc143 RandomForest Regressor
6149a61d8f6a02a07b594ce6 RandomForest Regressor
6149a79e7038fecafef17125 AVG Blender
6149a70801182efe44f17119 RandomForest Regressor
6149a61d8f6a02a07b594ce8 Light Gradient Boosted Trees Regressor with Early Stopping
6149a73d5a5be21f5f8bc145 RandomForest Regressor
6149a61d8f6a02a07b594ce9 eXtreme Gradient Boosted Trees Regressor
6149a61d8f6a02a07b594ce7 Light Gradient Boosting on ElasticNet Predictions 
6149a5ad8b854a085cdad5f4 RandomForest Regressor
6149a5ad8b854a085cdad5f7 Light Gradient Boosting on ElasticNet Predictions 
6149a5ad8b854a085cdad5f6 Light Gradient Boosted Trees Regressor with Early Stopping
6149a79b7038fecafef17120 RandomForest Regressor
6149a5ac8b854a085cdad5f1 eXtreme Gradient Boosted Trees Regressor
6149a5ac8b854a085cdad5f0 RuleFit Regressor
6149a5ac8b854a085cdad5ef Keras Slim Residual Neural Network Regressor using Training Schedule (1 Layer: 64 Units)
6149a5ad8b854a085cdad5f5 Generalized Additive2 Model
6149a5ac8b854

In [26]:
# Skipping explorarion and evaluation of models.
# Let's get directly to the recommended by DR:

recommendedModel = dr.ModelRecommendation.get(project.id).get_model()
print (recommendedModel.id,recommendedModel.model_type)

6149a76dfd6b8985128bc143 RandomForest Regressor


In [28]:
# Check my prediction servers
dr.PredictionServer.list()

[PredictionServer(https://mlops.dynamic.orm.datarobot.com),
 PredictionServer(https://datarobot-cfds.dynamic.orm.datarobot.com),
 PredictionServer(https://cfds-ccm-prod.orm.datarobot.com)]

In [29]:
# Deploy the model to a prediction server
predictionServer = dr.PredictionServer.list()[0]

deployment = dr.Deployment.create_from_learning_model(
    model_id=recommendedModel.id, 
    label='Wine Quality',
    description='Model for scoring wine quality',
    default_prediction_server_id=predictionServer.id
)

### 8.- Requesting Predictions

- Request batch predictions
- Request real-time predictions

In [31]:
# Check scoring data
scoringData = pd.read_csv(score_DS_Path)
scoringData.head()

Unnamed: 0,wine_id,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol
0,100,6.1,0.32,0.28,6.6,0.021,29,132,0.99188,3.15,0.36,11.45
1,101,5.0,0.2,0.4,1.9,0.015,20,98,0.9897,3.37,0.55,12.05
2,102,6.0,0.42,0.41,12.4,0.032,50,179,0.99622,3.14,0.6,9.7
3,103,5.7,0.21,0.32,1.6,0.03,33,122,0.99044,3.33,0.52,11.9
4,104,5.6,0.2,0.36,2.5,0.048,16,125,0.99282,3.49,0.49,10.0


In [32]:
# Batch prediction using wine_id as passthrough column
job = dr.BatchPredictionJob.score (
    deployment=deployment.id,
    passthrough_columns=['wine_id'],
    intake_settings={
        'type': 'localFile',
        'file': score_DS_Path
    },
    output_settings={
        'type': 'localFile',
        'path': preds_DS_Path
    }
)

In [33]:
# Check prediction results
predsData = pd.read_csv(preds_DS_Path)
predsData.head()

Unnamed: 0,quality_PREDICTION,wine_id
0,6.213348,100
1,6.510337,101
2,5.695313,102
3,6.468334,103
4,5.811963,104


Now, let's do real-time predictions:

In [34]:
# Prediction server for API Rest real-time predictions:
predictionServer = deployment.default_prediction_server['url']

predictionUrl = f'{predictionServer}/predApi/v1.0/deployments/{deployment.id}/predictions'
predictionUrl

'https://mlops.dynamic.orm.datarobot.com/predApi/v1.0/deployments/6149aaa11467df1e5a4bcf0e/predictions'

In [35]:
# Data into a binary string:
dataToScore = open(score_DS_Path, 'rb').read()

In [37]:
dataToScore[0:300]

b'wine_id,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol\n100,6.1,0.32,0.28,6.6,0.021,29,132,0.99188,3.15,0.36,11.45\n101,5,0.2,0.4,1.9,0.015,20,98,0.9897,3.37,0.55,12.05\n102,6,0.42,0.41,12.4,0.032,50,179,0.99622,'

In [38]:
# Setup credentials for API REST
import yaml 
stream = open('./drconfig.yaml', 'r')
drconfig = yaml.load(stream, Loader=yaml.Loader)
drtoken = drconfig['token']

In [39]:
# Build request
import requests

# construct a header for the request to include the content type (text)
# and access token and for managed cloud and self-managed installations
# including the prediction server key.
predictionRequestHeaders = {
    'Content-Type': 'text/plain; charset=UTF-8', 
    'Authorization': f'Bearer {drtoken}',
    'datarobot-key': deployment.default_prediction_server['datarobot-key']}

In [40]:
# Make the real-time prediction request

# construct and send the request using the prediction server URL, 
# the binary string containing the data to score, and the header
# values you set above.
predictionsResponse = requests.post(
        predictionUrl,
        data=dataToScore,
        headers=predictionRequestHeaders
)

In [41]:
# Check:
predictionsResponse.status_code

200

In [42]:
# See the response:
predictionsResponse.json()

{'data': [{'predictionValues': [{'value': 6.213347619, 'label': 'quality'}],
   'prediction': 6.213347619,
   'rowId': 0},
  {'predictionValues': [{'value': 6.5103365079, 'label': 'quality'}],
   'prediction': 6.5103365079,
   'rowId': 1},
  {'predictionValues': [{'value': 5.6953134921, 'label': 'quality'}],
   'prediction': 5.6953134921,
   'rowId': 2},
  {'predictionValues': [{'value': 6.468334127, 'label': 'quality'}],
   'prediction': 6.468334127,
   'rowId': 3},
  {'predictionValues': [{'value': 5.8119634921, 'label': 'quality'}],
   'prediction': 5.8119634921,
   'rowId': 4},
  {'predictionValues': [{'value': 5.5977793651, 'label': 'quality'}],
   'prediction': 5.5977793651,
   'rowId': 5},
  {'predictionValues': [{'value': 6.5352031746, 'label': 'quality'}],
   'prediction': 6.5352031746,
   'rowId': 6},
  {'predictionValues': [{'value': 4.8166571429, 'label': 'quality'}],
   'prediction': 4.8166571429,
   'rowId': 7},
  {'predictionValues': [{'value': 4.6671412698, 'label': 'qu