# Online and Batch Deployment/Scoring in CP4D with WML Python Client
<br>

* The purpose of this notebook is to demo how to <b>DEPLOY</b> and <b>SCORE</b> your ML Model using the WML Python Client.
* An <b>XGBOOST</b> model trained on the <b>Kaggle Iris Dataseta</b> is deployed, and scored via <b>Batch and Online</b> methods.
* Whether you are on a Watson Studio, Jupyter, or local DE, simply import the Watson ML Client library and bring your models to life!

<br> <b>*** Please Note:</b> There are several ways to deploy models on Watson ML. We are focusing on the 'Python Client' method. Other methods are in the watson-machine-learning-client documentation

### Notebook Layout
* <b>Section 1: Packages and EDA </b> 
* <b>Section 2: Model Training and Building </b> 
* <b>Section 3: WML Client Instantiation </b>
<br>
&ensp; <b>3a:</b> Enable User Authentication for CP4D on IBM Cloud Private (ICP) 
<br>
&ensp; <b>3b:</b> Authenticate and Create WML Python Client Object 
* <b>Section 4: Deployments </b>
<br>
&ensp; <b>4a:</b> Create and/or set Deployment Space
<br>
&ensp; <b>4b:</b> Online Deployment
<br>
&ensp; <b>4c:</b> Batch Deployment
* <b>Section 5: Scoring </b> 
<br>
&ensp; <b>5a:</b> Online Scoring - Using REST API Endpoint
<br>
&ensp; <b>5b:</b> Online Scoring - Using WML Python Client
<br>
&ensp; <b>5c:</b> Batch Scoring


### Sources
* <a href="https://www.kaggle.com/uciml/iris#Iris.csv">KAGGLE IRIS DATASET</a>  Includes three iris species with 50 samples each as well as some properties about each flower.
* <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-authentication.html">WML Auth INFO</a> The 'Authentication' overview section of the Watson Machine Learning info on IBM CLOUD Website.
* <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-deploy_new.html?audience=wdp">WML Deployment GEN INFO</a>  The 'Deployment' overview section of the Watson Machine Learning info on IBM CLOUD Website.
* <a href="https://matplotlib.org/">WML Deployment DOCS</a>  the watson-machine-learning-client documentation.
* <a href="https://matplotlib.org/">WML Deployment V4 DOCS</a>  the watson-machine-learning-client_v4 documentation. More detailed and developer orientated documentation.


## Section 1: Packages and EDA

<br>
Here are some quick summary statistics of the Iris Dataset:

* <b>Columns</b>: Id, SepalLengthCm, SepalWidthCm, PetalLengthCm, PetalWidthCm, Species
* <b>Observations</b>: 150
* <b>Classes</b>: Iris-virginica (50), Iris-versicolor (50), Iris-setosa (50)



In [1]:
import pandas as pd 
import numpy as np 

#Modeling Packages
!pip install sklearn_pandas
import sklearn 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier

#Packges for IAM Access Token 
import json
import requests
import base64
from requests.auth import HTTPBasicAuth
import time
import warnings

#Packages for WML Client
from watson_machine_learning_client import WatsonMachineLearningAPIClient
import os

Collecting sklearn_pandas
  Downloading https://files.pythonhosted.org/packages/1f/48/4e1461d828baf41d609efaa720d20090ac6ec346b5daad3c88e243e2207e/sklearn_pandas-1.8.0-py2.py3-none-any.whl
Installing collected packages: sklearn-pandas
Successfully installed sklearn-pandas-1.8.0


In [2]:
df = pd.read_csv('/project_data/data_asset/Iris.csv')
df.describe() 

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
count,150.0,150.0,150.0,150.0,150.0
mean,75.5,5.843333,3.054,3.758667,1.198667
std,43.445368,0.828066,0.433594,1.76442,0.763161
min,1.0,4.3,2.0,1.0,0.1
25%,38.25,5.1,2.8,1.6,0.3
50%,75.5,5.8,3.0,4.35,1.3
75%,112.75,6.4,3.3,5.1,1.8
max,150.0,7.9,4.4,6.9,2.5


In [3]:
df.Species.value_counts()

Iris-versicolor    50
Iris-setosa        50
Iris-virginica     50
Name: Species, dtype: int64

## Section 2: Model Training and Building

* <b>Data Transformations:</b> The dependent variable, Species, is transformed with <b>LabelEncoder</b>. Classes are 0,1,2 for Iris-virginica, Iris-versicolor, and Iris-setosa respectively. 
* <b>Estimator:</b> <b>XGBOOST</b> classifier. There is no parameter tuning. 
* <b>Results:</b> 93% global accuracy.

<br> <b>*** Please Note:</b> This notebook focuses on deployments, not model building/tuning. 

In [4]:
spec_encode = LabelEncoder().fit(df.Species)
df['Species'] = spec_encode.transform(df.Species)

In [5]:
X = df.drop(['Id','Species'], axis = 1)
y = df.Species
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

In [6]:
random_forest = RandomForestClassifier()
model= random_forest.fit( X_train, y_train )

In [7]:
# call model.predict() on your X_test data to make a set of test predictions
y_prediction = model.predict( X_test )
# test your predictions using sklearn.classification_report()
report = sklearn.metrics.classification_report( y_test, y_prediction )
# and print the report
print(report)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.82      0.90      0.86        10
           2       0.89      0.80      0.84        10

   micro avg       0.90      0.90      0.90        30
   macro avg       0.90      0.90      0.90        30
weighted avg       0.90      0.90      0.90        30



## Section 3: WML Client Instantiation


### 3a: Generate IBM Identity Access Management (IAM) Token for IBM Cloud Private (ICP)

* You need an IAM token to instantiate a Python Client Object
* <b>Inputs:</b> Username, password, and url **(or IP, port pair)** of your CP4D cluster <br> 
&emsp; If you are in the CP4D instance, calling **os.environ['RUNTIME_ENV_APSX_URL']** will return the url <br>
&emsp; If you are not in a CP4D instance, the URL can be found on the **'Lets Get Stared'** page <br>
&emsp; **OR** If you are not in a CP4D instance, the URL is also the ip, port pair combo. **Ex: https://< xyz-web-or-ip >:< port number >**
<br> 
<b>*** Please Note:</b> This generates an IAM token for <b>ICP.</b> The process is nuanced for IBM Public Cloud. You would need an API Key. Refer to documentation for more info. 

In [14]:
CREDENTIALS = {
                      "username": 'xyz',
                      "password": "abc",
                      # address should be replaced with ip, port pair to be used in scripts outside ICP
                      "url": 'https://<ip>:<port-number>'
                   }


def generate_access_token():
    headers={}
    headers["Accept"] = "application/json"
    auth = HTTPBasicAuth(CREDENTIALS["username"], CREDENTIALS["password"])
    
    ICP_TOKEN_URL= CREDENTIALS["url"] + "/v1/preauth/validateAuth"
    
    response = requests.get(ICP_TOKEN_URL, headers=headers, auth=auth, verify=False)
    json_data = response.json()
    icp_access_token = json_data['accessToken']
    return icp_access_token

token = generate_access_token()

### 3b: Authenticate and Create WML Python Client Object 

* Once you have your IAM token, you can create a WML Python Client Object. 
* <b>INPUTS:</b> 
<br>
&ensp; <b>Token:</b> IAM token obtained in step 3A
<br>
&ensp; <b>Instance Id:</b> Set to 'ICP' or 'Openshift' depending on what platform Watson Studio is running on.
<br>
&ensp; <b>Url:</b> IP, port pair of where Watson Studio is located.
<br>
&emsp; This can be found calling <b>os.environ['RUNTIME_ENV_APSX_URL']</b> if you are in ICP. 
<br>
&emsp; You can also use the URL of the Watson Studio instance if you are in ICP (this was done in 3a).  
<br>
&ensp; <b>Version:</b> In our case, it is '2.5.0'. 
<br>
&emsp; <b>IBM CP4D 3.0 is days away of being released.</b> In that case, version would be set to '3.0.0'.
<br><br>
<b>*** Please Note:</b> This generates a client object for <b>ICP.</b> The process is nuanced for IBM Public Cloud. You would need an API Key and WML Instance ID. Refer to documentation for more info. 

In [16]:
#token = os.environ['USER_ACCESS_TOKEN']
url= 'os.environ['RUNTIME_ENV_APSX_URL']'

wml_credentials = {
   "token": token,
   "instance_id" : "openshift",
   "url": url,
   "version": "2.5.0"
}

wml_client = WatsonMachineLearningAPIClient(wml_credentials)

## Section 4: Deployments

### 4a: Create and/or Set Deployment Space

* Setting a default Deployment Space or Project ID is <b>the first and mandatory step </b> in CP4D. This tells the client from where to push/pull information. 
* Because the focus is Deployments, a Deployment Space ID will be set. 



In [17]:
SPACE_NAME = "IRIS_MODEL_SPACE"

In [18]:
# If Space with same name, set new ID, if not, create new ID for project 
space_name = SPACE_NAME
spaces = wml_client.spaces.get_details()['resources']
space_id = None
for space in spaces:
    if space['entity']['name'] == space_name:
        space_id = space["metadata"]["guid"]
if space_id is None:
    space_id = wml_client.spaces.store(
        meta_props={wml_client.spaces.ConfigurationMetaNames.NAME: space_name})["metadata"]["guid"]
wml_client.set.default_space(space_id)

'SUCCESS'

### 4b: Online Deployment

&emsp; <b>TRAIN/BUILD</b> MODEL --> <b>STORE MODEL</b> IN DEPLOYMENT SPACE (CREATE ID) --> <b>DEPLOY MODEL</b> FROM DEPLOYMENT SPACE (CREATE ID) 

* Online and Batch deployment cycles are identical. A trained model is stored (in the deployment space) and subsequently deployed.
* For Online Deployments: 
<br>
&emsp; <b>1.</b> Model and deployment names are set <br>
&emsp; <b>2.</b> Deployment space checked for any existing deployments set to what was named in Step1. If so, deployment and associated model are deleted. New ones are set.<br>
&emsp; <b>3.</b> Model is pushed and stored in deployment space. Model ID created.<br> 
&emsp; <b>4.</b> Model is deployed from deployment space. Deployment ID created. <br>

In [19]:
MODEL_NAME = 'IRIS_RF_Online'
deployment_name = 'IRIS_RF_Online_Deployment'

In [20]:
# Remove any deployments and associated models with same name
deployment_details = wml_client.deployments.get_details()
for deployment in deployment_details['resources']:
    deployment_id = deployment['metadata']['guid']
    model_id = deployment['entity']['asset']['href'].split('/')[3].split('?')[0]
    if deployment['entity']['name'] == deployment_name:
        print('Deleting deployment id', deployment_id)
        wml_client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        wml_client.repository.delete(model_id)

In [21]:
#Save Model to Space 

space_metadata = {
    wml_client.repository.ModelMetaNames.NAME: MODEL_NAME,
    wml_client.repository.ModelMetaNames.TYPE: "scikit-learn_0.20",
    wml_client.repository.ModelMetaNames.RUNTIME_UID: "scikit-learn_0.20-py3",
    wml_client.repository.ModelMetaNames.TAGS: [{'value' : 'iris_online_tag'}],
    wml_client.repository.ModelMetaNames.SPACE_UID: space_id
}

stored_model_details = wml_client.repository.store_model(model=model, meta_props=space_metadata)

In [22]:
# Deploy the model

meta_props = {
    wml_client.deployments.ConfigurationMetaNames.NAME: deployment_name,
    wml_client.deployments.ConfigurationMetaNames.TAGS : [{'value' : 'iris_online_deployment_tag'}],
    wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
}

model_uid = stored_model_details["metadata"]["guid"]
wml_client.deployments.create(artifact_uid=model_uid, meta_props=meta_props)



#######################################################################################

Synchronous deployment creation for uid: 'b0b18f80-a2f4-4691-9eab-879779c868c5' started

#######################################################################################


initializing....................................................
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='8bfb500c-06ce-4e41-9d83-b78478cb72ae'
------------------------------------------------------------------------------------------------




{'metadata': {'parent': {'href': ''},
  'guid': '8bfb500c-06ce-4e41-9d83-b78478cb72ae',
  'modified_at': '',
  'created_at': '2020-05-09T00:32:18.475Z',
  'href': '/v4/deployments/8bfb500c-06ce-4e41-9d83-b78478cb72ae'},
 'entity': {'name': 'IRIS_RF_Online_Deployment',
  'custom': {},
  'online': {},
  'description': '',
  'tags': [{'value': 'iris_online_deployment_tag', 'description': ''}],
  'space': {'href': '/v4/spaces/8b9e1282-94fc-4f5d-9a81-af2992c4ec27'},
  'status': {'state': 'ready',
   'online_url': {'url': 'https://zen-cpd-zen.apps.lb.development01.csplab.local/v4/deployments/8bfb500c-06ce-4e41-9d83-b78478cb72ae/predictions'}},
  'asset': {'href': '/v4/models/b0b18f80-a2f4-4691-9eab-879779c868c5?space_id=8b9e1282-94fc-4f5d-9a81-af2992c4ec27'},
  'auto_redeploy': False}}

### 4c: Batch Deployment

* Online and Batch deployment cycles are identical. A trained model is stored (in the deployment space) and subsequently deployed.
* For Batch Deployments: 
<br>
&emsp; <b>1.</b> Model and deployment names are set <br>
&emsp; <b>2.</b> Deployment space checked for any existing deployments set to what was named in Step1. If so, deployment and associated model are deleted. New ones are set.<br>
&emsp; <b>3.</b> Model is pushed and stored in deployment space. Model ID created.<br> 
&emsp; <b>4.</b> Model is deployed from deployment space. Deployment ID created. <br>

In [23]:
MODEL_NAME = 'IRIS_RF_Batch'
deployment_name = 'IRIS_RF_Batch_Deployment'

In [24]:
# Remove any deployments and associated models with same name

deployment_details = wml_client.deployments.get_details()
for deployment in deployment_details['resources']:
    deployment_id = deployment['metadata']['guid']
    model_id = deployment['entity']['asset']['href'].split('/')[3].split('?')[0]
    if deployment['entity']['name'] == deployment_name:
        print('Deleting deployment id', deployment_id)
        wml_client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        wml_client.repository.delete(model_id)

In [25]:
# Save model to Space

space_metadata = {
    wml_client.repository.ModelMetaNames.NAME: MODEL_NAME,
    wml_client.repository.ModelMetaNames.TYPE: "scikit-learn_0.20",
    wml_client.repository.ModelMetaNames.RUNTIME_UID: "scikit-learn_0.20-py3",
    wml_client.repository.ModelMetaNames.TAGS: [{'value' : 'iris_batch_tag'}],
    wml_client.repository.ModelMetaNames.SPACE_UID: space_id
}

stored_model_details = wml_client.repository.store_model(model=model, meta_props=space_metadata)

In [26]:
# Deploy the model

meta_props = {
    wml_client.deployments.ConfigurationMetaNames.NAME: deployment_name,
    wml_client.deployments.ConfigurationMetaNames.TAGS : [{'value' : 'iris_batch_deployment_tag'}],
    wml_client.deployments.ConfigurationMetaNames.BATCH: {},
    wml_client.deployments.ConfigurationMetaNames.COMPUTE: {
        "name": "S",
         "nodes": 1
     }
 }

model_uid = stored_model_details["metadata"]["guid"]
wml_client.deployments.create(artifact_uid=model_uid, meta_props=meta_props)



#######################################################################################

Synchronous deployment creation for uid: '0f362e0e-8548-4ae7-9eb9-6986096ea881' started

#######################################################################################


ready.


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='a20e9cb3-46ad-4e13-a1fa-ba6162678c49'
------------------------------------------------------------------------------------------------




{'metadata': {'parent': {'href': ''},
  'guid': 'a20e9cb3-46ad-4e13-a1fa-ba6162678c49',
  'modified_at': '',
  'created_at': '2020-05-09T00:37:31.878Z',
  'href': '/v4/deployments/a20e9cb3-46ad-4e13-a1fa-ba6162678c49'},
 'entity': {'name': 'IRIS_RF_Batch_Deployment',
  'custom': {},
  'description': '',
  'tags': [{'value': 'iris_batch_deployment_tag', 'description': ''}],
  'compute': {'name': 'S', 'nodes': 1},
  'batch': {},
  'space': {'href': '/v4/spaces/8b9e1282-94fc-4f5d-9a81-af2992c4ec27'},
  'status': {'state': 'ready'},
  'asset': {'href': '/v4/models/0f362e0e-8548-4ae7-9eb9-6986096ea881?space_id=8b9e1282-94fc-4f5d-9a81-af2992c4ec27'},
  'auto_redeploy': False}}

## Section 5: Scoring

### 5a: Online Scoring - Using REST API Endpoint

* An Online Deployment can be accessed through the <b>Python Client</b>, <b>Command Line Interface (CLI)</b>, or <b>REST API.</b><br><br>
* For Online Scoring through <b>REST API: </b>
<br>
&emsp; <b>1.</b> Define online deployment name and retrieve ID (your online model should have already been deployed).  <br>
&emsp; <b>2.</b> Retrieve the Online URL by either constructing the Endpoint/URL or calling wml_client.deployments.get_details(< model id >). <br>
&emsp; &emsp; &emsp; The URL construction in our case is <b>'< url where model is deployed >/4/deployment< model id >/predictions'</b> <br>
&emsp; &emsp; &emsp; The scoring Endpoint/URL can also be found by <b> clicking </b> on the deployment  </b> <br>
&emsp; <b>3.</b> Construct authentication header (using IAM Token), scoring payload, and score results<br> 
&emsp; &emsp; &emsp; Boiler Code is used for the authentication header, payload constructer, and scoring. This can be found in the documentation.<br> 
&emsp; &emsp; &emsp;<b>***</b> ML Token is the IAM token defined in <b>Section 3</b><br> 
&emsp; &emsp; &emsp;<b>***</b> WML is a stickler for the payload input. Valid payloads for scoring are list of <b>values, pandas or numpy dataframes.</b><br>
&emsp; &emsp; &emsp;<b>***</b> Online score by running <b>requests.post(< scoring url > , < scoring payload > , verify = False )</b><br>
&emsp; <b>4.</b> Compile output. Compiling output is at user discretion.

In [27]:
#1. Setting and finding deployment name 
online_deployment_name = 'IRIS_RF_Online_Deployment'
online_deployment_id = None

for dep in wml_client.deployments.get_details()['resources']:
    if dep['entity']['name'] == online_deployment_name:
        print('found id!')
        online_deployment_id = dep['metadata']['guid']    ### HERE WE ARE FINDING CORRESPONDING DEPLOYMENT ID 
        break
if online_deployment_id == None: print('did not find id')

found id!


In [28]:
# Creating dummy score data
sep_length = (8 - .8) * np.random.random_sample((50,)) + .8
sep_width = (5 - .4) * np.random.random_sample((50,)) + .4
pet_length = (7 - 1.7) * np.random.random_sample((50,)) + 1.7
pet_width  = (3 - .7) * np.random.random_sample((50,)) + .7

score_data = pd.DataFrame({'SepalLengthCm':sep_length,'SepalWidthCm':sep_width,'PetalLengthCm':pet_length,'PetalWidthCm':pet_width})

In [29]:
#2. Constructing scoring URL 
def get_online_deployment_href(asset_id, url):
    DATA_ASSET = u'{}/v4/deployments/{}/predictions'
    return DATA_ASSET.format(url,asset_id)

iris_online_href = get_online_deployment_href(online_deployment_id,CREDENTIALS['url'])

In [30]:
#3. Construct authentication header, scoring payload, and score results 
mltoken = token
header = {'Content-Type': 'application/json', 'Authorization': 'Bearer ' + mltoken}
payload_scoring = {"input_data": [{"fields": score_data.columns.tolist(), "values": score_data.values.tolist()}]}
response_scoring = requests.post(iris_online_href , json=payload_scoring, headers= header,verify = False)
online_scoring_results = json.loads(response_scoring.text)

In [31]:
#4. Compile Results
score_result_columns = online_scoring_results['predictions'][0]['fields']
score_result_data =online_scoring_results['predictions'][0]['values']

online_result_df = score_data.copy()
online_result_df['Predictions'] ,online_result_df['Probability'] = [x[0] for x in score_result_data ], [x[1] for x in score_result_data ]
online_result_df['Predictions'] = spec_encode.inverse_transform(online_result_df['Predictions'])

In [32]:
online_result_df.sample(5)

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Predictions,Probability
19,2.795668,2.55023,2.076833,1.39332,Iris-versicolor,"[0.4, 0.5, 0.1]"
31,7.955788,1.422871,3.827253,2.380504,Iris-versicolor,"[0.0, 0.5, 0.5]"
0,7.67648,0.519356,6.64291,1.225585,Iris-virginica,"[0.0, 0.0, 1.0]"
9,6.826764,1.680009,1.842966,1.430981,Iris-setosa,"[0.5, 0.4, 0.1]"
33,7.456475,1.389737,6.75801,2.551476,Iris-virginica,"[0.0, 0.0, 1.0]"


### 5b: Online Scoring - Using WML Python Client

* An Online Deployment can be accessed through the <b>Python Client</b>, <b>Command Line Interface (CLI)</b>, or <b>REST API.</b><br><br>
* For Online Scoring through <b>PYTHON CLIENT: </b>
<br>
&emsp; <b>1.</b> Define online deployment name and retrieve ID (your online model should have already been deployed).  <br>
&emsp; <b>2.</b> Construct the scoring payload, and score results<br> 
&emsp; &emsp; &emsp; Boiler Code is used for the scoring. This can be found in the documentation.<br> 
&emsp; &emsp; &emsp;<b>***</b> WML is a stickler for the payload input. Valid payloads for scoring are list of <b>values, pandas or numpy dataframes.</b><br>
&emsp; &emsp; &emsp;<b>***</b> Online score by running <b> wml_client.deployments.score(< deployment id > , < scoring payload >)</b><br>
&emsp; <b>4.</b> Compile output. Compiling output is at user discretion.

In [33]:
online_deployment_name = 'IRIS_RF_Online_Deployment'
online_deployment_id = None

for dep in wml_client.deployments.get_details()['resources']:
    if dep['entity']['name'] == online_deployment_name:
        print('found id!')
        online_deployment_id = dep['metadata']['guid']    ### HERE WE ARE FINDING CORRESPONDING DEPLOYMENT ID 
        break
if online_deployment_id == None: print('did not find id')


found id!


In [34]:
scoring_payload = {wml_client.deployments.ScoringMetaNames.INPUT_DATA: [{'fields': score_data.columns.tolist(), 'values': score_data.values.tolist()  }]}


In [35]:
online_scoring_results = wml_client.deployments.score(online_deployment_id, scoring_payload)

In [36]:
score_result_columns = online_scoring_results['predictions'][0]['fields']
score_result_data =online_scoring_results['predictions'][0]['values']

online_result_df = score_data.copy()
online_result_df['Predictions'] ,online_result_df['Probability'] = [x[0] for x in score_result_data ], [x[1] for x in score_result_data ]
online_result_df['Predictions'] = spec_encode.inverse_transform(online_result_df['Predictions'])

online_result_df.sample(5)

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Predictions,Probability
0,7.67648,0.519356,6.64291,1.225585,Iris-virginica,"[0.0, 0.0, 1.0]"
1,7.641686,0.607364,3.046193,1.548297,Iris-versicolor,"[0.0, 0.8, 0.2]"
24,4.964359,0.671011,5.711646,2.245724,Iris-virginica,"[0.0, 0.1, 0.9]"
47,6.113861,0.884578,5.264106,1.901249,Iris-virginica,"[0.0, 0.0, 1.0]"
32,6.426823,1.716275,3.720906,0.973433,Iris-versicolor,"[0.0, 0.9, 0.1]"


### 5c: Batch Scoring

* Batch scoring is extremely useful when you are setting up a pipeline that needs to score large amounts of data, at time intervals, or pulls/pushes into databases.<br>
* Supported databses are Cloud Object Storage buckets (COS), DB2, PostgreSQL. 
* In the example, the scoring set is the same as the online datasets. This can be replaced by Database Connection,local csv files, etc. <br><br>
* For Batch Scoring through <b>PYTHON CLIENT: </b><br>
&emsp; <b>1.</b> Define batch deployment name and retrieve ID (your batch model should have already been deployed).<br>
&emsp; <b>2.</b> Construct the scoring payload, and score results<br> 
&emsp; &emsp; &emsp; Boiler Code is used for the scoring. This can be found in the documentation.<br> 
&emsp; &emsp; &emsp;<b>***</b> WML is a stickler for the payload input. Valid payloads for scoring are list of <b>values, pandas or numpy dataframes.</b>
<br>
&emsp; &emsp; &emsp;<b>***</b> Batch score by running <b>client.deployents.create_job(< deployment id > , < scoring payload >)</b>
<br>
&emsp; &emsp; &emsp;<b>***</b> States of a job are 'queued'-->'running'-->'completed' or 'failed'</b>
<br>
&emsp; <b>4.</b> Compile output. Compiling output is at user discretion.

In [37]:
#1. Get the batch Deployment ID - Will be used for creating batch job for scoring 
batch_deployment_name = 'IRIS_RF_Batch_Deployment'
batch_deployment_id = None

for dep in wml_client.deployments.get_details()['resources']:
    if dep['entity']['name'] == batch_deployment_name:
        print('found id!')
        batch_deployment_id = dep['metadata']['guid']    ### HERE WE ARE FINDING CORRESPONDING DEPLOYMENT ID 
        break
if batch_deployment_id == None: print('did not find id') 

found id!


In [38]:
#Create batch scoring job *NOTE*- Jobs can only be created for batch deployments 
batch_scoring_job = wml_client.deployments.create_job(batch_deployment_id, scoring_payload)
batch_scoring_id = batch_scoring_job['metadata']['guid']

In [39]:
##Cell will stop running once model job is complete 
state = wml_client.deployments.get_job_status(batch_scoring_id)['state']
while state !='completed':
    state = wml_client.deployments.get_job_status(batch_scoring_id)['state']
print('model scored!')

model scored!


In [40]:
batch_scoring_results = wml_client.deployments.get_job_details(batch_scoring_id)

score_result_columns = batch_scoring_results['entity']['scoring']['predictions'][0]['fields']
score_result_data =batch_scoring_results['entity']['scoring']['predictions'][0]['values']

batch_result_df = score_data.copy()
batch_result_df['Predictions'] ,batch_result_df['Probability'] = [x[0] for x in score_result_data ], [x[1] for x in score_result_data ]

batch_result_df.sample(5)

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Predictions,Probability
23,1.746538,2.05138,4.656183,0.729098,0,"[0.5, 0.3, 0.2]"
48,4.045124,3.709616,4.739772,2.889168,2,"[0.2, 0.3, 0.5]"
24,4.964359,0.671011,5.711646,2.245724,2,"[0.0, 0.1, 0.9]"
19,2.795668,2.55023,2.076833,1.39332,1,"[0.4, 0.5, 0.1]"
30,2.975255,4.988337,6.514303,2.189696,2,"[0.2, 0.0, 0.8]"


## Developed by IBM CPAT team:

Emilio Fiallos - Data Scientist               