# Online, Batch, and Function Deployment/Scoring in IBM Cloud Pak for Data with WML Python Client
#### Updated for Cloud Pak for Data 3.0  <br>

* The purpose of this notebook is to demo how to <b>DEPLOY</b> and <b>SCORE</b> your ML Models using the WML Python Client.
* IBM CP4D is 1 of 6 enterprise-grade container software solutions designed to move and manage on cloud. More info on this exciting, cutting-edge technology can be found [HERE.](https://www.ibm.com/blogs/cloud-computing/2019/06/10/what-are-ibm-cloud-paks/) 
* An <b>Random Forest</b> model trained on the <b>Kaggle Iris Dataseta</b> is deployed, and scored via <b>Batch,Online, and Function</b> methods.

<br> <b>*** Please Note:</b> There are several ways to deploy models on Watson ML. We are focusing on the 'Python Client' method. Other methods are in the watson-machine-learning-client documentation

### Notebook Layout
* <b>Section 1: Packages and EDA </b> 
* <b>Section 2: Model Training and Building </b> 
* <b>Section 3: WML Client Instantiation </b>
<br>
&ensp; <b>3a:</b> Generate IBM Identity Access Management (IAM) Token for IBM Cloud Pak for Data (CP4D)
<br>
&ensp; <b>3b:</b> Authenticate and Create WML Python Client Object
<br>
&ensp; <b>3c:</b> Persist the Trained Model
<br>
* <b>Section 4: Deployments </b>
<br>
&ensp; <b>4a:</b> Create and/or set Deployment Space
<br>
&ensp; <b>4b:</b> Online Deployment
<br>
&ensp; <b>4c:</b> Batch Deployment
* <b>Section 5: Scoring </b> 
<br>
&ensp; <b>5a:</b> Online Scoring - Using REST API Endpoint
<br>
&ensp; <b>5b:</b> Online Scoring - Using WML Python Client
<br>
&ensp; <b>5c:</b> Batch Scoring
* <b>Section 6: Function Deployments </b> 
<br>
&ensp; <b>6a:</b> Package and Model Creation
<br>
&ensp; <b>6b:</b> Scoring Function Creation
<br>
&ensp; <b>6c:</b> Function Deployment and Scoring



### Sources
* <a href="https://www.kaggle.com/uciml/iris#Iris.csv">KAGGLE IRIS DATASET</a>  Includes three iris species with 50 samples each as well as some properties about each flower.
* <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-authentication.html">WML Auth INFO</a> The 'Authentication' overview section of the Watson Machine Learning info on IBM CLOUD Website.
* <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-deploy_new.html?audience=wdp">WML Deployment GEN INFO</a>  The 'Deployment' overview section of the Watson Machine Learning info on IBM CLOUD Website.
* <a href="https://matplotlib.org/">WML Deployment DOCS</a>  the watson-machine-learning-client documentation.
* <a href="https://matplotlib.org/">WML Deployment V4 DOCS</a>  the watson-machine-learning-client_v4 documentation. More detailed and developer orientated documentation.


## Section 1: Packages and EDA

<br>
Here are some quick summary statistics of the Iris Dataset:

* <b>Columns</b>: Id, SepalLengthCm, SepalWidthCm, PetalLengthCm, PetalWidthCm, Species
* <b>Observations</b>: 150
* <b>Classes</b>: Iris-virginica (50), Iris-versicolor (50), Iris-setosa (50)



In [1]:
import pandas as pd 
import numpy as np 

#Modeling Packages
!pip install sklearn_pandas | tail -n 1
import sklearn 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier

#Packges for IAM Access Token 
import json
import requests
import base64
from requests.auth import HTTPBasicAuth
import time
import warnings

#Packages for WML Client
from watson_machine_learning_client import WatsonMachineLearningAPIClient
import os

#Packages to work with Custom Tranformer and Model Packaging
from sklearn.externals import joblib
import sys



In [2]:
df = pd.read_csv('/project_data/data_asset/Iris.csv')
df.describe() 

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
count,150.0,150.0,150.0,150.0,150.0
mean,75.5,5.843333,3.054,3.758667,1.198667
std,43.445368,0.828066,0.433594,1.76442,0.763161
min,1.0,4.3,2.0,1.0,0.1
25%,38.25,5.1,2.8,1.6,0.3
50%,75.5,5.8,3.0,4.35,1.3
75%,112.75,6.4,3.3,5.1,1.8
max,150.0,7.9,4.4,6.9,2.5


In [3]:
df.Species.value_counts()

Iris-virginica     50
Iris-setosa        50
Iris-versicolor    50
Name: Species, dtype: int64

## Section 2: Model Training and Building

* <b>Data Transformations:</b> The dependent variable, Species, is transformed with <b>LabelEncoder</b>. Classes are 0,1,2 for Iris-virginica, Iris-versicolor, and Iris-setosa respectively. 
* <b>Estimator:</b> <b>Random Forest</b> classifier. There is no parameter tuning. 
* <b>Results:</b> 93% global accuracy.

<br> <b>*** Please Note:</b> This notebook focuses on deployments, not model building/tuning. 

In [4]:
spec_encode = LabelEncoder().fit(df.Species)
df['Species'] = spec_encode.transform(df.Species)

In [5]:
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,0
1,2,4.9,3.0,1.4,0.2,0
2,3,4.7,3.2,1.3,0.2,0
3,4,4.6,3.1,1.5,0.2,0
4,5,5.0,3.6,1.4,0.2,0


In [6]:
X = df.drop(['Id','Species'], axis = 1)
y = df.Species
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

In [7]:
random_forest = RandomForestClassifier()
model= random_forest.fit( X_train, y_train )

In [8]:
# call model.predict() on your X_test data to make a set of test predictions
y_prediction = model.predict( X_test )
# test your predictions using sklearn.classification_report()
report = sklearn.metrics.classification_report( y_test, y_prediction )
# and print the report
print(report)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.90      0.95        10
           2       0.91      1.00      0.95        10

   micro avg       0.97      0.97      0.97        30
   macro avg       0.97      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30



## Section 3: WML Client Instantiation


### 3a: Generate IBM Identity Access Management (IAM) Token for IBM Cloud Pak for Data (CP4D)

* You need an IAM token to instantiate a Python Client Object
* <b>Inputs:</b> Username, password, and url **(or IP, port pair)** of your CP4D cluster <br> 
&emsp; If you are in the CP4D instance, calling **os.environ['RUNTIME_ENV_APSX_URL']** will return the url <br>
&emsp; If you are not in a CP4D instance, the URL can be found on the **'Lets Get Stared'** page <br>
&emsp; **OR** If you are not in a CP4D instance, the URL is also the ip, port pair combo. **Ex: https://< xyz-web-or-ip >:< port number >**
<br><br>  
<b>*** Please Note:</b> This generates an IAM token for CP4D <b>on private cloud.</b> The process is nuanced for IBM Public Cloud. You would need an API Key. Refer to documentation [for more info.](https://wml-api-pyclient-dev-v4.mybluemix.net/#requirements-applicable-only-for-ibm-cloud) 

In [9]:
CREDENTIALS = {
                      "username": '<username>',
                      "password": '<password>',
                      # address should be replaced with ip, port pair to be used in scripts outside CP4D
                      "url": 'https://<ip>:<port-number>'
                   }


def generate_access_token():
    headers={}
    headers["Accept"] = "application/json"
    auth = HTTPBasicAuth(CREDENTIALS["username"], CREDENTIALS["password"])
    
    CP4D_TOKEN_URL= CREDENTIALS["url"] + "/v1/preauth/validateAuth"
    
    response = requests.get(CP4D_TOKEN_URL, headers=headers, auth=auth, verify=False)
    json_data = response.json()
    cp4d_access_token = json_data['accessToken']
    return cp4d_access_token

token = generate_access_token()

### 3b: Authenticate and Create WML Python Client Object 

* Once you have your IAM token, you can create a WML Python Client Object. 
* <b>INPUTS:</b> 
<br>
&ensp; <b>Token:</b> IAM token obtained in step 3A
<br>
&ensp; <b>Instance Id:</b> Set to 'ICP' or 'Openshift' depending on what platform Watson Studio is running on.
<br>
&ensp; <b>Url:</b> IP, port pair of where Watson Studio is located.
<br>
&emsp; This can be found calling <b>os.environ['RUNTIME_ENV_APSX_URL']</b> if you are in ICP. 
<br>
&emsp; You can also use the URL of the Watson Studio instance if you are in ICP (this was done in 3a).  
<br>
&ensp; <b>Version:</b> In our case, it is '3.0.0'. 
<br>
&emsp; <b>If you are using CP4D 2.5</b>, the version would be set to '2.5.0'.
<br><br>
<b>*** Please Note:</b> This generates a client object for <b>ICP.</b> The process is nuanced for IBM Public Cloud. You would need an API Key and WML Instance ID. Refer to documentation <a href="https://wml-api-pyclient-dev-v4.mybluemix.net/#requirements-applicable-only-for-ibm-cloud">for more info.</a> 

In [10]:
url= os.environ['RUNTIME_ENV_APSX_URL']

wml_credentials = {
   "token": token,
   "instance_id" : "openshift",
   "url": url,
   "version": "3.0.0"
}

wml_client = WatsonMachineLearningAPIClient(wml_credentials)

### 3c: Persist the trained model

Sometimes you may want to save the model as a project asset before moving it into the deployment space. In order to use this function you will need access to the WML service. If you do not have access to WML you can create a model and save to the "/project_data/data_asset/filename.joblib" file-path. We will show you how to create a joblib later on in the tutorial.

* Run the first cell to locate the Project Id and set the Project Space
* The second cell checks to see if there are other models saved under the same name. If there is, the existing model is deleted.
* The final cell stores the model, as well as the corresponding metadata to the project

<br></br>
**Some potential use cases for saving the model as a project asset are:**
1. Perform unit tests to a pre-deployment model
2. Have the ability to export model
3. Save the state of a model to update later

In [11]:
project_uid = os.environ['PROJECT_ID']
wml_client.set.default_project(project_uid)

'SUCCESS'

In [12]:
MODEL_NAME = 'IRIS_RF_Model'
for m in wml_client.repository.get_model_details()['resources']:
    if m['entity']['name'] == MODEL_NAME:
        wml_client.repository.delete(m['metadata']['guid'])

In [13]:
model_metadata = {
    wml_client.repository.ModelMetaNames.NAME: MODEL_NAME,
    wml_client.repository.ModelMetaNames.DESCRIPTION: MODEL_NAME,
    wml_client.repository.ModelMetaNames.RUNTIME_UID: "scikit-learn_0.22-py3.6",
    wml_client.repository.ModelMetaNames.TYPE: "scikit-learn_0.22"
}

published_model_details = wml_client.repository.store_model(model=model, 
                                                            meta_props=model_metadata, 
                                                            training_data=X_train,
                                                            training_target=y_train, 
                                                            feature_names = list(X_train.columns))

## Section 4: Deployments

### 4a: Create and/or Set Deployment Space

* Setting a default Deployment Space or Project ID is <b>the first and mandatory step </b> in CP4D. This tells the client from where to push/pull information. 
* Because the focus is Deployments, a Deployment Space ID will be set. 



In [14]:
SPACE_NAME = "IRIS_MODEL_SPACE"

In [15]:
# If Space with same name, set new ID, if not, create new ID for project 
space_name = SPACE_NAME
spaces = wml_client.spaces.get_details()['resources']
space_id = None
for space in spaces:
    if space['entity']['name'] == space_name:
        space_id = space["metadata"]["guid"]
if space_id is None:
    space_id = wml_client.spaces.store(
        meta_props={wml_client.spaces.ConfigurationMetaNames.NAME: space_name})["metadata"]["guid"]
wml_client.set.default_space(space_id)

Unsetting the project_id ...


'SUCCESS'

### 4b: Online Deployment

&emsp; <b>TRAIN/BUILD</b> MODEL --> <b>STORE MODEL</b> IN DEPLOYMENT SPACE (CREATE ID) --> <b>DEPLOY MODEL</b> FROM DEPLOYMENT SPACE (CREATE ID) 

* Online and Batch deployment cycles are identical. A trained model is stored (in the deployment space) and subsequently deployed.
* For Online Deployments: 
<br>
&emsp; <b>1.</b> Model and deployment names are set <br>
&emsp; <b>2.</b> Deployment space checked for any existing deployments set to what was named in Step1. If so, deployment and associated model are deleted. New ones are set.<br>
&emsp; <b>3.</b> Model is pushed and stored in deployment space. Model ID created.<br> 
&emsp; &emsp; &emsp; For detailed steps on metadata for space storing, refer to the <a href="https://wml-api-pyclient.mybluemix.net/#repository"> metadata documentation.</a> <br>
&emsp; &emsp; &emsp; Accurate environment specifications are <b>essential.</b> For sepcification syntax, refer to <a href="https://www.ibm.com/support/producthub/icpdata/docs/content/SSQNUZ_current/wsj/wmls/wmls-deploy-python-types.html">syntax documentation.</a> <br> 
&emsp; &emsp; &emsp; If using Scikit-Learn, use <b>sklearn.__version__</b> command to get scikit version and <b>! python --version</b> for python version <br>
&emsp; <b>4.</b> Model is deployed from deployment space. Deployment ID created. <br>

In [16]:
#1. Model and deployment names are set 
MODEL_NAME = 'IRIS_RF_Online'
deployment_name = 'IRIS_RF_Online_Deployment'

In [17]:
#2. Remove any deployments and associated models with same name
deployment_details = wml_client.deployments.get_details()
for deployment in deployment_details['resources']:
    deployment_id = deployment['metadata']['guid']
    model_id = deployment['entity']['asset']['href'].split('/')[3].split('?')[0]
    if deployment['entity']['name'] == deployment_name:
        print('Deleting deployment id', deployment_id)
        wml_client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        wml_client.repository.delete(model_id)

Deleting deployment id e0c74ed8-257e-46c6-815a-5ccacba8cb26
Deleting model id a406ae1d-4eb6-4a3e-9beb-fd2d70824e07


In [18]:
#3. Save Model to Space 
space_metadata = {
    wml_client.repository.ModelMetaNames.NAME: MODEL_NAME,
    wml_client.repository.ModelMetaNames.TYPE: "scikit-learn_0.22",
    wml_client.repository.ModelMetaNames.RUNTIME_UID: "scikit-learn_0.22-py3.6",
    wml_client.repository.ModelMetaNames.TAGS: [{'value' : 'iris_online_tag'}],
    wml_client.repository.ModelMetaNames.SPACE_UID: space_id
}

stored_model_details = wml_client.repository.store_model(model=model, meta_props=space_metadata)

In [19]:
#4. Deploy the model
meta_props = {
    wml_client.deployments.ConfigurationMetaNames.NAME: deployment_name,
    wml_client.deployments.ConfigurationMetaNames.TAGS : [{'value' : 'iris_online_deployment_tag'}],
    wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
}

model_uid = stored_model_details["metadata"]["guid"]
wml_client.deployments.create(artifact_uid=model_uid, meta_props=meta_props)



#######################################################################################

Synchronous deployment creation for uid: '593fe312-e581-44bd-8ce9-82da8d076e6f' started

#######################################################################################


initializing
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='9630b891-8017-41fa-a652-418bf02cc60d'
------------------------------------------------------------------------------------------------




{'entity': {'asset': {'href': '/v4/models/593fe312-e581-44bd-8ce9-82da8d076e6f?space_id=fbf2dfb2-b7f2-4bd2-ab89-5d604458de45',
   'id': '593fe312-e581-44bd-8ce9-82da8d076e6f'},
  'custom': {},
  'description': '',
  'name': 'IRIS_RF_Online_Deployment',
  'online': {},
  'space': {'href': '/v4/spaces/fbf2dfb2-b7f2-4bd2-ab89-5d604458de45',
   'id': 'fbf2dfb2-b7f2-4bd2-ab89-5d604458de45'},
  'space_id': 'fbf2dfb2-b7f2-4bd2-ab89-5d604458de45',
  'status': {'online_url': {'url': 'https://internal-nginx-svc:12443/v4/deployments/9630b891-8017-41fa-a652-418bf02cc60d/predictions'},
   'state': 'ready'},
  'tags': [{'value': 'iris_online_deployment_tag'}]},
 'metadata': {'created_at': '2020-06-18T03:21:24.290Z',
  'description': '',
  'guid': '9630b891-8017-41fa-a652-418bf02cc60d',
  'href': '/v4/deployments/9630b891-8017-41fa-a652-418bf02cc60d',
  'id': '9630b891-8017-41fa-a652-418bf02cc60d',
  'modified_at': '2020-06-18T03:21:24.290Z',
  'name': 'IRIS_RF_Online_Deployment',
  'parent': {'href'

### 4c: Batch Deployment

* Online and Batch deployment cycles are identical. A trained model is stored (in the deployment space) and subsequently deployed.
* For Batch Deployments: 
<br>
&emsp; <b>1.</b> Model and deployment names are set <br>
&emsp; <b>2.</b> Deployment space checked for any existing deployments set to what was named in Step1. If so, deployment and associated model are deleted. New ones are set.<br>
&emsp; <b>3.</b> Model is pushed and stored in deployment space. Model ID created.<br> 
&emsp; &emsp; &emsp; For detailed steps on metadata for space storing, refer to the <a href="https://wml-api-pyclient.mybluemix.net/#repository"> metadata documentation.</a> <br>&emsp; &emsp; &emsp; Accurate environment specifications are <b>essential.</b> For sepcification syntax, refer to <a href="https://www.ibm.com/support/producthub/icpdata/docs/content/SSQNUZ_current/wsj/wmls/wmls-deploy-python-types.html">syntax documentation.</a> <br> 
&emsp; &emsp; &emsp; If using Scikit-Learn, use <b>sklearn.__version__</b> command to get scikit version and <b>! python --version</b> for python version <br>
&emsp; <b>4.</b> Model is deployed from deployment space. Deployment ID created. <br>

In [20]:
#1. Model and deployment names are set 
MODEL_NAME = 'IRIS_RF_Batch'
deployment_name = 'IRIS_RF_Batch_Deployment'

In [21]:
#2. Remove any deployments and associated models with same name
deployment_details = wml_client.deployments.get_details()
for deployment in deployment_details['resources']:
    deployment_id = deployment['metadata']['guid']
    model_id = deployment['entity']['asset']['href'].split('/')[3].split('?')[0]
    if deployment['entity']['name'] == deployment_name:
        print('Deleting deployment id', deployment_id)
        wml_client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        wml_client.repository.delete(model_id)

Deleting deployment id a812c8bb-49eb-410b-8b08-ea73cf3f8bd3
Deleting model id c9fd30f6-6615-42c1-814b-f606d632722e


In [22]:
#3. Save Model to Space 
space_metadata = {
    wml_client.repository.ModelMetaNames.NAME: MODEL_NAME,
    wml_client.repository.ModelMetaNames.TYPE: "scikit-learn_0.22",
    wml_client.repository.ModelMetaNames.RUNTIME_UID: "scikit-learn_0.22-py3.6",
    wml_client.repository.ModelMetaNames.TAGS: [{'value' : 'iris_batch_tag'}],
    wml_client.repository.ModelMetaNames.SPACE_UID: space_id
}

stored_model_details = wml_client.repository.store_model(model=model, meta_props=space_metadata)

In [23]:
#4. Deploy the model
meta_props = {
    wml_client.deployments.ConfigurationMetaNames.NAME: deployment_name,
    wml_client.deployments.ConfigurationMetaNames.TAGS : [{'value' : 'iris_batch_deployment_tag'}],
    wml_client.deployments.ConfigurationMetaNames.BATCH: {},
    wml_client.deployments.ConfigurationMetaNames.COMPUTE: {
        "name": "S",
         "nodes": 1
     }
 }

model_uid = stored_model_details["metadata"]["guid"]
wml_client.deployments.create(artifact_uid=model_uid, meta_props=meta_props)



#######################################################################################

Synchronous deployment creation for uid: 'e1c4b291-a729-4792-8b19-ee60391e806a' started

#######################################################################################


ready.


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='e1dcfc5e-8c86-4721-87d6-e98392c0bd4f'
------------------------------------------------------------------------------------------------




{'entity': {'asset': {'href': '/v4/models/e1c4b291-a729-4792-8b19-ee60391e806a?space_id=fbf2dfb2-b7f2-4bd2-ab89-5d604458de45',
   'id': 'e1c4b291-a729-4792-8b19-ee60391e806a'},
  'batch': {},
  'compute': {'name': 'S', 'nodes': 1},
  'custom': {},
  'description': '',
  'name': 'IRIS_RF_Batch_Deployment',
  'space': {'href': '/v4/spaces/fbf2dfb2-b7f2-4bd2-ab89-5d604458de45',
   'id': 'fbf2dfb2-b7f2-4bd2-ab89-5d604458de45'},
  'space_id': 'fbf2dfb2-b7f2-4bd2-ab89-5d604458de45',
  'status': {'state': 'ready'},
  'tags': [{'value': 'iris_batch_deployment_tag'}]},
 'metadata': {'created_at': '2020-06-18T03:21:33.394Z',
  'description': '',
  'guid': 'e1dcfc5e-8c86-4721-87d6-e98392c0bd4f',
  'href': '/v4/deployments/e1dcfc5e-8c86-4721-87d6-e98392c0bd4f',
  'id': 'e1dcfc5e-8c86-4721-87d6-e98392c0bd4f',
  'modified_at': '2020-06-18T03:21:33.394Z',
  'name': 'IRIS_RF_Batch_Deployment',
  'parent': {'href': ''},
  'space_id': 'fbf2dfb2-b7f2-4bd2-ab89-5d604458de45',
  'tags': ['iris_batch_deploy

## Section 5: Scoring

### 5a: Online Scoring - Using REST API Endpoint

* An Online Deployment can be accessed through the <b>Python Client</b>, <b>Command Line Interface (CLI)</b>, or <b>REST API.</b><br><br>
* For Online Scoring through <b>REST API: </b>
<br>
&emsp; <b>1.</b> Define online deployment name and retrieve ID (your online model should have already been deployed).  <br>
&emsp; <b>2.</b> Retrieve the Online URL by either constructing the Endpoint/URL or calling wml_client.deployments.get_details(< model id >). <br>
&emsp; &emsp; &emsp; The URL construction in our case is <b>'< url where model is deployed >/v4/deployment< model id >/predictions'</b> <br>
&emsp; &emsp; &emsp; The scoring Endpoint/URL can also be found by <b> clicking </b> on the deployment  </b> <br>
&emsp; <b>3.</b> Construct authentication header (using IAM Token), scoring payload, and score results<br> 
&emsp; &emsp; &emsp; Boiler Code is used for the authentication header, payload constructer, and scoring. This can be found in the documentation.<br> 
&emsp; &emsp; &emsp;<b>***</b> ML Token is the IAM token defined in <b>Section 3</b><br> 
&emsp; &emsp; &emsp;<b>***</b> WML is a stickler for the payload input. Valid payloads for scoring are list of <b>values, pandas or numpy dataframes.</b><br>
&emsp; &emsp; &emsp;<b>***</b> Online score by running <b>requests.post(< scoring url > , < scoring payload > , verify = False )</b><br>
&emsp; <b>4.</b> Compile output. Compiling output is at user discretion.

In [24]:
#1. Setting and finding deployment name 
online_deployment_name = 'IRIS_RF_Online_Deployment'
online_deployment_id = None

for dep in wml_client.deployments.get_details()['resources']:
    if dep['entity']['name'] == online_deployment_name:
        print('found id!')
        online_deployment_id = dep['metadata']['guid']    ### HERE WE ARE FINDING CORRESPONDING DEPLOYMENT ID 
        break
if online_deployment_id == None: print('did not find id')

found id!


In [25]:
# Creating dummy score data
sep_length = (8 - .8) * np.random.random_sample((50,)) + .8
sep_width = (5 - .4) * np.random.random_sample((50,)) + .4
pet_length = (7 - 1.7) * np.random.random_sample((50,)) + 1.7
pet_width  = (3 - .7) * np.random.random_sample((50,)) + .7

score_data = pd.DataFrame({'SepalLengthCm':sep_length,'SepalWidthCm':sep_width,'PetalLengthCm':pet_length,'PetalWidthCm':pet_width})

In [26]:
#2. Constructing scoring URL 
def get_online_deployment_href(asset_id, url):
    DATA_ASSET = u'{}/v4/deployments/{}/predictions'
    return DATA_ASSET.format(url,asset_id)

iris_online_href = get_online_deployment_href(online_deployment_id, CREDENTIALS['url'])

In [27]:
#3. Construct authentication header, scoring payload, and score results 
mltoken = token
header = {'Content-Type': 'application/json', 'Authorization': 'Bearer ' + mltoken}
payload_scoring = {"input_data": [{"fields": score_data.columns.tolist(), "values": score_data.values.tolist()}]}
response_scoring = requests.post(iris_online_href , json=payload_scoring, headers= header,verify = False)
online_scoring_results = json.loads(response_scoring.text)

In [28]:
#4. Compile Results
score_result_columns = online_scoring_results['predictions'][0]['fields']
score_result_data =online_scoring_results['predictions'][0]['values']

online_result_df = score_data.copy()
online_result_df['Predictions'] ,online_result_df['Probability'] = [x[0] for x in score_result_data ], [x[1] for x in score_result_data ]
online_result_df['Predictions'] = spec_encode.inverse_transform(online_result_df['Predictions'])

In [29]:
online_result_df.sample(5)

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Predictions,Probability
7,7.88901,3.340166,5.408251,2.975369,Iris-virginica,"[0.0, 0.0, 1.0]"
18,1.478927,4.230986,4.084285,1.771042,Iris-virginica,"[0.1, 0.3, 0.6]"
9,5.498212,4.495547,2.223028,2.014512,Iris-setosa,"[0.4, 0.2, 0.4]"
12,6.672595,3.266448,2.867779,2.689926,Iris-virginica,"[0.0, 0.4, 0.6]"
23,7.574369,4.684717,5.815662,2.181849,Iris-virginica,"[0.0, 0.0, 1.0]"


### 5b: Online Scoring - Using WML Python Client

* An Online Deployment can be accessed through the <b>Python Client</b>, <b>Command Line Interface (CLI)</b>, or <b>REST API.</b><br><br>
* For Online Scoring through <b>PYTHON CLIENT: </b>
<br>
&emsp; <b>1.</b> Define online deployment name and retrieve ID (your online model should have already been deployed).  <br>
&emsp; <b>2.</b> Construct the scoring payload, and score results<br> 
&emsp; &emsp; &emsp; Boiler Code is used for the scoring. This can be found in the documentation.<br> 
&emsp; &emsp; &emsp;<b>***</b> WML is a stickler for the payload input. Valid payloads for scoring are list of <b>values, pandas or numpy dataframes.</b><br>
&emsp; &emsp; &emsp;<b>***</b> Online score by running <b> wml_client.deployments.score(< deployment id > , < scoring payload >)</b><br>
&emsp; <b>4.</b> Compile output. Compiling output is at user discretion.

In [30]:
#1. Setting and finding deployment name 
online_deployment_name = 'IRIS_RF_Online_Deployment'
online_deployment_id = None

for dep in wml_client.deployments.get_details()['resources']:
    if dep['entity']['name'] == online_deployment_name:
        print('found id!')
        online_deployment_id = dep['metadata']['guid']    ### HERE WE ARE FINDING CORRESPONDING DEPLOYMENT ID 
        break
if online_deployment_id == None: print('did not find id')

found id!


In [31]:
#2. Construct authentication header, scoring payload, and score results 
scoring_payload = {wml_client.deployments.ScoringMetaNames.INPUT_DATA: [{'fields': score_data.columns.tolist(), 'values': score_data.values.tolist()  }]}
online_scoring_results = wml_client.deployments.score(online_deployment_id, scoring_payload)

In [32]:
#3. Compile Results
score_result_columns = online_scoring_results['predictions'][0]['fields']
score_result_data =online_scoring_results['predictions'][0]['values']

online_result_df = score_data.copy()
online_result_df['Predictions'] ,online_result_df['Probability'] = [x[0] for x in score_result_data ], [x[1] for x in score_result_data ]
online_result_df['Predictions'] = spec_encode.inverse_transform(online_result_df['Predictions'])

online_result_df.sample(5)

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Predictions,Probability
13,3.948591,1.394412,2.633925,2.481594,Iris-virginica,"[0.0, 0.4, 0.6]"
36,5.601699,2.9522,5.286558,1.076065,Iris-virginica,"[0.0, 0.3, 0.7]"
47,1.606185,1.487365,6.468035,2.497372,Iris-virginica,"[0.0, 0.3, 0.7]"
7,7.88901,3.340166,5.408251,2.975369,Iris-virginica,"[0.0, 0.0, 1.0]"
38,7.84157,0.957114,5.852993,1.80702,Iris-virginica,"[0.0, 0.0, 1.0]"


### 5c: Batch Scoring

* Batch scoring is extremely useful when you are setting up a pipeline that needs to score large amounts of data, at time intervals, or pulls/pushes into databases.<br>
* Supported databses are Cloud Object Storage buckets (COS), DB2, PostgreSQL. 
* In the example, the scoring set is the same as the online datasets. This can be replaced by Database Connection,local csv files, etc. <br><br>
* For Batch Scoring through <b>PYTHON CLIENT: </b><br>
&emsp; <b>1.</b> Define batch deployment name and retrieve ID (your batch model should have already been deployed).<br>
&emsp; <b>2.</b> Construct the scoring payload, and score results<br> 
&emsp; &emsp; &emsp; Boiler Code is used for the scoring. This can be found in the documentation.<br> 
&emsp; &emsp; &emsp;<b>***</b> WML is a stickler for the payload input. Valid payloads for scoring are list of <b>values, pandas or numpy dataframes.</b>
<br>
&emsp; &emsp; &emsp;<b>***</b> Batch score by running <b>client.deployents.create_job(< deployment id > , < scoring payload >)</b>
<br>
&emsp; &emsp; &emsp;<b>***</b> States of a job are 'queued'-->'running'-->'completed' or 'failed'</b>
<br>
&emsp; <b>4.</b> Compile output. Compiling output is at user discretion. 

In [33]:
#1. Get the batch Deployment ID - Will be used for creating batch job for scoring 
batch_deployment_name = 'IRIS_RF_Batch_Deployment'
batch_deployment_id = None

for dep in wml_client.deployments.get_details()['resources']:
    if dep['entity']['name'] == batch_deployment_name:
        print('found id!')
        batch_deployment_id = dep['metadata']['guid']    ### HERE WE ARE FINDING CORRESPONDING DEPLOYMENT ID 
        break
if batch_deployment_id == None: print('did not find id') 

found id!


In [34]:
#2. Create batch scoring job *NOTE*- Jobs can only be created for batch deployments 
batch_scoring_job = wml_client.deployments.create_job(batch_deployment_id, scoring_payload)
batch_scoring_id = batch_scoring_job['metadata']['guid']

In [35]:
##Cell will stop running once model job is complete 
state = wml_client.deployments.get_job_status(batch_scoring_id)['state']
while state !='completed':
    state = wml_client.deployments.get_job_status(batch_scoring_id)['state']
print('model scored!')

model scored!


In [36]:
#3. Compile Results
batch_scoring_results = wml_client.deployments.get_job_details(batch_scoring_id)

score_result_columns = batch_scoring_results['entity']['scoring']['predictions'][0]['fields']
score_result_data =batch_scoring_results['entity']['scoring']['predictions'][0]['values']

batch_result_df = score_data.copy()
batch_result_df['Predictions'] ,batch_result_df['Probability'] = [x[0] for x in score_result_data ], [x[1] for x in score_result_data ]

batch_result_df.sample(5)

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Predictions,Probability
36,5.601699,2.9522,5.286558,1.076065,2,"[0.0, 0.3, 0.7]"
39,2.700021,1.018216,2.989957,1.021377,1,"[0.0, 0.8, 0.2]"
0,6.490088,2.781614,2.957574,1.26022,1,"[0.0, 1.0, 0.0]"
23,7.574369,4.684717,5.815662,2.181849,2,"[0.0, 0.0, 1.0]"
33,6.831225,1.394038,6.053305,1.464198,2,"[0.0, 0.1, 0.9]"


## Section6 Function Deployments

* There are often scenarios where extremely custom transformations, feature engineering, and processes need to be performed.
* WML allows you to deploy python scripts as 'Functions' to circumvent any possible software limitations.
* A function can be **anything** (model, script, process, etc). In this case, the function will be a **scoring pipeline.**
* The following script creates a **iris_dataset_scoring_pipeline** function that takes the payload data, squares the values of **PedalWidthCm** column, and scores the data with a Random Forest Algorithm.

### 6a: Package and Model Creation 

**Steps for Custom Transformer and Model Creation :** <br> 
**1.** Create custom_transofrmer.py script. The created package, **ValueSquared**, squares the values in an array. Validate that package works correctly.  <br>
**2.** Random Forest Model is trained with new feature **PedalWidthCm2.** <br> 
&emsp; This is the same algorithm as **Section 2** with the exception of the additional feature <br> 
**3.** Transformer and Model are saved as **.txt** files and pushed into project space. IDS are created. They will be called by the function later.  <br>
&emsp; **Note** that the RF model is deployed using the python **joblib** library.<br>
<br>
    <b>**Please Note:</b> The Scikit-Learn library is <b>downgraded</b> from 0.22 to 0.20 due to troubles deploying function on WML

In [37]:
#--Downgrading Scikit-Learn version
!pip uninstall scikit-learn -y | tail -n 1
!pip install scikit-learn==0.20 | tail -n 1
import sklearn
from sklearn.externals import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

  Successfully uninstalled scikit-learn-0.20.0
Successfully installed scikit-learn-0.20.0


In [38]:
%%writefile  /project_data/data_asset/custom_transformer.py   
#make sure magic function appears in first line of cell! 

#Transformer Script 
class ValueSquared:
    """Takes a pandas series and squares the values and outputs the tranformed values"""
    
    def __init__(self, column):
        self.column = column
    
    def square(self):
        squared_col = self.column.map(lambda x : x**2)
        
        return squared_col

Overwriting /project_data/data_asset/custom_transformer.py


In [39]:
#Validates package was created correctly
import sys
sys.path.insert(0, '/project_data/data_asset/')

In [40]:
from custom_transformer import ValueSquared as vs

In [41]:
#Model Creation

X2 = df.drop(['Id','Species'], axis = 1)
X2['PetalWidthCm2'] = vs(df.PetalWidthCm).square() ##Note that we created this in STEP 6A
y2 = df.Species

X_train2, X_test2, y_train2, y_test2 = train_test_split(X2, y2, test_size=0.2, random_state=10, stratify=y2)

random_forest = RandomForestClassifier()

model2 = random_forest.fit( X_train2, y_train2 )

In [42]:
#Saving Custom Transformer Package and creating ID. This will be called by the scoring function 
create_script_asset_details = wml_client.data_assets.create("custom_transformer.py", "/project_data/data_asset/custom_transformer.py")
custom_transformer_uid = create_script_asset_details['metadata']['guid']


#Saving Random Forest Model and creating ID. This will be called by the scoring function 
joblib.dump(model2, '/project_data/data_asset/IRIS_RF_MODEL.txt')

random_forest_model_asset_details = wml_client.data_assets.create('RF_MODEL.joblib', "/project_data/data_asset/IRIS_RF_MODEL.txt")
random_forest_uid = random_forest_model_asset_details['metadata']['guid']

Creating data asset...
SUCCESS
Creating data asset...
SUCCESS


### 6b: Scoring Function Creation
- At a high level, this scoring function is a python script that:
    1. Gains access to the project space
        - The same **CREDENTIALS** and **wml_credentials** paramaters defined in **Section 3** are used. 
        - The **Model ID and Custom Transformer ID** defined in step **6a** are additional parameters used. 
    2. Downloads the Custom Transformer and RF Model
        - Methods **get_data_href, get_data_asset_href, get_attachment_href, handle_response** construct parameters needed to download assets (model, custom transformer). 
        - Method **download_asset** returns the href link needed to download the asset.  
    3. Munges and scores the payload data
        - Method **score** does the munging and scoring. Note that this leverages the downloaded assets.  
        - Returns model scores in an identical format as Batch/Online output. 

In [43]:
# Credentials to access assets in WML from function
ai_parms = {
    "wml_credentials": wml_credentials,  
    "credentials" : CREDENTIALS, 
    "project_uid": os.environ['PROJECT_ID'],  
    'header' :  {'Content-Type': 'application/json'},
    "custom_transformer_uid": custom_transformer_uid,  #defined in 6a
    "random_forest_uid": random_forest_uid,   #defined in 6a
    "parms" : {'space_id' : space_id}
}


def iris_dataset_scoring_pipeline(parms = ai_parms):
    
    import requests
    import json
    from sklearn.externals import joblib
    from watson_machine_learning_client import WatsonMachineLearningAPIClient
    import pandas as pd
    from requests.auth import HTTPBasicAuth
    import os
    
    global model
    
    
    # Access token expires so you need to generate one inside the function
    # For this reason you can't hardcode the token
    def generate_access_token():
        headers={}
        headers["Accept"] = "application/json"
        auth = HTTPBasicAuth(parms['credentials']["username"], parms['credentials']["password"])

        CP4D_TOKEN_URL= parms['credentials']["url"] + "/v1/preauth/validateAuth"

        response = requests.get(CP4D_TOKEN_URL, headers=headers, auth=auth, verify=False)
        json_data = response.json()
        cp4d_access_token = json_data['accessToken']
        return cp4d_access_token
    
    def get_data_asset_href(asset_id, wml_credentials):
        DATA_ASSET = u'{}/v2/data_assets/{}'
        return DATA_ASSET.format(wml_credentials['url'],asset_id)

    def get_attachment_href(asset_id,attachment_id, wml_credentials):
        ATTACHMENT = "{}/v2/assets/{}/attachments/{}"
        return ATTACHMENT.format(wml_credentials['url'],asset_id,attachment_id)

    # Function for error handling
    def handle_response(expected_status_code, operationName, response, json_response=True):
        if response.status_code == expected_status_code:
            if json_response:
                try:
                    return response.json()
                except Exception as e:
                    raise print(u'Failure during parsing json response: \'{}\''.format(response.text), e)
            else:
                return response.text
    
    # Loads in the data from the information collected using the above functions        
    def download_asset(asset_id, filename):
        import requests
        
        asset_response = requests.get(get_data_asset_href(asset_id, parms['wml_credentials']), params=parms['parms'],
                                          headers=parms['header'], verify=False)

        asset_details = handle_response(200, u'get assets', asset_response)
        attachment_id = asset_details["attachments"][0]["id"]

        response = requests.get(get_attachment_href(asset_id, attachment_id, parms['wml_credentials']), params=parms['parms'],
                                      headers=parms['header'], verify=False)
        attachment_signed_url = response.json()["url"]

        att_response = requests.get(parms['wml_credentials']["url"]+attachment_signed_url,
                                        verify=False)

        downloaded_asset = att_response.content

        with open(filename, 'wb') as f:
            f.write(downloaded_asset)
        download_path = os.getcwd() + '/' + filename
        
        return download_path
    
    #Get token
    token = generate_access_token()
    parms['header']['Authorization'] =  'Bearer ' + token
    
    # call the function to download the prep script and return the path
    prep_script_path = download_asset(parms['custom_transformer_uid'], 'custom_transformer.py')
    
    # call the function to download the joblib file and return the path
    model_path = download_asset(parms['random_forest_uid'], 'IRIS_RF_MODEL.joblib')
    #return model_path
    model = joblib.load(model_path)
    
    # Import in your custom transformer
    from custom_transformer import ValueSquared
    
    
    def score( payload ):
        import json
        global client
        global model

        payload_data = pd.DataFrame(payload['input_data'][0]['values'], columns = payload['input_data'][0]['fields'] )
        payload_data['PedalWidthCm2'] = ValueSquared(payload_data.PetalWidthCm).square()
        
        pred = model.predict(payload_data)
        prob = [ [str(x[0]),str(x[1]),str(x[2])] for x in model.predict_proba(payload_data)] 
        score_ = {"predictions": [{"fields": ["prediction", "probability"], "values": [[ str(x), y ] for x,y in zip(pred ,prob)]}]}
        response_scoring = json.dumps(score_)
        return json.loads(response_scoring)

    return score

In [44]:
#Make sure function is loading correctly
iris_dataset_scoring_pipeline()(scoring_payload)

{'predictions': [{'fields': ['prediction', 'probability'],
   'values': [['1', ['0.0', '1.0', '0.0']],
    ['2', ['0.0', '0.0', '1.0']],
    ['2', ['0.0', '0.0', '1.0']],
    ['2', ['0.0', '0.4', '0.6']],
    ['2', ['0.0', '0.0', '1.0']],
    ['2', ['0.0', '0.0', '1.0']],
    ['1', ['0.1', '0.9', '0.0']],
    ['2', ['0.0', '0.0', '1.0']],
    ['2', ['0.0', '0.3', '0.7']],
    ['2', ['0.1', '0.4', '0.5']],
    ['2', ['0.1', '0.1', '0.8']],
    ['2', ['0.0', '0.0', '1.0']],
    ['2', ['0.0', '0.2', '0.8']],
    ['2', ['0.1', '0.0', '0.9']],
    ['1', ['0.0', '0.6', '0.4']],
    ['0', ['0.7', '0.3', '0.0']],
    ['2', ['0.0', '0.0', '1.0']],
    ['1', ['0.0', '1.0', '0.0']],
    ['2', ['0.0', '0.4', '0.6']],
    ['1', ['0.0', '1.0', '0.0']],
    ['2', ['0.0', '0.1', '0.9']],
    ['2', ['0.0', '0.0', '1.0']],
    ['2', ['0.1', '0.0', '0.9']],
    ['2', ['0.0', '0.0', '1.0']],
    ['1', ['0.0', '0.7', '0.3']],
    ['2', ['0.0', '0.0', '1.0']],
    ['1', ['0.1', '0.9', '0.0']],
    ['2', ['0

### 4c: Function Deployment and Scoring 

* The Function Deployment cycle is the same as Batch/Online Deployment.
* The Function Scoring is the same as Online Scoring with REST API Endpoint. 

**Steps for Function Deployments:** <br> 
**1.** Function and deployment names are set <br>
**2.** Deployment space checked for any existing deployments set to what was named in Step1. If so, deployment and associated function are deleted. New ones are set. <br> 
**3.** Function is pushed and stored in deployment space. Function ID created. <br> 
&emsp; &emsp; &emsp; For detailed steps on metadata for space storing, refer to the <a href="https://wml-api-pyclient.mybluemix.net/#repository"> metadata documentation.</a> <br>&emsp; &emsp; &emsp; Accurate environment specifications are <b>essential.</b> For sepcification syntax, refer to <a href="https://www.ibm.com/support/producthub/icpdata/docs/content/SSQNUZ_current/wsj/wmls/wmls-deploy-python-types.html">syntax documentation.</a> <br> 
&emsp; &emsp; &emsp; If using Scikit-Learn, use <b>sklearn.__version__</b> command to get scikit version and <b>! python --version</b> for python version <br>
**4.** Function is deployed from deployment space. Function ID created. <br>  

**Steps for Function Scoring (same as scoring Online with REST API)** <br> 
**1.** Define online deployment name and retrieve ID (your online model should have already been deployed). <br>
**2.** Retrieve the Online URL by either constructing the URL or calling **wml_client.deployments.get_details(< function id >).** <br>
**3.** Construct authentication header (using IAM Token), scoring payload, and score results <br> 
**4.** Compile output. Compiling output is at user discretion. <br>  

In [45]:
#1. Setting and finding deployment name 
FUNCTION_NAME = 'IRIS_PY_Scoring_Function'
FUNCTION_DEPLOYMENT_NAME = 'IRIS_PY_Function_Deployment'

In [46]:
#2. Remove any deployments and associated function with same name
deployment_details = wml_client.deployments.get_details()
for deployment in deployment_details['resources']:
    deployment_id = deployment['metadata']['guid']
    model_id = deployment['entity']['asset']['href'].split('/')[3].split('?')[0]
    if deployment['entity']['name'] == FUNCTION_DEPLOYMENT_NAME:
        print('Deleting deployment id', deployment_id)
        wml_client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        wml_client.repository.delete(model_id)

In [47]:
#3. Save Function to Space 
space_metadata = {
    wml_client.repository.FunctionMetaNames.NAME: FUNCTION_NAME,
    wml_client.repository.FunctionMetaNames.DESCRIPTION: FUNCTION_NAME,
    wml_client.repository.FunctionMetaNames.RUNTIME_UID: "scikit-learn_0.20-py3.6",
    wml_client.repository.FunctionMetaNames.SPACE_UID: space_id
}

stored_function_details = wml_client.repository.store_function(meta_props = space_metadata, 
                                                                  function = iris_dataset_scoring_pipeline)


In [48]:
#4. Deploy the Function
meta_props_deployment = {
   wml_client.deployments.ConfigurationMetaNames.NAME: FUNCTION_DEPLOYMENT_NAME,
   wml_client.deployments.ConfigurationMetaNames.DESCRIPTION: FUNCTION_DEPLOYMENT_NAME,
   wml_client.deployments.ConfigurationMetaNames.TAGS : [{'value' : 'IRIS_PYTHON_FUNCTION_TAG'}],
   wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
}

function_uid = wml_client.repository.get_function_uid(stored_function_details)

deployment_details = wml_client.deployments.create(function_uid, meta_props=meta_props_deployment)

scoring_deployment_id = wml_client.deployments.get_uid(deployment_details)



#######################################################################################

Synchronous deployment creation for uid: '99e0614d-ab9d-485f-9510-58be78dae443' started

#######################################################################################


initializing...
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='31a56808-56be-41a3-8461-4be9ce946978'
------------------------------------------------------------------------------------------------




In [49]:
# NOTE: you must construct mltoken based on provided documentation
header = {'Content-Type': 'application/json', 'Authorization': 'Bearer ' + token}

# NOTE: manually define and pass the array(s) of values to be scored in the next line
payload = scoring_payload

scoring_url = wml_client.deployments.get_details(scoring_deployment_id)['entity']['status']['online_url']['url']
response_scoring = requests.post(scoring_url, json=payload, headers=header, verify=False)
function_scoring_results = json.loads(response_scoring.text)

In [50]:
#Compile Results
score_result_columns = function_scoring_results['predictions'][0]['fields']
score_result_data =function_scoring_results['predictions'][0]['values']

function_result_df = score_data.copy()
function_result_df['PetalWidthCm2'] = vs(function_result_df.PetalWidthCm).square() ##Note that this function was created in step 6A
function_result_df.drop('PetalWidthCm', axis = 1)
function_result_df['Predictions'] ,function_result_df['Probability'] = [x[0] for x in score_result_data ], [x[1] for x in score_result_data ]

function_result_df.sample(5)

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,PetalWidthCm2,Predictions,Probability
46,3.04521,3.25774,3.10455,1.53486,2.355795,1,"[0.0, 1.0, 0.0]"
29,1.399083,2.028729,5.234036,1.956076,3.826233,2,"[0.0, 0.0, 1.0]"
25,5.173941,1.461047,6.018446,2.316418,5.365793,2,"[0.0, 0.0, 1.0]"
21,5.012175,0.724621,5.924128,2.995747,8.9745,2,"[0.0, 0.0, 1.0]"
43,3.346034,4.489802,5.230575,0.765772,0.586407,0,"[0.4, 0.4, 0.2]"


## Developed by IBM CPAT team:
Emilio Fiallos - Data Scientist      
Kevin Potter - Data Scientist