#### How to use WML v2 on Cloud Pak for Data as a Service : CPDaaS

- read a csv **file**
- create a model
- store the model in WML
    
    
> As of May 2021 by **Michel Le** from IBM


In [21]:
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
%matplotlib inline

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import KFold
import xgboost as xgb

pd.options.display.max_rows = 20
pd.options.display.max_colwidth = 100
pd.options.display.max_columns = 100

> Get **project** credentials to read csv file stored as data asset 

> How : 
- At the project level 
    - Click on **Settings** next to Access control
    - Add and access token in the lower part of the page
- At the notebook level
    - In the line showing Project/NotebookName
    - On the right, click on the 3 vertical dots button
    - Choose **Insert pojrect token**

In [22]:
# @hidden_cell
# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.
from project_lib import Project
project = Project(project_id='3b7598be-4f47-4828-8b52-894de61196bf', project_access_token='p-05ef5c2a1e3ba88d7422ebdb7394218ef7654591')
pc = project.project_context


In [23]:
def asset(file):
    # return file
    global project
    f = project.get_file(file)
    f.seek(0)
    return f

In [24]:
def read():
    df = pd.read_csv(asset("sales-2000.csv"), sep="," )
    return df

In [25]:
df = read()
df.shape

(1999, 9)

In [26]:
df

Unnamed: 0,CustomerID,GENDER,AGE,MARITAL_STATUS,PROFESSION,IS_TENT,PRODUCT_LINE,PURCHASE_AMOUNT,PURCHASE_AMOUNT_PREDICTION
0,0,M,27,Single,Professional,True,Camping Equipment,144.78,0.0
1,1,F,39,Married,Other,False,Outdoor Protection,144.83,0.0
2,2,F,39,Married,Other,False,Outdoor Protection,137.37,0.0
3,3,F,56,Unspecified,Hospitality,False,Personal Accessories,92.61,0.0
4,4,M,45,Married,Retired,False,Golf Equipment,119.04,0.0
...,...,...,...,...,...,...,...,...,...
1994,1994,M,23,Single,Sales,True,Mountaineering Equipment,124.76,0.0
1995,1995,F,43,Married,Other,False,Outdoor Protection,136.80,0.0
1996,1996,F,19,Single,Student,False,Personal Accessories,72.97,0.0
1997,1997,M,25,Single,Other,False,Mountaineering Equipment,127.99,0.0


In [27]:
use_cols = [ 'AGE']

In [28]:
features = use_cols
target = 'PURCHASE_AMOUNT'

In [29]:
data = df.copy()

In [30]:

train, test = data[:800], data[800:]
x_train, y_train = train[features], train[target]
x_test, y_test = test[features], test[target]
x_test.head()

Unnamed: 0,AGE
800,42
801,42
802,41
803,41
804,27


In [31]:
y_test.head()

800    140.45
801    139.12
802    161.92
803    161.61
804    122.29
Name: PURCHASE_AMOUNT, dtype: float64

In [32]:
dtrain = xgb.DMatrix(x_train.astype('float32'), label=y_train)
dtest = xgb.DMatrix(x_test.astype('float32'))


In [33]:
params = {
    'objective' : 'reg:squarederror',
    'tree_method':'hist',
    'min_child_weight' : 10,
    'eta' : 0.01,
    'seed' : 0,
    'gamma':1,
    'max_depth':6,
}

In [37]:
# train and use the model to make prediction
n_trees = 50
bst = xgb.train(params, dtrain, num_boost_round=n_trees, evals=[(dtrain, 'rmse')], maximize=True, 
        verbose_eval=False)


In [39]:
y_hat = bst.predict(dtest)
y_hat

array([59.651318, 59.651318, 59.651318, ..., 41.996384, 48.894978,
       48.894978], dtype=float32)

In [40]:
xgb.__version__

'0.90'

In [41]:
import os
import json

In [42]:
from ibm_watson_machine_learning import APIClient

>

Create a WML client by providing 
- apikey
- url
       
How to get those:
- apikey 
    - On the top, at the "hamburger menu"
    - In Administration > IAM > Apikey
    - **Create an IBM Cloud API Key**
- url
    - The url in the brower up to "cloud.ibm.com"
    - ex : "https://eu-de.ml.cloud.ibm.com"

In [3]:

apikey = "n_s53M_KXxyy7KUbeKZ6uRpbQ9PIaEQc_BZSfDgyUNHZ"
## the above key is illustrative
url = "https://eu-de.ml.cloud.ibm.com"

In [48]:

wml_credentials = {
    "apikey" : apikey,
    "url" : "https://eu-de.ml.cloud.ibm.com"
}

client = APIClient(wml_credentials)

In [49]:
client

<ibm_watson_machine_learning.client.APIClient at 0x7f7d483d4f10>

> Have a deployement space ie a cloud storage location to persist data for WML

> How 
- On top, in the black banner, use the *Hamburger menu* next to the **IBM Cloud Pak for Data**
- Select Deployments
- Select "New deployment space" to create one.
- Click Manage in the new deployment space page
- Record the **Space GUID**

In [56]:
space_id = "d69c1588-adbe-4d3d-8bc1-a8bc9193543b"

In [55]:
client.set.default_space(space_id)

'SUCCESS'

> Now specify information needed to deploy the model
> What is the bare minimum needed ?
- The name of your *model*
- The software stack needed to run the model by giving the *uid* of a predefined software_spec/stack
- The type of the model ako main package used in your model. *It's my understanding*
- The input/output schema. Not mandatory but useful

> How ? 
- Software_spec are given by names and documented, in doubt choose "default_py3.7" to start with
- Type, it's documented

> Documentation
- For sofware_spec : see https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/pm_service_supported_frameworks.html
- For API : see http://ibm-wml-api-pyclient.mybluemix.net/?_ga=2.8950199.389008101.1620669525-228683183.1620669525#supported-machine-learning-frameworks


In [83]:
software_spec_uid = client.software_specifications.get_uid_by_name('default_py3.7')
software_spec_uid

'e4429883-c883-42b6-87a8-f419d64088cd'

In [147]:
input_schema = [{"id": "in",
                "type" : 'list',
                "fields" : [{'name' : 'age', 'type': 'int' }]
             }]

output_schema = {"id": "out",
                 "type" : 'dict',
                 "fields" : [{'name' : 'price-hat', 'type': 'float' }]
                 }

In [155]:
print(json.dumps(input_schema, indent=2))
print(json.dumps(output_schema, indent=2))

[
  {
    "id": "in",
    "type": "list",
    "fields": [
      {
        "name": "age",
        "type": "int"
      }
    ]
  }
]
{
  "id": "out",
  "type": "dict",
  "fields": [
    {
      "name": "price-hat",
      "type": "float"
    }
  ]
}


In [152]:
metadata = {
    client.repository.ModelMetaNames.NAME: "Price regression",
    client.repository.ModelMetaNames.TYPE: "scikit-learn_0.23",
    client.repository.ModelMetaNames.SOFTWARE_SPEC_UID : software_spec_uid,
    client.repository.ModelMetaNames.INPUT_DATA_SCHEMA: input_schema,
    client.repository.ModelMetaNames.OUTPUT_DATA_SCHEMA: output_schema
}

In [153]:
model_details = client.repository.store_model(bst, meta_props=metadata)

In [154]:
print(json.dumps(model_details, indent=3))

{
   "entity": {
      "schemas": {
         "input": [
            {
               "fields": [
                  {
                     "name": "age",
                     "type": "int"
                  }
               ],
               "id": "in",
               "type": "list"
            }
         ],
         "output": [
            {
               "fields": [
                  {
                     "name": "price-hat",
                     "type": "float"
                  }
               ],
               "id": "out",
               "type": "dict"
            }
         ]
      },
      "software_spec": {
         "id": "e4429883-c883-42b6-87a8-f419d64088cd",
         "name": "default_py3.7"
      },
      "type": "scikit-learn_0.23"
   },
   "metadata": {
      "created_at": "2021-05-11T13:07:25.386Z",
      "id": "e741479e-1e26-4595-8659-71b6f852a9c7",
      "modified_at": "2021-05-11T13:07:25.994Z",
      "name": "Price regression",
      "owner": "1000331020",
      "sp

> Deploy the model
- After the above action one should see in the **deployment space** the new model.
- For actual deployment, use the GUI in CPDaaS for now. Using code will be presented in another notebook.

> HOw ?
- In deployment space
- Next to the asset click on **deploy**
