<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Watson Machine Learning

This notebook should be run using with **Python 3.7** runtime environment. **If you are viewing this in Watson Studio and do not see Python 3.7 in the upper right corner of your screen, please update the runtime now.** It requires service credentials for the following services:
  * Watson OpenScale
  * Watson Machine Learning 
  * DB2

  
The notebook will train, create and deploy a model, configure OpenScale to monitor that deployment, and inject seven days' worth of historical records and measurements for viewing in the OpenScale Insights dashboard.

# Setup <a name="setup"></a>

## Package installation

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
!pip install --upgrade pyspark==2.4 --no-cache | tail -n 1

!pip install --upgrade pandas==0.25.3 --no-cache | tail -n 1
!pip install --upgrade requests==2.23 --no-cache | tail -n 1
!pip install numpy==1.16.4 --no-cache | tail -n 1
!pip install SciPy --no-cache | tail -n 1
!pip install lime --no-cache | tail -n 1
!pip install ibm-cloud-sdk-core --no-cache | tail -n 1

!pip install --upgrade ibm-watson-machine-learning --user | tail -n 1
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1

### Action: restart the kernel!

## Configure credentials

- WOS_CREDENTIALS (CP4D)
- WML_CREDENTIALS (CP4D)
- DATABASE_CREDENTIALS (DB2 on CP4D or Cloud Object Storage (COS))
- SCHEMA_NAME

In [1]:
#masked
WOS_CREDENTIALS = {
    "url": "https://namespace1-cpd-namespace1.apps.islnovXX.os.fyre.ibm.com",
    "username": "XX",
    "password": "XX",
    "version": "3.5"
}

In [2]:
#masked
WML_CREDENTIALS = {
                   "url": "https://namespace1-cpd-namespace1.apps.islnovXX.os.fyre.ibm.com",
                   "username": "XX",
                   "password" : "XX",
                   "instance_id": "wml_local",
                   "version" : "3.5"
                  }

In [3]:
#masked
#IBM DB2 database connection format example
DATABASE_CREDENTIALS = {
    "hostname":"9.999.999.99",
    "username":"XX",
    "password":"XX",
    "database":"SAMPLE",
    "port":"50000"
}

### Action: put created schema name below.

In [4]:
SCHEMA_NAME = 'AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000'

## Save training data to Cloud Object Storage

### Cloud object storage details¶

In next cells, you will need to paste some credentials to Cloud Object Storage. If you haven't worked with COS yet please visit getting started with COS tutorial. You can find COS_API_KEY_ID and COS_RESOURCE_CRN variables in Service Credentials in menu of your COS instance. Used COS Service Credentials must be created with Role parameter set as Writer. Later training data file will be loaded to the bucket of your instance and used as training refecence in subsription. COS_ENDPOINT variable can be found in Endpoint field of the menu.

In [5]:
IAM_URL="https://iam.ng.bluemix.net/oidc/token"

In [6]:
# masked
COS_API_KEY_ID = "xviwlu_XX"
COS_RESOURCE_CRN = "crn:v1:bluemix:public:cloud-object-storage:global:a/e40741b27da5881193d18b40e6a3078d:30030db1-808f-4a80-8f70-2a85ce8948b8::" # eg "crn:v1:bluemix:public:cloud-object-storage:global:a/3bf0d9003abfb5d29761c3e97696b71c:d6f04d83-6c4f-4a62-a165-696756d63903::"
COS_ENDPOINT = "https://s3.us.cloud-object-storage.appdomain.cloud" # Current list avaiable at https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints
BUCKET_NAME = "testcasebucket"
FILE_NAME = "Indirect_bias_AdultCensusdata.csv"

# Load and explore data

In [7]:
!rm adult.csv
!wget https://raw.githubusercontent.com/ravichamarthy/indirect-bias/master/adult.csv

rm: cannot remove 'adult.csv': No such file or directory
--2021-01-23 06:51:49--  https://raw.githubusercontent.com/ravichamarthy/indirect-bias/master/adult.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.192.133, 151.101.0.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.192.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3551145 (3.4M) [text/plain]
Saving to: ‘adult.csv’


2021-01-23 06:51:55 (5.01 MB/s) - ‘adult.csv’ saved [3551145/3551145]



## Explore data

In [8]:
from pyspark.sql import SparkSession
import json

spark = SparkSession.builder.getOrCreate()
df_data = spark.read.csv(path="adult.csv", sep=",", header=True, inferSchema=True) 
df_data.head()

Row(age=39, workclass='State-gov', fnlwgt=77516, education='Bachelors', education-num=13, Marital='Never-married', occupation='Adm-clerical', relationship='Not-in-family', race='White', sex='Male', capitalgain=2174, loss=0, hoursper=40, citizen_status='United-States', label='<=50K')

In [9]:
print("Number of records: " + str(df_data.count()))

Number of records: 32561


# Create a model

In [10]:
# spark_df = sqlCtx.createDataFrame(df_data)
spark_df = df_data
# Remove protected attributes from training data
protected_attributes = ["race", "age", "sex"]
for attr in protected_attributes:
    spark_df = spark_df.drop(attr)
columns = spark_df.columns
model_name = "Adult Census Income Classifier Model"
deployment_name = "Adult Census Income Classifier Deployment"

spark_df.printSchema()

root
 |-- workclass: string (nullable = true)
 |-- fnlwgt: integer (nullable = true)
 |-- education: string (nullable = true)
 |-- education-num: integer (nullable = true)
 |-- Marital: string (nullable = true)
 |-- occupation: string (nullable = true)
 |-- relationship: string (nullable = true)
 |-- capitalgain: integer (nullable = true)
 |-- loss: integer (nullable = true)
 |-- hoursper: integer (nullable = true)
 |-- citizen_status: string (nullable = true)
 |-- label: string (nullable = true)



In [11]:
from pyspark.ml.feature import OneHotEncoderEstimator, StringIndexer, IndexToString, VectorAssembler
from pyspark.ml import Pipeline, Model

cat_features = ['workclass', 'education', 'Marital', 'occupation', 'relationship', 'citizen_status'] 
num_features = ["fnlwgt", "education-num", "capitalgain", "loss", "hoursper"]
stages=[]

for feature in cat_features:
    string_indexer = StringIndexer(inputCol = feature, outputCol = feature + '_IX').setHandleInvalid("keep")
    encoder = OneHotEncoderEstimator(inputCols=[string_indexer.getOutputCol()], outputCols=[feature + "classVec"])
    stages += [string_indexer, encoder]

si_Label = StringIndexer(inputCol="label", outputCol="encoded_label").fit(spark_df)
label_converter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=si_Label.labels)
stages.append(si_Label)

In [12]:
assembler_inputs = [c + "classVec" for c in cat_features] + num_features
va_features = VectorAssembler(inputCols=assembler_inputs, outputCol="features")
stages.append(va_features)

In [13]:
(train_data, test_data) = spark_df.randomSplit([0.8, 0.2], 24)
print("Number of records for training: " + str(train_data.count()))
print("Number of records for evaluation: " + str(test_data.count()))

Number of records for training: 26028
Number of records for evaluation: 6533


In [14]:
train_data.columns

['workclass',
 'fnlwgt',
 'education',
 'education-num',
 'Marital',
 'occupation',
 'relationship',
 'capitalgain',
 'loss',
 'hoursper',
 'citizen_status',
 'label']

In [15]:
from pyspark.ml.classification import GBTClassifier, DecisionTreeClassifier, RandomForestClassifier
classifier = RandomForestClassifier(labelCol="encoded_label", featuresCol="features")
stages.append(classifier)
stages.append(label_converter)
pipeline = Pipeline(stages=stages)
model = pipeline.fit(train_data)

In [16]:
predictions = model.transform(test_data)
predictions.printSchema()
predictions.head()

root
 |-- workclass: string (nullable = true)
 |-- fnlwgt: integer (nullable = true)
 |-- education: string (nullable = true)
 |-- education-num: integer (nullable = true)
 |-- Marital: string (nullable = true)
 |-- occupation: string (nullable = true)
 |-- relationship: string (nullable = true)
 |-- capitalgain: integer (nullable = true)
 |-- loss: integer (nullable = true)
 |-- hoursper: integer (nullable = true)
 |-- citizen_status: string (nullable = true)
 |-- label: string (nullable = true)
 |-- workclass_IX: double (nullable = false)
 |-- workclassclassVec: vector (nullable = true)
 |-- education_IX: double (nullable = false)
 |-- educationclassVec: vector (nullable = true)
 |-- Marital_IX: double (nullable = false)
 |-- MaritalclassVec: vector (nullable = true)
 |-- occupation_IX: double (nullable = false)
 |-- occupationclassVec: vector (nullable = true)
 |-- relationship_IX: double (nullable = false)
 |-- relationshipclassVec: vector (nullable = true)
 |-- citizen_status_IX: 

Row(workclass='?', fnlwgt=12285, education='Some-college', education-num=10, Marital='Never-married', occupation='?', relationship='Not-in-family', capitalgain=0, loss=0, hoursper=20, citizen_status='United-States', label='<=50K', workclass_IX=3.0, workclassclassVec=SparseVector(9, {3: 1.0}), education_IX=1.0, educationclassVec=SparseVector(16, {1: 1.0}), Marital_IX=1.0, MaritalclassVec=SparseVector(7, {1: 1.0}), occupation_IX=7.0, occupationclassVec=SparseVector(15, {7: 1.0}), relationship_IX=1.0, relationshipclassVec=SparseVector(6, {1: 1.0}), citizen_status_IX=0.0, citizen_statusclassVec=SparseVector(41, {0: 1.0}), encoded_label=0.0, features=SparseVector(99, {3: 1.0, 10: 1.0, 26: 1.0, 39: 1.0, 48: 1.0, 53: 1.0, 94: 12285.0, 95: 10.0, 98: 20.0}), rawPrediction=DenseVector([18.3788, 1.6212]), probability=DenseVector([0.9189, 0.0811]), prediction=0.0, predictedLabel='<=50K')

In [17]:
from pyspark.ml.evaluation import BinaryClassificationEvaluator
evaluatorDT = BinaryClassificationEvaluator(labelCol="encoded_label", rawPredictionCol="rawPrediction")
accuracy = evaluatorDT.evaluate(predictions)

print("Accuracy = %g" % accuracy)

Accuracy = 0.881522


# Save and deploy the model

In [18]:
import json
from ibm_watson_machine_learning import APIClient

wml_client = APIClient(WML_CREDENTIALS)
wml_client.version

'1.0.45'

In [19]:
wml_client.spaces.list(limit=10)

------------------------------------  -------------------------------------------------------------------  ------------------------
ID                                    NAME                                                                 CREATED
fe8fd396-0fa3-4c4d-ad47-703ef3728b60  tutorial-space                                                       2020-12-13T09:38:20.638Z
05e86c85-7b72-4723-9e9f-79edc3c1d54c  prod                                                                 2020-12-10T18:25:47.038Z
ac444b7b-5994-424a-8cfe-1ced879cdf23  wml_preprod                                                          2020-12-10T08:24:22.942Z
4d5ee450-5184-43a7-91bd-943995e8a31f  wml_preprod_123                                                      2020-12-10T08:23:07.169Z
a39c40af-7d6b-4926-81e2-3aa6e1aad1dd  wml_preprod_ppm                                                      2020-12-09T11:31:02.796Z
cc8e5691-ad71-47cc-814a-b3ae3e720861  wml_prod_ppm                                           

## Find the space that you would like to associate the model that is created and deployed as part of the notebook, and specify it in the next cell

In [20]:
WML_SPACE_ID='fe8fd396-0fa3-4c4d-ad47-703ef3728b60' # use space id here
wml_client.set.default_space(WML_SPACE_ID)

'SUCCESS'

In [21]:
deployments_list = wml_client.deployments.get_details()
for deployment in deployments_list["resources"]:
    model_id = deployment["entity"]["asset"]["id"]
    deployment_id = deployment["metadata"]["id"]
    if deployment["metadata"]["name"] == deployment_name:
        print("Deleting deployment id", deployment_id)
        wml_client.deployments.delete(deployment_id)
        print("Deleting model id", model_id)
        wml_client.repository.delete(model_id)
wml_client.repository.list_models()

------------------------------------  -------------------------------  ------------------------  ---------
ID                                    NAME                             CREATED                   TYPE
0c7e99a4-107f-471c-8dc0-78de5a9373d7  Spark German Risk Model - Final  2020-12-13T09:38:57.002Z  mllib_2.4
------------------------------------  -------------------------------  ------------------------  ---------


In [22]:
training_data_references = [
                {
                    "id": "product line",
                    "type": "s3",
                    "connection": {
                        "access_key_id": COS_API_KEY_ID,
                        "endpoint_url": COS_ENDPOINT,
                        "resource_instance_id":COS_RESOURCE_CRN
                    },
                    "location": {
                        "bucket": BUCKET_NAME,
                        "path": FILE_NAME,
                    }
                }
            ]

In [23]:
software_spec_uid = wml_client.software_specifications.get_id_by_name("spark-mllib_2.4")
print("Software Specification ID: {}".format(software_spec_uid))
model_props = {
        wml_client._models.ConfigurationMetaNames.NAME:"{}".format(model_name),
        wml_client._models.ConfigurationMetaNames.SPACE_UID: WML_SPACE_ID,
        wml_client._models.ConfigurationMetaNames.TYPE: "mllib_2.4",
        wml_client._models.ConfigurationMetaNames.SOFTWARE_SPEC_UID: software_spec_uid,
        wml_client._models.ConfigurationMetaNames.TRAINING_DATA_REFERENCES: training_data_references,
        wml_client._models.ConfigurationMetaNames.LABEL_FIELD: "label",
    }

Software Specification ID: 390d21f8-e58b-4fac-9c55-d7ceda621326


In [24]:
print("Storing model ...")
published_model_details = wml_client.repository.store_model(
    model=model, 
    meta_props=model_props, 
    training_data=train_data, 
    pipeline=pipeline)

model_uid = wml_client.repository.get_model_uid(published_model_details)
print("Done")
print("Model ID: {}".format(model_uid))

Storing model ...
Done
Model ID: 4a04e27a-f19e-498a-bd51-65751e61debb


In [25]:
published_model_details

{'entity': {'label_column': 'label',
  'pipeline': {'id': '77a3d1a1-f89d-40c9-8968-e6e85ca13405'},
  'software_spec': {'id': '390d21f8-e58b-4fac-9c55-d7ceda621326',
   'name': 'spark-mllib_2.4'},
  'training_data_references': [{'connection': {'access_key_id': 'xviwlu_o6K7qmB8Kbi3tz6GW25Lmbbw9lKy8GWYzgIXW',
     'endpoint_url': 'https://s3.us.cloud-object-storage.appdomain.cloud',
     'resource_instance_id': 'crn:v1:bluemix:public:cloud-object-storage:global:a/e40741b27da5881193d18b40e6a3078d:30030db1-808f-4a80-8f70-2a85ce8948b8::'},
    'id': 'product line',
    'location': {'bucket': 'testcasebucket',
     'path': 'Indirect_bias_AdultCensusdata.csv'},
    'schema': {'fields': [{'metadata': {},
       'name': 'workclass',
       'nullable': True,
       'type': 'string'},
      {'metadata': {}, 'name': 'fnlwgt', 'nullable': True, 'type': 'integer'},
      {'metadata': {},
       'name': 'education',
       'nullable': True,
       'type': 'string'},
      {'metadata': {},
       'name

## Create a model deployment

In [26]:
deployment_details = wml_client.deployments.create(
    model_uid, 
    meta_props={
        wml_client.deployments.ConfigurationMetaNames.NAME: "{}".format(deployment_name),
        wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
    }
)
scoring_url = wml_client.deployments.get_scoring_href(deployment_details)
deployment_uid=wml_client.deployments.get_uid(deployment_details)

print("Scoring URL:" + scoring_url)
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))



#######################################################################################

Synchronous deployment creation for uid: '4a04e27a-f19e-498a-bd51-65751e61debb' started

#######################################################################################


initializing...
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='207d8487-0df2-491f-94a2-fd398e985c23'
------------------------------------------------------------------------------------------------


Scoring URL:https://namespace1-cpd-namespace1.apps.islnov15.os.fyre.ibm.com/ml/v4/deployments/207d8487-0df2-491f-94a2-fd398e985c23/predictions
Model id: 4a04e27a-f19e-498a-bd51-65751e61debb
Deployment id: 207d8487-0df2-491f-94a2-fd398e985c23


# Construct the scoring payload

In [27]:
import pandas as pd

df = pd.read_csv("adult.csv")
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,Marital,occupation,relationship,race,sex,capitalgain,loss,hoursper,citizen_status,label
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


## Remove the sensitive attributes

In [28]:
cols_to_remove = ['label']
cols_to_remove.extend(protected_attributes)
cols_to_remove

['label', 'race', 'age', 'sex']

## Create the meta data frame capturing the sensitive data

In [29]:
meta_df = df[protected_attributes].copy()
meta_fields = meta_df.columns.tolist()
meta_values = meta_df[meta_fields].values.tolist()

## Construct the scoring payload comprising the meta fields

In [30]:
def get_scoring_payload(no_of_records_to_score = 1):
    meta_payload = {
        "fields": meta_fields,
        "values": meta_values[:no_of_records_to_score]
    }

    for col in cols_to_remove:
        if col in df.columns:
            del df[col] 

    fields = df.columns.tolist()
    values = df[fields].values.tolist()

    payload_scoring = {"input_data": [{"fields": fields, "values": values[:no_of_records_to_score],"meta": meta_payload}]}  
    return payload_scoring

## Method to perform scoring

In [31]:
def sample_scoring(no_of_records_to_score = 1):
    records_list=[]
    payload_scoring = get_scoring_payload(no_of_records_to_score)
    scoring_response = wml_client.deployments.score(deployment_uid, payload_scoring)
    print('Single record scoring result:', '\n fields:', scoring_response['predictions'][0]['fields'], '\n values: ', scoring_response['predictions'][0]['values'][0])
    print(json.dumps(scoring_response, indent=None))
    return payload_scoring, scoring_response

In [32]:
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
def payload_logging(no_of_records_to_score = 1):
    records_list=[]
    payload_scoring = get_scoring_payload(no_of_records_to_score)
    
    
    scoring_response = wml_client.deployments.score(deployment_uid, payload_scoring)
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))
    if pl_records_count == 0:
        print("Payload logging did not happen, performing explicit payload logging.")
    
        #manual PL logging if automated logging does not work
        score_input=payload_scoring['input_data'][0]
        score_response=scoring_response['predictions'][0]
        pl_record = PayloadRecord(request=score_input, response=score_response, response_time=int(460))
        records_list.append(pl_record)
        wos_client.data_sets.store_records(data_set_id = payload_data_set_id, request_body=records_list)
        
        
        time.sleep(5)
        pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
        print("Number of records in the payload logging table: {}".format(pl_records_count))

## Score the model and print the scoring response

In [33]:
sample_scoring(no_of_records_to_score = 1)

Single record scoring result: 
 fields: ['workclass', 'fnlwgt', 'education', 'education-num', 'Marital', 'occupation', 'relationship', 'capitalgain', 'loss', 'hoursper', 'citizen_status', 'workclass_IX', 'workclassclassVec', 'education_IX', 'educationclassVec', 'Marital_IX', 'MaritalclassVec', 'occupation_IX', 'occupationclassVec', 'relationship_IX', 'relationshipclassVec', 'citizen_status_IX', 'citizen_statusclassVec', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'] 
 values:  ['State-gov', 77516, 'Bachelors', 13, 'Never-married', 'Adm-clerical', 'Not-in-family', 2174, 0, 40, 'United-States', 4.0, [9, [4], [1.0]], 2.0, [16, [2], [1.0]], 1.0, [7, [1], [1.0]], 3.0, [15, [3], [1.0]], 1.0, [6, [1], [1.0]], 0.0, [41, [0], [1.0]], [99, [4, 11, 26, 35, 48, 53, 94, 95, 96, 98], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 77516.0, 13.0, 2174.0, 40.0]], [17.62390057081348, 2.376099429186522], [0.881195028540674, 0.11880497145932609], 0.0, '<=50K']
{"predictions": [{"fields": ["wo

({'input_data': [{'fields': ['workclass',
     'fnlwgt',
     'education',
     'education-num',
     'Marital',
     'occupation',
     'relationship',
     'capitalgain',
     'loss',
     'hoursper',
     'citizen_status'],
    'values': [['State-gov',
      77516,
      'Bachelors',
      13,
      'Never-married',
      'Adm-clerical',
      'Not-in-family',
      2174,
      0,
      40,
      'United-States']],
    'meta': {'fields': ['race', 'age', 'sex'],
     'values': [['White', 39, 'Male']]}}]},
 {'predictions': [{'fields': ['workclass',
     'fnlwgt',
     'education',
     'education-num',
     'Marital',
     'occupation',
     'relationship',
     'capitalgain',
     'loss',
     'hoursper',
     'citizen_status',
     'workclass_IX',
     'workclassclassVec',
     'education_IX',
     'educationclassVec',
     'Marital_IX',
     'MaritalclassVec',
     'occupation_IX',
     'occupationclassVec',
     'relationship_IX',
     'relationshipclassVec',
     'citizen_status_

# Configure OpenScale 

The notebook will now import the necessary libraries and set up a Python OpenScale client.

In [34]:
from ibm_watson_openscale import APIClient
from ibm_watson_openscale.utils import *
from ibm_watson_openscale.supporting_classes import *
from ibm_watson_openscale.supporting_classes.enums import *

import json
import requests
import base64
from requests.auth import HTTPBasicAuth
import time

## Get a instance of the OpenScale SDK client

In [74]:
authenticator = CloudPakForDataAuthenticator(
        url=WOS_CREDENTIALS['url'],
        username=WOS_CREDENTIALS['username'],
        password=WOS_CREDENTIALS['password'],
        disable_ssl_verification=True
    )

wos_client = APIClient(service_url=WOS_CREDENTIALS['url'],authenticator=authenticator)
wos_client.version

'3.0.3'

## Create datamart

### Set up datamart

Watson OpenScale uses a database to store payload logs and calculated metrics. If database credentials were not supplied above, the notebook will use the free, internal lite database. If database credentials were supplied, the datamart will be created there unless there is an existing datamart and the KEEP_MY_INTERNAL_POSTGRES variable is set to True. If an OpenScale datamart exists in Db2 or PostgreSQL, the existing datamart will be used and no data will be overwritten.

Prior instances of the model will be removed from OpenScale monitoring.

In [36]:
wos_client.data_marts.show()

0,1,2,3,4,5
AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000,Data Mart created by OpenScale ExpressPath,False,active,2020-11-11 17:10:35.029000+00:00,00000000-0000-0000-0000-000000000000


In [37]:
data_marts = wos_client.data_marts.list().result.data_marts
if len(data_marts) == 0:
    if DB_CREDENTIALS is not None:
        if SCHEMA_NAME is None: 
            print("Please specify the SCHEMA_NAME and rerun the cell")

        print('Setting up external datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook",
                database_configuration=DatabaseConfigurationRequest(
                  database_type=DatabaseType.DB2,
                    credentials=PrimaryStorageCredentialsLong(
                        hostname=DATABASE_CREDENTIALS['hostname'],
                        username=DATABASE_CREDENTIALS['username'],
                        password=DATABASE_CREDENTIALS['password'],
                        db=DATABASE_CREDENTIALS['database'],
                        port=DATABASE_CREDENTIALS['port'],
                        ssl=DATABASE_CREDENTIALS['ssl'],
                        sslmode=DATABASE_CREDENTIALS['sslmode'],
                        certificate_base64=DATABASE_CREDENTIALS['certificate_base64']
                    ),
                    location=LocationSchemaName(
                        schema_name= SCHEMA_NAME
                    )
                )
             ).result
    else:
        print('Setting up internal datamart')
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook", 
                internal_database = True).result
        
    data_mart_id = added_data_mart_result.metadata.id
    
else:
    data_mart_id=data_marts[0].metadata.id
    print('Using existing datamart {}'.format(data_mart_id))

Using existing datamart 00000000-0000-0000-0000-000000000000


In [38]:
data_mart_details = wos_client.data_marts.list().result.data_marts[0]
data_mart_details.to_dict()

{'metadata': {'id': '00000000-0000-0000-0000-000000000000',
  'crn': 'crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:data_mart:00000000-0000-0000-0000-000000000000',
  'url': '/v2/data_marts/00000000-0000-0000-0000-000000000000',
  'created_at': '2020-11-11T17:10:35.029000Z',
  'created_by': 'admin'},
 'entity': {'name': 'AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000',
  'description': 'Data Mart created by OpenScale ExpressPath',
  'service_instance_crn': 'N/A',
  'internal_database': False,
  'database_configuration': {'database_type': 'db2',
   'credentials': {'secret_id': '73d1317b-582e-443a-9a7c-2a341bd3a4e5'},
   'location': {'schema_name': 'AIOSFASTPATHICP-00000000-0000-0000-0000-000000000000'}},
  'status': {'state': 'active'}}}

In [39]:
wos_client.service_providers.show()

0,1,2,3,4,5
,active,Custom Batch Provider - SDK - PG,custom_machine_learning,2021-01-19 08:45:57.468000+00:00,b7812f63-355a-4ce3-af51-f6fb3028d242
,active,Custom Batch - PG,custom_machine_learning,2021-01-12 07:16:31.985000+00:00,29d3faa4-e5da-42c2-b243-a365835895a8
99999999-9999-9999-9999-999999999999,active,Watson Machine Learning V2,watson_machine_learning,2020-12-13 09:41:38.879000+00:00,ec2338e5-fc15-418f-8f78-58336a446ec6
99999999-9999-9999-9999-999999999999,active,WML Prod,watson_machine_learning,2020-12-11 09:42:32.318000+00:00,6899581b-29ee-447e-a826-a446bfd88165
99999999-9999-9999-9999-999999999999,active,WML Pre-Prod,watson_machine_learning,2020-12-11 09:32:49.108000+00:00,2d663259-283e-43f1-910c-6ff02e3663cd
99999999-9999-9999-9999-999999999999,active,wml_preprod_ppm,watson_machine_learning,2020-12-09 11:34:30.151000+00:00,1384cc62-e451-4caa-a31a-de556d9689bb
99999999-9999-9999-9999-999999999999,active,WMP Prod PPM,watson_machine_learning,2020-12-08 17:22:06.497000+00:00,4d0270d1-942f-406f-a86f-e929db2cbc3d
,active,Custom Batch Provider - SDK - Drift - PP2,custom_machine_learning,2020-11-24 11:53:50.954000+00:00,5a430a28-c27f-4c0c-9f35-3a52808df349
,active,Batch - prv,custom_machine_learning,2020-11-23 18:00:46.686000+00:00,c3d8a5ee-7470-48f0-a859-09757116b477
,active,Custom Batch Provider - SDK - Drift - HS,custom_machine_learning,2020-11-20 18:04:29.020000+00:00,6ba3ea28-5e19-4513-b3be-b49bbf0f8497


Note: First 10 records were displayed.


## Remove existing service provider connected with used WML instance.

Multiple service providers for the same engine instance are avaiable in Watson OpenScale. To avoid multiple service providers of used WML instance in the tutorial notebook the following code deletes existing service provder(s) and then adds new one.

In [40]:
SERVICE_PROVIDER_NAME = "Watson Machine Learning - Indirect Bias Demo"
SERVICE_PROVIDER_DESCRIPTION = "Added by tutorial WOS notebook to showcase Indirect Bias functionality."

In [41]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == SERVICE_PROVIDER_NAME:
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

## Add service provider

Watson OpenScale needs to be bound to the Watson Machine Learning instance to capture payload data into and out of the model.
Note: You can bind more than one engine instance if needed by calling wos_client.service_providers.add method. Next, you can refer to particular service provider using service_provider_id.

In [42]:
added_service_provider_result = wos_client.service_providers.add(
        name=SERVICE_PROVIDER_NAME,
        description=SERVICE_PROVIDER_DESCRIPTION,
        service_type=ServiceTypes.WATSON_MACHINE_LEARNING,
        deployment_space_id = WML_SPACE_ID,
        operational_space_id = "production",
        credentials=WMLCredentialsCP4D(
            url=WML_CREDENTIALS["url"],
            username=WML_CREDENTIALS["username"],
            password=WML_CREDENTIALS["password"],
            instance_id=None
        ),
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id




 Waiting for end of adding service provider e5fcf249-96a2-43f6-9f88-31308f59e7b6 




active

-----------------------------------------------
 Successfully finished adding service provider 
-----------------------------------------------




In [43]:
print(wos_client.service_providers.get(service_provider_id).result)

{
  "metadata": {
    "id": "e5fcf249-96a2-43f6-9f88-31308f59e7b6",
    "crn": "crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:service_provider:e5fcf249-96a2-43f6-9f88-31308f59e7b6",
    "url": "/v2/service_providers/e5fcf249-96a2-43f6-9f88-31308f59e7b6",
    "created_at": "2021-01-23T06:55:08.824000Z",
    "created_by": "admin"
  },
  "entity": {
    "name": "Watson Machine Learning - Indirect Bias Demo",
    "service_type": "watson_machine_learning",
    "instance_id": "99999999-9999-9999-9999-999999999999",
    "credentials": {
      "secret_id": "5d511bc2-7fa5-4a35-94a5-fd03808c10dc"
    },
    "operational_space_id": "production",
    "deployment_space_id": "fe8fd396-0fa3-4c4d-ad47-703ef3728b60",
    "status": {
      "state": "active"
    }
  }
}


In [44]:
asset_deployment_details = wos_client.service_providers.list_assets(data_mart_id=data_mart_id, service_provider_id=service_provider_id, deployment_id=deployment_uid, deployment_space_id = WML_SPACE_ID).result['resources'][0]
asset_deployment_details

{'metadata': {'guid': '207d8487-0df2-491f-94a2-fd398e985c23',
  'created_at': '2021-01-23T06:54:18.560Z',
  'modified_at': '2021-01-23T06:54:18.560Z'},
 'entity': {'name': 'Adult Census Income Classifier Deployment',
  'type': 'online',
  'scoring_endpoint': {'url': 'https://ibm-nginx-svc.namespace1.svc.cluster.local/ml/v4/deployments/207d8487-0df2-491f-94a2-fd398e985c23/predictions'},
  'asset': {},
  'asset_properties': {}}}

In [45]:
model_asset_details_from_deployment=wos_client.service_providers.get_deployment_asset(data_mart_id=data_mart_id,service_provider_id=service_provider_id,deployment_id=deployment_uid,deployment_space_id=WML_SPACE_ID)
#model_asset_details_from_deployment

## Subscriptions

Remove existing credit risk subscriptions

This code removes previous subscriptions to the model to refresh the monitors with the new model and new data.

In [46]:
wos_client.subscriptions.show()

0,1,2,3,4,5,6,7,8
fdcb69fb-6032-4fdc-880f-6f024e00433d,GoSales - multiclass - SDK - PG,00000000-0000-0000-0000-000000000000,20dc504a-c6ad-4646-98fb-51045097686e,GoSales - multiclass - SDK - PG,b7812f63-355a-4ce3-af51-f6fb3028d242,active,2021-01-19 08:48:47.407000+00:00,9ea31c07-0161-4087-9502-af597229355d
8b94a1ec-ded3-4487-97f3-0f941cc576dc,[asset] GoSales multiclass - PG,00000000-0000-0000-0000-000000000000,40ecf801-9996-430c-89f2-5c97eda52241,GoSales multiclass - PG,29d3faa4-e5da-42c2-b243-a365835895a8,active,2021-01-12 07:17:07.877000+00:00,63ff1fde-a1f3-4404-8259-971c10f03fe3
0c7e99a4-107f-471c-8dc0-78de5a9373d7,Spark German Risk Model - Final,00000000-0000-0000-0000-000000000000,c2758570-f946-409a-9328-03f93da30a09,Spark German Risk Deployment - Final,ec2338e5-fc15-418f-8f78-58336a446ec6,active,2020-12-13 09:43:44.868000+00:00,f76aa235-f5a9-48a2-8dab-83e34d862a7a
5ff591bb-ae38-40e4-80ef-7268967b3843,German Credit Risk Model - Prod,00000000-0000-0000-0000-000000000000,2d32757b-7b42-436d-9a73-7a3a92d6da59,German Credit Risk Model - Prod,6899581b-29ee-447e-a826-a446bfd88165,active,2020-12-11 11:42:49.352000+00:00,85fad40d-8827-4411-be28-638f6eacc417
d0686174-9d72-4720-acfe-2ab3e6a945e8,German Credit Risk Model - Challenger,00000000-0000-0000-0000-000000000000,946aada8-e49a-4cbd-ba26-37e4d2c2842e,German Credit Risk Model - Challenger,2d663259-283e-43f1-910c-6ff02e3663cd,preparing,2020-12-11 09:33:12.381000+00:00,f639a386-e78c-4ba1-9340-5c0de992d11c
018101d2-d88f-49bb-9c97-399781be13a9,German Credit Risk Model - PreProd,00000000-0000-0000-0000-000000000000,134cd694-84bb-4964-9992-800f5e5a76bd,German Credit Risk Model - PreProd,2d663259-283e-43f1-910c-6ff02e3663cd,active,2020-12-11 09:33:07.481000+00:00,4352df0c-50d6-483e-8b51-d0a490f82cc4
21fa5841-f832-4c46-b9f5-7b26e3d79b65,AdultCensusAutoAI-Model,00000000-0000-0000-0000-000000000000,e593b337-30f2-4b13-8d60-46b61b56bfb3,AdultCensusAutoAI-Deployment,1384cc62-e451-4caa-a31a-de556d9689bb,active,2020-12-09 11:35:11.374000+00:00,5cd97e05-a27c-41f3-a381-a39e8d40dd77
7c475da6-5659-4753-bebc-d66b27119018,Bank Marketing - SDK - Drift - PP,00000000-0000-0000-0000-000000000000,e17b2fa6-9b6c-4ad3-b7b4-8fbd7ccea242,Bank Marketing - SDK - Drift - PP,5a430a28-c27f-4c0c-9f35-3a52808df349,active,2020-11-24 11:54:13.706000+00:00,cec81a2d-b196-4f7c-9e2e-3fc41dc4523d
fb4d2605-d07c-40ff-a5cf-c4821a5303bf,[asset] Housing pred - prv,00000000-0000-0000-0000-000000000000,97862626-7c1e-4269-a8ec-2a85fdf4b19d,Housing pred - prv,c3d8a5ee-7470-48f0-a859-09757116b477,active,2020-11-23 18:03:02.448000+00:00,ed4408bd-c0b1-4628-a7fe-bbdb6b8fe28f
9e62727a-5e7d-4bae-a157-339afddf21ff,MushroomsBatch - SDK - Drift - HS,00000000-0000-0000-0000-000000000000,63cf05df-a0d5-4154-a8cf-df998f42c987,MushroomsBatch - SDK - Drift - HS,6ba3ea28-5e19-4513-b3be-b49bbf0f8497,active,2020-11-20 18:20:55.300000+00:00,2ea105f5-3600-41b1-99e1-7bb12714cfda


Note: First 10 records were displayed.


## Remove the existing subscription

In [47]:
subscriptions = wos_client.subscriptions.list().result.subscriptions
for subscription in subscriptions:
    sub_model_id = subscription.entity.asset.asset_id
    if sub_model_id == model_uid:
        wos_client.subscriptions.delete(subscription.metadata.id)
        print('Deleted existing subscription for model', sub_model_id)

This code creates the model subscription in OpenScale using the Python client API. Note that we need to provide the model unique identifier, and some information about the model itself.

In [48]:
feature_columns = cat_features + num_features
feature_columns

['workclass',
 'education',
 'Marital',
 'occupation',
 'relationship',
 'citizen_status',
 'fnlwgt',
 'education-num',
 'capitalgain',
 'loss',
 'hoursper']

In [49]:
subscription_details = wos_client.subscriptions.add(
        data_mart_id=data_mart_id,
        service_provider_id=service_provider_id,
        asset=Asset(
            asset_id=model_asset_details_from_deployment["entity"]["asset"]["asset_id"],
            name=model_asset_details_from_deployment["entity"]["asset"]["name"],
            url=model_asset_details_from_deployment["entity"]["asset"]["url"],
            asset_type=AssetTypes.MODEL,
            input_data_type=InputDataType.STRUCTURED,
            problem_type=ProblemType.BINARY_CLASSIFICATION
        ),
        deployment=AssetDeploymentRequest(
            deployment_id=asset_deployment_details['metadata']['guid'],
            name=asset_deployment_details['entity']['name'],
            deployment_type= DeploymentTypes.ONLINE,
            url=asset_deployment_details['entity']['scoring_endpoint']['url']
        ),
        asset_properties=AssetPropertiesRequest(
            label_column="label",
            probability_fields=["probability"],
            prediction_field="predictedLabel",
            feature_fields = feature_columns,
            categorical_fields = cat_features,
            training_data_reference=TrainingDataReference(type="cos",
                                                          location=COSTrainingDataReferenceLocation(bucket = BUCKET_NAME,
                                                                                                    file_name = FILE_NAME),
                                                          connection=COSTrainingDataReferenceConnection.from_dict({
                                                                        "resource_instance_id": COS_RESOURCE_CRN,
                                                                        "url": COS_ENDPOINT,
                                                                        "api_key": COS_API_KEY_ID,
                                                                        "iam_url": IAM_URL})),
            training_data_schema=SparkStruct.from_dict(model_asset_details_from_deployment["entity"]["asset_properties"]["training_data_schema"])
        )
    ).result
subscription_id = subscription_details.metadata.id
print('subscription_id: ' + subscription_id)

subscription_id: 6fce507f-9513-4309-a98f-e8049c8e8d57


In [50]:
import time

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id:", payload_data_set_id)

Payload data set id: 6b0ff0b8-e1e7-444b-8cad-bf3aa351c78b


In [51]:
wos_client.data_sets.show()

0,1,2,3,4,5,6
00000000-0000-0000-0000-000000000000,active,6fce507f-9513-4309-a98f-e8049c8e8d57,subscription,manual_labeling,2021-01-23 06:55:55.737000+00:00,7df49537-bf69-4466-b8a3-ade93eae4743
00000000-0000-0000-0000-000000000000,active,6fce507f-9513-4309-a98f-e8049c8e8d57,subscription,payload_logging,2021-01-23 06:55:55.374000+00:00,6b0ff0b8-e1e7-444b-8cad-bf3aa351c78b
00000000-0000-0000-0000-000000000000,active,9ea31c07-0161-4087-9502-af597229355d,subscription,training,2021-01-19 08:48:51.869000+00:00,161b6ca0-6b9a-4014-b187-ccd13d72440e
00000000-0000-0000-0000-000000000000,active,9ea31c07-0161-4087-9502-af597229355d,subscription,manual_labeling,2021-01-19 08:48:51.668000+00:00,721cac26-fc6e-4c91-8c6a-fee93cfc57fa
00000000-0000-0000-0000-000000000000,active,9ea31c07-0161-4087-9502-af597229355d,subscription,payload_logging,2021-01-19 08:48:51.438000+00:00,818fec36-8a7a-4a7c-85ce-21ace927ef0d
00000000-0000-0000-0000-000000000000,active,63ff1fde-a1f3-4404-8259-971c10f03fe3,subscription,manual_labeling,2021-01-12 07:17:11.715000+00:00,6907e3a8-9d28-4f32-8b01-2ee884c9c117
00000000-0000-0000-0000-000000000000,active,63ff1fde-a1f3-4404-8259-971c10f03fe3,subscription,payload_logging,2021-01-12 07:17:11.435000+00:00,b1ee96ac-171a-44b2-8cc6-77d808cec042
00000000-0000-0000-0000-000000000000,active,63ff1fde-a1f3-4404-8259-971c10f03fe3,subscription,training,2021-01-12 07:17:11.871000+00:00,932d6f62-db41-48e8-ac6a-f905e448c977
00000000-0000-0000-0000-000000000000,active,f76aa235-f5a9-48a2-8dab-83e34d862a7a,subscription,manual_labeling,2020-12-13 09:43:53.518000+00:00,645d871d-1f46-4f4d-9d97-00f0ec3b378d
00000000-0000-0000-0000-000000000000,active,f76aa235-f5a9-48a2-8dab-83e34d862a7a,subscription,payload_logging,2020-12-13 09:43:53.052000+00:00,f4beb6d0-d5cf-44e2-85d3-1d5059330d76


Note: First 10 records were displayed.


In [52]:
wos_client.subscriptions.get(subscription_id).result.to_dict()

{'metadata': {'id': '6fce507f-9513-4309-a98f-e8049c8e8d57',
  'crn': 'crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:subscription:6fce507f-9513-4309-a98f-e8049c8e8d57',
  'url': '/v2/subscriptions/6fce507f-9513-4309-a98f-e8049c8e8d57',
  'created_at': '2021-01-23T06:55:51.077000Z',
  'created_by': 'admin'},
 'entity': {'data_mart_id': '00000000-0000-0000-0000-000000000000',
  'service_provider_id': 'e5fcf249-96a2-43f6-9f88-31308f59e7b6',
  'asset': {'asset_id': '4a04e27a-f19e-498a-bd51-65751e61debb',
   'url': 'https://ibm-nginx-svc.namespace1.svc.cluster.local/ml/v4/models/4a04e27a-f19e-498a-bd51-65751e61debb?space_id=fe8fd396-0fa3-4c4d-ad47-703ef3728b60&version=2020-06-12',
   'name': 'Adult Census Income Classifier Model',
   'asset_type': 'model',
   'problem_type': 'binary',
   'input_data_type': 'structured'},
  'asset_properties': {'training_data_reference': {'secret_id': 'd2008389-eed8-4dbe-89c5-69f6d057c0f0'},
   'training_data_schema': {'

# Score the model so we can configure monitors

Now that the WML service has been bound and the subscription has been created, we need to send a request to the model before we configure OpenScale. This allows OpenScale to create a payload log in the datamart with the correct schema, so it can capture data coming into and out of the model.

In [54]:
payload_logging(no_of_records_to_score = 1000)

Number of records in the payload logging table: 2000


In [55]:
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    raise Exception("Payload logging did not happen!")

Number of records in the payload logging table: 2000


## Fairness configuration

The code below configures fairness monitoring for our model. It turns on monitoring for two features, sex and age. In each case, we must specify:
    
Which model feature to monitor One or more majority groups, which are values of that feature that we expect to receive a higher percentage of favorable outcomes One or more minority groups, which are values of that feature that we expect to receive a higher percentage of unfavorable outcomes The threshold at which we would like OpenScale to display an alert if the fairness measurement falls below (in this case, 80%) Additionally, we must specify which outcomes from the model are favourable outcomes, and which are unfavourable. We must also provide the number of records OpenScale will use to calculate the fairness score. In this case, OpenScale's fairness monitor will run hourly, but will not calculate a new fairness rating until at least 100 records have been added. Finally, to calculate fairness, OpenScale must perform some calculations on the training data, so we provide the dataframe containing the data.

### Create Fairness Monitor Instance

In [56]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)
parameters = {
    "features": [
        {
            "feature": "sex",
            "majority": ["Male"],
            "minority": ["Female"]
        },
        {
            "feature": "age",
            "majority": [[41,75]],
            "minority": [[18,33]]
        }
    ],
    "favourable_class": [">50K"],
    "unfavourable_class": ["<=50K"],
    "min_records": 1000
}
thresholds = [
    {
        "metric_id": "fairness_value",
        "specific_values": [
            {
                "applies_to": [
                    {
                        "type": "tag",
                        "value": "sex",
                        "key": "feature"
                    }
                ],
                "value": 80
            },
            {
                "applies_to": [
                    {
                        "type": "tag",
                        "value": "age",
                        "key": "feature"
                    }
                ],
                "value": 80
            }
        ],
        "type": "lower_limit",
        "value": 80
    }
]
fairness_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.FAIRNESS.ID,
    target=target,
    parameters=parameters,
    thresholds=thresholds
).result
fairness_monitor_instance_id =fairness_monitor_details.metadata.id




 Waiting for end of monitor instance creation e6d59bdc-4624-4985-bcf3-3f8078f7c11b 




active

---------------------------------------
 Monitor instance successfully created 
---------------------------------------




### Get Fairness Monitor Instance

In [57]:
wos_client.monitor_instances.show()

0,1,2,3,4,5,6
00000000-0000-0000-0000-000000000000,active,6fce507f-9513-4309-a98f-e8049c8e8d57,subscription,fairness,2021-01-23 06:56:55.383000+00:00,e6d59bdc-4624-4985-bcf3-3f8078f7c11b
00000000-0000-0000-0000-000000000000,active,f76aa235-f5a9-48a2-8dab-83e34d862a7a,subscription,drift,2020-12-13 09:46:07.540000+00:00,dfb4a22d-d751-4f42-a1c8-3d3026864841
00000000-0000-0000-0000-000000000000,active,f76aa235-f5a9-48a2-8dab-83e34d862a7a,subscription,fairness,2020-12-13 09:45:52.485000+00:00,ab69f86b-363f-4e47-9bdd-604baed41ca7
00000000-0000-0000-0000-000000000000,active,85fad40d-8827-4411-be28-638f6eacc417,subscription,fairness,2020-12-11 11:44:02.827000+00:00,0f32f755-7317-49b8-867f-be86d8bdc0a2
00000000-0000-0000-0000-000000000000,active,e7d2fc9b-fcc3-4166-8290-aed1e063624e,subscription,drift,2020-11-12 05:27:39.768000+00:00,9a58b197-8dfc-4f2e-9d15-6cfb4162fca9
00000000-0000-0000-0000-000000000000,active,2d8888b2-c9ff-48b9-bf5f-fe1f3d59f804,subscription,fairness,2020-11-11 17:20:19.510000+00:00,4293d892-c874-4b73-bf2b-52ef77bf41fd
00000000-0000-0000-0000-000000000000,active,cec81a2d-b196-4f7c-9e2e-3fc41dc4523d,subscription,drift,2020-11-24 11:55:38.824000+00:00,4568ae09-80ec-4eb0-a9d6-a4a898a0e12d
00000000-0000-0000-0000-000000000000,active,85fad40d-8827-4411-be28-638f6eacc417,subscription,drift,2020-12-11 11:44:01.346000+00:00,16ca2ce5-818c-432d-9d2b-077c3fa88e01
00000000-0000-0000-0000-000000000000,active,2d8888b2-c9ff-48b9-bf5f-fe1f3d59f804,subscription,drift,2020-11-11 17:20:47.414000+00:00,18b628c9-f549-43fb-a1ce-5161e369aeac
00000000-0000-0000-0000-000000000000,active,8bfd7c1c-acfb-44fe-a53f-fb58dc21c9de,subscription,drift,2020-11-12 13:10:02.627000+00:00,f188c3e3-7515-4f24-9326-acf31178df32


Note: First 10 records were displayed.


### Get run details
In case of production subscription, initial monitoring run is triggered internally. Checking its status

In [58]:
runs = wos_client.monitor_instances.list_runs(fairness_monitor_instance_id, limit=1).result.to_dict()
fairness_monitoring_run_id = runs["runs"][0]["metadata"]["id"]
run_status = None
while(run_status not in ["finished", "error"]):
    run_details = wos_client.monitor_instances.get_run_details(fairness_monitor_instance_id, fairness_monitoring_run_id).result.to_dict()
    run_status = run_details["entity"]["status"]["state"]
    print('run_status: ', run_status)
    if run_status in ["finished", "error"]:
        break
    time.sleep(10)

run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  running
run_status:  finished


### Fairness run output

In [59]:
wos_client.monitor_instances.get_run_details(fairness_monitor_instance_id, fairness_monitoring_run_id).result.to_dict()

{'metadata': {'id': '7ec510bd-2751-4a42-92cd-c1d3cfb20285',
  'crn': 'crn:v1:bluemix:public:aiopenscale:us-south:a/na:00000000-0000-0000-0000-000000000000:run:7ec510bd-2751-4a42-92cd-c1d3cfb20285',
  'url': '/v2/monitor_instances/e6d59bdc-4624-4985-bcf3-3f8078f7c11b/runs/7ec510bd-2751-4a42-92cd-c1d3cfb20285',
  'created_at': '2021-01-23T06:56:59.826000Z',
  'created_by': 'internal-service'},
 'entity': {'triggered_by': 'user',
  'parameters': {'is_group_bias_completed': True},
  'status': {'state': 'finished',
   'queued_at': '2021-01-23T06:56:59.771000Z',
   'started_at': '2021-01-23T06:57:03.034000Z',
   'updated_at': '2021-01-23T06:59:28.435000Z',
   'completed_at': '2021-01-23T06:59:28.174000Z',
   'message': 'bias run is successful.',
   'operators': []}}}

In [60]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=fairness_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2021-01-23 06:57:13.579645+00:00,fairness_value,1b3b7cb1-0b3a-4ce7-8da4-d7cc095fd1d8,27.4,80.0,,"['feature:sex', 'fairness_metric_type:debiased_fairness', 'feature_value:Female']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,7ec510bd-2751-4a42-92cd-c1d3cfb20285,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57
2021-01-23 06:57:13.579645+00:00,fairness_value,1b3b7cb1-0b3a-4ce7-8da4-d7cc095fd1d8,132.4,80.0,,"['feature:age', 'fairness_metric_type:debiased_fairness', 'feature_value:18-33']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,7ec510bd-2751-4a42-92cd-c1d3cfb20285,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57
2021-01-23 06:57:13.579645+00:00,fairness_value,82c2fd69-d80c-42c9-af86-1a706881a699,21.9,80.0,,"['feature:sex', 'fairness_metric_type:fairness', 'feature_value:Female']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,7ec510bd-2751-4a42-92cd-c1d3cfb20285,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57
2021-01-23 06:57:13.579645+00:00,fairness_value,82c2fd69-d80c-42c9-af86-1a706881a699,51.3,80.0,,"['feature:age', 'fairness_metric_type:fairness', 'feature_value:18-33']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,7ec510bd-2751-4a42-92cd-c1d3cfb20285,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57


In [61]:
FAIRNESS_DASHBOARD_URL = WOS_CREDENTIALS["url"] + "/aiopenscale/insights/{0}/fairness/age?features=fairnessv2,indirect_bias,v2transaction".format(deployment_uid)

In [62]:
from IPython.display import Markdown as md
md("#### Link to IBM Watson OpenScale Fairness Dashboard: {}".format(FAIRNESS_DASHBOARD_URL))

#### Link to IBM Watson OpenScale Fairness Dashboard: https://namespace1-cpd-namespace1.apps.islnov15.os.fyre.ibm.com/aiopenscale/insights/207d8487-0df2-491f-94a2-fd398e985c23/fairness/age?features=fairnessv2,indirect_bias,v2transaction

### Run on-demand Fairness
If you would like to peform an on-demand fairness check, then we need to score a fresh set of data with meta-fields, so that they would be used for indirect bias checking. So the below two cells will score and make sure these records are reached to payload logging table.

In [63]:
payload_logging(no_of_records_to_score = 1000)

Number of records in the payload logging table: 3000


In [64]:
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count == 0:
    raise Exception("Payload logging did not happen!")

Number of records in the payload logging table: 3000


### Trigger fairness monitoring run

In [65]:
run_details = wos_client.monitor_instances.run(monitor_instance_id=fairness_monitor_instance_id, background_mode=False)




 Waiting for end of monitoring run 3a046258-085b-4caa-b222-1147d4417ace 




running..................
finished

---------------------------
 Successfully finished run 
---------------------------




### Check for its status

In [66]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=fairness_monitor_instance_id)

0,1,2,3,4,5,6,7,8,9,10,11
2021-01-23 07:01:18.415500+00:00,fairness_value,f8f27e84-7a7b-4751-9034-30eb96340f01,21.9,80.0,,"['feature:sex', 'fairness_metric_type:fairness', 'feature_value:Female']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,3a046258-085b-4caa-b222-1147d4417ace,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57
2021-01-23 07:01:18.415500+00:00,fairness_value,f8f27e84-7a7b-4751-9034-30eb96340f01,45.7,80.0,,"['feature:age', 'fairness_metric_type:fairness', 'feature_value:18-33']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,3a046258-085b-4caa-b222-1147d4417ace,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57
2021-01-23 07:01:18.415500+00:00,fairness_value,6f346d72-310b-4aa9-b764-40233a57632c,36.7,80.0,,"['feature:sex', 'fairness_metric_type:debiased_fairness', 'feature_value:Female']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,3a046258-085b-4caa-b222-1147d4417ace,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57
2021-01-23 07:01:18.415500+00:00,fairness_value,6f346d72-310b-4aa9-b764-40233a57632c,129.29999999999998,80.0,,"['feature:age', 'fairness_metric_type:debiased_fairness', 'feature_value:18-33']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,3a046258-085b-4caa-b222-1147d4417ace,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57
2021-01-23 06:57:13.579645+00:00,fairness_value,1b3b7cb1-0b3a-4ce7-8da4-d7cc095fd1d8,27.4,80.0,,"['feature:sex', 'fairness_metric_type:debiased_fairness', 'feature_value:Female']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,7ec510bd-2751-4a42-92cd-c1d3cfb20285,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57
2021-01-23 06:57:13.579645+00:00,fairness_value,1b3b7cb1-0b3a-4ce7-8da4-d7cc095fd1d8,132.4,80.0,,"['feature:age', 'fairness_metric_type:debiased_fairness', 'feature_value:18-33']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,7ec510bd-2751-4a42-92cd-c1d3cfb20285,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57
2021-01-23 06:57:13.579645+00:00,fairness_value,82c2fd69-d80c-42c9-af86-1a706881a699,21.9,80.0,,"['feature:sex', 'fairness_metric_type:fairness', 'feature_value:Female']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,7ec510bd-2751-4a42-92cd-c1d3cfb20285,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57
2021-01-23 06:57:13.579645+00:00,fairness_value,82c2fd69-d80c-42c9-af86-1a706881a699,51.3,80.0,,"['feature:age', 'fairness_metric_type:fairness', 'feature_value:18-33']",fairness,e6d59bdc-4624-4985-bcf3-3f8078f7c11b,7ec510bd-2751-4a42-92cd-c1d3cfb20285,subscription,6fce507f-9513-4309-a98f-e8049c8e8d57


In [67]:
from IPython.display import Markdown as md
md("#### To view the latest evaluation of the fairness check, please visit IBM Watson OpenScale Fairness Dashboard: {}".format(FAIRNESS_DASHBOARD_URL))

#### To view the latest evaluation of the fairness check, please visit IBM Watson OpenScale Fairness Dashboard: https://namespace1-cpd-namespace1.apps.islnov15.os.fyre.ibm.com/aiopenscale/insights/207d8487-0df2-491f-94a2-fd398e985c23/fairness/age?features=fairnessv2,indirect_bias,v2transaction

# Active debiasing

In [68]:
no_of_records_to_score = 200
payload_scoring, scoring_response = sample_scoring(no_of_records_to_score)

Single record scoring result: 
 fields: ['workclass', 'fnlwgt', 'education', 'education-num', 'Marital', 'occupation', 'relationship', 'capitalgain', 'loss', 'hoursper', 'citizen_status', 'workclass_IX', 'workclassclassVec', 'education_IX', 'educationclassVec', 'Marital_IX', 'MaritalclassVec', 'occupation_IX', 'occupationclassVec', 'relationship_IX', 'relationshipclassVec', 'citizen_status_IX', 'citizen_statusclassVec', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'] 
 values:  ['State-gov', 77516, 'Bachelors', 13, 'Never-married', 'Adm-clerical', 'Not-in-family', 2174, 0, 40, 'United-States', 4.0, [9, [4], [1.0]], 2.0, [16, [2], [1.0]], 1.0, [7, [1], [1.0]], 3.0, [15, [3], [1.0]], 1.0, [6, [1], [1.0]], 0.0, [41, [0], [1.0]], [99, [4, 11, 26, 35, 48, 53, 94, 95, 96, 98], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 77516.0, 13.0, 2174.0, 40.0]], [17.62390057081348, 2.376099429186522], [0.881195028540674, 0.11880497145932609], 0.0, '<=50K']
{"predictions": [{"fields": ["wo

### List the original model predictions

In [69]:
# for i in range(no_of_records_to_score):
#     print(scoring_response['predictions'][0]['values'][i][-1:][0])

## Get the token for calling OpenScale API

In [70]:
import json
import requests
import base64
from requests.auth import HTTPBasicAuth
import time

token_url = WOS_CREDENTIALS['url'] + '/v1/preauth/validateAuth'
headers = {}
headers["Accept"] = "application/json"
auth = HTTPBasicAuth(WOS_CREDENTIALS['username'], WOS_CREDENTIALS['password'])
response = requests.get(token_url, headers=headers, auth=auth, verify=False)
json_data = response.json()
access_token = json_data['accessToken']
access_token

'eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6ImFkbWluIiwicm9sZSI6IkFkbWluIiwicGVybWlzc2lvbnMiOlsiYWRtaW5pc3RyYXRvciIsImNhbl9wcm92aXNpb24iLCJtYW5hZ2VfY2F0YWxvZyIsImFjY2Vzc19jYXRhbG9nIl0sImdyb3VwcyI6WzEwMDAwXSwic3ViIjoiYWRtaW4iLCJpc3MiOiJLTk9YU1NPIiwiYXVkIjoiRFNYIiwidWlkIjoiMTAwMDMzMDk5OSIsImF1dGhlbnRpY2F0b3IiOiJkZWZhdWx0IiwiaWF0IjoxNjExMzg1NDQ2LCJleHAiOjE2MTE0Mjg2MTB9.gzPW8_5rmUoXxS4EkLrM41GbfVEEUdcnHgq6Z2D4iFJPTXDI4T4wAL6Buwn0Bws9WAATLR40qLSBuaxudC9XFwg250xor-blV41wFP1TyztWH1_5NW0M9E00M3HGSQYiS1dclAKAJLQKzaR5c8z974SAvrEe2gNP8Y1HkZCap2QSNdMkF1y_tCOYBmr8TO8OzsUzfAenFA19qys3plnKNSzeexj7N54o7ZSRuBOJ38rRzMonkNzTPGdfzG3mSePcVTfEkuVtNsa-aFdxUmTviVHvmazt_-vigewLAxH2DsKqdL5C8xdDnkiKAis9D0dYI5B8yNIaS3JqcFIb1j0hgQ'

In [71]:
DEBIASING_PREDICTIONS_URL = WOS_CREDENTIALS['url'] + "/openscale/{0}/v2/subscriptions/{1}/predictions".format(data_mart_id,subscription_id)
print(DEBIASING_PREDICTIONS_URL)

headers = {}
headers["Content-Type"] = "application/json"
headers["Accept"] = "application/json"
headers["Authorization"] = "Bearer {}".format(access_token)

debiased_scoring_payload = payload_scoring['input_data'][0]
print('\n>>>>>>>>>>>>>>>\n')
print(debiased_scoring_payload)
print('\n>>>>>>>>>>>>>>>\n')

response = requests.post(DEBIASING_PREDICTIONS_URL, data=json.dumps(debiased_scoring_payload), headers=headers, verify=False)

https://namespace1-cpd-namespace1.apps.islnov15.os.fyre.ibm.com/openscale/00000000-0000-0000-0000-000000000000/v2/subscriptions/6fce507f-9513-4309-a98f-e8049c8e8d57/predictions

>>>>>>>>>>>>>>>

{'fields': ['workclass', 'fnlwgt', 'education', 'education-num', 'Marital', 'occupation', 'relationship', 'capitalgain', 'loss', 'hoursper', 'citizen_status'], 'values': [['State-gov', 77516, 'Bachelors', 13, 'Never-married', 'Adm-clerical', 'Not-in-family', 2174, 0, 40, 'United-States'], ['Self-emp-not-inc', 83311, 'Bachelors', 13, 'Married-civ-spouse', 'Exec-managerial', 'Husband', 0, 0, 13, 'United-States'], ['Private', 215646, 'HS-grad', 9, 'Divorced', 'Handlers-cleaners', 'Not-in-family', 0, 0, 40, 'United-States'], ['Private', 234721, '11th', 7, 'Married-civ-spouse', 'Handlers-cleaners', 'Husband', 0, 0, 40, 'United-States'], ['Private', 338409, 'Bachelors', 13, 'Married-civ-spouse', 'Prof-specialty', 'Wife', 0, 0, 40, 'Cuba'], ['Private', 284582, 'Masters', 14, 'Married-civ-spouse', 'Exe

## Listing those predictions whose original model prediction is different from the debiased prediction

In [72]:
predictedLabel_index = response.json()['fields'].index('predictedLabel')
debiased_prediction_index = response.json()['fields'].index('debiased_prediction')

for j in range(no_of_records_to_score):
    scored_record = response.json()['values'][j]
    predictedLabel = scored_record[predictedLabel_index]
    debiased_prediction = scored_record[debiased_prediction_index]
    if predictedLabel != debiased_prediction:
        print('==========')
        print(scored_record)
        print('predictedLabel:' + str(predictedLabel) + ', debiased_prediction:' + str(debiased_prediction))
        print('==========')

['Private', 338409, 'Bachelors', 13, 'Married-civ-spouse', 'Prof-specialty', 'Wife', 0, 0, 40, 'Cuba', '>50K', 'Female', 28, [0.4472757833095359, 0.5527242166904641], [6, [4], [1.0]], 0.0, [15, [0], [1.0]], 0.0, 'Black', [7, [0], [1.0]], 4.0, '<=50K', [16, [2], [1.0]], 2.0, [10.677317313290049, 9.32268268670995], [99, [0, 11, 25, 32, 51, 62, 94, 95, 98], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 338409.0, 13.0, 40.0]], [9, [0], [1.0]], 0.0, [0.5338658656645024, 0.4661341343354975], 9.0, 0.0, [41, [9], [1.0]], '065a2605-a628-497c-8e69-2e3a9dcbb6b6-5']
predictedLabel:<=50K, debiased_prediction:>50K
['Private', 163003, 'Bachelors', 13, 'Never-married', 'Exec-managerial', 'Other-relative', 0, 0, 40, 'Philippines', '>50K', 'Female', 33, [0.47070161065285526, 0.5292983893471447], [6, [5], [1.0]], 1.0, [15, [2], [1.0]], 0.0, 'Asian-Pac-Islander', [7, [1], [1.0]], 5.0, '<=50K', [16, [2], [1.0]], 2.0, [17.421903304413373, 2.578096695586629], [99, [0, 11, 26, 34, 52, 56, 94, 95, 98], [1.0, 1.0, 1.0, 1.0, 1

## Additional data to help debugging

In [73]:
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))
print("OpenScale Datamart id: {}".format(data_mart_id))
print("OpenScale Subscription id: {}".format(subscription.metadata.id))
print("OpenScale Fairness Monitor Instance id: {}".format(fairness_monitor_instance_id))
print("OpenScale Fairness Monitoring Run id: {}".format(fairness_monitoring_run_id))

Model id: 4a04e27a-f19e-498a-bd51-65751e61debb
Deployment id: 207d8487-0df2-491f-94a2-fd398e985c23
OpenScale Datamart id: 00000000-0000-0000-0000-000000000000
OpenScale Subscription id: 6612448f-1c66-4b11-95ea-102307875d7d
OpenScale Fairness Monitor Instance id: e6d59bdc-4624-4985-bcf3-3f8078f7c11b
OpenScale Fairness Monitoring Run id: 7ec510bd-2751-4a42-92cd-c1d3cfb20285


## Conclusion

As part of this notebook we did the following tasks

- Created and trained an Income classification model. We made sure to remove the sensitive attributes - age, sex and race while training the model.
- Identified a Space to be associated with the model and its deployment.
- Deployed the model to the space and scored it with additional meta fields.
- Configured OpenScale and subscribed the deployment.
- Configured fairness on the meta fields (sensitive attributes) age and sex.
- Ran fairness monitor
- Noticed that Indirect Bias exists against age attribute, as it can be visualised in the OpenScale dashboard.
- Did an on-demand evaluation of the fairness monitor as well.
- Call the active debias API, otherwise called as OpenScale predictions API, to notice that from the set of scored records indeed there exists some records for which debiased prediction is different from the original prediction.  
- The above step proves that OpenScale is successfully able to debiased the model prediction even on the meta/sensitive attributes.

That's all for now. Thank You!

Author: Ravi Chamarthy (ravi.chamarthy@in.ibm.com)