<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# IBM Watson OpenScale & Watson Machine Learning

This notebook should be run in a Watson Studio project, using with **Python 3.5 with Spark** runtime environment. **If you are viewing this in Watson Studio and do not see Python 3.5 with Spark in the upper right corner of your screen, please update the runtime now.** It requires service credentials for the following Cloud services:
  * IBM Watson OpenScale
  * Db2 Warehouse
  * Watson Machine Learning
  
The notebook will train, create and deploy a German Credit Risk model, configure OpenScale to monitor that deployment, and inject seven days' worth of historical records and measurements for viewing in the OpenScale Insights dashboard.

Contents
- [1. Setup](#setup)
- [2. Building and deploying the model](#model)
- [3. Configuring OpenScale](#model)
- [4. Binding machine learning engine](#binding)
- [5. Subscriptions](#subscription)
- [6. Quality monitoring and feedback logging](#payload)
- [7. Monitoring](#monitor)
- [8. Data Mart](#datamart)

## 1. Setup

In [60]:
!rm -rf $PIP_BUILD
!pip install psycopg2-binary | tail -n 1
!pip install --upgrade watson-machine-learning-client --no-cache | tail -n 1
!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1
!pip install --upgrade lime | tail -n 1

[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m
[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m
Requirement not upgraded as not directly required: jmespath<1.0.0,>=0.7.1 in /usr/local/src/conda3_runtime.v49/home/envs/DSX-Python35-Spark/lib/python3.5/site-packages (from ibm-cos-sdk-core==2.*,>=2.0.0->ibm-cos-sdk->watson-machine-learning-client) (0.9.3)
[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m
Requirement not upgraded as not directly required: jmespath<1.0.0,>=0.7.1 in /usr/local/src/conda3_runtime.v49/home/envs/DSX-Python35-Spark/lib/python3.5/site-packages (from ibm-cos-sdk-core==2.*,>=2.0.0->ibm-cos-sdk->watson-machine-learning-client->ibm-ai-openscale) (0.9.3)
[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m
Successfully installed lime-0.1.1.32


Restart the kernel to assure the new libraries are being used.

### Provision services and configure credentials

If you have not already, provision an instance of IBM Watson OpenScale using the [OpenScale link in the Cloud catalog](https://cloud.ibm.com/catalog/services/ai-openscale).

Your Cloud API key can be generated by going to the [**Users** section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the **API Keys** section, and click **Create an IBM Cloud API key**. Give your key a name and click **Create**, then copy the created key and paste it below.

In [2]:
CLOUD_API_KEY = "PASTE HERE"

In [3]:
# The code was removed by Watson Studio for sharing.

Next you will need credentials for Watson Machine Learning. If you already have a WML instance, you may use credentials for it. To provision a new Lite instance of WML, use the [Cloud catalog](https://cloud.ibm.com/catalog/services/machine-learning), give your service a name, and click **Create**. Once your instance is created, click the **Service Credentials** link on the left side of the screen. Click the **New credential** button, give your credentials a name, and click **Add**. Your new credentials can be accessed by clicking the **View credentials** button. Copy and paste your WML credentials into the cell below.

In [4]:
WML_CREDENTIALS = {
    "apikey": "key",
    "iam_apikey_description": "description",
    "iam_apikey_name": "auto-generated-apikey",
    "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
    "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::",
    "instance_id": "instance_id",
    "password": "password",
    "url": "https://us-south.ml.cloud.ibm.com",
    "username": "username"
}

In [5]:
# The code was removed by Watson Studio for sharing.

This lab uses Db2 Warehouse to store training data for the created model, and to create a datamart for AI OpenScale.

If you have previously configured AI OpenScale, it will use your existing datamart, and not interfere with any models you are currently monitoring. However, you will still need to provide Db2 Warehouse credentials to allow for storage of the training data.

To provision a new instance of Db2 Warehouse, locate [Db2 Warehouse in the Cloud catalog](https://cloud.ibm.com/catalog/services/db2-warehouse), give your service a name, and click **Create**. Once your instance is created, click the **Service Credentials** link on the left side of the screen. Click the **New credential** button, give your credentials a name, and click **Add**. Your new credentials can be accessed by clicking the **View credentials** button. Copy and paste your Db2 Warehouse credentials into the cell below.

In [6]:
DB2_CREDENTIALS = {
    "hostname": "dashdb.net",
    "password": "password",
    "https_url": "https://dashdb-entry.services.dal.bluemix.net:8443",
    "port": 50000,
    "ssldsn": "DATABASE=BLUDB;HOSTNAME=dashdb-entry.services.dal.bluemix.net;PORT=50001;PROTOCOL=TCPIP;UID=dash;PWD=password;Security=SSL;",
    "host": "dashdb-entry.services.dal.bluemix.net",
    "jdbcurl": "jdbc:db2://dashdb-entry.bluemix.net:50000/BLUDB",
    "uri": "db2://dash:password@dashdb.services.dal.bluemix.net:50000/BLUDB",
    "db": "BLUDB",
    "dsn": "DATABASE=BLUDB;HOSTNAME=dashdb-entry.services.dal.bluemix.net;PORT=50000;PROTOCOL=TCPIP;UID=dash;PWD=password;",
    "username": "dash",
    "ssljdbcurl": "jdbc:db2://dashdb-entry.services.dal.bluemix.net:50001/BLUDB:sslConnection=true;"
}

In [7]:
DB2_CREDENTIALS = {
                # name: DND-DB2-PythonClient-QA-xksj23s
                # account: 1722703 - IBM
                # org: aiostest@us.ibm.com
                "hostname": "dashdb-entry-yp-dal09-10.services.dal.bluemix.net",
                "password": "89TsmoAN_Sb_",
                "https_url": "https://dashdb-entry-yp-dal09-10.services.dal.bluemix.net:8443",
                "port": 50000,
                "ssldsn": "DATABASE=BLUDB;HOSTNAME=dashdb-entry-yp-dal09-10.services.dal.bluemix.net;PORT=50001;PROTOCOL=TCPIP;UID=dash14647;PWD=89TsmoAN_Sb_;Security=SSL;",
                "host": "dashdb-entry-yp-dal09-10.services.dal.bluemix.net",
                "jdbcurl": "jdbc:db2://dashdb-entry-yp-dal09-10.services.dal.bluemix.net:50000/BLUDB",
                "uri": "db2://dash14647:89TsmoAN_Sb_@dashdb-entry-yp-dal09-10.services.dal.bluemix.net:50000/BLUDB",
                "db": "BLUDB",
                "dsn": "DATABASE=BLUDB;HOSTNAME=dashdb-entry-yp-dal09-10.services.dal.bluemix.net;PORT=50000;PROTOCOL=TCPIP;UID=dash14647;PWD=89TsmoAN_Sb_;",
                "username": "dash14647",
                "ssljdbcurl": "jdbc:db2://dashdb-entry-yp-dal09-10.services.dal.bluemix.net:50001/BLUDB:sslConnection=true;"
              }

If you have an **already-existing** schema in your Db2 instance you would like to use for OpenScale data, specify it below. If you leave the variable set to None, OpenScale will use the default Db2 schema.

In [8]:
SCHEMA_NAME = None

__If you previously configured OpenScale to use the free internal version of PostgreSQL, you can switch to a new datamart using Db2 Warehouse.__ If you would like to delete the PostgreSQL configuration and create a new one using Db2 Warehouse, set the __KEEP_MY_INTERNAL_POSTGRES__ variable below to __False__ below. In this case, the notebook will remove your existing internal PostgreSQL datamart and create a new one with the supplied credentials. __*NO DATA MIGRATION WILL OCCUR.*__

In [9]:
KEEP_MY_INTERNAL_POSTGRES = True

## 2. Building and deploying the model

At this point, the notebook is ready to run. You can either run the cells one at a time, or click the **Kernel** option above and select **Restart and Run All** to run all the cells.

### 2.1 Load and explore data

#### Load the training data from github

In [10]:
!rm credit_risk_training.csv
!wget https://raw.githubusercontent.com/emartensibm/german-credit/master/german_credit_data_biased_training.csv

rm: cannot remove ‘credit_risk_training.csv’: No such file or directory
--2019-02-13 04:33:21--  https://raw.githubusercontent.com/emartensibm/german-credit/master/german_credit_data_biased_training.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.48.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 689622 (673K) [text/plain]
Saving to: ‘german_credit_data_biased_training.csv.1’


2019-02-13 04:33:21 (26.6 MB/s) - ‘german_credit_data_biased_training.csv.1’ saved [689622/689622]



In [11]:
from pyspark.sql import SparkSession
import json

spark = SparkSession.builder.getOrCreate()
df_data = spark.read.csv(path="german_credit_data_biased_training.csv", sep=",", header=True, inferSchema=True)
df_data.head()

Row(CheckingStatus='0_to_200', LoanDuration=31, CreditHistory='credits_paid_to_date', LoanPurpose='other', LoanAmount=1889, ExistingSavings='100_to_500', EmploymentDuration='less_1', InstallmentPercent=3, Sex='female', OthersOnLoan='none', CurrentResidenceDuration=3, OwnsProperty='savings_insurance', Age=32, InstallmentPlans='none', Housing='own', ExistingCreditsCount=1, Job='skilled', Dependents=1, Telephone='none', ForeignWorker='yes', Risk='No Risk')

In [58]:
import pandas as pd

pd_data = df_data.toPandas()

#### Explore data

In [12]:
df_data.printSchema()

root
 |-- CheckingStatus: string (nullable = true)
 |-- LoanDuration: integer (nullable = true)
 |-- CreditHistory: string (nullable = true)
 |-- LoanPurpose: string (nullable = true)
 |-- LoanAmount: integer (nullable = true)
 |-- ExistingSavings: string (nullable = true)
 |-- EmploymentDuration: string (nullable = true)
 |-- InstallmentPercent: integer (nullable = true)
 |-- Sex: string (nullable = true)
 |-- OthersOnLoan: string (nullable = true)
 |-- CurrentResidenceDuration: integer (nullable = true)
 |-- OwnsProperty: string (nullable = true)
 |-- Age: integer (nullable = true)
 |-- InstallmentPlans: string (nullable = true)
 |-- Housing: string (nullable = true)
 |-- ExistingCreditsCount: integer (nullable = true)
 |-- Job: string (nullable = true)
 |-- Dependents: integer (nullable = true)
 |-- Telephone: string (nullable = true)
 |-- ForeignWorker: string (nullable = true)
 |-- Risk: string (nullable = true)



In [13]:
print("Number of records: " + str(df_data.count()))

Number of records: 5000


### 2.2 Create a model

In [14]:
spark_df = df_data
(train_data, test_data) = spark_df.randomSplit([0.8, 0.2], 24)

MODEL_NAME = "AIOS Spark German Risk Model - Final"
DEPLOYMENT_NAME = "AIOS Spark German Risk Deployment - Final"

print("Number of records for training: " + str(train_data.count()))
print("Number of records for evaluation: " + str(test_data.count()))

Number of records for training: 4016
Number of records for evaluation: 984


In [15]:
from pyspark.ml.feature import OneHotEncoder, StringIndexer, IndexToString, VectorAssembler
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml import Pipeline, Model

si_CheckingStatus = StringIndexer(inputCol = 'CheckingStatus', outputCol = 'CheckingStatus_IX')
si_CreditHistory = StringIndexer(inputCol = 'CreditHistory', outputCol = 'CreditHistory_IX')
si_LoanPurpose = StringIndexer(inputCol = 'LoanPurpose', outputCol = 'LoanPurpose_IX')
si_ExistingSavings = StringIndexer(inputCol = 'ExistingSavings', outputCol = 'ExistingSavings_IX')
si_EmploymentDuration = StringIndexer(inputCol = 'EmploymentDuration', outputCol = 'EmploymentDuration_IX')
si_Sex = StringIndexer(inputCol = 'Sex', outputCol = 'Sex_IX')
si_OthersOnLoan = StringIndexer(inputCol = 'OthersOnLoan', outputCol = 'OthersOnLoan_IX')
si_OwnsProperty = StringIndexer(inputCol = 'OwnsProperty', outputCol = 'OwnsProperty_IX')
si_InstallmentPlans = StringIndexer(inputCol = 'InstallmentPlans', outputCol = 'InstallmentPlans_IX')
si_Housing = StringIndexer(inputCol = 'Housing', outputCol = 'Housing_IX')
si_Job = StringIndexer(inputCol = 'Job', outputCol = 'Job_IX')
si_Telephone = StringIndexer(inputCol = 'Telephone', outputCol = 'Telephone_IX')
si_ForeignWorker = StringIndexer(inputCol = 'ForeignWorker', outputCol = 'ForeignWorker_IX')

In [16]:
si_Label = StringIndexer(inputCol="Risk", outputCol="label").fit(spark_df)
label_converter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=si_Label.labels)

In [17]:
va_features = VectorAssembler(inputCols=["CheckingStatus_IX", "CreditHistory_IX", "LoanPurpose_IX", "ExistingSavings_IX", "EmploymentDuration_IX", "Sex_IX", \
                                         "OthersOnLoan_IX", "OwnsProperty_IX", "InstallmentPlans_IX", "Housing_IX", "Job_IX", "Telephone_IX", "ForeignWorker_IX", \
                                         "LoanDuration", "LoanAmount", "InstallmentPercent", "CurrentResidenceDuration", "LoanDuration", "Age", "ExistingCreditsCount", \
                                         "Dependents"], outputCol="features")

In [18]:
from pyspark.ml.classification import RandomForestClassifier
classifier = RandomForestClassifier(featuresCol="features")

pipeline = Pipeline(stages=[si_CheckingStatus, si_CreditHistory, si_EmploymentDuration, si_ExistingSavings, si_ForeignWorker, si_Housing, si_InstallmentPlans, si_Job, si_LoanPurpose, si_OthersOnLoan,\
                               si_OwnsProperty, si_Sex, si_Telephone, si_Label, va_features, classifier, label_converter])
model = pipeline.fit(train_data)

In [19]:
predictions = model.transform(test_data)
evaluatorDT = BinaryClassificationEvaluator(rawPredictionCol="prediction")
area_under_curve = evaluatorDT.evaluate(predictions)

#default evaluation is areaUnderROC
print("areaUnderROC = %g" % area_under_curve)

areaUnderROC = 0.704369


### 2.3 Save and deploy the model

In [20]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient
import json

wml_client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)



#### Remove existing model and deployment

In [21]:
model_deployment_ids = wml_client.deployments.get_uids()
for deployment_id in model_deployment_ids:
    deployment = wml_client.deployments.get_details(deployment_id)
    model_id = deployment['entity']['deployable_asset']['guid']
    if deployment['entity']['name'] == DEPLOYMENT_NAME:
        print('Deleting deployment id', deployment_id)
        wml_client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        wml_client.repository.delete(model_id)
wml_client.repository.list_models()

Deleting deployment id 84bb9d1c-9d4c-4183-8622-b8381ba4d257
Deleting model id 3e57efc1-0315-4698-b1c7-3e058b4a676f
------------------------------------  -------------------------------  ------------------------  -----------------
GUID                                  NAME                             CREATED                   FRAMEWORK
9d08893f-0c2e-4a6f-b0aa-833388868c05  AIOS Xgboost Solar Model         2019-02-13T10:32:58.034Z  xgboost-0.80
8d77d6b2-9458-4105-9977-1390c65cb475  credit-risk                      2019-02-08T08:31:18.222Z  mllib-2.3
f8cef2be-3097-4b1b-8989-a952754eb705  AIOS Spark German Risk model     2019-02-07T16:04:50.142Z  mllib-2.1
263dedd0-254d-4eed-9c4f-16300e0db5b3  AIOS Spark Drugs feedback model  2019-02-07T15:06:55.275Z  mllib-2.1
de5c9e61-f806-40f1-8137-e034303a8dad  AIOS Xgboost Agaricus Model      2019-01-25T11:06:58.070Z  xgboost-0.6
404aae40-ad00-43c5-b4cf-70a85030d043  AIOS Spark Digits Model          2019-01-22T08:34:29.846Z  scikit-learn-0.19
1a1f6ee4

In [22]:
model_props = {
    wml_client.repository.ModelMetaNames.NAME: "{}".format(MODEL_NAME),
    wml_client.repository.ModelMetaNames.EVALUATION_METHOD: "binary",
    wml_client.repository.ModelMetaNames.EVALUATION_METRICS: [
        {
           "name": "areaUnderROC",
           "value": area_under_curve,
           "threshold": 0.7
        }
    ]
}

In [23]:
wml_models = wml_client.repository.get_details()
model_uid = None
for model_in in wml_models['models']['resources']:
    if MODEL_NAME == model_in['entity']['name']:
        model_uid = model_in['metadata']['guid']
        break

if model_uid is None:
    print("Storing model ...")

    published_model_details = wml_client.repository.store_model(model=model, meta_props=model_props, training_data=train_data, pipeline=pipeline)
    model_uid = wml_client.repository.get_model_uid(published_model_details)
    print("Done")

Storing model ...
Done


In [24]:
model_uid

'0f1d511c-b121-44d1-b3f7-c3ad258211f5'

In [25]:
wml_deployments = wml_client.deployments.get_details()
deployment_uid = None
for deployment in wml_deployments['resources']:
    if DEPLOYMENT_NAME == deployment['entity']['name']:
        deployment_uid = deployment['metadata']['guid']
        break

if deployment_uid is None:
    print("Deploying model...")

    deployment = wml_client.deployments.create(artifact_uid=model_uid, name=DEPLOYMENT_NAME, asynchronous=False)
    deployment_uid = wml_client.deployments.get_uid(deployment)
    
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))

Deploying model...


#######################################################################################

Synchronous deployment creation for uid: '0f1d511c-b121-44d1-b3f7-c3ad258211f5' started

#######################################################################################


INITIALIZING
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='66c2d7c6-3e9d-4bd2-95ce-61c883f429e0'
------------------------------------------------------------------------------------------------


Model id: 0f1d511c-b121-44d1-b3f7-c3ad258211f5
Deployment id: 66c2d7c6-3e9d-4bd2-95ce-61c883f429e0


## 3. Configure OpenScale

In [26]:
from ibm_ai_openscale import APIClient
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *
from ibm_ai_openscale.supporting_classes import PayloadRecord, Feature
from ibm_ai_openscale.supporting_classes.enums import *

#### Get AI OpenScale GUID

In [27]:
import requests

AIOS_GUID = None
token_data = {
    'grant_type': 'urn:ibm:params:oauth:grant-type:apikey',
    'response_type': 'cloud_iam',
    'apikey': CLOUD_API_KEY
}

response = requests.post('https://iam.bluemix.net/identity/token', data=token_data)
iam_token = response.json()['access_token']
iam_headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer %s' % iam_token
}

resources = json.loads(requests.get('https://resource-controller.cloud.ibm.com/v2/resource_instances', headers=iam_headers).text)['resources']
for resource in resources:
    if "aiopenscale" in resource['id'].lower():
        AIOS_GUID = resource['guid']
        
AIOS_CREDENTIALS = {
    "instance_guid": AIOS_GUID,
    "apikey": CLOUD_API_KEY,
    "url": "https://api.aiopenscale.cloud.ibm.com"
}

if AIOS_GUID is None:
    print('AI OpenScale GUID NOT FOUND')
else:
    print(AIOS_GUID)

e3a38ab0-3884-454c-abc0-5e535eec36e6


#### Create schema and datamart

In [28]:
ai_client = APIClient(aios_credentials=AIOS_CREDENTIALS)
ai_client.version

'2.0.48'

#### Set up datamart

In [29]:
try:
    data_mart_details = ai_client.data_mart.get_details()
    if 'internal_database' in data_mart_details and data_mart_details['internal_database']:
        if KEEP_MY_INTERNAL_POSTGRES:
            print('Using existing internal datamart. YOU WILL NOT BE ABLE TO COMPLETE THE CONTINUOUS LEARNING PORTION OF THE NOTEBOOK.')
        else:
            print('Switching to external datamart')
            ai_client.data_mart.delete(force=True)
            ai_client.data_mart.setup(db_credentials=DB2_CREDENTIALS, schema=SCHEMA_NAME)
    else:
        print('Using existing external datamart')
except:
    print('Setting up external datamart')
    ai_client.data_mart.setup(db_credentials=DB2_CREDENTIALS, schema=SCHEMA_NAME)

Using existing external datamart


In [32]:
data_mart_details = ai_client.data_mart.get_details()

## 4. Bind machine learning engines

In [33]:
binding_uid = ai_client.data_mart.bindings.add('WML instance', WatsonMachineLearningInstance(WML_CREDENTIALS))
if binding_uid is None:
    binding_uid = ai_client.data_mart.bindings.get_details()['service_bindings'][0]['metadata']['guid']
bindings_details = ai_client.data_mart.bindings.get_details()
ai_client.data_mart.bindings.list()

0,1,2,3
a73bf76d-a663-448a-b771-4f651f73d54e,WML instance,watson_machine_learning,2019-02-13T10:34:55.751Z
91d749e3-8551-45da-84fc-385e7ed2ba26,My Azure ML Studio engine,azure_machine_learning,2019-02-07T12:44:20.125Z


In [34]:
print(binding_uid)

a73bf76d-a663-448a-b771-4f651f73d54e


In [36]:
ai_client.data_mart.bindings.list_assets(binding_uid=binding_uid)

0,1,2,3,4,5,6
0f1d511c-b121-44d1-b3f7-c3ad258211f5,AIOS Spark German Risk Model - Final,2019-02-13T10:34:05.152Z,model,mllib-2.1,a73bf76d-a663-448a-b771-4f651f73d54e,False
8d77d6b2-9458-4105-9977-1390c65cb475,credit-risk,2019-02-08T08:31:18.222Z,model,mllib-2.3,a73bf76d-a663-448a-b771-4f651f73d54e,False
f8cef2be-3097-4b1b-8989-a952754eb705,AIOS Spark German Risk model,2019-02-07T16:04:50.142Z,model,mllib-2.1,a73bf76d-a663-448a-b771-4f651f73d54e,False
263dedd0-254d-4eed-9c4f-16300e0db5b3,AIOS Spark Drugs feedback model,2019-02-07T15:06:55.275Z,model,mllib-2.1,a73bf76d-a663-448a-b771-4f651f73d54e,False
de5c9e61-f806-40f1-8137-e034303a8dad,AIOS Xgboost Agaricus Model,2019-01-25T11:06:58.070Z,model,xgboost-0.6,a73bf76d-a663-448a-b771-4f651f73d54e,False
404aae40-ad00-43c5-b4cf-70a85030d043,AIOS Spark Digits Model,2019-01-22T08:34:29.846Z,model,scikit-learn,a73bf76d-a663-448a-b771-4f651f73d54e,False
1a1f6ee4-1a5d-4af6-83ea-9de89cfdf9cd,AIOS Spark German Risk model,2019-01-18T12:38:38.440Z,model,mllib-2.1,a73bf76d-a663-448a-b771-4f651f73d54e,False
096310f1-c01c-496d-9647-34b964f29e06,AIOS Spark German Risk model,2019-01-18T12:20:09.545Z,model,mllib-2.1,a73bf76d-a663-448a-b771-4f651f73d54e,False
d023d796-73bd-496d-9ef6-0d35eb4e769a,AIOS Spark German Risk model,2019-01-18T12:12:56.978Z,model,mllib-2.1,a73bf76d-a663-448a-b771-4f651f73d54e,False
a68c3eb5-e804-45c6-bd1c-8a5ac127be3a,AIOS Spark German Risk model,2019-01-18T10:16:40.630Z,model,mllib-2.1,a73bf76d-a663-448a-b771-4f651f73d54e,False


## 5. Subscriptions

### Remove existing credit risk subscriptions

In [37]:
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
for subscription in subscriptions_uids:
    sub_name = ai_client.data_mart.subscriptions.get_details(subscription)['entity']['asset']['name']
    if sub_name == MODEL_NAME:
        ai_client.data_mart.subscriptions.delete(subscription)
        print('Deleted existing subscription for', MODEL_NAME)

In [38]:
subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
    model_uid,
    problem_type=ProblemType.BINARY_CLASSIFICATION,
    input_data_type=InputDataType.STRUCTURED,
    label_column='Risk',
    prediction_column='predictedLabel',
    probability_column='probability',
    feature_columns = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"],
    categorical_columns = ["CheckingStatus","CreditHistory","LoanPurpose","ExistingSavings","EmploymentDuration","Sex","OthersOnLoan","OwnsProperty","InstallmentPlans","Housing","Job","Telephone","ForeignWorker"]
))

if subscription is None:
    print('Subscription already exists; get the existing one')
    subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
    for sub in subscriptions_uids:
        if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
            subscription = ai_client.data_mart.subscriptions.get(sub)

Get subscription list

In [39]:
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
ai_client.data_mart.subscriptions.list()

0,1,2,3,4
0f1d511c-b121-44d1-b3f7-c3ad258211f5,AIOS Spark German Risk Model - Final,model,a73bf76d-a663-448a-b771-4f651f73d54e,2019-02-13T10:36:38.050Z
085460ef94636166aea5800e9ea26168,GermanCreditRisk.2019.1.9.10.41.58.611,model,91d749e3-8551-45da-84fc-385e7ed2ba26,2019-02-07T12:46:42.998Z


In [42]:
subscription_details = subscription.get_details()

### Score the model so we can configure monitors

In [43]:
credit_risk_scoring_endpoint = None
print(deployment_uid)

for deployment in wml_client.deployments.get_details()['resources']:
    if deployment_uid in deployment['metadata']['guid']:
        credit_risk_scoring_endpoint = deployment['entity']['scoring_url']
        
print(credit_risk_scoring_endpoint)

66c2d7c6-3e9d-4bd2-95ce-61c883f429e0
https://us-south.ml.cloud.ibm.com/v3/wml_instances/a73bf76d-a663-448a-b771-4f651f73d54e/deployments/66c2d7c6-3e9d-4bd2-95ce-61c883f429e0/online


In [44]:
fields = ["CheckingStatus","LoanDuration","CreditHistory","LoanPurpose","LoanAmount","ExistingSavings","EmploymentDuration","InstallmentPercent","Sex","OthersOnLoan","CurrentResidenceDuration","OwnsProperty","Age","InstallmentPlans","Housing","ExistingCreditsCount","Job","Dependents","Telephone","ForeignWorker"]
values = [
  ["no_checking",13,"credits_paid_to_date","car_new",1343,"100_to_500","1_to_4",2,"female","none",3,"savings_insurance",46,"none","own",2,"skilled",1,"none","yes"],
  ["no_checking",24,"prior_payments_delayed","furniture",4567,"500_to_1000","1_to_4",4,"male","none",4,"savings_insurance",36,"none","free",2,"management_self-employed",1,"none","yes"],
  ["0_to_200",26,"all_credits_paid_back","car_new",863,"less_100","less_1",2,"female","co-applicant",2,"real_estate",38,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",14,"no_credits","car_new",2368,"less_100","1_to_4",3,"female","none",3,"real_estate",29,"none","own",1,"skilled",1,"none","yes"],
  ["0_to_200",4,"no_credits","car_new",250,"less_100","unemployed",2,"female","none",3,"real_estate",23,"none","rent",1,"management_self-employed",1,"none","yes"],
  ["no_checking",17,"credits_paid_to_date","car_new",832,"100_to_500","1_to_4",2,"male","none",2,"real_estate",42,"none","own",1,"skilled",1,"none","yes"],
  ["no_checking",33,"outstanding_credit","appliances",5696,"unknown","greater_7",4,"male","co-applicant",4,"unknown",54,"none","free",2,"skilled",1,"yes","yes"],
  ["0_to_200",13,"prior_payments_delayed","retraining",1375,"100_to_500","4_to_7",3,"male","none",3,"real_estate",37,"none","own",2,"management_self-employed",1,"none","yes"]
]

payload_scoring = {"fields": fields,"values": values}
scoring_response = wml_client.deployments.score(credit_risk_scoring_endpoint, payload_scoring)

print(scoring_response)

{'fields': ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'CheckingStatus_IX', 'CreditHistory_IX', 'EmploymentDuration_IX', 'ExistingSavings_IX', 'ForeignWorker_IX', 'Housing_IX', 'InstallmentPlans_IX', 'Job_IX', 'LoanPurpose_IX', 'OthersOnLoan_IX', 'OwnsProperty_IX', 'Sex_IX', 'Telephone_IX', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedLabel'], 'values': [['no_checking', 13, 'credits_paid_to_date', 'car_new', 1343, '100_to_500', '1_to_4', 2, 'female', 'none', 3, 'savings_insurance', 46, 'none', 'own', 2, 'skilled', 1, 'none', 'yes', 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, [21, [1, 3, 5, 13, 14, 15, 16, 17, 18, 19, 20], [1.0, 1.0, 1.0, 13.0, 1343.0, 2.0, 3.0, 13.0, 

## 6. Quality monitoring and feedback logging

### Enable quality monitoring

Wait ten seconds to allow the payload logging table to be set up before we begin enabling monitors.

In [45]:
time.sleep(10)
subscription.quality_monitoring.enable(threshold=0.7, min_records=50)

### Feedback logging

In [46]:
!rm additional_feedback_data.json
!wget https://raw.githubusercontent.com/emartensibm/german-credit/master/additional_feedback_data.json

rm: cannot remove ‘additional_feedback_data.json’: No such file or directory
--2019-02-13 04:37:46--  https://raw.githubusercontent.com/emartensibm/german-credit/master/additional_feedback_data.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.48.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16506 (16K) [text/plain]
Saving to: ‘additional_feedback_data.json’


2019-02-13 04:37:46 (30.0 MB/s) - ‘additional_feedback_data.json’ saved [16506/16506]



In [47]:
with open('additional_feedback_data.json') as feedback_file:
    additional_feedback_data = json.load(feedback_file)
subscription.feedback_logging.store(additional_feedback_data['data'])

In [48]:
subscription.feedback_logging.show_table()

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21
less_0,10,all_credits_paid_back,car_new,250,500_to_1000,4_to_7,3,male,none,2,real_estate,23,none,rent,1,skilled,1,none,yes,No Risk,2019-02-13 10:37:47.848000+00:00
no_checking,23,prior_payments_delayed,appliances,6964,100_to_500,4_to_7,4,female,none,3,car_other,39,none,own,1,skilled,1,none,yes,Risk,2019-02-13 10:37:47.848000+00:00
0_to_200,30,outstanding_credit,appliances,3464,100_to_500,greater_7,3,male,guarantor,4,savings_insurance,51,stores,free,1,skilled,1,yes,yes,Risk,2019-02-13 10:37:47.848000+00:00
no_checking,23,outstanding_credit,car_used,2681,500_to_1000,greater_7,4,male,none,3,car_other,33,stores,free,1,unskilled,1,yes,yes,No Risk,2019-02-13 10:37:47.848000+00:00
0_to_200,18,prior_payments_delayed,furniture,1673,less_100,1_to_4,2,male,none,3,car_other,30,none,own,2,skilled,1,none,yes,Risk,2019-02-13 10:37:47.848000+00:00
no_checking,44,outstanding_credit,radio_tv,3476,unknown,greater_7,4,male,co-applicant,4,unknown,60,none,free,2,skilled,2,yes,yes,Risk,2019-02-13 10:37:47.848000+00:00
less_0,8,no_credits,education,803,less_100,unemployed,1,male,none,1,savings_insurance,19,stores,rent,1,skilled,1,none,yes,No Risk,2019-02-13 10:37:47.848000+00:00
0_to_200,7,all_credits_paid_back,car_new,250,less_100,unemployed,1,male,none,1,real_estate,19,stores,rent,1,skilled,1,none,yes,No Risk,2019-02-13 10:37:47.848000+00:00
0_to_200,33,credits_paid_to_date,radio_tv,3548,100_to_500,1_to_4,3,male,none,4,car_other,28,none,own,2,skilled,1,yes,yes,Risk,2019-02-13 10:37:47.848000+00:00
no_checking,24,prior_payments_delayed,retraining,4158,100_to_500,greater_7,3,female,none,2,savings_insurance,35,stores,own,1,unskilled,2,none,yes,Risk,2019-02-13 10:37:47.848000+00:00


In [50]:
run_details = subscription.quality_monitoring.run(background_mode=False)




 Waiting for end of quality monitoring run 697b6e2d-79fe-49ab-b11e-0ff997973ca0 




initializing.
running
completed

---------------------------
 Successfully finished run 
---------------------------




In [55]:
quality_monitoring_details = subscription.quality_monitoring.get_run_details()

In [56]:
subscription.quality_monitoring.show_table()

0,1,2,3,4,5,6,7
2019-02-13 10:37:48.641000+00:00,0.7469512195121951,0.7,a73bf76d-a663-448a-b771-4f651f73d54e,0f1d511c-b121-44d1-b3f7-c3ad258211f5,66c2d7c6-3e9d-4bd2-95ce-61c883f429e0,Accuracy_evaluation_57e1a685-9b68-44fd-b9d5-c2d10f4e2f82,
2019-02-13 10:39:05.035000+00:00,0.7469512195121951,0.7,a73bf76d-a663-448a-b771-4f651f73d54e,0f1d511c-b121-44d1-b3f7-c3ad258211f5,66c2d7c6-3e9d-4bd2-95ce-61c883f429e0,Accuracy_evaluation_697b6e2d-79fe-49ab-b11e-0ff997973ca0,


In [57]:
ai_client.data_mart.get_deployment_metrics()

{'deployment_metrics': [{'asset': {'asset_id': '085460ef94636166aea5800e9ea26168',
    'asset_type': 'model',
    'created_at': '2019-01-09T10:42:59.7933412Z',
    'name': 'GermanCreditRisk.2019.1.9.10.41.58.611',
    'url': 'https://ussouthcentral.services.azureml.net/subscriptions/744bca722299451cb682ed6fb75fb671/services/e13d36b1c48f4080a49e5ae675d816ec/swagger.json'},
   'deployment': {'created_at': '2019-01-09T10:42:59.7933412Z',
    'deployment_id': '563f01d37f720857b95c557dc76176ad',
    'deployment_rn': '/subscriptions/744bca72-2299-451c-b682-ed6fb75fb671/resourceGroups/ai-ops-squad/providers/Microsoft.MachineLearning/webServices/GermanCreditRisk.2019.1.9.10.41.58.611',
    'deployment_type': 'online',
    'name': 'GermanCreditRisk.2019.1.9.10.41.58.611',
    'scoring_endpoint': {'credentials': {'token': 'X/YNeesRdD95FDyXoBm7wvP2LlHD4pZGlMARZv6rh8AfV0Ol0Zb1KiftUJiNmisml7NmhURSpzeGn2UsKzcqZw=='},
     'request_headers': {'Content-Type': 'application/json'},
     'url': 'https://

## 7. Fairness monitoring and explainability

## Fairness monitoring

In [61]:
subscription.fairness_monitoring.enable(
            features=[
                Feature("Sex", majority=['male'], minority=['female'], threshold=0.95),
                Feature("Age", majority=[[26,75]], minority=[[18,25]], threshold=0.95)
            ],
            favourable_classes=['No Risk'],
            unfavourable_classes=['Risk'],
            min_records=200,
            training_data=pd_data
        )

### Score the model again now that monitoring is configured

In [62]:
!rm german_credit_feed.json
!wget https://raw.githubusercontent.com/emartensibm/german-credit/master/german_credit_feed.json

rm: cannot remove ‘german_credit_feed.json’: No such file or directory
--2019-02-13 04:52:19--  https://raw.githubusercontent.com/emartensibm/german-credit/master/german_credit_feed.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.48.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3076547 (2.9M) [text/plain]
Saving to: ‘german_credit_feed.json’


2019-02-13 04:52:19 (47.5 MB/s) - ‘german_credit_feed.json’ saved [3076547/3076547]



Score 200 randomly chosen records

In [65]:
import random

with open('german_credit_feed.json', 'r') as scoring_file:
    scoring_data = json.load(scoring_file)

fields = scoring_data['fields']
values = []
for _ in range(200):
    values.append(random.choice(scoring_data['values']))
payload_scoring = {"fields": fields, "values": values}

scoring_response = wml_client.deployments.score(credit_risk_scoring_endpoint, payload_scoring)

### Run fairness monitor

Kick off a fairness monitor run on current data. Depending on how fast the monitor runs, the table may not contain the most recent results.

In [70]:
run_details = subscription.fairness_monitoring.run(background_mode=False)




 Counting bias for deployment_uid=66c2d7c6-3e9d-4bd2-95ce-61c883f429e0 




RUNNING
FINISHED

---------------------------
 Successfully finished run 
---------------------------




In [75]:
subscription.fairness_monitoring.show_table()

0,1,2,3,4,5,6,7,8,9,10
2019-02-13 10:53:35.322795+00:00,Sex,female,False,0.951,77.0,a73bf76d-a663-448a-b771-4f651f73d54e,0f1d511c-b121-44d1-b3f7-c3ad258211f5,0f1d511c-b121-44d1-b3f7-c3ad258211f5,66c2d7c6-3e9d-4bd2-95ce-61c883f429e0,
2019-02-13 10:53:35.322795+00:00,Age,"[18, 25]",False,0.986,82.33618233618233,a73bf76d-a663-448a-b771-4f651f73d54e,0f1d511c-b121-44d1-b3f7-c3ad258211f5,0f1d511c-b121-44d1-b3f7-c3ad258211f5,66c2d7c6-3e9d-4bd2-95ce-61c883f429e0,
2019-02-13 11:00:15.791532+00:00,Sex,female,False,0.951,77.0,a73bf76d-a663-448a-b771-4f651f73d54e,0f1d511c-b121-44d1-b3f7-c3ad258211f5,0f1d511c-b121-44d1-b3f7-c3ad258211f5,66c2d7c6-3e9d-4bd2-95ce-61c883f429e0,
2019-02-13 11:00:15.791532+00:00,Age,"[18, 25]",False,0.986,82.33618233618233,a73bf76d-a663-448a-b771-4f651f73d54e,0f1d511c-b121-44d1-b3f7-c3ad258211f5,0f1d511c-b121-44d1-b3f7-c3ad258211f5,66c2d7c6-3e9d-4bd2-95ce-61c883f429e0,


## Explainability

### Configuration

In [None]:
subscription.explainability.enable(training_data=pd_data)

In [74]:
explainability_details = subscription.explainability.get_details()

### Identify transactions for Explainability

Transaction IDs (`scoring_id` in payload logging table) identified by the cells below can be copied and pasted into the Explainability tab of the OpenScale dashboard.
You can use method `subscription.payload_logging_show_table()` to preview the content and find scoring_id you are interested in. You can also get the content of payload logging table in form of pandas dataframe.

In [80]:
sample_transaction_id=subscription.payload_logging.get_table_content(limit=1)['scoring_id'][0]

### Run Explainability

In [82]:
run_details = subscription.explainability.run(sample_transaction_id)




 Looking for explanation for d383f1e2466601636c210ea98937a929-1 




in_progress
finished

---------------------------
 Successfully finished run 
---------------------------




In [84]:
run_details

{'entity': {'asset': {'deployment': {'id': '66c2d7c6-3e9d-4bd2-95ce-61c883f429e0',
    'name': 'AIOS Spark German Risk Deployment - Final'},
   'id': '0f1d511c-b121-44d1-b3f7-c3ad258211f5',
   'name': 'AIOS Spark German Risk Model - Final',
   'type': 'numeric_categorical'},
  'contrastive_explanation': {'pertinent_positive_features': [{'feature_name': 'CurrentResidenceDuration',
     'feature_value': '-7.16667',
     'importance': '9.113266364760246'},
    {'feature_name': 'Sex',
     'feature_value': 'male',
     'importance': '3.796711274823613'},
    {'feature_name': 'InstallmentPlans',
     'feature_value': 'none',
     'importance': '2.8014340688507895'},
    {'feature_name': 'Dependents',
     'feature_value': '1.76291',
     'importance': '2.057369135154447'},
    {'feature_name': 'OthersOnLoan',
     'feature_value': 'none',
     'importance': '1.7079714425356278'},
    {'feature_name': 'ForeignWorker',
     'feature_value': 'yes',
     'importance': '1.3084195873798594'},
   

## Congratulations!

You have finished the hands-on lab for IBM Watson OpenScale. You can now view the [OpenScale Dashboard](https://aiopenscale.cloud.ibm.com/aiopenscale/). Click on the tile for the AIOS German Credit model to see fairness, accuracy, and performance monitors. Click on the timeseries graph to get detailed information on transactions during a specific time window.

## Next steps

OpenScale shows model performance over time. You have two options to keep data flowing to your OpenScale graphs:
  * Download, configure and schedule the [model feed notebook](https://raw.githubusercontent.com/emartensibm/german-credit/master/german_credit_scoring_feed.ipynb). This notebook can be set up with your WML credentials, and scheduled to provide a consistent flow of scoring requests to your model, which will appear in your OpenScale monitors.
  * Re-run this notebook. Running this notebook from the beginning will delete and re-create the model and deployment, and re-create the historical data. Please note that the payload and measurement logs for the previous deployment will continue to be stored in your datamart, and cal be deleted if necessary.