# Infuse Applications with AI Using IBM Watson OpenScale

The following notebook is intended for use with the Watson OpenScale hands-on lab found [here](https://dtelink). It contains instructions and data for training and deploying an insurance fraud prediction model, and configuring Watson OpenScale to monitor and provide detailed explanations for that model's predictions.

This notebook should be run in a Watson Studio project, using a Python 3.6 or above runtime environment. If you are viewing this in Watson Studio and do not see Python 3.6 or above in the upper right corner of your screen, please update the runtime now. It requires the following Cloud services:

* __IBM Watson OpenScale__
* __Watson Machine Learning__

If you have a paid Cloud account, you may also provision a __Databases for PostgreSQL__ or __Db2 Warehouse__ service to take full advantage of integration with Watson Studio and continuous learning services. If you choose not to provision this paid service, you can use the free internal PostgreSQL storage with OpenScale, but will not be able to configure continuous learning for your model.

## Install packages

In [None]:
!pip install --upgrade ibm-watson-openscale --no-cache | tail -n 1

In [None]:
import sklearn
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import numpy as np
from sklearn.model_selection import train_test_split
from scipy.io import arff

## Provision services and configure credentials

If you have not already, provision instances of [IBM Watson OpenScale](https://cloud.ibm.com/catalog/services/watson-openscale) and [Watson Machine Learning](https://cloud.ibm.com/catalog/services/machine-learning). The free lite versions of each plan will work for this tutorial.

Your Cloud API key can be generated by going to the [__Users__ section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the __API Keys__ section, and click __Create an IBM Cloud API key__. Give your key a name and click __Create__, then copy the created key and paste it between the single quotes in the cell below.

In [None]:
CLOUD_API_KEY = '___PASTE_API_KEY_HERE____'

In [None]:
WML_CREDENTIALS = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": CLOUD_API_KEY
}

## Create a deployment space

All deployed models require a deployment space. Go to the [Deployment Spaces Dashboard](https://dataplatform.cloud.ibm.com/ml-runtime/spaces?context=cpdaas) to create a new space, or choose an existing one. Click on the name of the space, then go to the __Settings__ tab. Locate the __Space ID__ and then click the icon to copy the ID to your clipboard. Paste your space ID between the quotation marks below.

In [None]:
SPACE_ID = '___PASTE_SPACE_ID_HERE___'

## Database Credentials

This tutorial can use Databases for PostgreSQL, Db2 Warehouse, or a free internal version of PostgreSQL to create a datamart for OpenScale. The free internal version can be accessed via the OpenScale APIs, but you will be unable to access it using direct database queries.

If you have previously configured OpenScale, it will use your existing datamart, and not interfere with any models you are currently monitoring. Do not update the cell below.

If you do not have a paid Cloud account or would prefer not to provision this paid service, you may use the free internal PostgreSQL service with OpenScale. Do not update the cell below.

To provision a new instance of Db2 Warehouse, locate [Db2 Warehouse in the Cloud catalog](https://cloud.ibm.com/catalog/services/db2-warehouse), give your service a name, and click __Create__. Once your instance is created, click the __Service Credentials__ link on the left side of the screen. Click the __New credential__ button, give your credentials a name, and click __Add__. Your new credentials can be accessed by clicking the __View credentials__ button. Copy and paste your Db2 Warehouse credentials into the cell below.

To provision a new instance of Databases for PostgreSQL, locate [Databases for PostgreSQL](https://cloud.ibm.com/catalog/services/databases-for-postgresql) in the Cloud catalog, give your service a name, and click __Create__. Once your instance is created, click the __Service Credentials__ link on the left side of the screen. Click the __New credential__ button, give your credentials a name, and click __Add__. Your new credentials can be accessed by clicking the __View credentials__ button. Copy and paste your Databases for PostgreSQL credentials into the cell below.

In [None]:
DB_CREDENTIALS = None

## Restart the kernel and run the notebook

At this point, the notebook is ready to run. _You must restart the kernel via the kernel menu above_. You can either restart the kernel and run the cells one at a time, starting from the package installation, or click the __Kernel__ option above and select __Restart and Run All__ to run all the cells.

In [None]:
MODEL_NAME = "SKLearn Fraud Prediction"
DEPLOYMENT_NAME = "SKLearn Fraud Deployment"

### Connect to OpenScale

In [None]:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson_openscale import APIClient

service_credentials = {
    "apikey": CLOUD_API_KEY,
    "url": "https://api.aiopenscale.cloud.ibm.com"
}

authenticator = IAMAuthenticator(apikey=service_credentials['apikey'])

wos_client = APIClient(authenticator=authenticator)
wos_client.version

### Delete any existing subscriptions to this model in OpenScale

In [None]:
subscriptions = wos_client.subscriptions.list().result.subscriptions
for subscription in subscriptions:
    if subscription.entity.asset.name == MODEL_NAME:
        print("Deleting existing subscription for model", subscription.entity.asset.name)
        wos_client.subscriptions.delete(subscription.metadata.id)

### Get the training data from github

In [None]:
!rm training_data.csv
!wget https://raw.githubusercontent.com/emartensibm/openscale_insurance/master/data/training_data.csv

### Explore the data

The training data contains information on auto insurance claims that may indicate a higher likelihood of fraudulent claims. In this case, we have a set of binary variables for the following:
* __SUSPICIOUS\_CLAIM\_TIME__: The claim was filed after too much time had elapsed following the incident
* __EXPIRED\_LICENSE__: The person filing the claim did not have a valid drivers license at the time of the incident
* __LOW\_MILES\_AT\_LOSS__: The vehicle's mileage at the time of loss was lower than expected
* __EXCESSIVE\_CLAIM\_AMOUNT__: The dollar amount claimed was higher than expected given the value of the vehicle
* __TOO\_MANY\_CLAIMS__: The person filing the claim has multiple claims outstanding
* __NO\_POLICE__: No police report was filed for the loss incident

In [None]:
features = ["SUSPICIOUS_CLAIM_TIME", "EXPIRED_LICENSE", "LOW_MILES_AT_LOSS", "EXCESSIVE_CLAIM_AMOUNT", "TOO_MANY_CLAIMS", "NO_POLICE", "FLAG_FOR_FRAUD_INV"]
df_model = pd.read_csv('training_data.csv')

df_model.drop(["DRIVER_ID", "POLICY_ID", "CLAIM_ID", "HOUSEHOLD_ID", "ZIPCODE"], axis=1, inplace=True)

df_model["SUSPICIOUS_CLAIM_TIME"] = df_model["SUSPICIOUS_CLAIM_TIME"].astype(int)
df_model["EXPIRED_LICENSE"] = df_model["EXPIRED_LICENSE"].astype(int)
df_model["LOW_MILES_AT_LOSS"] = df_model["LOW_MILES_AT_LOSS"].astype(int)
df_model["EXCESSIVE_CLAIM_AMOUNT"] = df_model["EXCESSIVE_CLAIM_AMOUNT"].astype(int)
df_model["TOO_MANY_CLAIMS"] = df_model["TOO_MANY_CLAIMS"].astype(int)
df_model["NO_POLICE"] = df_model["NO_POLICE"].astype(int)
df_model["FLAG_FOR_FRAUD_INV"] = df_model["FLAG_FOR_FRAUD_INV"].astype(int)

df_model.head()

Identify the training data columns and label columns, and set up a train/test split of 80/20.

In [None]:
xVar = df_model[["SUSPICIOUS_CLAIM_TIME", "EXPIRED_LICENSE", "LOW_MILES_AT_LOSS", "EXCESSIVE_CLAIM_AMOUNT", "TOO_MANY_CLAIMS", "NO_POLICE"]]
yVar = df_model["FLAG_FOR_FRAUD_INV"]

x_train, x_test, y_train, y_test = train_test_split(xVar, yVar, test_size=0.2)

Create a scikit-learn Random Forest Classifier and fit the training data.

In [None]:
model = RandomForestClassifier(n_jobs=2, random_state=0)
model.fit(x_train, y_train)

Check the test data using the model. For this model, an output of 1 indicates likely fraud; an output of 0 indicates unlikely fraud.

In [None]:
predict_result = model.predict(x_test)
pd.crosstab(y_test, predict_result, rownames = ["Actual Result"], colnames = ["Predicted Result"])

## Store the model in Watson Machine Learning

In this section, the notebook uses the supplied Watson Machine Learning credentials to save the model to the WML instance. Previous versions of the model are removed so that the notebook can be run again, resetting all data for another demo.

In [None]:
from ibm_watson_machine_learning import APIClient
wml_client = APIClient(WML_CREDENTIALS)

In [None]:
space_details = wml_client.spaces.list()

In [None]:
wml_client.set.default_space(SPACE_ID)

In [None]:
wml_client.repository.list_models()

In [None]:
deployment_details = wml_client.deployments.get_details()
for deployment in deployment_details['resources']:
    deployment_id = deployment['metadata']['id']
    model_id = deployment['entity']['asset']['id']
    if deployment['entity']['name'] == DEPLOYMENT_NAME:
        model_id = deployment['entity']['asset']['id']
        print('Deleting deployment id', deployment_id)
        wml_client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        wml_client.repository.delete(model_id)
wml_client.repository.list_models()

In [None]:
sw_spec_id = wml_client.software_specifications.get_id_by_name('scikit-learn_0.20-py3.6')
metadata = {
    wml_client.repository.ModelMetaNames.NAME: MODEL_NAME,
    wml_client.repository.ModelMetaNames.TYPE: 'scikit-learn_0.20',
    wml_client.repository.ModelMetaNames.SOFTWARE_SPEC_UID: sw_spec_id,
}

df_train = df_model.copy()
df_train.drop("FLAG_FOR_FRAUD_INV", axis=1, inplace=True)

In [None]:
df_train.head()

In [None]:
# Name the columns
cols=["SUSPICIOUS_CLAIM_TIME","EXPIRED_LICENSE","LOW_MILES_AT_LOSS","EXCESSIVE_CLAIM_AMOUNT","TOO_MANY_CLAIMS","NO_POLICE"]
      
saved_model = wml_client.repository.store_model(model=model, meta_props=metadata, training_data=df_train,\
                                                training_target=df_model['FLAG_FOR_FRAUD_INV'], feature_names=cols,\
                                                label_column_names=["FLAG_FOR_FRAUD_INV"] )

saved_model

## Deploy the model

In this section, the model is deployed as a web service.

In [None]:
model_uid = saved_model['metadata']['id']
print("Deploying model", model_uid)

meta_props = {
    wml_client.deployments.ConfigurationMetaNames.NAME: DEPLOYMENT_NAME,
    wml_client.deployments.ConfigurationMetaNames.ONLINE: {}
}

deployment = wml_client.deployments.create(artifact_uid=model_uid, meta_props=meta_props)
deployment_uid = wml_client.deployments.get_uid(deployment)

The deployed model is available as a web service, and can be called via the scoring endpoint. Values are passed and predictions are returned as JSON objects.

In [None]:
scoring_endpoint = deployment['entity']['status']['online_url']['url']
scoring_endpoint

In [None]:
scoring_payload = {wml_client.deployments.ScoringMetaNames.INPUT_DATA:
                   [
                       {
                           "fields": ["SUSPICIOUS_CLAIM_TIME", "EXPIRED_LICENSE", "LOW_MILES_AT_LOSS", "EXCESSIVE_CLAIM_AMOUNT", "TOO_MANY_CLAIMS", "NO_POLICE"],
                           "values": [[0,1,0,1,0,1]]
                       }
                    ]
                  }

In [None]:
predictions = wml_client.deployments.score(deployment_uid, scoring_payload)
print(predictions)

## Configure OpenScale

We will now configure Watson OpenScale to monitor the deployed model. When this step is finished, all data into and out of the model will be logged, and can be made available to our applications via the Python API. Additionally, we will have the ability to generate explanations for individual predictions.

The code below creates the OpenScale datamart, a database in which OpenScale will store its data. If you have already set up OpenScale, it will use your existing datamart and not remove any previous data. If you specified Db2 Warehouse or Databases for PostgreSQL credentials above, it will use those credentials to create a datamart with that paid service. Finally, if you have not previously used OpenScale and did not supply credentials for a paid database service, it will create the datamart in a free, internal database. This internal database still allows access via the OpenScale APIs, but you cannot access it directly via database queries.

In [None]:
wos_client.data_marts.show()

In [None]:
data_marts = wos_client.data_marts.list().result.data_marts
if len(data_marts) == 0:
    if DB_CREDENTIALS is not None:
        if SCHEMA_NAME is None: 
            print("Please specify the SCHEMA_NAME and rerun the cell")

        print("Setting up external datamart")
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook",
                database_configuration=DatabaseConfigurationRequest(
                  database_type=DatabaseType.POSTGRESQL,
                    credentials=PrimaryStorageCredentialsLong(
                        hostname=DB_CREDENTIALS["connection"]["postgres"]["hosts"][0]["hostname"],
                        username=DB_CREDENTIALS["connection"]["postgres"]["authentication"]["username"],
                        password=DB_CREDENTIALS["connection"]["postgres"]["authentication"]["password"],
                        db=DB_CREDENTIALS["connection"]["postgres"]["database"],
                        port=DB_CREDENTIALS["connection"]["postgres"]["hosts"][0]["port"],
                        ssl=True,
                        sslmode=DB_CREDENTIALS["connection"]["postgres"]["query_options"]["sslmode"],
                        certificate_base64=DB_CREDENTIALS["connection"]["postgres"]["certificate"]["certificate_base64"]
                    ),
                    location=LocationSchemaName(
                        schema_name= SCHEMA_NAME
                    )
                )
             ).result
    else:
        print("Setting up internal datamart")
        added_data_mart_result = wos_client.data_marts.add(
                background_mode=False,
                name="WOS Data Mart",
                description="Data Mart created by WOS tutorial notebook", 
                internal_database = True).result
        
    data_mart_id = added_data_mart_result.metadata.id
    
else:
    data_mart_id=data_marts[0].metadata.id
    print("Using existing datamart {}".format(data_mart_id))

Multiple service providers for the same engine instance are avaiable in Watson OpenScale. To avoid multiple service providers of used WML instance in the tutorial notebook the following code deletes existing service provder(s) and then adds new one.

In [None]:
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *

In [None]:
wos_client.service_providers.show()

In [None]:
service_providers = wos_client.service_providers.list().result.service_providers
for service_provider in service_providers:
    service_instance_name = service_provider.entity.name
    if service_instance_name == "WML instance for OpenScale":
        service_provider_id = service_provider.metadata.id
        wos_client.service_providers.delete(service_provider_id)
        print("Deleted existing service_provider for WML instance: {}".format(service_provider_id))

In [None]:
added_service_provider_result = wos_client.service_providers.add(
        name="WML instance for OpenScale",
        description="Created for OpenScale Insurance tutorial",
        service_type=ServiceTypes.WATSON_MACHINE_LEARNING,
        deployment_space_id = SPACE_ID,
        operational_space_id = "production",
        credentials=WMLCredentialsCloud(
            apikey=CLOUD_API_KEY,
            url='https://us-south.ml.cloud.ibm.com',
            instance_id=None
        ),
        background_mode=False
    ).result
service_provider_id = added_service_provider_result.metadata.id

In [None]:
asset = Asset(
    name = "SKLearn Fraud Prediction",
    asset_id=model_uid,
    url=scoring_endpoint,
    asset_type=AssetTypes.MODEL,
    input_data_type=InputDataType.STRUCTURED,
    problem_type=ProblemType.BINARY_CLASSIFICATION
)
asset_deployment = AssetDeploymentRequest(
    deployment_id=deployment_uid,
    name=DEPLOYMENT_NAME,
    deployment_type=DeploymentTypes.ONLINE,
    url=scoring_endpoint
)
training_data_reference = TrainingDataReference(
    type="cos",
    location=COSTrainingDataReferenceLocation(
        bucket='faststartlab-donotdelete-pr-nhfd4jnhlxgpc7',
        file_name='insurance_fraud_training_data.csv'
    ),
    connection=COSTrainingDataReferenceConnection.from_dict(
        {
            "resource_instance_id": "crn:v1:bluemix:public:cloud-object-storage:global:a/7d8b3c34272c0980d973d3e40be9e9d2:2883ef10-23f1-4592-8582-2f2ef4973639::",
            "url": "https://s3.us.cloud-object-storage.appdomain.cloud",
            "api_key": "yqcPbWZ0AQPHleHVerrR4Wx5e9pymBdMgydbEra5zCif",
            "iam_url": "https://iam.bluemix.net/oidc/token"
        }
    )
)
asset_properties_request = AssetPropertiesRequest(
    label_column="FLAG_FOR_FRAUD_INV",
    probability_fields=["probability"],
    prediction_field="prediction",
    feature_fields=cols,
    categorical_fields=[],
    training_data_reference=training_data_reference
)

In [None]:
subscription_details = wos_client.subscriptions.add(
        data_mart_id=data_mart_id,
        service_provider_id=service_provider_id,
        asset=asset,
        deployment=asset_deployment,
        asset_properties=asset_properties_request).result
subscription_id = subscription_details.metadata.id
print(subscription_details)

In [None]:
import time

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id:", payload_data_set_id)

In [None]:
wos_client.data_sets.show()

In [None]:
wos_client.subscriptions.show()

### Score the model so we can configure monitors
Now that the datamart and subscription have been created, we need to send some sample data to the model for scoring so that OpenScale can create the correct schema for the payload logging table that will store our prediction history. These two records will be the two that we use for explanations as well.

Note that we specify a customer ID as metadata in the scoring request; this will allow us to tie the prediction and explanation to a particular customer, so we can retreive the explanation data easily.

In [None]:
fields = ["SUSPICIOUS_CLAIM_TIME", "EXPIRED_LICENSE", "LOW_MILES_AT_LOSS", "EXCESSIVE_CLAIM_AMOUNT", "TOO_MANY_CLAIMS", "NO_POLICE"]
values = [[0,1,0,0,0,1]]
meta = {
    "fields": ["customer_id"],
    "values": [['A2018MV533']]
}

payload_scoring = {"input_data": [{"fields": fields, "values": values, "meta": meta}]}
predictions = wml_client.deployments.score(deployment_uid, payload_scoring)
print(predictions)

In [None]:
payload_scoring

In [None]:
fields = ["SUSPICIOUS_CLAIM_TIME", "EXPIRED_LICENSE", "LOW_MILES_AT_LOSS", "EXCESSIVE_CLAIM_AMOUNT", "TOO_MANY_CLAIMS", "NO_POLICE"]
values = [[0,0,0,1,1,0]]
meta = {
    "fields": ["customer_id"],
    "values": [['A2016CA740']]
}

payload_scoring = {"input_data": [{"fields": fields, "values": values, "meta": meta}]}
predictions = wml_client.deployments.score(deployment_uid, payload_scoring)
print(predictions)

In [None]:
import uuid
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))

### Enable quality monitoring

Set the minimum feedback data size to 50, and the alert threshold to 70%.

In [None]:
target = Target(
        target_type=TargetTypes.SUBSCRIPTION,
        target_id=subscription_id
)
parameters = {
    "min_feedback_data_size": 50,
    "threshold": 0.7
}
quality_monitor_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.QUALITY.ID,
    target=target,
    parameters=parameters
).result

The next cell enables the explanation service in OpenScale.

In [None]:
target = Target(
    target_type=TargetTypes.SUBSCRIPTION,
    target_id=subscription_id
)
parameters = {
    "enabled": True
}
explainability_details = wos_client.monitor_instances.create(
    data_mart_id=data_mart_id,
    background_mode=False,
    monitor_definition_id=wos_client.monitor_definitions.MONITORS.EXPLAINABILITY.ID,
    target=target,
    parameters=parameters
).result

explainability_monitor_id = explainability_details.metadata.id

The next two cells call the explanation service on our transactions, using the scoring IDs we provided. It should take between 30-60 seconds for each explanation to run. They can be run in background mode, but in this case we choose not to so the results can be displayed in the notebook.

Once the explanation service has evaluated a prediction, the data is saved in the OpenScale datamart and can be accessed without re-running the service.

In [None]:
records = wos_client.data_sets.get_list_of_records(data_set_id=payload_data_set_id, offset=0).result
scoring_ids = []
customer_ids = ['A2016CA740','A2018MV533']
for customer in customer_ids:
    for record in records['records']:
        if record["entity"]["values"]["customer_id"] == customer:
            scoring_ids.append(record["entity"]["values"]["scoring_id"])
            break
print(scoring_ids)

In [None]:
print("Running explanations on scoring IDs: {}".format(scoring_ids))
explanation_types = ["lime", "contrastive"]
result = wos_client.monitor_instances.explanation_tasks(scoring_ids=scoring_ids, explanation_types=explanation_types).result
print(result)

In [None]:
task_id = result.metadata.explanation_task_ids[0]
task_id

In [None]:
task_state = 'in_progress'
while task_state == 'in_progress':
    explanation = wos_client.monitor_instances.get_explanation_tasks(task_id).result.to_dict()
    task_state = explanation['entity']['status']['state']
    if task_state == 'finished':
        break
    print(task_state)
    time.sleep(8)
explanation

## Next Steps

Congratulations, you have successfully run the notebook. Please return to the tutorial for instructions on setting up the Flask web application that accesses the data created here and makes it available to usuers.