# Infuse Applications with AI Using IBM Watson OpenScale

The following notebook is intended for use with the Watson OpenScale hands-on lab found [here](https://dtelink). It contains instructions and data for training and deploying an insurance fraud prediction model, and configuring Watson OpenScale to monitor and provide detailed explanations for that model's predictions.

This notebook should be run in a Watson Studio project, using a Python 3.5 or above runtime environment. If you are viewing this in Watson Studio and do not see Python 3.5 or above in the upper right corner of your screen, please update the runtime now. It requires service credentials for the following Cloud services:

* __IBM Watson OpenScale__
* __Watson Machine Learning__

If you have a paid Cloud account, you may also provision a __Databases for PostgreSQL__ or __Db2 Warehouse__ service to take full advantage of integration with Watson Studio and continuous learning services. If you choose not to provision this paid service, you can use the free internal PostgreSQL storage with OpenScale, but will not be able to configure continuous learning for your model.

## Install packages

In [None]:
!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1
!pip install --upgrade watson-machine-learning-client | tail -n 1
!pip install --upgrade numpy --no-cache | tail -n 1
!pip install --upgrade SciPy --no-cache | tail -n 1
!pip install lime --no-cache | tail -n 1
!pip install 'scikit-learn==0.19.1' --force-reinstall

In [None]:
import sklearn
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import numpy as np
from sklearn.cross_validation import train_test_split
from scipy.io import arff
from watson_machine_learning_client import WatsonMachineLearningAPIClient

## Provision services and configure credentials

In this section, you will add your credentials for Watson Machine Learning and OpenScale. If you have not already, provision an instance of IBM Watson OpenScale using the [OpenScale link in the Cloud catalog](https://cloud.ibm.com/catalog/services/watson-openscale).

Your Cloud API key can be generated by going to the [__Users__ section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the __API Keys__ section, and click __Create an IBM Cloud API key__. Give your key a name and click __Create__, then copy the created key and paste it between the single quotes in the cell below.

In [None]:
CLOUD_API_KEY = '__PASTE_HERE___'

Next you will need credentials for Watson Machine Learning. If you already have a WML instance, you may use credentials for it. To provision a new Lite instance of WML, use the [Cloud catalog](https://cloud.ibm.com/catalog/services/machine-learning), give your service a name, and click __Create__. Once your instance is created, click the __Service Credentials__ link on the left side of the screen. Click the __New credential__ button, give your credentials a name, and click __Add__. Your new credentials can be accessed by clicking the __View credentials__ button. Copy and paste your WML credentials into the cell below.

In [None]:
WML_CREDENTIALS = {
    "apikey": "key",
    "iam_apikey_description": "description",
    "iam_apikey_name": "auto-generated-apikey",
    "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
    "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::",
    "instance_id": "instance_id",
    "password": "password",
    "url": "https://us-south.ml.cloud.ibm.com",
    "username": "username"
}

This tutorial can use Databases for PostgreSQL, Db2 Warehouse, or a free internal version of PostgreSQL to create a datamart for OpenScale. The free internal version can be accessed via the OpenScale APIs, but you will be unable to access it using direct database queries.

If you have previously configured OpenScale, it will use your existing datamart, and not interfere with any models you are currently monitoring. Do not update the cell below.

If you do not have a paid Cloud account or would prefer not to provision this paid service, you may use the free internal PostgreSQL service with OpenScale. Do not update the cell below.

To provision a new instance of Db2 Warehouse, locate [Db2 Warehouse in the Cloud catalog](https://cloud.ibm.com/catalog/services/db2-warehouse), give your service a name, and click __Create__. Once your instance is created, click the __Service Credentials__ link on the left side of the screen. Click the __New credential__ button, give your credentials a name, and click __Add__. Your new credentials can be accessed by clicking the __View credentials__ button. Copy and paste your Db2 Warehouse credentials into the cell below.

To provision a new instance of Databases for PostgreSQL, locate [Databases for PostgreSQL](https://cloud.ibm.com/catalog/services/databases-for-postgresql) in the Cloud catalog, give your service a name, and click __Create__. Once your instance is created, click the __Service Credentials__ link on the left side of the screen. Click the __New credential__ button, give your credentials a name, and click __Add__. Your new credentials can be accessed by clicking the __View credentials__ button. Copy and paste your Databases for PostgreSQL credentials into the cell below.

In [None]:
DB_CREDENTIALS = None

## Restart the kernel and run the notebook

At this point, the notebook is ready to run. _You must restart the kernel via the kernel menu above_. You can either restart the kernel and run the cells one at a time, starting from the package installation, or click the __Kernel__ option above and select __Restart and Run All__ to run all the cells.

### Get the training data from github

In [None]:
!rm training_data.csv
!wget https://raw.githubusercontent.com/emartensibm/openscale_insurance/master/data/training_data.csv

### Explore the data

The training data contains information on auto insurance claims that may indicate a higher likelihood of fraudulent claims. In this case, we have a set of binary variables for the following:
* __SUSPICIOUS\_CLAIM\_TIME__: The claim was filed after too much time had elapsed following the incident
* __EXPIRED\_LICENSE__: The person filing the claim did not have a valid drivers license at the time of the incident
* __LOW\_MILES\_AT\_LOSS__: The vehicle's mileage at the time of loss was lower than expected
* __EXCESSIVE\_CLAIM\_AMOUNT__: The dollar amount claimed was higher than expected given the value of the vehicle
* __TOO\_MANY\_CLAIMS__: The person filing the claim has multiple claims outstanding
* __NO\_POLICE__: No police report was filed for the loss incident

In [None]:
features = ["SUSPICIOUS_CLAIM_TIME", "EXPIRED_LICENSE", "LOW_MILES_AT_LOSS", "EXCESSIVE_CLAIM_AMOUNT", "TOO_MANY_CLAIMS", "NO_POLICE", "FLAG_FOR_FRAUD_INV"]
df_model = pd.read_csv('training_data.csv')

df_model.drop(["DRIVER_ID", "POLICY_ID", "CLAIM_ID", "HOUSEHOLD_ID", "ZIPCODE"], axis=1, inplace=True)

df_model["SUSPICIOUS_CLAIM_TIME"] = df_model["SUSPICIOUS_CLAIM_TIME"].astype(int)
df_model["EXPIRED_LICENSE"] = df_model["EXPIRED_LICENSE"].astype(int)
df_model["LOW_MILES_AT_LOSS"] = df_model["LOW_MILES_AT_LOSS"].astype(int)
df_model["EXCESSIVE_CLAIM_AMOUNT"] = df_model["EXCESSIVE_CLAIM_AMOUNT"].astype(int)
df_model["TOO_MANY_CLAIMS"] = df_model["TOO_MANY_CLAIMS"].astype(int)
df_model["NO_POLICE"] = df_model["NO_POLICE"].astype(int)
df_model["FLAG_FOR_FRAUD_INV"] = df_model["FLAG_FOR_FRAUD_INV"].astype(int)

df_model.head()

Identify the training data columns and label columns, and set up a train/test split of 80/20.

In [None]:
xVar = df_model[["SUSPICIOUS_CLAIM_TIME", "EXPIRED_LICENSE", "LOW_MILES_AT_LOSS", "EXCESSIVE_CLAIM_AMOUNT", "TOO_MANY_CLAIMS", "NO_POLICE"]]
yVar = df_model["FLAG_FOR_FRAUD_INV"]

x_train, x_test, y_train, y_test = train_test_split(xVar, yVar, test_size=0.2)

Create a scikit-learn Random Forest Classifier and fit the training data.

In [None]:
model = RandomForestClassifier(n_jobs=2, random_state=0)
model.fit(x_train, y_train)

Check the test data using the model. For this model, an output of 1 indicates likely fraud; an output of 0 indicates unlikely fraud.

In [None]:
predict_result = model.predict(x_test)
pd.crosstab(y_test, predict_result, rownames = ["Actual Result"], colnames = ["Predicted Result"])

## Store the model in Watson Machine Learning

In this section, the notebook uses the supplied Watson Machine Learning credentials to save the model to the WML instance. Previous versions of the model are removed so that the notebook can be run again, resetting all data for another demo.

In [None]:
wml_client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)

In [None]:
wml_client.repository.list_models()

In [None]:
MODEL_NAME = "SKLearn Fraud Prediction"
DEPLOYMENT_NAME = "SKLearn Fraud Deployment"

In [None]:
model_deployment_ids = wml_client.deployments.get_uids()
for deployment_id in model_deployment_ids:
    deployment = wml_client.deployments.get_details(deployment_id)
    model_id = deployment['entity']['deployable_asset']['guid']
    if deployment['entity']['name'] == DEPLOYMENT_NAME:
        print('Deleting deployment id', deployment_id)
        wml_client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        wml_client.repository.delete(model_id)
wml_client.repository.list_models()

In [None]:
model_props = {
    wml_client.repository.ModelMetaNames.NAME: "{}".format(MODEL_NAME),
    wml_client.repository.ModelMetaNames.EVALUATION_METHOD: "binary",
    wml_client.repository.ModelMetaNames.FRAMEWORK_NAME: "scikit-learn",
    wml_client.repository.ModelMetaNames.FRAMEWORK_VERSION: "0.19",
    wml_client.repository.ModelMetaNames.RUNTIME_NAME: "python",
    wml_client.repository.ModelMetaNames.RUNTIME_VERSION: "3.5"
}

df_train = df_model.copy()
df_train.drop("FLAG_FOR_FRAUD_INV", axis=1, inplace=True)

In [None]:
df_train.head()

In [None]:
wml_models = wml_client.repository.get_details()
model_uid = None
for model_in in wml_models['models']['resources']:
    if MODEL_NAME == model_in['entity']['name']:
        model_uid = model_in['metadata']['guid']
        break

if model_uid is None:
    print("Storing model ...")

    published_model_details = wml_client.repository.store_model(model=model, meta_props=model_props, training_data=df_train, training_target=df_model["FLAG_FOR_FRAUD_INV"])
    model_uid = wml_client.repository.get_model_uid(published_model_details)
    print("Done")

## Deploy the model

In this section, the model is deployed as a web service.

In [None]:
wml_deployments = wml_client.deployments.get_details()
deployment_uid = None
for deployment in wml_deployments['resources']:
    if DEPLOYMENT_NAME == deployment['entity']['name']:
        deployment_uid = deployment['metadata']['guid']
        break

if deployment_uid is None:
    print("Deploying model...")

    deployment = wml_client.deployments.create(artifact_uid=model_uid, name=DEPLOYMENT_NAME, asynchronous=False)
    deployment_uid = wml_client.deployments.get_uid(deployment)
    
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))

The deployed model is available as a web service, and can be called via the scoring endpoint. Values are passed and predictions are returned as JSON objects.

In [None]:
scoring_endpoint = None

for deployment in wml_client.deployments.get_details()['resources']:
    if deployment_uid in deployment['metadata']['guid']:
        scoring_endpoint = deployment['entity']['scoring_url']
        
print(scoring_endpoint)

In [None]:
fields = ["SUSPICIOUS_CLAIM_TIME", "EXPIRED_LICENSE", "LOW_MILES_AT_LOSS", "EXCESSIVE_CLAIM_AMOUNT", "TOO_MANY_CLAIMS", "NO_POLICE"]
values = [[0,1,0,1,0,1]]
payload_scoring = {"fields": fields,"values": values}
scoring_response = wml_client.deployments.score(scoring_endpoint, payload_scoring)
print(scoring_response)

## Configure OpenScale

We will now configure Watson OpenScale to monitor the deployed model. When this step is finished, all data into and out of the model will be logged, and can be made available to our applications via the Python API. Additionally, we will have the ability to generate explanations for individual predictions.

In [None]:
from ibm_ai_openscale import APIClient
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *
from ibm_ai_openscale.supporting_classes import PayloadRecord, Feature
from ibm_ai_openscale.supporting_classes.enums import *

Get the unique identifier for the OpenScale instance.

In [None]:
import requests
from ibm_ai_openscale.utils import get_instance_guid

WOS_GUID = get_instance_guid(api_key=CLOUD_API_KEY)
WOS_CREDENTIALS = {
    "instance_guid": WOS_GUID,
    "apikey": CLOUD_API_KEY,
    "url": "https://api.aiopenscale.cloud.ibm.com"
}

if WOS_GUID is None:
    print('Watson OpenScale GUID NOT FOUND')
else:
    print(WOS_GUID)

Create the OpenScale client.

In [None]:
ai_client = APIClient(aios_credentials=WOS_CREDENTIALS)
ai_client.version

The code below creates the OpenScale datamart, a database in which OpenScale will store its data. If you have already set up OpenScale, it will use your existing datamart and not remove any previous data. If you specified Db2 Warehouse or Databases for PostgreSQL credentials above, it will use those credentials to create a datamart with that paid service. Finally, if you have not previously used OpenScale and did not supply credentials for a paid database service, it will create the datamart in a free, internal database. This internal database still allows access via the OpenScale APIs, but you cannot access it directly via database queries.

In [None]:
try:
    data_mart_details = ai_client.data_mart.get_details()
    if 'internal_database' in data_mart_details and data_mart_details['internal_database']:
        print('Using existing internal datamart.') 
    else:
        print('Using existing external datamart')
except:
    if DB_CREDENTIALS is None:
        print('Setting up internal datamart')
        ai_client.data_mart.setup(internal_db=True)
    else:
        print('Setting up external datamart')
        try:
            ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS)
        except:
            print('Setup failed, trying Db2 setup')
            ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS, schema=DB_CREDENTIALS['username'])

Bind the OpenScale instance to the Watson Machine Learning instance. If you have already set up OpenScale, this cell will generate a warning that the binding already exists.

In [None]:
binding_uid = ai_client.data_mart.bindings.add('WML instance', WatsonMachineLearningInstance(WML_CREDENTIALS))
if binding_uid is None:
    binding_uid = ai_client.data_mart.bindings.get_details()['service_bindings'][0]['metadata']['guid']
bindings_details = ai_client.data_mart.bindings.get_details()
ai_client.data_mart.bindings.list()

In [None]:
ai_client.data_mart.bindings.list_assets()

The cells below will delete any existing OpenScale subscriptions for this particular model, ensuring that we have the most up-to-date model version. They will then create a new subscription for monitoring the new model.

In [None]:
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
for subscription in subscriptions_uids:
    sub_name = ai_client.data_mart.subscriptions.get_details(subscription)['entity']['asset']['name']
    if sub_name == MODEL_NAME:
        ai_client.data_mart.subscriptions.delete(subscription)
        print('Deleted existing subscription for', MODEL_NAME)

In [None]:
subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
    model_uid,
    problem_type=ProblemType.BINARY_CLASSIFICATION,
    input_data_type=InputDataType.STRUCTURED,
    label_column='FLAG_FOR_FRAUD_INV',
    prediction_column='prediction',
    probability_column='probability',
    feature_columns = ["SUSPICIOUS_CLAIM_TIME", "EXPIRED_LICENSE", "LOW_MILES_AT_LOSS", "EXCESSIVE_CLAIM_AMOUNT", "TOO_MANY_CLAIMS", "NO_POLICE"],
    categorical_columns = []
))

In [None]:
ai_client.data_mart.subscriptions.list()

Now that the datamart and subscription have been created, we need to send some sample data to the model for scoring so that OpenScale can create the correct schema for the payload logging table that will store our prediction history. These two records will be the two that we use for explanations as well.

Note that we specify a transaction ID for the scoring request; this simulates a request coming from a user app, where the transaction ID matches the unique ID for the insurance claim, and will allow us to tie the prediction and explanation with a particular claim.

In [None]:
fields = ["SUSPICIOUS_CLAIM_TIME", "EXPIRED_LICENSE", "LOW_MILES_AT_LOSS", "EXCESSIVE_CLAIM_AMOUNT", "TOO_MANY_CLAIMS", "NO_POLICE"]
values = [[0,1,0,0,0,1]]
payload_scoring = {"fields": fields,"values": values}
scoring_response = wml_client.deployments.score(scoring_endpoint, payload_scoring, transaction_id='A2018MV533')
print(scoring_response)

In [None]:
fields = ["SUSPICIOUS_CLAIM_TIME", "EXPIRED_LICENSE", "LOW_MILES_AT_LOSS", "EXCESSIVE_CLAIM_AMOUNT", "TOO_MANY_CLAIMS", "NO_POLICE"]
values = [[0,0,0,1,1,0]]
payload_scoring = {"fields": fields,"values": values}
scoring_response = wml_client.deployments.score(scoring_endpoint, payload_scoring, transaction_id='A2016CA740')
print(scoring_response)

Before we begin enabling OpenScale monitors, we pause for ten seconds to allow OpenScale time to create the schema in the prediction logging table in the datamart. We then enable quality monitoring, specifying an alert threshold of 0.7 and a minimum records threshold of 50. These settings mean that OpenScale will use a minimum of 50 records to calculate model quality, and alert us if the quality value falls below 70%.

In [None]:
time.sleep(10)
subscription.quality_monitoring.enable(threshold=0.7, min_records=50)

The next cell enables the explanation service in OpenScale, passing in the training data so that OpenScale can do some necessary calculations.

In [None]:
from ibm_ai_openscale.supporting_classes import *

subscription.explainability.enable(training_data=df_model)

With everything now configured, we can use the OpenScale Python client to import the contents of the payload logging table with the Pandas library.

In [None]:
pandas_table_content = subscription.payload_logging.get_table_content()

In [None]:
pandas_table_content

The next two cells call the explanation service on our transactions, using the scoring IDs we provided. It should take between 30-60 seconds for each explanation to run. They can be run in background mode, but in this case we choose not to so the results can be displayed in the notebook.

Once the explanation service has evaluated a prediction, the data is saved in the OpenScale datamart and can be accessed without re-running the service.

In [None]:
explain_run = subscription.explainability.run(transaction_id='A2018MV533-1', background_mode=False)

In [None]:
pd.DataFrame.from_dict(explain_run['entity']['predictions'][0]['explanation_features'])

In [None]:
explain_run = subscription.explainability.run(transaction_id='A2016CA740-1', background_mode=False)

In [None]:
pd.DataFrame.from_dict(explain_run['entity']['predictions'][0]['explanation_features'])

## Next Steps

Congratulations, you have successfully run the notebook. Please return to the tutorial for instructions on setting up the Flask web application that accesses the data created here and makes it available to usuers.