# Watson OpenScale Mortgage Default Demo

This notebook should be run in a [Watson Studio](https://dataplatform.ibm.com/) project with Python 3.6 or greater. It requires a free lite version of [Watson Machine Learning](https://cloud.ibm.com/catalog/services/machine-learning).

This notebook will train, save and deploy a machine learning model to predict mortgage defaults. Then, it will configure OpenScale to monitor the model.

## Provision services and create credentials

You will need credentials for Watson Machine Learning. If you already have a WML instance, you may use credentials for it. To provision a new Lite instance of WML, use the [Cloud catalog](https://cloud.ibm.com/catalog/services/machine-learning), give your service a name, and click **Create**. Once your instance is created, click the **Service Credentials** link on the left side of the screen. Click the **New credential** button, give your credentials a name, and click **Add**. Your new credentials can be accessed by clicking the **View credentials** button. Copy and paste your WML credentials into the cell below.

In [None]:
WML_CREDENTIALS = {
    "apikey": "key",
    "iam_apikey_description": "description",
    "iam_apikey_name": "auto-generated-apikey",
    "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
    "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::",
    "instance_id": "instance_id",
    "url": "https://us-south.ml.cloud.ibm.com",
}

You can generate a Cloud API key [here](https://cloud.ibm.com/iam/apikeys).

In [None]:
CLOUD_API_KEY = "xxxxxxxxxxxxxxxxx"

If you have already set up an OpenScale datamart, or if you would like to use the free internal PostgreSQL datamart, you can skip the following cell. If you are setting up a new instance of OpenScale and would like to use a paid database service, paste your [Db2](https://cloud.ibm.com/catalog/services/db2-warehouse) or [PostgreSQL](https://cloud.ibm.com/catalog/services/databases-for-postgresql) credentials below.

In [None]:
DB_CREDENTIALS = None

## Name your model

You may give your model and deployment a custom name below; however, if you change the values below, be sure to use the same names in all subsequent notebooks in this lab.

In [None]:
MODEL_NAME = 'Mortgage Default Demo'
DEPLOYMENT_NAME = 'Mortgage Default - Demo'

In [None]:
!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1

## Run the notebook

At this point, you can run all cells in this notebook using the menus above.

Import the scikit-learn framework and check the version. This notebook was developed using sklearn version 0.20.3.

In [None]:
import sklearn
sklearn.__version__

Use the provided credentials above to create a new Watson Machine Learning client.

In [None]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

wml_client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)

List all models for this instance of Watson Machine Learning.

In [None]:
wml_client.repository.list_models()

Import the pandas library, download and examine our training data. The data contains an 'ID' field for the loan ID, which will not be used in training the model and is dropped.

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/emartensibm/openscale-demos/master/mortgage-default/data/Mortgage_Full_Records.csv'
df_raw = pd.read_csv(url)
df = df_raw.drop('ID', axis=1)
df.head()

Import the sklearn libraries we need, including encoders, transformers, scalers, and our random forest classifier.

In [None]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

Identify the categorical features, and create a one-hot encoder pipeline for them.

Next, identify the numerical features and use the min-max scaler to scale the values, which will significantly increase our model's accuracy.

Finally, organize the categorical encoder and the scaler into a pipeline so the deployed model can work with our data.

In [None]:
categorical_features = ['AppliedOnline','Residence','Location']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

scaled_features = ['Income','Yrs_at_Current_Address','Yrs_with_Current_Employer',\
                   'Number_of_Cards','Creditcard_Debt','Loan_Amount','SalePrice']
scale_transformer = Pipeline(steps=[('scale', MinMaxScaler())])

preprocessor = ColumnTransformer(
    transformers=[
        ('cat', categorical_transformer, categorical_features),
        ('scaler', scale_transformer, scaled_features)
    ]
)

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', RandomForestClassifier())])

Perform the train/test split, train the model, and score the model quality.

In [None]:
X = df.drop('MortgageDefault', axis=1)
y = df['MortgageDefault']

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)

model = clf.fit(X_train, y_train)
res_predict = model.predict(X_test)
print("model score: %.3f" % clf.score(X_test, y_test))
print(classification_report(y_test, res_predict, target_names=["False", "True"]))

## Save the model to Watson Machine Learning

Check the list of models in the WML instance, and remove pre-existing versions of this model. This allows the notebook to be re-run to reset all data if necessary.

In [None]:
model_deployment_ids = wml_client.deployments.get_uids()
for deployment_id in model_deployment_ids:
    deployment = wml_client.deployments.get_details(deployment_id)
    model_id = deployment['entity']['deployable_asset']['guid']
    if deployment['entity']['name'] == DEPLOYMENT_NAME:
        print('Deleting deployment id', deployment_id)
        wml_client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        wml_client.repository.delete(model_id)
wml_client.repository.list_models()

Create the metadata and save the model.

In [None]:
metadata = {
    wml_client.repository.ModelMetaNames.NAME: MODEL_NAME,
    wml_client.repository.ModelMetaNames.EVALUATION_METHOD: "binary",
    wml_client.repository.ModelMetaNames.EVALUATION_METRICS: [
        {
            "name": "areaUnderROC",
            "value": 0.7,
            "threshold": 0.7
        }
    ]
}

# Name the columns
cols=["Income","AppliedOnline","Residence","Yrs_at_Current_Address","Yrs_with_Current_Employer",\
      "Number_of_Cards","Creditcard_Debt","Loans","Loan_Amount","SalePrice","Location"]
      
saved_model = wml_client.repository.store_model(model=model, meta_props=metadata, 
                                            training_data=X_train, training_target=y_train, 
                                            feature_names=cols, label_column_names=["MortgageDefault"] )
saved_model

Get the unique ID for the model so we can deploy it.

In [None]:
model_uid = saved_model['metadata']['guid']
model_uid

Deploy the model as a web service with Watson Machine Learning.

In [None]:
print("Deploying model...")

deployment = wml_client.deployments.create(artifact_uid=model_uid, name=DEPLOYMENT_NAME, asynchronous=False)

In [None]:
deployment_uid = wml_client.deployments.get_uid(deployment)

print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))

# OpenScale Mortgage Default Configuration

This pportion of the notebook will configure OpenScale monitoring for the mortgage default model using the Python client, as opposed to the graphical user interface.

In [None]:
from ibm_ai_openscale import APIClient
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *
from ibm_ai_openscale.supporting_classes import PayloadRecord, Feature
from ibm_ai_openscale.supporting_classes.enums import *

Get the instance ID for Watson OpenScale.

In [None]:
import requests
from ibm_ai_openscale.utils import get_instance_guid

WOS_GUID = get_instance_guid(api_key=CLOUD_API_KEY)
WOS_CREDENTIALS = {
    "instance_guid": WOS_GUID,
    "apikey": CLOUD_API_KEY,
    "url": "https://api.aiopenscale.cloud.ibm.com"
}

if WOS_GUID is None:
    print('Watson OpenScale GUID NOT FOUND')
else:
    print(WOS_GUID)

Use the Cloud API key and WOS instance ID to create a new OpenScale client.

In [None]:
ai_client = APIClient(aios_credentials=WOS_CREDENTIALS)
ai_client.version

Set up the OpenScale datamart. First check for an existing datamart. If none is found, create one using the DB_CREDENTIALS if provided. If no credentials were provided, use the free internal datamart.

In [None]:
try:
    data_mart_details = ai_client.data_mart.get_details()
    if 'internal_database' in data_mart_details and data_mart_details['internal_database']:
        print('Using existing internal datamart')
    else:
        print('Using existing external datamart')
except:
    if DB_CREDENTIALS is None:
        print('Setting up internal datamart')
        ai_client.data_mart.setup(internal_db=True)
    else:
        print('Setting up external datamart')
        try:
            ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS)
        except:
            print('Setup failed, trying Db2 setup')
            ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS, schema=DB_CREDENTIALS['username'])

In [None]:
data_mart_details = ai_client.data_mart.get_details()

Bind the OpenScale datamart to the WML instance. If the binding already exists, this will generate an error message, but will not affect the remainder of the notebook.

In [None]:
binding_uid = ai_client.data_mart.bindings.add('WML Binding', WatsonMachineLearningInstance(WML_CREDENTIALS))
bindings_details = ai_client.data_mart.bindings.get_details()

ai_client.data_mart.bindings.list()

In [None]:
print(binding_uid)

Get the scoring endpoint for the deployed model.

In [None]:
deployment_details = wml_client.deployments.get_details(deployment_uid)
scoring_endpoint = deployment_details['entity']['scoring_url']

print('Model UID:', model_uid)
print('Scoring URL:', scoring_endpoint)

List all the subscribed models.

In [None]:
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
ai_client.data_mart.subscriptions.list()

The credentials below point to the training data for the model, in CSV format. OpenScale uses the training data to train the drift model, and generate distribution statistics for the explainability service and the fairness monitor. If you don't want to provide this information to OpenScale, it is possible to run a custom notebook to create this data.

In [None]:
cos_credentials = {
    "apikey": "yqcPbWZ0AQPHleHVerrR4Wx5e9pymBdMgydbEra5zCif",
    "api_key": "yqcPbWZ0AQPHleHVerrR4Wx5e9pymBdMgydbEra5zCif",
    "url": "https://s3.us.cloud-object-storage.appdomain.cloud",
    "iam_url": 'https://iam.bluemix.net/oidc/token',
    "cos_hmac_keys": {
        "access_key_id": "2d1be760f19241d695a534960da6eb80",
        "secret_access_key": "e1252b952f47a6b3f42305b8ffe6f9bd7d10e45f966b9a62"
    },
    "endpoints": "https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints",
    "iam_apikey_description": "Auto-generated for key 2d1be760-f192-41d6-95a5-34960da6eb80",
    "iam_apikey_name": "FastStartLab",
    "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Reader",
    "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/7d8b3c34272c0980d973d3e40be9e9d2::serviceid:ServiceId-568ba191-a3bf-48f2-a30c-f3a4af7ec61d",
    "resource_instance_id": "crn:v1:bluemix:public:cloud-object-storage:global:a/7d8b3c34272c0980d973d3e40be9e9d2:2883ef10-23f1-4592-8582-2f2ef4973639::"
}

In [None]:
training_data_reference = {
    'type': 'cos',
    'location': {
        'bucket': 'faststartlab-donotdelete-pr-nhfd4jnhlxgpc7',
        'file_name': 'Mortgage_Full_Records.csv',
        'firstlineheader': True,
        'file_format': 'csv'
    },
    'connection': cos_credentials,
    'name': 'training data reference'
}

Create the subscription in OpenScale so we can monitor the model. Required information includes feature columns, categorical columns, problem types, input types, and output types.

In [None]:
subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
    model_uid,
    problem_type=ProblemType.BINARY_CLASSIFICATION,
    input_data_type=InputDataType.STRUCTURED,
    label_column='MortgageDefault',
    prediction_column='prediction',
    probability_column='probability',
    transaction_id_column='ID',
    feature_columns = ['AppliedOnline','Residence','Location','Income','Yrs_at_Current_Address','Yrs_with_Current_Employer',\
                   'Number_of_Cards','Creditcard_Debt','Loan_Amount','Loans','SalePrice'],
    categorical_columns = ['AppliedOnline','Residence','Location'],
    training_data_reference = training_data_reference
))

if subscription is None:
    print('Subscription already exists; get the existing one')
    subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
    for sub in subscriptions_uids:
        if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
            subscription = ai_client.data_mart.subscriptions.get(sub)

In [None]:
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
ai_client.data_mart.subscriptions.list()

In [None]:
subscription_details = subscription.get_details()

Score the model so we can begin configuring OpenScale monitors.

In [None]:
!rm mortgage_feed.json
!wget https://raw.githubusercontent.com/emartensibm/openscale-demos/master/mortgage-default/data/mortgage_feed.json

In [None]:
import json

with open('mortgage_feed.json', 'r') as scoring_file:
    data = json.load(scoring_file)

scoring_payload = {
    "fields": data['fields'][1:],
    "values": [],
    "meta":{
        "fields": ["ID"],
        "values": []
    }
}

In [None]:
import random
import string

letters = string.digits

for _ in range(0, 201):
    value_to_score = random.choice(data['values'])
    scoring_payload['values'].append(value_to_score[1:])
    scoring_payload['meta']['values'].append([int(''.join(random.choices(letters, k=8)))])
print(len(scoring_payload['values']))

In [None]:
predictions = wml_client.deployments.score(scoring_endpoint, scoring_payload)
print(predictions['values'][0])

Pause for ten seconds to give OpenScale time to create the datamart schema, then show the number of records in the datamart.

In [None]:
time.sleep(10)
subscription.payload_logging.get_records_count()

Enable quality monitoring. Set the alert threshold at 70%, and the minimum records for scoring at 100.

In [None]:
subscription.quality_monitoring.enable(threshold=0.7, min_records=100)

Enable fairness monitoring. In this case, we'll monitor online applications, since they are under-represented in our training data. The fairness alert threshold will be set at 90%, and we will require at least 200 records for scoring. Note that we have to supply a favourable and unfavourable model outcome, as well as a majority (reference) and minority (monitored) group.

In [None]:
subscription.fairness_monitoring.enable(
    features=[
        Feature("AppliedOnline", majority=['NO'], minority=['YES'], threshold=0.90)
    ],
    favourable_classes=['NO'],
    unfavourable_classes=['YES'],
    min_records=200
)

Configure drift monitoring. Set the alert threshold at 5% predicted drop in accuracy, and the minimum records required for scoring at 100. This will begin training the drift monitor model in OpenScale.

In [None]:
subscription.drift_monitoring.enable(threshold=0.05, min_records=100)

Monitor the creation of the drift monitor. Check every 30 seconds to see if it has been successfully created.

In [None]:
drift_status = None
while drift_status != 'finished':
    drift_details = subscription.drift_monitoring.get_details()
    drift_status = drift_details['parameters']['config_status']['state']
    if drift_status != 'finished':
        print(drift_status)
        time.sleep(30)
print(drift_status)

In [None]:
print(drift_details['parameters'])

Manually run the fairness monitor.

In [None]:
fairness_run_details = subscription.fairness_monitoring.run(background_mode=False)

In [None]:
fairness_run_details

In [None]:
subscription.fairness_monitoring.show_table()

Manually run the drift monitor

In [None]:
drift_run_details = subscription.drift_monitoring.run(background_mode=False)

In [None]:
drift_run_details

In [None]:
subscription.drift_monitoring.show_table()

Upload feedback data to run the quality monitor

In [None]:
feedback_payload_set = df.values.tolist()

feedback_payload = []
for _ in range(0, 101):
    feedback_payload.append(random.choice(feedback_payload_set))
print(feedback_payload[0])

In [None]:
subscription.feedback_logging.store(feedback_payload)

In [None]:
time.sleep(10)
subscription.feedback_logging.show_table()

In [None]:
run_details = subscription.quality_monitoring.run(background_mode=False)

In [None]:
time.sleep(10)
subscription.quality_monitoring.show_table()

Enable explainability.

In [None]:
from ibm_ai_openscale.supporting_classes import *

subscription.explainability.enable()

Get a transaction to explain.

In [None]:
transaction_id = subscription.payload_logging.get_table_content(limit=1)['scoring_id'].values[0]

print(transaction_id)

In [None]:
explain_run = subscription.explainability.run(transaction_id=transaction_id, background_mode=False, cem=False)

In [None]:
explain_run

## Congratulations!

You have successfully created the mortgage default model and configured it for monitoring with Watson OpenScale. Next, set up the model feed notebook to run at regular intervals to pump data into your model.