# Historical Data and Reference OpenScale Datamart Queries

This notebook should be run in a Watson Studio project, using Default Python 3.6 runtime environment. It requires a Cloud API key to access the following Cloud services:
* Watson OpenScale
* Watson Machine Learning

The notebook assumes the model has been deployed to Watson Machine Learning and Watson OpenScale has been configured with various monitors. It will load historical data into OpenScale to simulate a model that has been in production for some time.

# 1. Setup

#### 1.1 Dependencies

In [None]:
!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1
!pip install --upgrade watson-machine-learning-client | tail -n 1

#### 1.2 Configure Service Credentials

Update the two cells below with your Cloud API Key and your Watson Machine Learning service credentials.

In [None]:
CLOUD_API_KEY = "PASTE HERE"

In [None]:
WML_CREDENTIALS = {
    "apikey": "key",
    "iam_apikey_description": "description",
    "iam_apikey_name": "auto-generated-apikey",
    "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
    "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::",
    "instance_id": "instance_id",
    "password": "password",
    "url": "https://us-south.ml.cloud.ibm.com",
    "username": "username"
}

#### 1.3 Model Parameters

__Ensure that the two parameters match the model / deployment you have previously subscribed__

In [None]:
MODEL_NAME = "Spark German Risk Model"
DEPLOYMENT_NAME = "Spark German Risk Deployment"

#### 1.4 Gather Model Information

In [None]:
import time
from watson_machine_learning_client import WatsonMachineLearningAPIClient

wml_client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)
wml_client.repository.list_models()

model_uid = None
wml_models = wml_client.repository.get_details()
for model_in in wml_models['models']['resources']:
    if MODEL_NAME == model_in['entity']['name']:
        model_uid = model_in['metadata']['guid']
        break

deployment_uid = None
deployment = None
scoring_url = None
wml_deployments = wml_client.deployments.get_details()
for deployment_in in wml_deployments['resources']:
    if DEPLOYMENT_NAME == deployment_in['entity']['name']:
        deployment_uid = deployment_in['metadata']['guid']
        scoring_url = deployment_in['entity']['scoring_url']
        deployment = deployment_in
        break

if model_uid is None:
    print("No model ...")
    
if deployment_uid is None:
    print("No Model deployment...")
    
print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))
print("Scoring URL: {}".format(scoring_url))

#### 1.5 Get Watson OpenScale GUID

Each instance of OpenScale has a unique ID. We can get this value using the Cloud API key specified at the beginning of the notebook.

In [None]:
from ibm_ai_openscale.utils import get_instance_guid
from ibm_ai_openscale import APIClient

wos_client = None
WOS_GUID = get_instance_guid(api_key=CLOUD_API_KEY)
WOS_CREDENTIALS = {
    "instance_guid": WOS_GUID,
    "apikey": CLOUD_API_KEY,
    "url": "https://api.aiopenscale.cloud.ibm.com"
}

if WOS_GUID is None:
    print('Watson OpenScale GUID NOT FOUND')
else:
    print("Watson OpenScale GUID: {}".format(WOS_GUID))

wos_client = APIClient(aios_credentials=WOS_CREDENTIALS)
print("Watson OpenScale Python Client Version: {}".format(wos_client.version))

#### 1.6 Get deployment subscription

We have previously subscribed Watson OpenScale to our machine learning model. Here we get that subscription.

In [None]:
wos_client.data_mart.subscriptions.list()

subscriptions_uids = wos_client.data_mart.subscriptions.get_uids()
subscription_id = None
for sub in subscriptions_uids:
    if wos_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
        subscription = wos_client.data_mart.subscriptions.get(sub)
        subscription_id = sub
        break
            
if subscription is None:
    print('Subscription not found.')
    
data_mart_id = subscription.get_details()['metadata']['url'].split('/service_bindings')[0].split('marts/')[1]
print("Data Mart ID: {}".format(data_mart_id))

business_application_url = "/".join((WOS_CREDENTIALS['url'], data_mart_id,"v2", "business_applications" ))
print("Business Application URL: {}".format(business_application_url))

performance_metrics_url = WOS_CREDENTIALS['url'] + subscription.get_details()['metadata']['url'].split('/service_bindings')[0] + '/metrics'
print("Performance Metrics URL: {}".format(performance_metrics_url))

measurements_url = WOS_CREDENTIALS['url'] + subscription.get_details()['metadata']['url'].split('/service_bindings')[0] + '/measurements'
print("Measurements URL: {}".format(measurements_url))

manual_labeling_url = WOS_CREDENTIALS['url'] + subscription.get_details()['metadata']['url'].split('/service_bindings')[0] + '/manual_labelings'
print("Manual Labeling URL: {}".format(manual_labeling_url))

monitor_instances_url = "/".join((WOS_CREDENTIALS['url'], data_mart_id,"v2", "monitor_instances" ))
print("Monitor Instances URL: {}".format(monitor_instances_url))
      
binding_uid = subscription.get_details()['entity']['service_binding_id']
print("Binding ID: {}".format(binding_uid))

***

# 2. Historical Data

The next section of the notebook downloads and writes historical data to the payload and measurement tables to simulate a production model that has been monitored and receiving regular traffic for the last seven days. This historical data can be viewed in the Watson OpenScale user interface. The code uses the Python and REST APIs to write this data.

In [None]:
import os
from IPython.utils import io
from ibm_ai_openscale.utils.inject_demo_data import DemoData

historicalData = DemoData(aios_credentials=WOS_CREDENTIALS)
historical_data_path=os.getcwd()

#### 2.1 Insert historical payloads

In [None]:
with io.capture_output() as captured:
    !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_payloads_0.json
    !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_payloads_1.json
    !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_payloads_2.json
    !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_payloads_3.json
    !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_payloads_4.json
    !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_payloads_5.json
    !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_payloads_6.json
!ls -lh history_payloads_*.json

historyDays = 7

print('Starting payload loading')
historicalData.load_historical_scoring_payload(subscription, deployment_uid,file_path=historical_data_path, day_template="history_payloads_{}.json" )
print('Finished')

#### 2.2 Insert historical fairness metrics

In [None]:
import json
import datetime
import requests

with io.capture_output() as captured:
    !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_fairness.json -O history_fairness.json
!ls -lh history_fairness.json

with open('history_fairness.json', 'r') as history_file:
    payloads = json.load(history_file)

for day in range(historyDays):
    print('Loading day', day + 1)
    metrics = []
    
    for hour in range(24):
        score_time = (datetime.datetime.utcnow() + datetime.timedelta(hours=(-(24*day + hour + 1)))).strftime('%Y-%m-%dT%H:%M:%SZ')
        index = (day * 24 + hour) % len(payloads) # wrap around and reuse values if needed
        
        metric = {
            'metric_type': 'fairness',
            'binding_id': binding_uid,
            'timestamp': score_time,
            'subscription_id': model_uid,
            'asset_revision': model_uid,
            'deployment_id': deployment_uid,
            'value': payloads[index]
        }
        metrics.append(metric)
    response = requests.post(performance_metrics_url, json=metrics, headers=wos_client._get_headers())
print('Finished')

#### 2.3 Insert historical debias metrics

In [None]:
with io.capture_output() as captured:
    !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_debias.json -O history_debias.json
!ls -lh history_debias.json

with open('history_debias.json', 'r') as history_file:
    payloads = json.load(history_file)

for day in range(historyDays):
    print('Loading day', day + 1)
    debias_metrics = []
    for hour in range(24):
        score_time = (datetime.datetime.utcnow() + datetime.timedelta(hours=(-(24*day + hour + 1)))).strftime('%Y-%m-%dT%H:%M:%SZ')
        index = (day * 24 + hour) % len(payloads) # wrap around and reuse values if needed

        debiasMetric = {
            'metric_type': 'debiased_fairness',
            'binding_id': binding_uid,
            'timestamp': score_time,
            'subscription_id': model_uid,
            'asset_revision': model_uid,
            'deployment_id': deployment_uid,
            'value': payloads[index]
        }

        debias_metrics.append(debiasMetric)
    response = requests.post(performance_metrics_url, json=debias_metrics, headers=wos_client._get_headers())
print('Finished')

#### 2.4 Insert historical quality metrics

In [None]:
measurements = [0.76, 0.78, 0.68, 0.72, 0.73, 0.77, 0.80]
for day in range(historyDays):
    quality_metrics = []
    print('Day', day + 1)
    for hour in range(24):
        score_time = (datetime.datetime.utcnow() + datetime.timedelta(hours=(-(24*day + hour + 1)))).strftime('%Y-%m-%dT%H:%M:%SZ')
        qualityMetric = {
            'metric_type': 'quality',
            'binding_id': binding_uid,
            'timestamp': score_time,
            'subscription_id': model_uid,
            'asset_revision': model_uid,
            'deployment_id': deployment_uid,
            'value': {
                'quality': measurements[day],
                'threshold': 0.7,
                'metrics': [
                    {
                        'name': 'auroc',
                        'value': measurements[day],
                        'threshold': 0.7
                    }
                ]
            }
        }
        
        quality_metrics.append(qualityMetric)
    
    response = requests.post(performance_metrics_url, json=quality_metrics, headers=wos_client._get_headers())

print('Finished')

#### 2.5 Insert historical confusion matrixes

In [None]:
with io.capture_output() as captured:
    !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_quality_metrics.json -O history_quality_metrics.json
!ls -lh history_quality_metrics.json

with open('history_quality_metrics.json') as json_file:
    records = json.load(json_file)
    
for day in range(historyDays):
    index = 0
    measurments = []
    print('Day', day + 1)
    
    for hour in range(24):
        score_time = (datetime.datetime.utcnow() + datetime.timedelta(hours=(-(24*day + hour + 1)))).strftime('%Y-%m-%dT%H:%M:%SZ')

        measurement = {
            "monitor_definition_id": 'quality',
            "binding_id": subscription.binding_uid,
            "subscription_id": subscription.uid,
            "asset_id": subscription.source_uid,
            'metrics': [records[index]['metrics']],
            'sources': [records[index]['sources']],
            'timestamp': score_time
        }

        measurments.append(measurement)
        index+=1

    response = requests.post(measurements_url, json=measurments, headers=wos_client._get_headers())

print('Finished')

#### 2.6 Insert historical performance metrics

In [None]:
import random

for day in range(historyDays):
    performance_metrics = []
    print('Day', day + 1)
    for hour in range(24):
        score_time = (datetime.datetime.utcnow() + datetime.timedelta(hours=(-(24*day + hour + 1)))).strftime('%Y-%m-%dT%H:%M:%SZ')
        score_count = random.randint(60, 600)
        score_resp = random.uniform(60, 300)

        performanceMetric = {
            'metric_type': 'performance',
            'binding_id': binding_uid,
            'timestamp': score_time,
            'subscription_id': model_uid,
            'asset_revision': model_uid,
            'deployment_id': deployment_uid,
            'value': {
                'response_time': score_resp,
                'records': score_count
            }
        }
        performance_metrics.append(performanceMetric)

    response = requests.post(performance_metrics_url, json=performance_metrics, headers=wos_client._get_headers())

print('Finished')

#### 2.7 Insert historical manual labeling

In [None]:
with io.capture_output() as captured:
    !wget https://raw.githubusercontent.com/pmservice/ai-openscale-tutorials/master/assets/historical_data/german_credit_risk/wml/history_manual_labeling.json -O history_manual_labeling.json
!ls -lh history_manual_labeling.json

with open('history_manual_labeling.json', 'r') as history_file:
    records = json.load(history_file)

for day in range(historyDays):
    print('Loading day', day + 1)
    record_json = []
    for hour in range(24):
        for record in records:
            if record['fastpath_history_day'] == day and record['fastpath_history_hour'] == hour:
                record['binding_id'] = binding_uid
                record['subscription_id'] = model_uid
                record['asset_revision'] = model_uid
                record['deployment_id'] = deployment_uid
                record['scoring_timestamp'] = (datetime.datetime.utcnow() + datetime.timedelta(hours=(-(24*day + hour + 1)))).strftime('%Y-%m-%dT%H:%M:%SZ')
                record_json.append(record)
    response = requests.post(manual_labeling_url, json=record_json, headers=wos_client._get_headers())

print('Finished')

***

# 3. OpenScale Monitor Triggers

These cells allow you to start the evaluation of different monitors.

In [None]:
run_details = subscription.fairness_monitoring.run(background_mode=False)
time.sleep(10)
subscription.fairness_monitoring.show_table()

In [None]:
drift_run_details = subscription.drift_monitoring.run(background_mode=False)
time.sleep(10)
subscription.drift_monitoring.get_table_content()

In [None]:
run_details = subscription.quality_monitoring.run(background_mode=False)
time.sleep(10)
subscription.quality_monitoring.show_table()

***

# 4. Datamart Queries

Various queries against the subscription or monitor tables.

In [None]:
wos_client.data_mart.get_deployment_metrics()

In [None]:
wos_client.data_mart.bindings.list_assets()

In [None]:
wos_client.data_mart.subscriptions.list()

In [None]:
subscription.quality_monitoring.show_table()

In [None]:
#subscription.feedback_logging.show_table()
subscription.feedback_logging.print_table_schema()

In [None]:
subscription.quality_monitoring.show_table()

In [None]:
subscription.fairness_monitoring.show_table()

In [None]:
subscription.drift_monitoring.show_table()

In [None]:
subscription.payload_logging.show_table()

In [None]:
subscription.payload_logging.print_table_schema()

## Next steps

__Return to the workshop instruction book.__


## Credits

This notebook was adapted from the following sources:

* [Monitor Models Code Pattern](https://github.com/IBM/monitor-wml-model-with-watson-openscale)
* [OpenScale Labs](https://github.com/pmservice/OpenScale-Labs)
* [OpenScale Tutorials](https://github.com/pmservice/ai-openscale-tutorials)

#### Original Authors
* Eric Martens, is a technical specialist having expertise in analysis and description of business processes, and their translation into functional and non-functional IT requirements. He acts as the interpreter between the worlds of IT and business.
* Lukasz Cmielowski, PhD, is an Automation Architect and Data Scientist at IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.
