<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Custom Machine Learning engine

This notebook shows how to log the payload for the model deployed on custom model serving engine using AI OpenScale python sdk.

Contents
- [1. Setup](#setup)
- [2. Binding machine learning engine](#binding)
- [3. Subscriptions](#subscription)
- [4. Scoring and payload logging](#scoring)
- [5. Feedback logging](#feedback)
- [6. Data Mart](#datamart)

<a id="setup"></a>
## 1. Setup

### 1.0 Sample custom machine learning engine

The sample machine learning engine based on docker image and deployment instructions can be found [here](https://github.com/pmservice/ai-openscale-tutorials/tree/master/applications/custom-ml-engine).

**NOTE:** CUSTOM machine learning engine must follow this [API specification](https://aiopenscale-custom-deployement-spec.mybluemix.net/) to be supported.

### 1.1 Installation and authentication

In [1]:
!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1

Requirement not upgraded as not directly required: jmespath<1.0.0,>=0.7.1 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from ibm-cos-sdk-core==2.*,>=2.0.0->ibm-cos-sdk->watson-machine-learning-client->ibm-ai-openscale)


Import and initiate.

In [2]:
from ibm_ai_openscale import APIClient
from ibm_ai_openscale.supporting_classes import PayloadRecord
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *

#### ACTION: Get AI OpenScale `instance_guid` and `apikey`

How to install IBM Cloud (bluemix) console: [instruction](https://console.bluemix.net/docs/cli/reference/ibmcloud/download_cli.html#install_use)

How to get api key using bluemix console:
```
bx login --sso
bx iam api-key-create 'my_key'
```

How to get your AI OpenScale instance GUID

- if your resource group is different than `default` switch to resource group containing AI OpenScale instance
```
bx target -g <myResourceGroup>
```
- get details of the instance
```
bx resource service-instance 'AI-OpenScale-instance_name'
```

#### Let's define some constants required to set up data mart:

- AIOS_CREDENTIALS
- POSTGRES_CREDENTIALS
- SCHEMA_NAME

In [4]:
AIOS_CREDENTIALS = {
  "url": "https://api.aiopenscale.cloud.ibm.com",
  "instance_guid": "***",
  "apikey": "***"
}

In [5]:
# The code was removed by Watson Studio for sharing.

In [6]:
POSTGRES_CREDENTIALS = {
    "db_type": "postgresql",
    "uri_cli_1": "xxx",
    "maps": [],
    "instance_administration_api": {
        "instance_id": "xxx",
        "root": "xxx",
        "deployment_id": "xxx"
    },
    "name": "xxx",
    "uri_cli": "xxx",
    "uri_direct_1": "xxx",
    "ca_certificate_base64": "xxx",
    "deployment_id": "xxx",
    "uri": "xxx"
}

In [7]:
# The code was removed by Watson Studio for sharing.

In [28]:
SCHEMA_NAME = 'data_mart_for_custom'

Create schema for data mart.

In [29]:
create_postgres_schema(postgres_credentials=POSTGRES_CREDENTIALS, schema_name=SCHEMA_NAME)

In [30]:
client = APIClient(AIOS_CREDENTIALS)

In [31]:
client.version

'1.0.287'

### 1.2 DataMart setup

In [32]:
client.data_mart.setup(db_credentials=POSTGRES_CREDENTIALS, schema=SCHEMA_NAME)

In [33]:
data_mart_details = client.data_mart.get_details()

<a id="binding"></a>
## 2. Bind machine learning engines

### 2.1 Bind  `CUSTOM` machine learning engine
**NOTE:** CUSTOM machine learning engine must follow this [API specification](https://aiopenscale-custom-deployement-spec.mybluemix.net/) to be supported.

Credentials support following fields:
- `url` - hostname and port (required)
- `username` - part of BasicAuth (optional)
- `password` - part of BasicAuth (optional)

In [34]:
CUSTOM_ENGINE_CREDENTIALS = {
    "url": "***",
    "username": "***",
    "password": "***"
}

In [35]:
# The code was removed by Watson Studio for sharing.

In [36]:
binding_uid = client.data_mart.bindings.add('My custom engine', CustomMachineLearningInstance(CUSTOM_ENGINE_CREDENTIALS))

In [37]:
bindings_details = client.data_mart.bindings.get_details()

In [38]:
client.data_mart.bindings.list()

0,1,2,3
8eca2a14-9ac4-4de5-bf10-f308e84f2591,My custom engine,custom_machine_learning,2018-12-13T13:20:36.726Z


<a id="subsciption"></a>
## 3. Subscriptions

### 3.1 Add subscriptions

List available deployments.

In [39]:
client.data_mart.bindings.list_assets()

0,1,2,3,4,5,6
resnet50,resnet50,2016-12-01T10:11:12Z,model,,8eca2a14-9ac4-4de5-bf10-f308e84f2591,False
action,area and action prediction,2016-12-01T10:11:12Z,model,,8eca2a14-9ac4-4de5-bf10-f308e84f2591,False


In [40]:
subscription = client.data_mart.subscriptions.add(
    CustomMachineLearningAsset(source_uid='action', 
                               binding_uid=binding_uid, 
                               prediction_column='predictedActionLabel'))

#### Get subscriptions list

In [41]:
subscriptions = client.data_mart.subscriptions.get_details()

In [42]:
subscriptions_uids = client.data_mart.subscriptions.get_uids()
print(subscriptions_uids)

['action']


#### List subscriptions

In [43]:
client.data_mart.subscriptions.list()

0,1,2,3,4
action,area and action prediction,model,8eca2a14-9ac4-4de5-bf10-f308e84f2591,2018-12-13T13:20:39.296Z


<a id="scoring"></a>
## 4. Scoring and payload logging

### 4.1 Score the action model

In [44]:
import requests
import time


request_data = {'fields': ['ID',
                              'Gender',
                              'Status',
                              'Children',
                              'Age',
                              'Customer_Status',
                              'Car_Owner',
                              'Customer_Service',
                              'Business_Area',
                              'Satisfaction'],
                             'values': [[3785,
                               'Male',
                               'S',
                               1,
                               17,
                               'Inactive',
                               'Yes',
                               'The car should have been brought to us instead of us trying to find it in the lot.',
                               'Product: Information',
                               0]]}

header = {'Content-Type': 'application/json'}
scoring_url = subscription.get_details()['entity']['deployments'][0]['scoring_endpoint']['url']

start_time = time.time()
response = requests.post(scoring_url, json=request_data, headers=header)
response_time = int((time.time() - start_time)*1000)

response_data = response.json()
print('Response: ' + str(response_data))

Response: {'labels': ['NA', 'Free Upgrade', 'On-demand pickup location', 'Voucher', 'Premium features'], 'fields': ['ID', 'Gender', 'Status', 'Children', 'Age', 'Customer_Status', 'Car_Owner', 'Customer_Service', 'Business_Area', 'Satisfaction', 'words', 'hash', 'area_features', 'area_label', 'rawPrediction_area', 'probability_area', 'prediction_area', 'predictedAreaLabel', 'gender_ix', 'customer_status_ix', 'status_ix', 'owner_ix', 'features', 'rawPrediction', 'probability', 'prediction', 'predictedActionLabel'], 'values': [[3785, 'Male', 'S', 1, 17, 'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 'Product: Information', 0, ['the', 'car', 'should', 'have', 'been', 'brought', 'to', 'us', 'instead', 'of', 'us', 'trying', 'to', 'find', 'it', 'in', 'the', 'lot.'], [262144.0, [9639.0, 21872.0, 74079.0, 86175.0, 91878.0, 99585.0, 103838.0, 175817.0, 205044.0, 218965.0, 222453.0, 227152.0, 227431.0, 229772.0, 253475.0], [1.0, 2.0, 1.0,

### 4.2 Store the request and response in payload logging table

#### Using Python SDK

**Hint:** You can embed payload logging code into your custom deployment so it is logged automatically each time you score the model.

In [73]:
records_list = [PayloadRecord(request=request_data, response=response_data, response_time=response_time), 
                PayloadRecord(request=request_data, response=response_data, response_time=response_time)]

for i in range(1, 10):
    records_list.append(PayloadRecord(request=request_data, response=response_data, response_time=response_time))

subscription.payload_logging.store(records=records_list)

#### Using REST API

Get the token first.

In [49]:
token_endpoint = "https://iam.bluemix.net/identity/token"
headers = {
    "Content-Type": "application/x-www-form-urlencoded",
    "Accept": "application/json"
}

data = {
    "grant_type":"urn:ibm:params:oauth:grant-type:apikey",
    "apikey":AIOS_CREDENTIALS["apikey"]
}

req = requests.post(token_endpoint, data=data, headers=headers)
token = req.json()['access_token']

Store the payload.

In [51]:
import requests, uuid

PAYLOAD_STORING_HREF_PATTERN = '{}/v1/data_marts/{}/scoring_payloads'
endpoint = PAYLOAD_STORING_HREF_PATTERN.format(AIOS_CREDENTIALS['url'], AIOS_CREDENTIALS['data_mart_id'])

payload = [{
    'binding_id': binding_uid, 
    'deployment_id': subscription.get_details()['entity']['deployments'][0]['deployment_id'], 
    'subscription_id': subscription.uid, 
    'scoring_id': str(uuid.uuid4()), 
    'response': response_data,
    'request': request_data
}]


headers = {"Authorization": "Bearer " + token}
      
req_response = requests.post(endpoint, json=payload, headers = headers)

print("Request OK: " + str(req_response.ok))

Request OK: True


<a id="feedback"></a>
## 5. Feedback logging & quality (accuracy) monitoring

### Enable quality monitoring

You need to provide the monitoring `threshold` and `min_records` (minimal number of feedback records).

In [52]:
subscription.quality_monitoring.enable(threshold=0.7, min_records=10)

### Feedback records logging

Feedback records are used to evaluate your model. The predicted values are compared to real values (feedback records).

You can check the schema of feedback table using below method.

In [53]:
subscription.feedback_logging.print_table_schema()

0,1,2
ID,integer,True
Gender,string,True
Status,string,True
Children,integer,True
Age,integer,True
Customer_Status,string,True
Car_Owner,string,True
Customer_Service,string,True
Business_Area,string,True
Satisfaction,integer,True


The feedback records can be send to feedback table using below code.

In [54]:
fields = ['ID', 'Gender', 'Status','Children', 'Age', 'Customer_Status', 'Car_Owner', 'Customer_Service', 'Business_Area', 'Satisfaction', 'label']

records = [
    [3785, 'Male', 'S', 1, 17,'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 'Product: Information', 0, 'On-demand pickup location'],
    [3785, 'Male', 'S', 1, 17,'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 'Product: Information', 0, 'On-demand pickup location']]

for i in range(1,10):
    records.append([3785, 'Male', 'S', 1, 17,'Inactive', 'Yes', 'The car should have been brought to us instead of us trying to find it in the lot.', 'Product: Information', 0, 'On-demand pickup location'])

subscription.feedback_logging.store(feedback_data=records, fields=fields)

### Run quality monitoring on demand

By default, quality monitoring is run on hourly schedule. You can also trigger it on demand using below code.

In [55]:
run_details = subscription.quality_monitoring.run()

Since the monitoring runs in the background you can use below method to check the status of the job.

In [56]:
status = run_details['status']
id = run_details['id']

print("Run status: {}".format(status))

start_time = time.time()
elapsed_time = 0

while status != 'completed' and elapsed_time < 60:
    time.sleep(10)
    run_details = subscription.quality_monitoring.get_run_details(run_uid=id)
    status = run_details['status']
    elapsed_time = time.time() - start_time
    print("Run status: {}".format(status))

Run status: running
Run status: completed


### Show the quality metrics

In [57]:
subscription.quality_monitoring.show_table()

0,1,2,3,4,5,6,7
2018-12-13 13:21:26.264000+00:00,1.0,0.7,8eca2a14-9ac4-4de5-bf10-f308e84f2591,action,action,Accuracy_evaluation_a41197a7-f766-4e3d-985c-ec674c7bdb93,


Get all calculated metrics.

In [58]:
subscription.quality_monitoring.get_metrics(deployment_uid='action')

{'end': '2018-12-13T13:21:37.564236Z',
 'metrics': [{'process': 'Accuracy_evaluation_a41197a7-f766-4e3d-985c-ec674c7bdb93',
   'timestamp': '2018-12-13T13:21:26.264Z',
   'value': {'metrics': [{'name': 'weightedTruePositiveRate', 'value': 1.0},
     {'name': 'accuracy', 'value': 1.0},
     {'name': 'weightedFMeasure', 'value': 1.0},
     {'name': 'weightedRecall', 'value': 1.0},
     {'name': 'weightedFalsePositiveRate', 'value': None},
     {'name': 'weightedPrecision', 'value': 1.0}],
    'quality': 1.0,
    'threshold': 0.7}}],
 'start': '2018-12-13T12:21:19.030Z'}

<a id="datamart"></a>
## 6. Get the logged data

### 6.1 Payload logging

#### Print schema of payload_logging table

In [59]:
subscription.payload_logging.print_table_schema()

0,1,2
scoring_id,string,False
scoring_timestamp,timestamp,False
deployment_id,string,False
asset_revision,string,True
ID,integer,True
Gender,string,True
Status,string,True
Children,integer,True
Age,integer,True
Customer_Status,string,True


#### Show (preview) the table

In [60]:
subscription.payload_logging.describe_table()

           ID  Children   Age  Satisfaction  area_label  prediction_area  \
count     3.0       3.0   3.0           3.0         3.0              3.0   
mean   3785.0       1.0  17.0           0.0         7.0              1.0   
std       0.0       0.0   0.0           0.0         0.0              0.0   
min    3785.0       1.0  17.0           0.0         7.0              1.0   
25%    3785.0       1.0  17.0           0.0         7.0              1.0   
50%    3785.0       1.0  17.0           0.0         7.0              1.0   
75%    3785.0       1.0  17.0           0.0         7.0              1.0   
max    3785.0       1.0  17.0           0.0         7.0              1.0   

       gender_ix  customer_status_ix  status_ix  owner_ix  prediction  
count        3.0                 3.0        3.0       3.0         3.0  
mean         0.0                 1.0        1.0       1.0         2.0  
std          0.0                 0.0        0.0       0.0         0.0  
min          0.0           

Unnamed: 0,ID,Children,Age,Satisfaction,area_label,prediction_area,gender_ix,customer_status_ix,status_ix,owner_ix,prediction
count,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0
mean,3785.0,1.0,17.0,0.0,7.0,1.0,0.0,1.0,1.0,1.0,2.0
std,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
min,3785.0,1.0,17.0,0.0,7.0,1.0,0.0,1.0,1.0,1.0,2.0
25%,3785.0,1.0,17.0,0.0,7.0,1.0,0.0,1.0,1.0,1.0,2.0
50%,3785.0,1.0,17.0,0.0,7.0,1.0,0.0,1.0,1.0,1.0,2.0
75%,3785.0,1.0,17.0,0.0,7.0,1.0,0.0,1.0,1.0,1.0,2.0
max,3785.0,1.0,17.0,0.0,7.0,1.0,0.0,1.0,1.0,1.0,2.0


#### Return the table content as pandas dataframe

In [61]:
pandas_df = subscription.payload_logging.get_table_content(format='pandas')

### 6.2 Feddback logging

Check the schema of table.

In [62]:
subscription.feedback_logging.print_table_schema()

0,1,2
ID,integer,True
Gender,string,True
Status,string,True
Children,integer,True
Age,integer,True
Customer_Status,string,True
Car_Owner,string,True
Customer_Service,string,True
Business_Area,string,True
Satisfaction,integer,True


Preview table content.

In [63]:
subscription.feedback_logging.show_table()

0,1,2,3,4,5,6,7,8,9,10,11
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2018-12-13 13:21:25.668000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2018-12-13 13:21:25.668000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2018-12-13 13:21:25.668000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2018-12-13 13:21:25.668000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2018-12-13 13:21:25.668000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2018-12-13 13:21:25.668000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2018-12-13 13:21:25.668000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2018-12-13 13:21:25.668000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2018-12-13 13:21:25.668000+00:00
3785,Male,S,1,17,Inactive,Yes,The car should have been brought to us instead of us trying to find it in the lot.,Product: Information,0,On-demand pickup location,2018-12-13 13:21:25.668000+00:00


Note: First 10 records were displayed.


Describe table (calulcate basic statistics).

In [64]:
subscription.feedback_logging.describe_table()

           ID  Children   Age  Satisfaction
count    11.0      11.0  11.0          11.0
mean   3785.0       1.0  17.0           0.0
std       0.0       0.0   0.0           0.0
min    3785.0       1.0  17.0           0.0
25%    3785.0       1.0  17.0           0.0
50%    3785.0       1.0  17.0           0.0
75%    3785.0       1.0  17.0           0.0
max    3785.0       1.0  17.0           0.0


Get table content.

In [65]:
feedback_pd = subscription.feedback_logging.get_table_content(format='pandas')

### 6.3 Quality metrics table

In [66]:
subscription.quality_monitoring.print_table_schema()

0,1,2
ts,timestamp,False
quality,float,False
quality_threshold,float,False
binding_id,string,False
subscription_id,string,False
deployment_id,string,True
process,string,False
asset_revision,string,True


In [67]:
subscription.quality_monitoring.show_table()

0,1,2,3,4,5,6,7
2018-12-13 13:21:26.264000+00:00,1.0,0.7,8eca2a14-9ac4-4de5-bf10-f308e84f2591,action,action,Accuracy_evaluation_a41197a7-f766-4e3d-985c-ec674c7bdb93,


### 6.4 Performance metrics table

In [68]:
subscription.performance_monitoring.print_table_schema()

0,1,2
ts,timestamp,False
scoring_time,float,False
scoring_records,object,False
binding_id,string,False
subscription_id,string,False
deployment_id,string,True
process,string,False
asset_revision,string,True


In [69]:
subscription.performance_monitoring.show_table()

0,1,2,3,4,5,6,7
2018-12-13 13:20:41.975803+00:00,611.0,1,8eca2a14-9ac4-4de5-bf10-f308e84f2591,action,action,,
2018-12-13 13:20:41.975875+00:00,611.0,1,8eca2a14-9ac4-4de5-bf10-f308e84f2591,action,action,,


### 6.5 Data Mart measurement facts table

In [70]:
client.data_mart.get_deployment_metrics()

{'deployment_metrics': [{'asset': {'asset_id': 'action',
    'asset_type': 'model',
    'created_at': '2016-12-01T10:11:12Z',
    'name': 'area and action prediction',
    'url': 'http://173.193.75.3:31520/v1/deployments/action/online'},
   'deployment': {'created_at': '2016-12-01T10:11:12Z',
    'deployment_id': 'action',
    'deployment_type': 'online',
    'name': 'action deployment',
    'url': ''},
   'metrics': [{'issues': 0,
     'metric_type': 'performance',
     'timestamp': '2018-12-13T13:20:41.975875Z',
     'value': {'records': 1, 'response_time': 611.0}},
    {'issues': 0,
     'metric_type': 'quality',
     'timestamp': '2018-12-13T13:21:26.264Z',
     'value': {'metrics': [{'name': 'weightedTruePositiveRate', 'value': 1.0},
       {'name': 'accuracy', 'value': 1.0},
       {'name': 'weightedFMeasure', 'value': 1.0},
       {'name': 'weightedRecall', 'value': 1.0},
       {'name': 'weightedFalsePositiveRate', 'value': None},
       {'name': 'weightedPrecision', 'value': 1

---

### Authors
Lukasz Cmielowski, PhD, is an Automation Architect and Data Scientist at IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.