<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/banner.png" align="left" alt="banner">

# Working with Azure Machine Learning Studio engine

This notebook shows how to log the payload for the model deployed on custom model serving engine using AI OpenScale python sdk.

Contents
- [1. Setup](#setup)
- [2. Binding machine learning engine](#binding)
- [3. Subscriptions](#subscription)
- [4. Scoring and payload logging](#scoring)
- [5. Feedback logging](#feedback)
- [6. Data Mart](#datamart)

<a id="setup"></a>
## 1. Setup

### 1.0 Sample model creation using [Azure Machine Learning Studio](https://studio.azureml.net)

- Download training data set from [here](https://github.com/pmservice/wml-sample-models/raw/master/spark/product-line-prediction/data/GoSales_Tx.csv)
- Create an expierment in Azure ML Studio (select `Product Line` column as label)
- Run experiment to train a model
- Create (deploy) web service (new)

<img src="https://github.com/pmservice/ai-openscale-tutorials/raw/master/notebooks/images/azure_product_line_model.png" align="left" alt="experiment">

**NOTE:** Classic web services are not supported.

### 1.1 Installation and authentication

In [1]:
!pip install --upgrade ibm-ai-openscale --no-cache | tail -n 1

Successfully installed ibm-ai-openscale-1.0.287


Import and initiate.

In [24]:
from ibm_ai_openscale import APIClient
from ibm_ai_openscale.supporting_classes import PayloadRecord
from ibm_ai_openscale.supporting_classes.enums import InputDataType, ProblemType
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *

#### ACTION: Get AI OpenScale `instance_guid` and `apikey`

How to install IBM Cloud (bluemix) console: [instruction](https://console.bluemix.net/docs/cli/reference/ibmcloud/download_cli.html#install_use)

How to get api key using bluemix console:
```
bx login --sso
bx iam api-key-create 'my_key'
```

How to get your AI OpenScale instance GUID

- if your resource group is different than `default` switch to resource group containing AI OpenScale instance
```
bx target -g <myResourceGroup>
```
- get details of the instance
```
bx resource service-instance 'AI-OpenScale-instance_name'
```

#### Let's define some constants required to set up data mart:

- AIOS_CREDENTIALS
- POSTGRES_CREDENTIALS
- SCHEMA_NAME

In [3]:
AIOS_CREDENTIALS = {
  "url": "https://api.aiopenscale.cloud.ibm.com",
  "instance_guid": "***",
  "apikey": "***"
}

In [4]:
# The code was removed by Watson Studio for sharing.

In [5]:
POSTGRES_CREDENTIALS = {
    "db_type": "postgresql",
    "uri_cli_1": "xxx",
    "maps": [],
    "instance_administration_api": {
        "instance_id": "xxx",
        "root": "xxx",
        "deployment_id": "xxx"
    },
    "name": "xxx",
    "uri_cli": "xxx",
    "uri_direct_1": "xxx",
    "ca_certificate_base64": "xxx",
    "deployment_id": "xxx",
    "uri": "xxx"
}

In [6]:
# The code was removed by Watson Studio for sharing.

In [8]:
SCHEMA_NAME = 'data_mart_for_azure'

Create schema for data mart.

In [9]:
create_postgres_schema(postgres_credentials=POSTGRES_CREDENTIALS, schema_name=SCHEMA_NAME)

In [10]:
client = APIClient(AIOS_CREDENTIALS)

In [11]:
client.version

'1.0.287'

### 1.2 DataMart setup

In [13]:
client.data_mart.setup(db_credentials=POSTGRES_CREDENTIALS, schema=SCHEMA_NAME)

In [14]:
data_mart_details = client.data_mart.get_details()

<a id="binding"></a>
## 2. Bind machine learning engines

### 2.1 Bind  `Azure` machine learning engine

Provide credentials using following fields:
- `client_id`
- `client_secret`
- `subscription_id`
- `tenant`

In [15]:
AZURE_ENGINE_CREDENTIALS = {
    "client_id": "***",
    "client_secret": "***",
    "subscription_id": "***",
    "tenant": "***"
}

In [16]:
# The code was removed by Watson Studio for sharing.

In [17]:
binding_uid = client.data_mart.bindings.add('My Azure ML Studio engine', AzureMachineLearningInstance(AZURE_ENGINE_CREDENTIALS))

In [18]:
bindings_details = client.data_mart.bindings.get_details()

In [19]:
client.data_mart.bindings.list()

0,1,2,3
073ccc69-bb6f-424a-b887-b3c0bdb2c77f,My Azure ML Studio engine,azure_machine_learning,2018-12-14T13:56:04.456Z


<a id="subsciption"></a>
## 3. Subscriptions

### 3.1 Add subscriptions

List available deployments.

**Note:** Depending on number of assets it may take some time.

In [20]:
client.data_mart.bindings.list_assets()

0,1,2,3,4,5,6
08ba1404374d7635f8eb8638c105f7eb,AzureNew-Yuki-ProductLine-LinearRegression(Age)-20181213,2018-12-13T09:23:36.1493809Z,model,,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,False
efa7c594c5a0deac76bc6866ffc18dc7,ClaimInsuranceRegression.2018.12.12.5.12.15.584,2018-12-12T05:13:51.021202Z,model,,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,False
2f1c3b2552bb9d96a134ef671c716059,AzureNew-Yuki-ProductLine-LinearRegression-20181212,2018-12-12T03:01:58.8467204Z,model,,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,False
48c6ebc2fa6cfd359923e3fe59bdefcc,AzureNew-Yuki-ProductLine-TwoClassLogisticRegression-20181212,2018-12-12T02:59:33.7864639Z,model,,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,False
f49e4115dbc90378d3c5cb4bbc27d2d8,AzureNew-Yuki-ProductLine-MulticlassLogisticRegression-20181212,2018-12-12T02:58:25.3151546Z,model,,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,False
68efeda5fa73738bd028e1a290807367,AzureNew-Yuki-DrugSelection-TwoClassLogisticRegression-20181212,2018-12-12T02:49:34.4638846Z,model,,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,False
5d14ed9ef7fa12fc86a96254e4d5061f,AzureNew-Yuki-DrugSelection-LinearRegression-20181212,2018-12-12T02:49:04.2670654Z,model,,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,False
920b3ea9eb29a41e4fa93bfa5c96ac63,AzureNew-Yuki-DrugSelection-MulticlassLogisticRegression-20181212,2018-12-12T02:47:43.887378Z,model,,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,False
3d07cae29805365453230e8beb638088,MultiClassProductline,2018-12-11T19:21:01.9441141Z,model,,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,False
f39da17aaccec60ee9bd67207ae272f6,neelimaregressio.2018.12.11.18.31.12.560,2018-12-11T18:31:22.7972314Z,model,,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,False


**Action:** Assign your source_uid to `source_uid` variable below.

In [21]:
source_uid = '72519e05bd423560d4a919153e6dc029'

In [25]:
subscription = client.data_mart.subscriptions.add(
            AzureMachineLearningAsset(source_uid=source_uid,
                                      binding_uid=binding_uid,
                                      input_data_type=InputDataType.STRUCTURED,
                                      problem_type=ProblemType.MULTICLASS_CLASSIFICATION,
                                      label_column='PRODUCT_LINE',
                                      prediction_column='Scored Labels'))

#### Get subscriptions list

In [26]:
subscriptions = client.data_mart.subscriptions.get_details()

In [27]:
subscriptions_uids = client.data_mart.subscriptions.get_uids()
print(subscriptions_uids)

['72519e05bd423560d4a919153e6dc029']


#### List subscriptions

In [28]:
client.data_mart.subscriptions.list()

0,1,2,3,4
72519e05bd423560d4a919153e6dc029,ProductLineClass.2018.11.2.11.40.22.845,model,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,2018-12-14T14:02:27.849Z


<a id="scoring"></a>
## 4. Scoring and payload logging

### 4.1 Score the product line model and measure response time

In [37]:
import requests
import time
import json

subscription_details = subscription.get_details()
scoring_url = subscription_details['entity']['deployments'][0]['scoring_endpoint']['url']

data = {
    "Inputs": {
        "input1":
            [
                {
                    'GENDER': "F",
                    'AGE': 27,
                    'MARITAL_STATUS': "Single",
                    'PROFESSION': "Professional",
                    'PRODUCT_LINE': "Personal Accessories",
                }
            ],
    },
    "GlobalParameters": {
    }
}

body = str.encode(json.dumps(data))

token = subscription_details['entity']['deployments'][0]['scoring_endpoint']['credentials']['token']
headers = subscription_details['entity']['deployments'][0]['scoring_endpoint']['request_headers']
headers['Authorization'] = ('Bearer ' + token)

start_time = time.time()
response = requests.post(url=scoring_url, data=body, headers=headers)
response_time = int(time.time() - start_time)*1000
result = response.json()

print(json.dumps(result, indent=2))

{
  "Results": {
    "output1": [
      {
        "Scored Probabilities for Class \"Personal Accessories\"": "0.942931283576809",
        "Scored Labels": "Personal Accessories",
        "MARITAL_STATUS": "Single",
        "GENDER": "F",
        "Scored Probabilities for Class \"Golf Equipment\"": "0",
        "PRODUCT_LINE": "Personal Accessories",
        "Scored Probabilities for Class \"Camping Equipment\"": "0",
        "PROFESSION": "Professional",
        "Scored Probabilities for Class \"Mountaineering Equipment\"": "0.0570687164231906",
        "AGE": "27",
        "Scored Probabilities for Class \"Outdoor Protection\"": "0"
      }
    ]
  }
}


### 4.2 Store the request and response in payload logging table

#### Transform the model's input and output to the format compatible with AI OpenScale standard.

In [38]:
request_data = {'fields': list(data['Inputs']['input1'][0]),
           'values': [list(x.values()) for x in data['Inputs']['input1']]}

response_data = {'fields': list(result['Results']['output1'][0]),
            'values': [list(x.values()) for x in result['Results']['output1']]}

#### Store the payload using Python SDK

**Hint:** You can embed payload logging code into your custom deployment so it is logged automatically each time you score the model.

In [39]:
records_list = [PayloadRecord(request=request_data, response=response_data, response_time=response_time), 
                PayloadRecord(request=request_data, response=response_data, response_time=response_time)]

for i in range(1, 10):
    records_list.append(PayloadRecord(request=request_data, response=response_data, response_time=response_time))

subscription.payload_logging.store(records=records_list)

#### Store the payload using REST API

Get the token first.

In [41]:
token_endpoint = "https://iam.bluemix.net/identity/token"
headers = {
    "Content-Type": "application/x-www-form-urlencoded",
    "Accept": "application/json"
}

data = {
    "grant_type":"urn:ibm:params:oauth:grant-type:apikey",
    "apikey":AIOS_CREDENTIALS["apikey"]
}

req = requests.post(token_endpoint, data=data, headers=headers)
token = req.json()['access_token']

Store the payload.

In [42]:
import requests, uuid

PAYLOAD_STORING_HREF_PATTERN = '{}/v1/data_marts/{}/scoring_payloads'
endpoint = PAYLOAD_STORING_HREF_PATTERN.format(AIOS_CREDENTIALS['url'], AIOS_CREDENTIALS['data_mart_id'])

payload = [{
    'binding_id': binding_uid, 
    'deployment_id': subscription.get_details()['entity']['deployments'][0]['deployment_id'], 
    'subscription_id': subscription.uid, 
    'scoring_id': str(uuid.uuid4()), 
    'response': response_data,
    'request': request_data
}]


headers = {"Authorization": "Bearer " + token}
      
req_response = requests.post(endpoint, json=payload, headers = headers)

print("Request OK: " + str(req_response.ok))

Request OK: True


<a id="feedback"></a>
## 5. Feedback logging & quality (accuracy) monitoring

### Enable quality monitoring

You need to provide the monitoring `threshold` and `min_records` (minimal number of feedback records).

In [43]:
subscription.quality_monitoring.enable(threshold=0.7, min_records=10)

### Feedback records logging

Feedback records are used to evaluate your model. The predicted values are compared to real values (feedback records).

You can check the schema of feedback table using below method.

In [44]:
subscription.feedback_logging.print_table_schema()

0,1,2
PROFESSION,string,True
MARITAL_STATUS,string,True
GENDER,string,True
AGE,integer,True
PRODUCT_LINE,string,True
_training,timestamp,False


The feedback records can be send to feedback table using below code.

In [45]:
fields = ["GENDER", "AGE", "MARITAL_STATUS", "PROFESSION", "PRODUCT_LINE"]

records = [
    ["F", "27", "Single", "Professional", "Personal Accessories"],
    ["M", "27", "Single", "Professional", "Personal Accessories"]]

for i in range(1,10):
    records.append(["F", "27", "Single", "Professional", "Personal Accessories"])

subscription.feedback_logging.store(feedback_data=records, fields=fields)

### Run quality monitoring on demand

By default, quality monitoring is run on hourly schedule. You can also trigger it on demand using below code.

In [46]:
run_details = subscription.quality_monitoring.run()

Since the monitoring runs in the background you can use below method to check the status of the job.

In [47]:
status = run_details['status']
id = run_details['id']

print("Run status: {}".format(status))

start_time = time.time()
elapsed_time = 0

while status != 'completed' and elapsed_time < 60:
    time.sleep(10)
    run_details = subscription.quality_monitoring.get_run_details(run_uid=id)
    status = run_details['status']
    elapsed_time = time.time() - start_time
    print("Run status: {}".format(status))

Run status: initializing
Run status: completed


### Show the quality metrics

In [48]:
subscription.quality_monitoring.show_table()

0,1,2,3,4,5,6,7
2018-12-14 14:13:23.700000+00:00,0.9090909090909092,0.7,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,Accuracy_evaluation_456f6344-e16a-4f5b-8f3b-db84ac6f8b40,


Get all calculated metrics.

In [50]:
deployment_uids = subscription.get_deployment_uids()

In [51]:
subscription.quality_monitoring.get_metrics(deployment_uid=deployment_uids[0])

{'end': '2018-12-14T14:14:32.690343Z',
 'metrics': [{'process': 'Accuracy_evaluation_456f6344-e16a-4f5b-8f3b-db84ac6f8b40',
   'timestamp': '2018-12-14T14:13:23.700Z',
   'value': {'metrics': [{'name': 'weightedTruePositiveRate',
      'value': 0.9090909090909091},
     {'name': 'accuracy', 'value': 0.9090909090909091},
     {'name': 'weightedFMeasure', 'value': 0.8658008658008658},
     {'name': 'weightedRecall', 'value': 0.9090909090909091},
     {'name': 'weightedFalsePositiveRate', 'value': 0.9090909090909091},
     {'name': 'weightedPrecision', 'value': 0.8264462809917354}],
    'quality': 0.9090909090909091,
    'threshold': 0.7}}],
 'start': '2018-12-14T13:11:19.945Z'}

<a id="datamart"></a>
## 6. Get the logged data

### 6.1 Payload logging

#### Print schema of payload_logging table

In [52]:
subscription.payload_logging.print_table_schema()

0,1,2
scoring_id,string,False
scoring_timestamp,timestamp,False
deployment_id,string,False
asset_revision,string,True
PROFESSION,string,True
MARITAL_STATUS,string,True
GENDER,string,True
AGE,integer,True
PRODUCT_LINE,string,True
"Scored Probabilities for Class ""Personal Accessories""",string,True


#### Show (preview) the table

In [53]:
subscription.payload_logging.describe_table()

        AGE
count  12.0
mean   27.0
std     0.0
min    27.0
25%    27.0
50%    27.0
75%    27.0
max    27.0


Unnamed: 0,AGE
count,12.0
mean,27.0
std,0.0
min,27.0
25%,27.0
50%,27.0
75%,27.0
max,27.0


#### Return the table content as pandas dataframe

In [54]:
pandas_df = subscription.payload_logging.get_table_content(format='pandas')

### 6.2 Feddback logging

Check the schema of table.

In [55]:
subscription.feedback_logging.print_table_schema()

0,1,2
PROFESSION,string,True
MARITAL_STATUS,string,True
GENDER,string,True
AGE,integer,True
PRODUCT_LINE,string,True
_training,timestamp,False


Preview table content.

In [56]:
subscription.feedback_logging.show_table()

0,1,2,3,4,5
Professional,Single,F,27,Personal Accessories,2018-12-14 14:13:19.780000+00:00
Professional,Single,M,27,Personal Accessories,2018-12-14 14:13:19.780000+00:00
Professional,Single,F,27,Personal Accessories,2018-12-14 14:13:19.780000+00:00
Professional,Single,F,27,Personal Accessories,2018-12-14 14:13:19.780000+00:00
Professional,Single,F,27,Personal Accessories,2018-12-14 14:13:19.780000+00:00
Professional,Single,F,27,Personal Accessories,2018-12-14 14:13:19.780000+00:00
Professional,Single,F,27,Personal Accessories,2018-12-14 14:13:19.780000+00:00
Professional,Single,F,27,Personal Accessories,2018-12-14 14:13:19.780000+00:00
Professional,Single,F,27,Personal Accessories,2018-12-14 14:13:19.780000+00:00
Professional,Single,F,27,Personal Accessories,2018-12-14 14:13:19.780000+00:00


Note: First 10 records were displayed.


Describe table (calulcate basic statistics).

In [57]:
subscription.feedback_logging.describe_table()

        AGE
count  11.0
mean   27.0
std     0.0
min    27.0
25%    27.0
50%    27.0
75%    27.0
max    27.0


Get table content.

In [58]:
feedback_pd = subscription.feedback_logging.get_table_content(format='pandas')

### 6.3 Quality metrics table

In [59]:
subscription.quality_monitoring.print_table_schema()

0,1,2
ts,timestamp,False
quality,float,False
quality_threshold,float,False
binding_id,string,False
subscription_id,string,False
deployment_id,string,True
process,string,False
asset_revision,string,True


In [60]:
subscription.quality_monitoring.show_table()

0,1,2,3,4,5,6,7
2018-12-14 14:13:23.700000+00:00,0.9090909090909092,0.7,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,Accuracy_evaluation_456f6344-e16a-4f5b-8f3b-db84ac6f8b40,


### 6.4 Performance metrics table

In [61]:
subscription.performance_monitoring.print_table_schema()

0,1,2
ts,timestamp,False
scoring_time,float,False
scoring_records,object,False
binding_id,string,False
subscription_id,string,False
deployment_id,string,True
process,string,False
asset_revision,string,True


In [62]:
subscription.performance_monitoring.show_table()

0,1,2,3,4,5,6,7
2018-12-14 14:10:50.310809+00:00,0.0,1,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,,
2018-12-14 14:10:50.310915+00:00,0.0,1,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,,
2018-12-14 14:10:50.310848+00:00,0.0,1,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,,
2018-12-14 14:10:50.310947+00:00,0.0,1,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,,
2018-12-14 14:10:50.310756+00:00,0.0,1,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,,
2018-12-14 14:10:50.310899+00:00,0.0,1,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,,
2018-12-14 14:10:50.310883+00:00,0.0,1,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,,
2018-12-14 14:10:50.310867+00:00,0.0,1,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,,
2018-12-14 14:10:50.310962+00:00,0.0,1,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,,
2018-12-14 14:10:50.310830+00:00,0.0,1,073ccc69-bb6f-424a-b887-b3c0bdb2c77f,72519e05bd423560d4a919153e6dc029,59ae603e9261ceda7d66953909de0096,,


Note: First 10 records were displayed.


### 6.5 Data Mart measurement facts table

In [63]:
client.data_mart.get_deployment_metrics()

{'deployment_metrics': [{'asset': {'asset_id': '72519e05bd423560d4a919153e6dc029',
    'asset_type': 'model',
    'created_at': '2018-11-02T11:40:59.563349Z',
    'name': 'ProductLineClass.2018.11.2.11.40.22.845',
    'url': 'https://ussouthcentral.services.azureml.net/subscriptions/744bca722299451cb682ed6fb75fb671/services/ae62976ad690472eaf4f9797075ed831/swagger.json'},
   'deployment': {'created_at': '2018-11-02T11:40:59.563349Z',
    'deployment_id': '59ae603e9261ceda7d66953909de0096',
    'deployment_type': 'online',
    'name': 'ProductLineClass.2018.11.2.11.40.22.845',
    'url': 'https://ussouthcentral.services.azureml.net:443/subscriptions/744bca722299451cb682ed6fb75fb671/services/ae62976ad690472eaf4f9797075ed831/execute?api-version=2.0&format=swagger'},
   'metrics': [{'issues': 0,
     'metric_type': 'performance',
     'timestamp': '2018-12-14T14:10:50.310962Z',
     'value': {'records': 1, 'response_time': 0.0}},
    {'issues': 0,
     'metric_type': 'quality',
     'times

---

### Authors
Lukasz Cmielowski, PhD, is an Automation Architect and Data Scientist at IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.