# Test Deployed Webservice
In this notebook we test the deployed webservice by scoring data and saving the output of all records of defaults to an output file. 

***Two methods are shown:***
1. Using a POST request
2. Using the Webservice SDK

### Initialize an existing Workspace

Initialize a workspace object using Service Principal Authentication

In [1]:
from azureml.core.authentication import AzureCliAuthentication
from azureml.core.workspace import Workspace
import pandas as pd

cli_auth = AzureCliAuthentication()
ws = Workspace.from_config(auth=cli_auth)
ws.get_details()

output = {}
output['SDK version'] = azureml.core.VERSION
output['Subscription ID'] = ws.subscription_id
output['Workspace'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
pd.set_option('display.max_colwidth', -1)
outputDf = pd.DataFrame(data = output, index = [''])
outputDf.T

Unnamed: 0,Unnamed: 1
SDK version,1.2.0
Subscription ID,b9f1c816-0637-419c-b0f6-a8392690c7aa
Workspace,paydmlws91
Resource Group,paydmlrg91
Location,westeurope


### Connect to Data
Connect or Register the Target Data Storage Account
We connect to an existing registed data store with Azure ML, else we register the storage account that we want to connect to. Next, we can read and process the data

In [2]:
from azureml.core.datastore import Datastore
from azureml.core.dataset import Dataset
from azureml.data.data_reference import DataReference

keyvault = ws.get_default_keyvault()

blobstore_name = "payments_data"

blob_datastore = Datastore.get(ws, blobstore_name)
print(f"Found Blob Datastore with name: {blobstore_name}")

Found Blob Datastore with name: payments_data


### Load the Data
We can load the payments dataset we want to score from csv files in a referenced blob storage account. The data as been processed to only contains features needed to do the scoring. This appoach allows the data to be in different files which are all collected into one dataset and then scored.

In [3]:
from azureml.core import Dataset
from azureml.data.datapath import DataPath

scoring_input_path='ml_scoring_input'

datastore_path = [
   DataPath(blob_datastore, scoring_input_path + '/*.csv')
]
dataset = Dataset.Tabular.from_delimited_files(path=datastore_path)

# drop the ID column
dataset = dataset.drop_columns(columns=["default_payment"])

# preview the first 3 rows of the dataset
df = dataset.to_pandas_dataframe()
df.head(3)

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.


Unnamed: 0,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,PAY_5,...,BILL_AMT3,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6
0,260000.0,2,1,2,51,-1,-1,-1,-1,-1,...,9966,8517,22287,13668,21818,9966,8583,22301,0,3640
1,630000.0,2,2,2,41,-1,0,-1,-1,-1,...,6500,6500,6500,2870,1000,6500,6500,6500,2870,0
2,250000.0,1,1,2,29,0,0,0,0,0,...,63561,59696,56875,55512,3000,3000,3000,3000,3000,3000


### Retrieve the WebService
We use the keyvault to retrieve the name of the model and lookup the 1st webservice the is connected to that Model

In [4]:
from azureml.core.webservice import Webservice
from azureml.core.model import Model
import json

model_info = json.loads(keyvault.get_secret(name="PAYMENTSMODEL"))
model = Model(ws, name=model_info['NAME'])
name = model_info['NAME']
web = Webservice.list(ws, model_name=name)[0]     # Retrieve the 1st Webservice with [0]

### Score using a POST Request
This method loops through the dataframe one row at a time as an example.
Of course this is slower, but showcases the capability in environments where you don't have the SDK installed.

In [5]:
import requests
import json

# URL for the web service
scoring_uri = model_info['URI']

df = dataset.to_pandas_dataframe()  # refresh the dataframe

# If the service is authenticated, set the key or token
key = '<your key or token>'
# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'

def predict(x):
    input_data = x.to_json()
    input_data = f'{{"data" : [ {input_data}]}}'
    resp = json.loads(requests.post(scoring_uri, input_data, headers=headers).json())
    print(f"{resp} | ", end='') # Include this is if you want to print out progress
    # The POST retuns an array, just return first item with [0]
    score = resp.get("result")[0]
    return score

post_scores = df.apply(predict, axis=1)

# Add the scores to the dataframe
output = df
output['score'] = post_scores

{'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [1]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [0]} | {'result': [1]} | {'result': [0]} | {'result': [0]} | {'result': [1]} | {'result': [0]} | {'result': [0]} | {'result': [1]} | 

### Score the Dataframe using SDK
Loop through the dataframe scoring chunks of data at a time - much faster.

In [6]:
scores = []

df = dataset.to_pandas_dataframe()  # refresh the dataframe

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

for i in chunker(df, 10):
    input_data = i.to_json(orient='records')
    input_data = f'{{"data" : {input_data}}}'
    print(f"Scoring:\n{input_data}")
    score_input = bytes(input_data, encoding='utf-8')
    resp = web.run(input_data=score_input)
    score_output = json.loads(resp).get("result")
    print(f"Recieved:\n{score_output}\n----------------------------")
    scores.extend(score_output)

df['score'] = scores
print(f"All Scores:\n{scores}")

Scoring:
{"data" : [{"LIMIT_BAL":260000.0,"SEX":2,"EDUCATION":1,"MARRIAGE":2,"AGE":51,"PAY_0":-1,"PAY_2":-1,"PAY_3":-1,"PAY_4":-1,"PAY_5":-1,"PAY_6":2,"BILL_AMT1":12261,"BILL_AMT2":21670,"BILL_AMT3":9966,"BILL_AMT4":8517,"BILL_AMT5":22287,"BILL_AMT6":13668,"PAY_AMT1":21818,"PAY_AMT2":9966,"PAY_AMT3":8583,"PAY_AMT4":22301,"PAY_AMT5":0,"PAY_AMT6":3640},{"LIMIT_BAL":630000.0,"SEX":2,"EDUCATION":2,"MARRIAGE":2,"AGE":41,"PAY_0":-1,"PAY_2":0,"PAY_3":-1,"PAY_4":-1,"PAY_5":-1,"PAY_6":-1,"BILL_AMT1":12137,"BILL_AMT2":6500,"BILL_AMT3":6500,"BILL_AMT4":6500,"BILL_AMT5":6500,"BILL_AMT6":2870,"PAY_AMT1":1000,"PAY_AMT2":6500,"PAY_AMT3":6500,"PAY_AMT4":6500,"PAY_AMT5":2870,"PAY_AMT6":0},{"LIMIT_BAL":250000.0,"SEX":1,"EDUCATION":1,"MARRIAGE":2,"AGE":29,"PAY_0":0,"PAY_2":0,"PAY_3":0,"PAY_4":0,"PAY_5":0,"PAY_6":0,"BILL_AMT1":70887,"BILL_AMT2":67060,"BILL_AMT3":63561,"BILL_AMT4":59696,"BILL_AMT5":56875,"BILL_AMT6":55512,"PAY_AMT1":3000,"PAY_AMT2":3000,"PAY_AMT3":3000,"PAY_AMT4":3000,"PAY_AMT5":3000,"PAY_

Recieved:
[0, 1, 0, 0, 1]
----------------------------
All Scores:
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1]


In [7]:
df.head(3)

Unnamed: 0,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,PAY_5,...,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,score
0,260000.0,2,1,2,51,-1,-1,-1,-1,-1,...,8517,22287,13668,21818,9966,8583,22301,0,3640,0
1,630000.0,2,2,2,41,-1,0,-1,-1,-1,...,6500,6500,2870,1000,6500,6500,6500,2870,0,0
2,250000.0,1,1,2,29,0,0,0,0,0,...,59696,56875,55512,3000,3000,3000,3000,3000,3000,0


### Filter to see only payments with score = 1

In [8]:
scored = df.loc[df['score'] == 1]
scored

Unnamed: 0,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,PAY_5,...,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,score
5,70000.0,2,2,2,26,2,0,0,2,2,...,44006,46905,46012,2007,3582,0,3601,0,1820,1
18,210000.0,1,2,1,34,3,2,2,2,2,...,2500,2500,2500,0,0,0,0,0,0,1
21,80000.0,1,2,2,34,2,2,2,2,2,...,77519,82607,81158,7000,3500,0,7000,0,4000,1
24,30000.0,1,2,2,37,4,3,2,-1,0,...,20878,20582,19357,0,0,22000,4200,2000,3100,1


### Save and Upload the Scored Data

In [9]:
import time
filename = time.strftime("scored-%Y%m%d-%H%M%S.csv")
scored.to_csv(filename)

In [10]:
import os
scoring_output_path='ml_scoring_output'

print(f"Uploading {filename}")
dref = blob_datastore.upload_files([filename], target_path=scoring_output_path, overwrite=False, show_progress=False) 
print("Done.")
os.remove(filename)

Uploading scored-20200408-172824.csv
Done.
