Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Develop Scoring Script

In this notebook, we will develop the scoring script and test it locally. The web service will use the scoring script to call the model.

In [1]:
import sys
import pandas as pd
from utilities import text_to_json
import logging
from dotenv import set_key, get_key, find_dotenv
from azureml.core.workspace import Workspace
from azureml.core.model import Model

In [2]:
sys.path.append('./scripts/')

In [11]:
env_path = find_dotenv(raise_error_if_not_found=True)

Let's load the workspace.

In [4]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep="\n")

Found the config file in: /datadrive/MachineLearningNotebooks/mlaksdeployment/aml_config/config.json
fboyluamlsdkws
fboyluamlsdkrg
eastus2
edf507a2-6235-46c5-b560-fd463ba2e771


Let's retrive the model registered earlier and download it.

In [12]:
model_name = 'question_match_model'
model_version = int(get_key(env_path, 'model_version'))
model = Model(ws, name=model_name, version=model_version)
print(model.name, model.version, model.url, sep="\n")

DEBUG:cli.azure.cli.core:Current cloud config:
AzureCloud
DEBUG:adal-python:7905605c-92c9-4a8e-89dd-9558ff513b83 - Authority:Performing instance discovery: ...
DEBUG:adal-python:7905605c-92c9-4a8e-89dd-9558ff513b83 - Authority:Performing static instance discovery
DEBUG:adal-python:7905605c-92c9-4a8e-89dd-9558ff513b83 - Authority:Authority validated via static instance discovery
INFO:adal-python:7905605c-92c9-4a8e-89dd-9558ff513b83 - TokenRequest:Getting token from cache with refresh if necessary.
DEBUG:adal-python:7905605c-92c9-4a8e-89dd-9558ff513b83 - CacheDriver:finding with query keys: {'_clientId': '...', 'userId': '...'}
DEBUG:adal-python:7905605c-92c9-4a8e-89dd-9558ff513b83 - CacheDriver:Looking for potential cache entries: {'_clientId': '...', 'userId': '...'}
DEBUG:adal-python:7905605c-92c9-4a8e-89dd-9558ff513b83 - CacheDriver:Found 3 potential entries.
DEBUG:adal-python:7905605c-92c9-4a8e-89dd-9558ff513b83 - CacheDriver:Resource specific token found.
DEBUG:adal-python:7905605c

question_match_model
4
aml://asset/3f6b7fb2d9334c41a5a359568a861a77


In [6]:
model.download(target_dir=".", exist_ok=True)

'model.pkl'

## Create Scoring Script

We use the writefile magic to write the contents of the below cell to `score.py` which includes the  `init` and `run` functions required by AML.
- The init() function typically loads the model into a global object.
- The run(input_data) function uses the model to predict a value based on the input_data.

In [5]:
%%writefile score.py

import pandas as pd
import json
from duplicate_model import DuplicateModel
import logging
import timeit as t

def init():
    logger = logging.getLogger("scoring_script")
    global model
    model_path = "model.pkl"
    questions_path = "./data_folder/questions.tsv"
    start = t.default_timer()
    model = DuplicateModel(model_path, questions_path)
    end = t.default_timer()
    loadTimeMsg = "Model loading time: {0} ms".format(round((end-start)*1000, 2))
    logger.info(loadTimeMsg)


def run(body):
    logger = logging.getLogger("scoring_script")
    json_load_text = json.loads(body)
    text_to_score = json_load_text["input"]
    start = t.default_timer()
    resp = model.score(text_to_score)
    end = t.default_timer()
    logger.info("Prediction took {0} ms".format(round((end-start)*1000, 2)))
    return json.dumps(resp)


Overwriting score.py


Let's test by running the score.py which will bring the imports and functions into the context of the notebook.

In [6]:
logging.basicConfig(level=logging.DEBUG)

In [7]:
%run score.py

Now, let's use one of the duplicate questions to test our driver.

In [8]:
dupes_test_path = './data_folder/dupes_test.tsv'
dupes_test = pd.read_csv(dupes_test_path, sep='\t', encoding='latin1')
text_to_score = dupes_test.iloc[0,4]
text_to_score

"javascript arrays as objects.  possible duplicate: length of javascript object (ie. associative array) loop through javascript object    i'm trying to make an array, where each item has some name and value. the code above doesn't work. tryed to make an object, but it doesn't have a length property - no for loop.  is it possible to use arrays in this context?"

Now, call the init() to initalize the model.

In [9]:
init()

INFO:scoring_script:Model loading time: 569.45 ms


We convert the question text to json format and make predictions.

In [10]:
jsontext = text_to_json(text_to_score)
r = run(jsontext)
r

INFO:scoring_script:Prediction took 44.67 ms


'[[5223, 6700, 0.9404882121467983], [11922383, 11922384, 0.6376984742439135], [750486, 750506, 0.0022961800568457157], [684672, 684692, 0.00027900736310522206], [171251, 171256, 0.00019333518502312696], [1584370, 1584377, 0.00016380270604214681], [4057440, 4060176, 0.0001498665973955501], [5187530, 5187652, 8.410158019842015e-05], [2241875, 2241883, 5.764723610306164e-05], [5117127, 5117172, 2.1329214222194568e-05], [126100, 4889658, 2.0901140965526787e-05], [12953704, 12953750, 1.777154932479563e-05], [1885557, 1885660, 1.4987021596611661e-05], [8495687, 8495740, 9.791005034068572e-06], [1129216, 1129270, 8.664625464258679e-06], [4255472, 4255480, 4.568289027003899e-06], [7364150, 7364307, 4.000594074848036e-06], [7837456, 14853974, 3.860920440981213e-06], [5891840, 5891929, 3.854145618824322e-06], [3583724, 3583740, 3.634922709192637e-06], [1451009, 1451043, 2.813801717908391e-06], [6487366, 6487376, 2.454908335371808e-06], [2274242, 2274327, 1.5704565243474771e-06], [85992, 86014, 1

Next, we move on to [creating the docker image and deploying on AKS](04_Create_Image_Deploy_On_AKS.ipynb).