Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Develop Scoring Script

In this notebook, we will develop the scoring script and test it locally. We will use the scoring script to create the web service that will call the model for scoring.

In [1]:
import sys
import pandas as pd
from utilities import text_to_json, get_auth
import logging
from dotenv import set_key, get_key, find_dotenv
from azureml.core.workspace import Workspace
from azureml.core.model import Model

In [2]:
sys.path.append('./scripts/')

In [3]:
env_path = find_dotenv(raise_error_if_not_found=True)

Let's load the workspace.

In [4]:
ws = Workspace.from_config(auth=get_auth(env_path))
print(ws.name, ws.resource_group, ws.location, sep="\n")

Found the config file in: /mnt/MLAKSDeployAML/aml_config/config.json
fboyluamlsdkws
fboyluamlsdkrg
eastus2


Let's retrive the model registered earlier and download it.

In [5]:
model_name = 'question_match_model'
model_version = int(get_key(env_path, 'model_version'))
model = Model(ws, name=model_name, version=model_version)
print(model.name, model.version, model.url, sep="\n")

question_match_model
9
aml://asset/32e91ee38bcd43768c1945dec915bcef


In [6]:
model.download(target_dir=".", exist_ok=True)

'model.pkl'

## Create Scoring Script

We use the writefile magic to write the contents of the below cell to `score.py` which includes the  `init` and `run` functions required by AML.
- The init() function typically loads the model into a global object.
- The run(input_data) function uses the model to predict a value based on the input_data.

In [7]:
%%writefile score.py

import pandas as pd
import json
from duplicate_model import DuplicateModel
import logging
import timeit as t

def init():
    logger = logging.getLogger("scoring_script")
    global model
    model_path = "model.pkl"
    questions_path = "./data_folder/questions.tsv"
    start = t.default_timer()
    model = DuplicateModel(model_path, questions_path)
    end = t.default_timer()
    loadTimeMsg = "Model loading time: {0} ms".format(round((end-start)*1000, 2))
    logger.info(loadTimeMsg)


def run(body):
    logger = logging.getLogger("scoring_script")
    json_load_text = json.loads(body)
    text_to_score = json_load_text["input"]
    start = t.default_timer()
    resp = model.score(text_to_score)
    end = t.default_timer()
    logger.info("Prediction took {0} ms".format(round((end-start)*1000, 2)))
    return json.dumps(resp)


Writing score.py


Let's test by running the score.py which will bring the imports and functions into the context of the notebook.

In [8]:
logging.basicConfig(level=logging.DEBUG)

In [9]:
%run score.py

Now, let's use one of the duplicate questions to test our driver.

In [10]:
dupes_test_path = './data_folder/dupes_test.tsv'
dupes_test = pd.read_csv(dupes_test_path, sep='\t', encoding='latin1')
text_to_score = dupes_test.iloc[0,4]
text_to_score

"how can i print the length of this json object?. i'm trying to print the length of this json object to the console but i keep receiving 'undefined'. this is being json encoded in php and then returned via ajax to an anonymous function as 'msg'  printing to console:  i'm able to print the value of msg[1].item_id but i'm not able to get the length via msg.length thanks for your help."

Now, call the init() to initalize the model.

In [11]:
init()

INFO:scoring_script:Model loading time: 538.18 ms


We convert the question text to json format and make predictions.

In [12]:
jsontext = text_to_json(text_to_score)
r = run(jsontext)
r

INFO:scoring_script:Prediction took 37.4 ms


'[[750486, 750506, 0.788610734888794], [14220321, 14220323, 0.788610734888794], [203198, 1207393, 0.7744806758593752], [14028959, 8716680, 0.7311754600005094], [11922383, 11922384, 0.7311754600005094], [23667086, 23667087, 0.7070711224820937], [20279484, 20279485, 0.7070711224820937], [13840429, 13840431, 0.647781605794779], [3127429, 3127440, 0.616515842327386], [23740548, 23740549, 0.5927238975963133], [901115, 901144, 0.5808804369565093], [6847697, 6847754, 0.5629631823473857], [1225667, 1225683, 0.5628036088927046], [1451009, 1451043, 0.5282633063964524], [111102, 111111, 0.4875722029962708], [4616202, 4616273, 0.4814500612962727], [5316697, 5316755, 0.47118219613078227], [166221, 8758614, 0.4383863027975902], [1634268, 1634321, 0.4266541854491646], [10693845, 10693852, 0.4254479315108979], [500431, 500459, 0.42000193740743125], [950087, 950146, 0.42000193740743125], [25111831, 25111942, 0.41296635376781304], [6487366, 6487376, 0.4098846417117424], [2631001, 2631198, 0.396706204176

Next, we move on to [creating the docker image and deploying on AKS](04_Create_Image_Deploy_On_AKS.ipynb).