Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Load Test deployed web application

In this notebook, we test the latency of the deployed web application by sending a number of duplicate questions as asychronous requests.

In [1]:
import asyncio
import json
import random
import urllib.request
from timeit import default_timer
import aiohttp
from tqdm import tqdm
import requests
import pandas as pd
from utilities import text_to_json
from azureml.core.workspace import Workspace
from azureml.core.image import Image
from azureml.core.webservice import AksWebservice
from dotenv import set_key, get_key, find_dotenv

In [2]:
print(aiohttp.__version__) #3.3.2

3.3.2


In [3]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep="\n")

Found the config file in: /datadrive/MachineLearningNotebooks/mlaksdeployment/aml_config/config.json
fboyluamlsdkws
fboyluamlsdkrg
eastus2
edf507a2-6235-46c5-b560-fd463ba2e771


In [4]:
env_path = find_dotenv(raise_error_if_not_found=True)

Let's retrive the web service.

In [5]:
aks_service_name = get_key(env_path, 'aks_service_name')
aks_service = AksWebservice(ws, name=aks_service_name)
aks_service.name

'lgbmservice'

We will test our deployed service with 100 calls. We will only have 4 requests concurrently at any time. Feel free to try different values and see how the service responds.

In [6]:
NUMBER_OF_REQUESTS = 100  # Total number of requests
CONCURRENT_REQUESTS = 4   # Number of requests at a time

Get the scoring URL and API key of the service.

In [7]:
scoring_url = aks_service.scoring_uri
api_key = aks_service.get_keys()[0]

In [8]:
dupes_test_path = './data_folder/dupes_test.tsv'
dupes_test = pd.read_csv(dupes_test_path, sep='\t', encoding='latin1')
dupes_to_score = dupes_test.iloc[:NUMBER_OF_REQUESTS,4]

In [9]:
url_list = [[scoring_url, jsontext] for jsontext in dupes_to_score.apply(text_to_json)]

In [10]:
def decode(result):
    return json.loads(result.decode("utf-8"))

In [11]:
async def fetch(url, session, data, headers):
    start_time = default_timer()
    async with session.request("post", url, data=data, headers=headers) as response:
        resp = await response.read()
        elapsed = default_timer() - start_time
        return resp, elapsed

In [12]:
async def bound_fetch(sem, url, session, data, headers):
    # Getter function with semaphore.
    async with sem:
        return await fetch(url, session, data, headers)


In [13]:
async def await_with_progress(coros):
    results = []
    for f in tqdm(asyncio.as_completed(coros), total=len(coros)):
        result = await f
        results.append((decode(result[0]), result[1]))
    return results


In [14]:
async def run(url_list, num_concurrent=CONCURRENT_REQUESTS):
    headers = {
        "content-type": "application/json",
        "Authorization": ("Bearer " + api_key),
    }
    tasks = []
    # create instance of Semaphore
    sem = asyncio.Semaphore(num_concurrent)

    # Create client session that will ensure we dont open new connection
    # per each request.
    async with aiohttp.ClientSession() as session:
        for url, data in url_list:
            # pass Semaphore and session to every POST request
            task = asyncio.ensure_future(bound_fetch(sem, url, session, data, headers))
            tasks.append(task)
        return await await_with_progress(tasks)


Below we run the 100 requests against our deployed service.

In [15]:
loop = asyncio.get_event_loop()
start_time = default_timer()
complete_responses = loop.run_until_complete(
    asyncio.ensure_future(run(url_list, num_concurrent=CONCURRENT_REQUESTS))
)
elapsed = default_timer() - start_time
print("Total Elapsed {}".format(elapsed))
print("Avg time taken {0:4.2f} ms".format(1000 * elapsed / len(url_list)))


100%|██████████| 100/100 [00:03<00:00, 31.44it/s]

Total Elapsed 3.1862132069654763
Avg time taken 31.86 ms





In [16]:
# Example response
complete_responses[0]

('[[5223, 6700, 0.9404882121467983], [11922383, 11922384, 0.6376984742439135], [750486, 750506, 0.0022961800568457157], [684672, 684692, 0.00027900736310522206], [171251, 171256, 0.00019333518502312696], [1584370, 1584377, 0.00016380270604214681], [4057440, 4060176, 0.0001498665973955501], [5187530, 5187652, 8.410158019842015e-05], [2241875, 2241883, 5.764723610306164e-05], [5117127, 5117172, 2.1329214222194568e-05], [126100, 4889658, 2.0901140965526787e-05], [12953704, 12953750, 1.777154932479563e-05], [1885557, 1885660, 1.4987021596611661e-05], [8495687, 8495740, 9.791005034068572e-06], [1129216, 1129270, 8.664625464258679e-06], [4255472, 4255480, 4.568289027003899e-06], [7364150, 7364307, 4.000594074848036e-06], [7837456, 14853974, 3.860920440981213e-06], [5891840, 5891929, 3.854145618824322e-06], [3583724, 3583740, 3.634922709192637e-06], [1451009, 1451043, 2.813801717908391e-06], [6487366, 6487376, 2.454908335371808e-06], [2274242, 2274327, 1.5704565243474771e-06], [85992, 86014, 

Let's use the number of original questions to count the succesful responses.

In [17]:
no_questions = len(eval(complete_responses[0][0]))

In [18]:
num_succesful = [len(eval(i[0])) for i in complete_responses].count(no_questions)
print("Succesful {} out of {}".format(num_succesful, len(url_list)))

Succesful 100 out of 100


Next, we will explore the real-time scoring in an [iPyWidget app](06_Real_Time_Scoring.ipynb).