Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Real-time Forecasting Webservice Deployment - Custom Script
---

In this notebook we deploy multiple webservices to forecast sales in real-time with the models we trained in the last step.

Models are grouped based on their tags and each group is deployed together to the same webservice. You can customize your grouping strategy by simply playing with the model tags. 

### Prerequisites
At this point, you should have already:

1. Created your AML Workspace using the [00_Setup_AML_Workspace notebook](../00_Setup_AML_Workspace.ipynb)
2. Run [01_Data_Preparation.ipynb](../01_Data_Preparation.ipynb) to setup your compute and create the dataset
3. Run [02_CustomScript_Training_Pipeline.ipynb](02_CustomScript_Training_Pipeline.ipynb) to train the models

## 1.0 Connect to workspace

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()

print('Workspace Name: ' + ws.name, 
      'Azure Region: ' + ws.location, 
      'Subscription Id: ' + ws.subscription_id, 
      'Resource Group: ' + ws.resource_group, sep='\n')

## 2.0 Get models to be deployed

### 2.1 Get all models registered in the workspace

In [None]:
from azureml.core import Model

models = Model.list(ws, latest=True, expand=False, page_count=100)

### 2.2 Group models by store

We will create groups of models splitting by store. Therefore, each group will contain three models, one for each of the orange juice brands, and all of them corresponding to the same store.

You can change the grouping strategy by modifying the `splitting_tags` variable below and specifying the names of the tags you want to use for splitting. If you leave it empty all the models will be deployed into a single webservice.

To create custom tags, include them in the dataset, add their names as part of the `tags_columns` setting in the settings file of the [training script](../../scripts/customscript/train.py) and run the training again.

In [None]:
splitting_tags = ['Store']

In [None]:
grouped_models = {}
for m in models:
    
    if m.tags['ModelType'] == '_meta_':
        continue
    
    group_name = '/'.join([m.tags[t] for t in splitting_tags]) if splitting_tags else 'allmodels'
    group = grouped_models.setdefault(group_name, [])
    group.append(m)
    
print(f'{len(grouped_models)} group(s) created. Names: {list(grouped_models.keys())}')

## 3.0 Configure deployment

### 3.1 Define inference environment

In [None]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

forecast_env = Environment(name="many_models_environment")
forecast_conda_deps = CondaDependencies.create(pip_packages=['azureml-defaults', 'sklearn'])
forecast_env.python.conda_dependencies = forecast_conda_deps

### 3.2 Define inference configuration

In [None]:
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(
    source_directory='../../scripts/customscript/',
    entry_script='model_webservice.py',
    environment=forecast_env
)

### 3.3 [Option A] Define deploy configuration using ACI (dev/test)

Use this option to deploy the models to Azure Container Instances, indicated for dev/test environments.

In [None]:
from azureml.core.webservice import AciWebservice

deployment_type = 'aci'
deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
deployment_target = None

### 3.3 [Option B] Define deploy configuration using AKS (production)

Use this option to deploy the models to Azure Kubernetes Services, indicated for production environments.

In [None]:
aks_target_name = 'manymodels-aks'

In [None]:
from azureml.core.compute import AksCompute
from azureml.core.compute_target import ComputeTargetException

try:
    aks_target = AksCompute(ws, aks_target_name)
    print('AKS cluster already attached. Skip the optional step below and jump to "Configure AKS"')
except ComputeTargetException:
    print('AKS cluster not attached yet. Run the optional step below to do so')

#### [Optional] Attach AKS cluster

Attach existing AKS cluster as Compute Target in Azure Machine Learning. This needs to be run only the first time.

In [None]:
aks_resource_name = '<my-aks-name>'
aks_resource_group = '<my-aks-resource-group>'

In [None]:
from azureml.core.compute import ComputeTarget

attach_config = AksCompute.attach_configuration(
    resource_group=aks_resource_group,
    cluster_name=aks_resource_name
)

aks_target = ComputeTarget.attach(ws, aks_target_name, attach_config)
aks_target.wait_for_completion(show_output=True)

#### Configure AKS

In [None]:
from azureml.core.webservice import AksWebservice

deployment_type = 'aks'
deployment_config = AksWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
deployment_target = aks_target

## 4.0 Deploy the models

We will now deploy one webservice for each of the groups of models. Deployment takes some minutes to complete, so we'll request all of them and then wait for them to finish.

We will store the information on a python dictionary that we'll use later on to find the corresponding webservice for a given model.

In [None]:
deployments = []
for group_name, group_models in grouped_models.items():
    
    service_name = '{prefix}manymodels-{group}'.format(
        prefix='test-' if deployment_type == 'aci' else '',
        group=group_name
    ).lower()
    
    print('Launching deployment of {}...'.format(service_name))
    service = Model.deploy(
        workspace=ws,
        name=service_name,
        models=group_models,
        inference_config=inference_config,
        deployment_config=deployment_config,
        deployment_target=deployment_target,
        overwrite=True
    )
    print('Deployment of {} started'.format(service_name))
    
    deployments.append({ 'service': service, 'group': group_name, 'models': group_models })
    

In [None]:
models_deployed = {}
for deployment in deployments:
    
    service = deployment['service']
    print('Waiting for deployment of {} to finish...'.format(service.name))
    service.wait_for_deployment(show_output=True)
    if service.state != 'Healthy':
        print('DEPLOYMENT FAILED FOR SERVICE {}'.format(service.name))
    
    service_info = {
        'webservice': service.name,
        'state': service.state,
        'endpoint': service.scoring_uri if service.state == 'Healthy' else None,
        'key': service.get_keys()[0] if service.auth_enabled and service.state == 'Healthy' else None
    }

    # Store deployment info for each deployed model
    for m in deployment['models']:
        models_deployed[m.name] = {
            'version': m.version,
            'group': deployment['group'],
            **service_info
        }


### 4.2 Test the webservices

We can query for multiple models into the same request, but all of them need to be from the same store, as each endpoint only contains models corresponding to one particular store.

The webservice deployed needs some data to generate the prediction:
- Data used for identifying the model (store, brand, model type)
- The timestamp where forecasting should start, the number of horizons we want to predict and the frequency of the forecasts
- The past values of the target variable (Quantity) to generate the lags
- The future value of the external regressors (Price, Advert)

In [None]:
store1, brand1 = ('1000', 'minute.maid')
store2, brand2 = ('1000', 'tropicana')

In [None]:
test_data = [
    {
        "id": {
            "Store": store1,
            "Brand": brand1
        },
        "model_type": "lr",
        "forecast_start": "2020-05-21", "forecast_freq": "W-THU", "forecast_horizon": 4,
        "data": {
            "historical": {
                "Quantity": [11450, 12235, 14713]
            },
            "future": {
                "Price": [2.4, 2.5, 3, 3],
                "Advert": [0, 1, 1, 1]
            }
        }
    },
    {
        "id": {
            "Store": store2,
            "Brand": brand2
        },
        "model_type": "lr",
        "forecast_start": "2020-05-21", "forecast_freq": "W-THU", "forecast_horizon": 5,
        "data": {
            "historical": {
                "Quantity": [25692, 32976, 28610]
            },
            "future": {
                "Price": [1.5, 1.5, 3.1, 3.2, 3.5],
                "Advert": [0, 0, 1, 1, 1]
            }
        }
    }
]

Get webservice endpoint and key:

In [None]:
import sys
sys.path.append('../../scripts/customscript')
from utils.models import get_model_name

model_name = get_model_name('lr', {'Store': store1, 'Brand': brand1})

try:
    url = models_deployed[model_name]['endpoint']
    key = models_deployed[model_name]['key']
except KeyError as e:
    raise ValueError(f'Model for store {store1} and brand {brand1} has not been deployed')

Send request to model webservice to get forecasts:

In [None]:
import requests

request_headers = {'Content-Type': 'application/json'}
if key:
    request_headers['Authorization'] = f'Bearer {key}'

response = requests.post(url, json=test_data, headers=request_headers)
print(response.status_code)

if response.ok:
    print(response.json())
else:
    print(response.text)

## 5.0 Group all models into a single routing endpoint

We can now group all the services into a single entry point, so that we don't have to handle each endpoint separately. 
For that, we'll register the `models_deployed` object as a model, and deploy it as a webservice. This webservice will receive the incoming requests and route them to the appropiate model service, acting as the unique entry point for outside requests.

### 5.1 Register endpoints dict as an AML model

In [None]:
import json

artifact_path = 'models_deployed.json'
with open(artifact_path, 'w') as f:
    json.dump(models_deployed, f, indent=4)

dep_model = Model.register(
    workspace=ws, 
    model_path=artifact_path,
    model_name='deployed_models_info',
    tags={'ModelType': '_meta_'},
    description='Dictionary of the service endpoint where each model is deployed'
)

### 5.2 Deploy routing webservice

In [None]:
routing_env = Environment(name="many_models_routing_environment")
routing_env_deps = CondaDependencies.create(pip_packages=['azureml-defaults'])
routing_env.python.conda_dependencies = routing_env_deps

routing_infconfig = InferenceConfig(
    source_directory='../../scripts/customscript/',
    entry_script='routing_webservice.py',
    environment=routing_env
)

# Reuse deployment config with lower capacity
deployment_config.cpu_cores = 0.1
deployment_config.memory_gb = 0.5

routing_service = Model.deploy(
    workspace=ws,
    name='routing-manymodels',
    models=[dep_model],
    inference_config=routing_infconfig,
    deployment_config=deployment_config,
    deployment_target=deployment_target,
    overwrite=True
)
routing_service.wait_for_deployment(show_output=True)

assert routing_service.state == 'Healthy'

print('Routing endpoint deployed with URL: {}'.format(routing_service.scoring_uri))

### 5.3 Test the webservice

This new endpoint can be called with data from different stores or brands, and it will automatically route the request to the appropiate model endpoint.

In [None]:
store1, brand1 = ('1002', 'minute.maid')
store2, brand2 = ('1000', 'tropicana')

In [None]:
test_data = [
    {
        "id": {
            "Store": store1,
            "Brand": brand1
        },
        "model_type": "lr",
        "forecast_start": "2020-05-21", "forecast_freq": "W-THU", "forecast_horizon": 5,
        "data": {
            "historical": {
                "Quantity": [11450, 12235, 14713]
            },
            "future": {
                "Price": [2.4, 2.5, 3, 3, 3],
                "Advert": [0, 1, 1, 1, 1]
            }
        }
    },
    {
        "id": {
            "Store": store2,
            "Brand": brand2
        },
        "model_type": "lr",
        "forecast_start": "2020-05-21", "forecast_freq": "W-THU", "forecast_horizon": 3,
        "data": {
            "historical": {
                "Quantity": [21450, 25291, 24910]
            },
            "future": {
                "Price": [5.2, 4.3, 5],
                "Advert": [0, 0, 0]
            }
        }
    },
    {
        "id": {
            "Store": store1,
            "Brand": brand2
        },
        "model_type": "lr",
        "forecast_start": "2020-05-21", "forecast_freq": "W-THU", "forecast_horizon": 10,
        "data": {
            "historical": {
                "Quantity": [13710, 11641, 9701]
            },
            "future": {
                "Price": [1.5, 2, 2, 2, 2, 1.5, 1.5, 1.5, 2, 2],
                "Advert": [0, 0, 0, 0, 1, 1, 1, 1, 0, 0]
            }
        }
    },
    {
        "id": {
            "Store": store2,
            "Brand": brand1
        },
        "model_type": "lr",
        "forecast_start": "2020-05-21", "forecast_freq": "W-THU", "forecast_horizon": 4,
        "data": {
            "historical": {
                "Quantity": [8192, 7103, 11710]
            },
            "future": {
                "Price": [3.5, 3.5, 4, 4],
                "Advert": [0, 0, 1, 1]
            }
        }
    }
]

In [None]:
url = routing_service.scoring_uri

request_headers = {'Content-Type': 'application/json'}
if routing_service.auth_enabled:
    keys = routing_service.get_keys()
    request_headers['Authorization'] = 'Bearer {}'.format(keys[0])

response = requests.post(url, json=test_data, headers=request_headers)
print(response.status_code)

if response.ok:
    print(response.json())
else:
    print(response.text)