Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

## Goal

This notebook creates a scalable real-time scoring service for the Spark based models such as the Content Based Personaliation moel trained in the [MMLSpark-LightGBM-Criteo notebook](../02_model/mmlspark_lightgbm_criteo.ipynb).

## Assumptions
In order to execute this notebook the following items are assumed:

1. A model has previously been trained as shown in the [mmlspark_lightgbm_criteo](../02_model/mmlspark_lightgbm_criteo.ipynb) notebook
2. This notebook is running in the same Azure Databricks workspace used to train and save the model
3. The Databricks cluster used has been prepared for operationalization (MML Spark and reco_utils are both installed)
  - See [Setup](https://github.com/Microsoft/Recommenders/blob/master/SETUP.md) instructions for details
4. An Azure Machine Learning Service workspace has been setup in the same region as the Azure Databricks workspace used for model training
  - See [Create A Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace) for more details
5. The Azure ML Workspace config.json has been uploaded to databrics at dbfs:/aml_config/config.json
  - See [Configure Environment](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment) and [Databricks CLI](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html#access-dbfs-with-the-databricks-cli)
6. An Azure Container Instance (ACI) has been registered for use your Azure subscription
  - See [Supported Services](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-supported-services#portal) for more details

### Setup libraries and variables

The next few cells initialize the environment and varibles: we import relevant libraries and set variables.

In [5]:
import os
import json
import shutil

from reco_utils.dataset.criteo import get_spark_schema, load_spark_df

from azureml.core import Workspace
from azureml.core import VERSION as azureml_version

from azureml.core.model import Model
from azureml.core.conda_dependencies import CondaDependencies 
from azureml.core.webservice import Webservice, AksWebservice
from azureml.core.image import ContainerImage
from azureml.core.compute import AksCompute, ComputeTarget


# Check core SDK version number
print("Azure ML SDK version: {}".format(azureml_version))

## Prepare Assets for the Scoring Service

Before walking through the steps taken to create setup the service, it is useful to set some context. In our example, a "scoring service" is a function that is executed by a docker container. It takes in post request with JSON formatted payload and produces a score based on a previously estimated model. In our case, we will take the model we estimated earlier that predicts the probability of an user-item interaction based on a set of numeric and categorical features. <br><br>

In order to create a scoring service, we will do the following steps:

1. Setup and authorize the Azure Machine Learning Workspace
2. Serialize the previously trained model and add it to the Azure Model Registry
3. Define the 'scoring service' script to execute the model
4. Define all the pre-requisites that that script requires
5. Use the model, the driver script, and the pre-requisites to create a Azure Container Image
6. Deploy the container image on a scalable platform Azure Kubernetes Service
7. Test the service

In [7]:
MODEL_NAME = 'lightgbm_criteo.mml'  # this name must exactly match the name used to save the pipeline model in the estimation notebook
MODEL_DESCRIPTION = 'LightGBM Criteo Model'

# Setup AzureML assets (names must be lower case alphanumeric without spaces and between 3 and 32 characters)
# Azure ML Webservice
SERVICE_NAME = 'lightgbm-criteo'
# Azure ML Container Image
CONTAINER_NAME = 'lightgbm-criteo'
CONTAINER_RUN_TIME = 'spark-PY'
# Azure AKS Service
AKS_NAME = 'predict-aks'

# Names of other files that are used below
CONDA_FILE = "deploy_conda.yaml"
DRIVER_FILE = "mmlspark_serving.py"

## Setup AzureML Workspace
Workspace configuration can be retrieved from the portal and uploaded to Databricks<br>
See [AzureML on Databricks](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#azure-databricks)

In [9]:
ws = Workspace.from_config('/dbfs/aml_config/config.json')

## Prepare the Serialized Model
Spark Serving needs the schema of the raw input data so an additional file is added to the model directory.<br>

In [11]:
### MAKE THIS METHOD PUBLIC WITH DEFAULT HEADER ###
raw_schema = get_spark_schema()
with open(os.path.join('/dbfs', MODEL_NAME, 'schema.json'), 'w') as f:
  f.write(raw_schema.json())

### Copy from dbfs to local

While you can access files on DBFS with local file APIs, it is safer to explicitly copy saved models to and from dbfs, because the local file APIs can only access files smaller than 2 GB (see details [here](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html#access-dbfs-using-local-file-apis)).

In [13]:
model_local = os.path.join(os.getcwd(), MODEL_NAME)
dbutils.fs.cp('dbfs:/' + MODEL_NAME, 'file:' + model_local, recurse=True)

### Register the Model

Next, we need to register the model in the Azure Machine Learning Workspace.

In [15]:
# First the model directory is compressed to minimize data transfer
zip_file = shutil.make_archive(base_name=MODEL_NAME, format='zip', root_dir=model_local)

# Register the model
model = Model.register(model_path=zip_file,  # this points to a local file
                       model_name=MODEL_NAME,  # this is the name the model is registered as
                       description=MODEL_DESCRIPTION,
                       workspace=ws)

print(model.name, model.description, model.version)

## Define the Scoring Script

Next we, need to create the driver script that will be executed when the service is called. The functions that need to be defined for scoring are `init()` and `run()`. The `init()` function is run when the service is created, and the `run()` function is run each time the service is called.

In our example, we use the `init()` function to load all the libraries, initialize the spark session, start the spark streaming service and load the model pipeline. We use the `run()` method to route the input to the spark streaming service to generate predictions (in this case the probability of an interaction) then return the output.

In [17]:
driver_file = '''
import os
import json
from time import sleep
from uuid import uuid4
from zipfile import ZipFile

from azureml.core.model import Model
from pyspark.ml import PipelineModel
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
import requests


def init():
    """One time initialization of pyspark and model server"""

    spark = SparkSession.builder.appName("Model Server").getOrCreate()
    import mmlspark

    # extract and load model
    model_path = Model.get_model_path('{model_name}')
    with ZipFile(model_path, 'r') as f:
        f.extractall('model')
    model = PipelineModel.load('model')

    # load data schema saved with model
    with open(os.path.join('model', 'schema.json'), 'r') as f:
        schema = StructType.fromJson(json.load(f))

    input_df = (
        spark.readStream.continuousServer()
        .address("localhost", 8089, "predict")
        .load()
        .parseRequest(schema)
    )

    output_df = (
        model.transform(input_df)
        .makeReply("probability")
    )

    checkpoint = os.path.join('/tmp', 'checkpoints', uuid4().hex)
    server = (
        output_df.writeStream.continuousServer()
        .trigger(continuous="1 second")
        .replyTo("predict")
        .queryName("prediction")
        .option("checkpointLocation", checkpoint)
        .start()
    )

    # let the server finish starting
    sleep(1)


def run(input_json):
    try:
        response = requests.post(data=input_json, url='http://localhost:8089/predict')
        result = response.json()['probability']['values'][1]
    except Exception as e:
        result = str(e)
    
    return json.dumps({{"result": result}})
    
'''.format(model_name=MODEL_NAME)

# check syntax
exec(driver_file)

with open(DRIVER_FILE, "w") as f:
    f.write(driver_file)

## 4. Define Dependencies

Next, we define the dependencies that are required by driver script.

In [19]:
# azureml-sdk is required to load the registered model
conda_file = CondaDependencies.create(pip_packages=['azureml-sdk', 'requests']).serialize_to_string()

with open(CONDA_FILE, "w") as f:
    f.write(conda_file)

## 5. Create the Image

We use the `ContainerImage` class to first configure the image with the defined driver and dependencies, then to create the image for use later.<br>
Building the image allows it to be downloaded and debugged locally using docker, see [troubleshooting instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-troubleshoot-deployment)

In [21]:
image_config = ContainerImage.image_configuration(execution_script=DRIVER_FILE, 
                                                  runtime=CONTAINER_RUN_TIME,
                                                  conda_file=CONDA_FILE,
                                                  tags={"runtime":CONTAINER_RUN_TIME, "model": MODEL_NAME})

image = ContainerImage.create(name=SERVICE_NAME,
                              models=[model],
                              image_config=image_config,
                              workspace=ws)

image.wait_for_creation(show_output=True)

## 6. Create the Service

Once we have created an image, we configure an AKS cluster and deploy the image as a AKS Webservice.

**NOTE** You *can* create a service directly from the registered model and image_configuration with the `Webservice.deploy_from_model()` function. We create the image here explicitly and use `deploy_from_image()` for two reasons:

1. It provides more transparency in terms of the actual steps that are taking place
2. It has potential for faster iteration and for more portability. Once we have an image, we can create a new deployment with the exact same code.

In [23]:
# Create AKS compute first

# Use the default configuration (can also provide parameters to customize)
prov_config = AksCompute.provisioning_configuration()

# Create the cluster
aks_target = ComputeTarget.create(
  workspace=ws, 
  name=AKS_NAME, 
  provisioning_configuration=prov_config
)

aks_target.wait_for_completion(show_output=True)

print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)

In [24]:
# Set the web service configuration (using default here)
aks_config = AksWebservice.deploy_configuration()

# Deploy service using created image
aks_service = Webservice.deploy_from_image(
  workspace=ws, 
  name=SERVICE_NAME,
  deployment_config=aks_config,
  image=image,
  deployment_target=aks_target
)

aks_service.wait_for_deployment(show_output=True)

## 7. Test the Service

Next, we can use data from the `test` data to test the service.

The service expects JSON as its payload, so we take the test data, fill missing values, convert to JSON, then submit to the service endpoint.

We have to fill in missing values here to create the data, because the webservice expects that the data coming into the webservice is well-formed.

In [26]:
# View the URI
url = aks_service.scoring_uri
print('AKS URI: {}'.format(url))

# Setup authentication using one of the keys from aks_service
headers = dict(Authorization='Bearer {}'.format(aks_service.get_keys()[0]))

In [27]:
# Grab some sample data
df = load_spark_df(size='sample', spark=spark, dbutils=dbutils)
data = df.head().asDict()
print(data)

In [28]:
# Send a request to the AKS cluster
response = requests.post(url=url, json=data, headers=headers)
print(response.json())

### Delete the Service

When you are done, you can delete the service to minimize costs. You can always redeploy from the image using the same command above.

In [30]:
# Uncomment the following line to delete the web service
# aks_service.delete()

In [31]:
aks_service.state