d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px; height: 163px">
</div>

# Real Time Deployment

While real time deployment represents a smaller share of the deployment landscape, many of these deployments represent high value tasks.  This lesson surveys real time deployment options ranging from proofs of concept to both custom and managed solutions with a focus on RESTful services.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lesson you:<br>
 - Survey the landscape of real time deployment options
 - Prototype a RESTful service using MLflow
 - Walk through the deployment of REST endpoint using SageMaker
 - Query a REST endpoint for inference using individual records and batch requests

### The Why and How of Real Time Deployment

Real time inference is...<br><br>

* Generating predictions for a small number of records with fast results (e.g. results in milliseconds)
* The first question to ask when considering real time deployment is: do I need it?  
  - It represents a minority of machine learning inference use cases 
  - Is one of the more complicated ways of deploying models
  - That being said, domains where real time deployment is often needed are often of great business value.  
  
Domains needing real time deployment include...<br><br>

 - Financial services (especially with fraud detection)
 - Mobile
 - Adtech

There are a number of ways of deploying models...<br><br>

* Many use REST
* For basic prototypes, MLflow can act as a development deployment server
  - The MLflow implementation is backed by the Python library Flask
  - *This is not intended to for production environments*

For production RESTful deployment, there are two main options...<br><br>

* A managed solution 
  - Azure ML
  - SageMaker
* A custom solution  
  - Involve deployments using a range of tools
  - Often using Docker, Kubernetes, and Elastic Beanstalk
* One of the crucial elements of deployment in containerization
  - Software is packaged and isolated with its own application, tools, and libraries
  - Containers are a more lightweight alternative to virtual machines

Finally, embedded solutions are another way of deploying machine learning models, such as storing a model on IoT devices for inference.

Run the following cell to set up our environment.

In [6]:
%run "./Includes/Classroom-Setup"

-sandbox
### Prototyping with MLflow

MLflow offers <a href="https://www.mlflow.org/docs/latest/models.html#pyfunc-deployment" target="_blank">a Flask-backed deployment server for development.</a>

Build a basic model. This model will always predict 5.

In [9]:
import mlflow
import mlflow.pyfunc

class TestModel(mlflow.pyfunc.PythonModel):
  
  def predict(self, context, input_df):
    return 5
  
artifact_path="pyfunc-model"

with mlflow.start_run():
  mlflow.pyfunc.log_model(artifact_path=artifact_path, python_model=TestModel())
  
  run_id = mlflow.active_run().info.run_uuid

-sandbox
In its current development, the server is only accessible through the CLI using `mlflow pyfunc serve`.  This will change in future development.  In the meantime, we can work around this using `click`.

<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> Models can be served in this way in other languages as well.

In [11]:
from multiprocessing import Process

server_port_number = 6501

def run_server():
  try:
    import mlflow.pyfunc.cli
    from click.testing import CliRunner
    
    CliRunner().invoke(mlflow.pyfunc.cli.commands, 
                       ['serve', 
                        "--model-path", artifact_path, 
                        "--run-id", run_id, 
                        "--port", server_port_number, 
                        "--host", "127.0.0.1", 
                        "--no-conda"])
  except Exception as e:
    print(e)

p = Process(target=run_server) # Run as a background process
p.start()

Create an input for our REST input.

In [13]:
import json
import pandas as pd

input_df = pd.DataFrame([0])
input_json = input_df.to_json(orient='split')

input_json

Perform a POST request against the endpoint.

In [15]:
import requests

headers = {'Content-type': 'application/json'}
url = "http://localhost:{port_number}/invocations".format(port_number=server_port_number)

response = requests.post(url=url, headers=headers, data=input_json)

print(response)
print(response.text)

Do the same in bash.

In [17]:
%sh (echo -n '{"columns":[0],"index":[0],"data":[[0]]}') | curl -H "Content-Type: application/json" -d @- http://127.0.0.1:6501/invocations

### Managed Service Walk-through

Choose one of the following:<br><br>

* [A walk-through of deployment to Azure ML]($./Extras/AzureML-Deployment ) and the corresponding <a href="https://www.mlflow.org/docs/latest/models.html#deploy-a-python-function-model-on-microsoft-azure-ml" target="_blank">MLflow docs</a>
* [A walk-through of deployment to AWS SageMaker]($./Extras/SageMaker-Deployment ) and the corresponding <a href="https://www.mlflow.org/docs/latest/models.html#deploy-a-python-function-model-on-amazon-sagemaker" target="_blank">MLflow docs</a>

### SageMaker

This example assumes that the model was already deployed to SageMaker.  See the walk-through above in case you missed it.  Now let's look at how we'll query that REST endpoint.

-sandbox
First set AWS keys as environment variables.  **This is not a best practice since this is not the most secure way of handling credentials.**  This works in our case sense the keys have a very limited policy associated with them.

<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> See the <a href="hhttps://docs.azuredatabricks.net/api/latest/secrets.html#id1" target="_blank">Secrets API</a> and <a href="https://docs.databricks.com/administration-guide/cloud-configurations/aws/iam-roles.html" target="_blank">IAM roles</a> for more secure ways of storing keys.

In [21]:
import os

# Set AWS credentials as environment variables
os.environ["AWS_ACCESS_KEY_ID"] = 'AKIAI4T2MLVBUB372FAA'
os.environ["AWS_SECRET_ACCESS_KEY"] = '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' # READ ONLY ACCESS KEYS
os.environ["AWS_DEFAULT_REGION"] = 'us-west-2'

Use `boto3`, the library for interacting with AWS in Python, to check the application status.

In [23]:
import boto3

def check_status(appName):
  sage_client = boto3.client('sagemaker', region_name="us-west-2")
  endpoint_description = sage_client.describe_endpoint(EndpointName=appName)
  endpoint_status = endpoint_description["EndpointStatus"]
  return endpoint_status

print("Application status is: {}".format(check_status(appName="airbnb-latest-0001")))

Import the Airbnb dataset and pull out the first record.

In [25]:
import pandas as pd
import random
from sklearn.model_selection import train_test_split

df = pd.read_csv("/dbfs/mnt/training/airbnb/sf-listings/airbnb-cleaned-mlflow.csv")
X_train, X_test, y_train, y_test = train_test_split(df.drop(["price"], axis=1), df[["price"]].values.ravel(), random_state=42)
query_input = X_train.iloc[[0]].values.flatten().tolist()

print("Using input vector: {}".format(query_input))

Define a helper function that connects to the `sagemaker-runtime` client and sends the record in the appropriate JSON format.

In [27]:
import json

def query_endpoint_example(inputs, appName="airbnb-latest-0001", verbose=True):
  if verbose:
    print("Sending batch prediction request with inputs: {}".format(inputs))
  client = boto3.session.Session().client("sagemaker-runtime", "us-west-2")
  
  response = client.invoke_endpoint(
      EndpointName=appName,
      Body=json.dumps(inputs),
      ContentType='application/json',
  )
  preds = response['Body'].read().decode("ascii")
  preds = json.loads(preds)
  
  if verbose:
    print("Received response: {}".format(preds))
  return preds

Query the endpoint.

In [29]:
prediction = query_endpoint_example(inputs=[query_input])

Now try the same but by using more than just one record.  Create a helper function to query the endpoint with a number of random samples.

In [31]:
def random_n_samples(n, df=X_train, verbose=False):
  dfShape = X_train.shape[0]
  samples = []
  
  for i in range(n):
    sample = X_train.iloc[[random.randint(0, dfShape-1)]].values
    samples.append(sample.flatten().tolist())
  
  return query_endpoint_example(samples, appName="airbnb-latest-0001", verbose=verbose)

Test this using 10 samples.  The payload for SageMaker can be 1 or more samples.

In [33]:
random_n_samples(10, verbose=True)

Compare the times between payload sizes.  **Notice how sending more records at a time reduces the time to prediction for each individual record.**

In [35]:
%timeit -n5 random_n_samples(100)

In [36]:
%timeit -n5 random_n_samples(1)

## Review

 - Survey the landscape of real time deployment options
 - Prototype a RESTful service using MLflow
 - Walk through the deployment of REST endpoint using SageMaker
 - Query a REST endpoint for inference using individual records and batch requests

**Question:** What are the best tools for real time deployment?  
**Answer:** This depends largely on the desired features.  The main tools to consider are a way to containerize code and either a REST endpoint or an embedded model.  This covers the vast majority of real time deployment options.

**Question:** What are the best options for RESTful services?  
**Answer:** The major cloud providers all have their respective deployment options.  In the Azure environment, Azure ML manages deployments using Docker images.  AWS SageMaker does the same.  This provides a REST endpoint that can be queried by various elements of your infrastructure.

**Question:** What factors influence REST deployment latency?  
**Answer:** Response time is a function of a few factors.  Batch predictions should be used when needed since it improves throughput by lowering the overhead of the REST connection.  Geo-location is also an issue, as is server load.  This can be handled by geo-located deployments and load balancing with more resources.

## Next Steps

Start the next lesson, [Drift Monitoring]($./09-Drift-Monitoring ).

## Additional Topics & Resources

**Q:** Where can I find out more information on MLflow's `pyfunc`?  
**A:** Check out <a href="https://www.mlflow.org/docs/latest/models.html#pyfunc-deployment" target="_blank">the MLflow documentation</a>

-sandbox
&copy; 2019 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>