-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

![](https://files.training.databricks.com/images/301/deployment_options_mllib.png)

There are **four main deployment options**:
* **Batch pre-compute**
* **Structured streaming**
* **Low-latency model serving**
* **Mobile/embedded (outside scope of class)**

We have already seen how to do batch predictions using Spark. Now let's look at how to make predictions on streaming data.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lesson you:<br>
 - **Apply a SparkML model on a simulated stream of data**

In [0]:
%run "../Includes/Classroom-Setup"

## Load in Model & Data

We are loading in a repartitioned version of our dataset (100 partitions instead of 4) to see more incremental progress of the streaming predictions.

In [0]:
from pyspark.ml.pipeline import PipelineModel

pipeline_path = f"{datasets_dir}/airbnb/sf-listings/models/sf-listings-2019-03-06/pipeline_model"
pipeline_model = PipelineModel.load(pipeline_path)

repartitioned_path =  f"{datasets_dir}/airbnb/sf-listings/sf-listings-2019-03-06-clean-100p.parquet/"
schema = spark.read.parquet(repartitioned_path).schema

## Simulate streaming data

**NOTE**: You must specify a schema when creating a streaming source DataFrame.

In [0]:
streaming_data = (spark
                 .readStream
                 .schema(schema) # Can set the schema this way
                 .option("maxFilesPerTrigger", 1)
                 .parquet(repartitioned_path))

## Make Predictions

In [0]:
stream_pred = pipeline_model.transform(streaming_data)

**Let's save our results.**

In [0]:
import re

checkpoint_dir = working_dir + "/stream_checkpoint"
# Clear out the checkpointing directory
dbutils.fs.rm(checkpoint_dir, True) 

(stream_pred
 .writeStream
 .format("memory")
 .option("checkpointLocation", checkpoint_dir)
 .outputMode("append")
 .queryName("pred_stream")
 .start())

In [0]:
untilStreamIsReady("pred_stream")

#### While this is running, take a look at the new Structured Streaming tab in the Spark UI.

In [0]:
display(
  sql("select * from pred_stream")
)

host_is_superhost,cancellation_policy,instant_bookable,host_total_listings_count,neighbourhood_cleansed,zipcode,latitude,longitude,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,minimum_nights,number_of_reviews,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,price,bedrooms_na,bathrooms_na,beds_na,review_scores_rating_na,review_scores_accuracy_na,review_scores_cleanliness_na,review_scores_checkin_na,review_scores_communication_na,review_scores_location_na,review_scores_value_na,host_is_superhostIndex,host_is_superhostOHE,cancellation_policyIndex,cancellation_policyOHE,instant_bookableIndex,instant_bookableOHE,neighbourhood_cleansedIndex,neighbourhood_cleansedOHE,zipcodeIndex,zipcodeOHE,property_typeIndex,property_typeOHE,room_typeIndex,room_typeOHE,bed_typeIndex,bed_typeOHE,features,prediction
t,moderate,t,1.0,Western Addition,94117,37.77149,-122.42872,Apartment,Private room,2.0,1.0,1.0,1.0,Real Bed,1.0,44.0,95.0,10.0,9.0,10.0,10.0,10.0,10.0,90.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,"Map(vectorType -> sparse, length -> 1, indices -> List(), values -> List())",1.0,"Map(vectorType -> sparse, length -> 5, indices -> List(1), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 1, indices -> List(), values -> List())",2.0,"Map(vectorType -> sparse, length -> 34, indices -> List(2), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 28, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 25, indices -> List(0), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 2, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 4, indices -> List(0), values -> List(1.0))","Map(vectorType -> sparse, length -> 126, indices -> List(2, 9, 42, 69, 95, 96, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 37.77149, -122.42872, 2.0, 1.0, 1.0, 1.0, 1.0, 44.0, 95.0, 10.0, 9.0, 10.0, 10.0, 10.0, 10.0))",110.6106637326011
t,strict_14_with_grace_period,t,1.0,Outer Mission,94131,37.73285,-122.44235,House,Private room,3.0,1.0,1.0,0.0,Real Bed,2.0,240.0,97.0,10.0,9.0,10.0,10.0,10.0,9.0,119.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,"Map(vectorType -> sparse, length -> 1, indices -> List(), values -> List())",0.0,"Map(vectorType -> sparse, length -> 5, indices -> List(0), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 1, indices -> List(), values -> List())",16.0,"Map(vectorType -> sparse, length -> 34, indices -> List(16), values -> List(1.0))",9.0,"Map(vectorType -> sparse, length -> 28, indices -> List(9), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 25, indices -> List(1), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 2, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 4, indices -> List(0), values -> List(1.0))","Map(vectorType -> sparse, length -> 126, indices -> List(1, 23, 50, 70, 95, 96, 100, 101, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112, 113, 114, 115), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 37.73285, -122.44235, 3.0, 1.0, 1.0, 2.0, 240.0, 97.0, 10.0, 9.0, 10.0, 10.0, 10.0, 9.0))",67.58495702355867
f,strict_14_with_grace_period,f,2.0,Mission,94110,37.75139,-122.41194,House,Entire home/apt,2.0,1.0,1.0,1.0,Real Bed,31.0,85.0,99.0,10.0,10.0,10.0,10.0,10.0,10.0,200.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"Map(vectorType -> sparse, length -> 1, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 5, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 1, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 34, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 28, indices -> List(0), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 25, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 2, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 4, indices -> List(0), values -> List(1.0))","Map(vectorType -> sparse, length -> 126, indices -> List(0, 1, 6, 7, 41, 70, 94, 96, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 37.75139, -122.41194, 2.0, 1.0, 1.0, 1.0, 31.0, 85.0, 99.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0))",168.0648859992616
f,moderate,f,1.0,South of Market,94103,37.77086,-122.40459,Apartment,Entire home/apt,2.0,1.0,1.0,1.0,Real Bed,1.0,6.0,100.0,10.0,10.0,10.0,10.0,10.0,10.0,275.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"Map(vectorType -> sparse, length -> 1, indices -> List(0), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 5, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 1, indices -> List(0), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 34, indices -> List(1), values -> List(1.0))",4.0,"Map(vectorType -> sparse, length -> 28, indices -> List(4), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 25, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 2, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 4, indices -> List(0), values -> List(1.0))","Map(vectorType -> sparse, length -> 126, indices -> List(0, 2, 6, 8, 45, 69, 94, 96, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 37.77086, -122.40459, 2.0, 1.0, 1.0, 1.0, 1.0, 6.0, 100.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0))",168.72583024587584
f,flexible,t,439.0,Western Addition,94115,37.7864,-122.4365,Apartment,Entire home/apt,2.0,1.0,0.0,1.0,Real Bed,30.0,0.0,98.0,10.0,10.0,10.0,10.0,10.0,10.0,196.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,"Map(vectorType -> sparse, length -> 1, indices -> List(0), values -> List(1.0))",2.0,"Map(vectorType -> sparse, length -> 5, indices -> List(2), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 1, indices -> List(), values -> List())",2.0,"Map(vectorType -> sparse, length -> 34, indices -> List(2), values -> List(1.0))",10.0,"Map(vectorType -> sparse, length -> 28, indices -> List(10), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 25, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 2, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 4, indices -> List(0), values -> List(1.0))","Map(vectorType -> sparse, length -> 126, indices -> List(0, 3, 9, 51, 69, 94, 96, 100, 101, 102, 103, 104, 106, 107, 109, 110, 111, 112, 113, 114, 115, 119, 120, 121, 122, 123, 124, 125), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 439.0, 37.7864, -122.4365, 2.0, 1.0, 1.0, 30.0, 98.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))",169.40790726757405
t,moderate,f,4.0,Mission,94110,37.75057,-122.41043,Apartment,Private room,2.0,1.0,1.0,1.0,Real Bed,2.0,102.0,98.0,10.0,10.0,10.0,10.0,10.0,9.0,116.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,"Map(vectorType -> sparse, length -> 1, indices -> List(), values -> List())",1.0,"Map(vectorType -> sparse, length -> 5, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 1, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 34, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 28, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 25, indices -> List(0), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 2, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 4, indices -> List(0), values -> List(1.0))","Map(vectorType -> sparse, length -> 126, indices -> List(2, 6, 7, 41, 69, 95, 96, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 37.75057, -122.41043, 2.0, 1.0, 1.0, 1.0, 2.0, 102.0, 98.0, 10.0, 10.0, 10.0, 10.0, 10.0, 9.0))",116.6779662590052
t,strict_14_with_grace_period,f,1.0,Bayview,94124,37.73821,-122.39146,Apartment,Private room,2.0,1.5,1.0,1.0,Real Bed,30.0,10.0,100.0,10.0,10.0,10.0,10.0,9.0,10.0,60.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,"Map(vectorType -> sparse, length -> 1, indices -> List(), values -> List())",0.0,"Map(vectorType -> sparse, length -> 5, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 1, indices -> List(0), values -> List(1.0))",17.0,"Map(vectorType -> sparse, length -> 34, indices -> List(17), values -> List(1.0))",17.0,"Map(vectorType -> sparse, length -> 28, indices -> List(17), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 25, indices -> List(0), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 2, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 4, indices -> List(0), values -> List(1.0))","Map(vectorType -> sparse, length -> 126, indices -> List(1, 6, 24, 58, 69, 95, 96, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 37.73821, -122.39146, 2.0, 1.5, 1.0, 1.0, 30.0, 10.0, 100.0, 10.0, 10.0, 10.0, 10.0, 9.0, 10.0))",29.32134988472899
f,strict_14_with_grace_period,t,24.0,Financial District,94108,37.78952,-122.40664,Boutique hotel,Private room,2.0,1.0,0.0,1.0,Real Bed,5.0,1.0,100.0,10.0,10.0,10.0,10.0,10.0,10.0,200.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"Map(vectorType -> sparse, length -> 1, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 5, indices -> List(0), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 1, indices -> List(), values -> List())",21.0,"Map(vectorType -> sparse, length -> 34, indices -> List(21), values -> List(1.0))",12.0,"Map(vectorType -> sparse, length -> 28, indices -> List(12), values -> List(1.0))",4.0,"Map(vectorType -> sparse, length -> 25, indices -> List(4), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 2, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 4, indices -> List(0), values -> List(1.0))","Map(vectorType -> sparse, length -> 126, indices -> List(0, 1, 28, 53, 73, 95, 96, 100, 101, 102, 103, 104, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 24.0, 37.78952, -122.40664, 2.0, 1.0, 1.0, 5.0, 1.0, 100.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0))",139.72477195190731
f,strict_14_with_grace_period,f,1.0,Outer Sunset,94122,37.7551,-122.50936,House,Entire home/apt,4.0,1.0,2.0,3.0,Real Bed,3.0,23.0,98.0,10.0,10.0,10.0,10.0,10.0,9.0,285.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"Map(vectorType -> sparse, length -> 1, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 5, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 1, indices -> List(0), values -> List(1.0))",8.0,"Map(vectorType -> sparse, length -> 34, indices -> List(8), values -> List(1.0))",8.0,"Map(vectorType -> sparse, length -> 28, indices -> List(8), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 25, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 2, indices -> List(0), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 4, indices -> List(0), values -> List(1.0))","Map(vectorType -> sparse, length -> 126, indices -> List(0, 1, 6, 15, 49, 70, 94, 96, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 37.7551, -122.50936, 4.0, 1.0, 2.0, 3.0, 3.0, 23.0, 98.0, 10.0, 10.0, 10.0, 10.0, 10.0, 9.0))",249.36392371191687
t,moderate,f,4.0,Castro/Upper Market,94114,37.7586,-122.43332,Apartment,Private room,2.0,1.5,1.0,1.0,Real Bed,2.0,61.0,94.0,9.0,9.0,10.0,10.0,10.0,9.0,95.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,"Map(vectorType -> sparse, length -> 1, indices -> List(), values -> List())",1.0,"Map(vectorType -> sparse, length -> 5, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 1, indices -> List(0), values -> List(1.0))",4.0,"Map(vectorType -> sparse, length -> 34, indices -> List(4), values -> List(1.0))",2.0,"Map(vectorType -> sparse, length -> 28, indices -> List(2), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 25, indices -> List(0), values -> List(1.0))",1.0,"Map(vectorType -> sparse, length -> 2, indices -> List(1), values -> List(1.0))",0.0,"Map(vectorType -> sparse, length -> 4, indices -> List(0), values -> List(1.0))","Map(vectorType -> sparse, length -> 126, indices -> List(2, 6, 11, 43, 69, 95, 96, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 37.7586, -122.43332, 2.0, 1.5, 1.0, 1.0, 2.0, 61.0, 94.0, 9.0, 9.0, 10.0, 10.0, 10.0, 9.0))",103.21780879473954


In [0]:
display(
  sql("select count(*) from pred_stream")
)

count(1)
6940


Now that we are done, make sure to stop the stream

In [0]:
for stream in spark.streams.active:
    print(f"Stopping {stream.name}")
    stream.stop()                  # Stop the active stream
    try: stream.awaitTermination() # Wait for it to actually stop
    except: pass                   # Don't care if stopping fails

### What about Model Export?

* <a href="https://onnx.ai/" target="_blank">ONNX</a>
  * ONNX is very popular in the deep learning community allowing developers to switch between libraries and languages, but only has experimental support for MLlib.
* DIY (Reimplement it yourself)
  * Error-prone, fragile
* 3rd party libraries
  * See XGBoost notebook
  * <a href="https://www.h2o.ai/products/h2o-sparkling-water/" target="_blank">H2O</a>

### Low-Latency Serving Solutions

Low-latency serving can operate as quickly as tens to hundreds of milliseconds.  Custom solutions are normally backed by Docker and/or Flask (though Flask generally isn't recommended in production unless significant precations are taken).  Managed solutions also include:<br><br>

* <a href="https://docs.databricks.com/applications/mlflow/model-serving.html" target="_blank">MLflow Model Serving</a>
* <a href="https://azure.microsoft.com/en-us/services/machine-learning/" target="_blank">Azure Machine Learning</a>
* <a href="https://aws.amazon.com/sagemaker/" target="_blank">SageMaker</a>

-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>