## Model Deployment with Spark Serving 
In this example, we try to movie recommendations from the *Movie Ratings* dataset. Then we will use Spark serving to deploy it as a realtime web service. 
First, we import needed packages:

In [2]:
import sys
import numpy as np
import pandas as pd
import mmlspark
import os

from mmlspark.RecommendationIndexer import RecommendationIndexer
from mmlspark.SAR import SAR

from pyspark.ml import Pipeline, PipelineModel

Now let's read the data and split it to train and test sets:

In [4]:
#Columns
user_id='UserId'
item_id='MovieId'
rating_id='Rating'

user_id_index = user_id.replace("Id","Index")
item_id_index = item_id.replace("Id","Index")

# Download Movie Ratings
basedataurl = "http://aka.ms" 
datafile = "MovieRatings.csv"

datafile_dbfs = os.path.join("/dbfs", datafile)

if os.path.isfile(datafile_dbfs):
    print("found {} at {}".format(datafile, datafile_dbfs))
else:
    print("downloading {} to {}".format(datafile, datafile_dbfs))
    urllib.request.urlretrieve(os.path.join(basedataurl, datafile), datafile_dbfs)
    
data_all = sqlContext.read.format('csv')\
                     .options(header='true', delimiter=',', inferSchema='true', ignoreLeadingWhiteSpace='true', ignoreTrailingWhiteSpace='true')\
                     .load(datafile)    

display(data_all)

train, test = data_all.randomSplit([0.75, 0.25], seed=123)

UserId,MovieId,Rating,Timestamp
1,68646,10,1381620027
1,113277,10,1379466669
2,454876,8,1394818630
2,790636,7,1389963947
2,816711,8,1379963769
2,1091191,7,1391173869
2,1322269,7,1391529691
2,1433811,8,1380453043
2,1454468,8,1387016442
2,1535109,8,1386350135


Next, we will create a MMLSpark SAR Model

In [6]:
recommendation_indexer = RecommendationIndexer(userInputCol=user_id, userOutputCol=user_id_index,
                                               itemInputCol=item_id, itemOutputCol=item_id_index)\
                                              .fit(train)

sar = SAR(userCol=user_id_index, itemCol=item_id_index, ratingCol=rating_id)

train.cache()
model = Pipeline(stages=[recommendation_indexer, sar]).fit(train)

In [7]:
testInput = recommendation_indexer.transform(test).select("UserIndex")

Now, we will define the webservice input/output.
For more information, you can visit the [documentation for Spark Serving](https://github.com/Azure/mmlspark/blob/master/docs/mmlspark-serving.md)

In [9]:
from pyspark.sql.functions import col, from_json, broadcast
from pyspark.sql.types import *
import uuid
from mmlspark import request_to_string, string_to_response

serving_inputs = spark.readStream.server() \
    .address("localhost", 8898, "my_api") \
    .load()\
    .parseRequest(schema=testInput.schema, apiName="my_api")

recommendations = model.stages[1].recommendForAllUsers(10).cache()

serving_outputs = serving_inputs \
  .join(broadcast(recommendations), 'UserIndex') \
  .makeReply("recommendations")

server = serving_outputs.writeStream \
    .server() \
    .replyTo("my_api") \
    .queryName("my_query") \
    .option("checkpointLocation", "/checkpoints-{}".format(uuid.uuid1())) \
    .start()


Test the webservice

In [11]:
import requests
data = u'{"UserIndex":13621.0}'
r = requests.post(data=data, url="http://localhost:8898/my_api")
print("Response {}".format(r.text))

In [12]:
import requests
data = u'{"UserIndex":4247.0}'
r = requests.post(data=data, url="http://localhost:8898/my_api")
print("Response {}".format(r.text))

In [13]:
import time
time.sleep(20) # wait for server to finish setting up (just to be safe)
server.stop()