## Model Deployment with Spark Serving 
In this example, we try to predict incomes from the *Adult Census* dataset. Then we will use Spark serving to deploy it as a realtime web service. 
First, we import needed packages:

In [3]:
import sys
import numpy as np
import pandas as pd
import mmlspark


Now let's read the data and split it to train and test sets:

In [5]:
dataFilePath = "AdultCensusIncome.csv"
import os, urllib
if not os.path.isfile(dataFilePath):
    urllib.request.urlretrieve("https://mmlspark.azureedge.net/datasets/" + dataFilePath, dataFilePath)
data = spark.createDataFrame(pd.read_csv(dataFilePath, dtype={" hours-per-week": np.float64}))
data = data.select([" education", " marital-status", " hours-per-week", " income"])
train, test = data.randomSplit([0.75, 0.25], seed=123)
train.limit(10).toPandas()

`TrainClassifier` can be used to initialize and fit a model, it wraps SparkML classifiers.
You can use `help(mmlspark.TrainClassifier)` to view the different parameters.

Note that it implicitly converts the data into the format expected by the algorithm. More specifically it:
 tokenizes, hashes strings, one-hot encodes categorical variables, assembles the features into a vector
etc.  The parameter `numFeatures` controls the number of hashed features.

In [7]:
from mmlspark import TrainClassifier
from pyspark.ml.classification import LogisticRegression
model = TrainClassifier(model=LogisticRegression(), labelCol=" income", numFeatures=256).fit(train)

After the model is trained, we score it against the test dataset and view metrics.

In [9]:
from mmlspark import ComputeModelStatistics, TrainedClassifierModel
prediction = model.transform(test)
prediction.printSchema()

In [10]:
metrics = ComputeModelStatistics().transform(prediction)
metrics.limit(10).toPandas()

First, we will define the webservice input/output.
For more information, you can visit the [documentation for Spark Serving](https://github.com/Azure/mmlspark/blob/master/docs/mmlspark-serving.md)

In [15]:
from pyspark.sql.functions import col, from_json
from pyspark.sql.types import *
import uuid

serving_inputs = spark.readStream.server() \
    .address("localhost", 8888, "my_api") \
    .load()\
    .withColumn("variables", from_json(col("value"), test.schema))\
    .select("id","variables.*")

serving_outputs = model.transform(serving_inputs) \
  .withColumn("scored_labels", col("scored_labels").cast("string"))

server = serving_outputs.writeStream \
    .server() \
    .option("name", "my_api") \
    .queryName("my_query") \
    .option("replyCol", "scored_labels") \
    .option("checkpointLocation", "checkpoints-{}".format(uuid.uuid1())) \
    .start()


Test the webservice

In [17]:
import requests
data = u'{" education":" 10th"," marital-status":" Divorced"," hours-per-week":40.0," income":" <=50K"}'
r = requests.post(data=data, url="http://localhost:8888/my_api")
print("Response {}".format(r.text))

In [17]:
import time
time.sleep(20) # wait for server to finish setting up (just to be safe)
server.stop()