# Model Deployment 

---
Recall that the goal of this project was to use machine learning to predict the wine variety based on a wine review text.
So the model we deploy should do exactly that.
It should take a review text as input and output a wine variety.

Since our model implementation was carried out with pyspark [Pipeline](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.Pipeline.html)'s, preparing the model for deployment is straight forward.

We can save a [PipelineModel](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.PipelineModel.html) 
which will allow others to load and reuse it.

However, before saving our model, we will have to do some minor adaptations.
For now, our models were designed to handle labeled data, which is not what the models for deployment will have to work with.
Also, our models produce labels, i.e. numbers, as predictions. Such predictions are of little use when deployed.
What we need is for our models to produce actual wine variety names, so the prediction should be `'pinot noir'` instead of `0.0`.

In the end, we want to be able to do is write something like:

```python
>>> from reviewed_grapes import CommonWordsModel

>>> sentence_df = spark.createDataFrame(
        [("A superbe red wine with blackberry and stuff.",),
         ("Acid dark too strong for me.",),
         ("Tart and snappy, supple plum aroma.",)],
        ["review"])  

>>> cmw = CommonWordsModel(inputCol='review', outputCol='predicted variety')

>>> cmw.transform(sentence_df).select('review', 'predicted variety').show()
+--------------------+------------------+                                                                                                                                                                       
|              review| predicted variety|                                                                                                                                                                       
+--------------------+------------------+                                                                                                                                                                       
|A superbe red win...|cabernet sauvignon|                                                                                                                                                                       
|Acid dark too str...|        pinot noir|                                                                                                                                                                       
|Tart and snappy, ...|        pinot noir|                                                                                                                                                                       
+--------------------+------------------+ 
```
So we also need to define a little wrapper class, basically the `CommonWordModel` above, that will allow for a smooth integration of our
fitted model in any pyspark pipeline.

With all of this done, we then create a minimal installable python package such that our fitted models can be uploaded to GitHub and easily deployed  with [pip](https://pypi.org/project/pip/).

---


In [10]:
from pyspark.ml.feature import IndexToString
from pyspark.ml.pipeline import PipelineModel
from pyspark.sql import SparkSession, SQLContext

In [2]:
from IPython.display import Markdown, display
from pyspark.sql import SparkSession, SQLContext

In [3]:
%matplotlib inline
def warn(string):
    display(Markdown('<span style="color:red">'+string+'</span>'))
def info(string):
    display(Markdown('<span style="color:blue">'+string+'</span>'))

In [12]:
spark = SparkSession.builder \
    .master("local[*]") \
    .config("spark.driver.maxResultSize", "1g") \
    .config("spark.driver.memory", "20g") \
    .appName("jojoSparkSession") \
    .getOrCreate()
    # .config("spark.driver.memory", "20g") \
    # .config("spark.default.parallelism", "16") \
    # .config("spark.executor.cores", "16") \
sc = spark.sparkContext
sqlContext = SQLContext(sc) 

Let's implement the changes we need to do on our models to make them ready for deployment:

In [13]:
def ready_for_deployment(model):
    """
    Function that renders a model deployable.
    
    It removes the StringIndexer for the labels and uses
    it to construct a IndexToString Transformer that will 
    transform the predicted label to a human readable wine 
    variety.
    """
    stages = model.stages
    # we no longer need the string indexer
    s_first = stages.pop(0)
    # however we now need a index->string
    # get the labels from the string indexer
    labels = s_first.labels
    # get the last stage
    s_last = stages[-1]
    # now we need to construct the index->string
    its = IndexToString(inputCol=s_last.getPredictionCol(),
                        labels=labels)
    stages.append(its)
    model.stages = stages

Now we load are models, ready them and save the deployable models:

In [14]:
model_names = ['CommonWordsModel',
               'SimilarWordsModel',
               'DissimilarWordsModel',
               'ExtremesWordsModel',
               'LowentropyWordsModel']
interim_path = 'data/interim/'
models = {name: PipelineModel.load(interim_path+name) for name in model_names}

In [15]:
m = models['DissimilarWordsModel']
stages = m.stages
print(stages)
print([s.__module__ for s in stages])

[StringIndexerModel: uid=StringIndexer_5150e5fc8a13, handleInvalid=error, NLTKLemmatizer_3634256b87d1, WordSetTrackerModel_5befa5362707, LogisticRegressionModel: uid=LogisticRegression_846b6e77627d, numClasses=57, numFeatures=798]
['pyspark.ml.feature', 'reviewed_grapes.transformers', 'reviewed_grapes.models', 'pyspark.ml.classification']


In [16]:
deployable_models_location = 'reviewed_grapes/fitted_models/{name}'
for name, model in models.items():
    print(f'Reday {name} for deployment.')
    ready_for_deployment(model)
    # save it
    model.write().overwrite().save(
        deployable_models_location.format(name=name)
    )

Reday CommonWordsModel for deployment.
Reday SimilarWordsModel for deployment.
Reday DissimilarWordsModel for deployment.
Reday ExtremesWordsModel for deployment.
Reday LowentropyWordsModel for deployment.


Finally we need to define the wrapper we are going to ship along with the model.

Below is the helper class that we put in [reviewed_grapes/models.py](ReviewedGrapes/reviewed_grapes/models.py) and ship along with the package of deployable models.

```python

# partial content of reviewed_grapes/models.py

class ReviewedGrapesModel(PipelineModel):

    def __new__(cls, inputCol='review', outputCol='prediction',
                modelPath=None):
        if modelPath is not None:
            self = cls.load(modelPath)
            # overwrite the input and output cols
            stages = self.stages
            s_first = stages[0]
            s_first.setInputCol(inputCol)
            s_last = stages[-1]
            s_last.setOutputCol(outputCol)
            self.stage = stages
        else:
            raise NotImplementedError(
                'For now only pretrained models are allowed, thus'
                ' the `modelPath` attribute must be set when instantiating'
                ' this class.'
            )

        return self
```

We use [partial](https://docs.python.org/3/library/functools.html#functools.partial) to define for each of our model a wrapper class pointing to the appropriate pipeline model (_see [reviewed_grapes/\_\_init\_\_.py](reviewed_grapes/__init__.py)_).

This looks something like:
```python

# see reviewed_grapes/__init__.py

from functools import partial

CommonWordsModel = partial(ReviewedGrapesModel, modelPath='<path/to/commonwordmodel>')

```


---