# Model Deployment

![screenshot01](Screenshot_01.png)

* Until now, we have been working on the first (Design) and the second (Training) step
* When it comes to model deployment, we hava multiple options.
* Do we need the predictions immediately? Or can it wait a bit?
    * If we can wait, we apply the model regularly; this in the so called *Batch Mode* or *Offline Mode*
    * If we need the predictions immediately, we run the model in the so called *Online Mode. In this case we have two options to make the model available:
       1. Through a webservice
       2. Through streaming

## Batch Mode

* Apply the model regularly, e.g. every 10 minutes, every day, etc.
* Often used for marketing related tasks
![batch_mode](batch_mode.png)

## Web Services
* Common way of deploying models
![webservice](webservice.png)

## Streaming
* In contrast to a web service multiple consumers can use the streaming output
![streaming](streaming.png)
* Example: 
    * Uploading of a Video 
    * is tested by different services (e.g. violence, copyright, ...) 
    * these services send their predictions to a decision service if the video can be uploaded

# Web-services: Deploying models with Flask and Docker

* Use the model we created in the previous weeks and deploy it via a webservice
    * Create a virtual environment
    * Create a script for predicting
    * Put the script into a flask app
    * Package the app to docker

* First find the exact version of sklearn we used to create the model. If we load it using another version it might not work. We can do that using ```pip freeze | grep scikit-learn```. Or in the conda environment ```conda list | grep scikit-learn``` shows ```scikit-learn 1.0.2```
* Create a virtual environemt: ```pipenv install scikitlearn==1.0.2 flask --python 3.9```
* Start the environment: ```pipenv shell```

* Now create the prediction script: ```predict.py```
* Test it with ```test.py```, where the model is applied to a specific example

* Now turn it into a flask application: ```predict_flask.py```
* Test this using ```test_flask.py```

* When we start the app using ```python3 predict_flask.py```, we get the following warning:
```WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.```
* To fix this use ```gunicorn```: ```pipenv install gunicorn```
* Flask is only used to run things locally
* start the app with gunicorn: ```gunicorn --bind=0.0.0.0:9696 predict_flask:app```

* Note: The library ```requests``` is included in the base python, but not in our virtual environment. We need this library only for testing. I.e. we can install it as a development dependency: ```pipenv install --dev requests```

* Now package everything into docker: ```Dockerfile```
* Build the image: ```docker build -t ride-duration-prediction-service:v1 .```
* Run the imgae: ```docker run -it --rm -p 9696:9696 ride-duration-prediction-service:v1```

## Web Services: Getting the models from the model registry (MLflow)

* Use the model from random-forest.ipynb
* Use the runid from mlflow
* Adapt the ```predict_flask.py``` script
* Use the run id and pyfunc to load the model
* The dict vectorizer is stored as an artifact, to load is we need to use the client
* Better: define dict vectorizer and model as pipeline and use pyfunc to load both together. Adapt test_flask.py accordingly.