# Model Deployment

In this module, we will look into deploying the ride duration model which has been our working example in the modules. Deploying means that other applications can get predictions from our model. We will look at three modes of deployment: **online** deployment, **offline** or batch deployment, and **streaming**. 

In online mode, our service must be up all the time. To do this, we implement a web service which takes in HTTP requests and sends out predictions. In offline or mode, we have a service running regularly, but not necessarily all the time. This can make predictions for a batch of examples that runs periodically using workflow orchestration. Finally, we look at how to implement a streaming service, i.e. a machine learning service that listens to a stream of events and reacts to it using AWS Kinesis and AWS Lambda.

```{margin}
⚠️ **Attribution:** These are notes for [Module 4: Model Deployment](https://github.com/DataTalksClub/mlops-zoomcamp/blob/main/04-deployment) of the [MLOps Zoomcamp](https://github.com/DataTalksClub/mlops-zoomcamp). The MLOps Zoomcamp is a free course from [DataTalks.Club](https://github.com/DataTalksClub).
```


## Deploying the model as a web service

In this section, we develop a Flask prediction API that serves predictions using a trained model from our MLflow artifacts store or model registry. This will take in requests from the backend containing information of a ride which will be used by the model to make a prediction. Finally, we containerize this application using Docker. This container can be deployed anywhere where Docker is supported such as Kubernetes and Elastic Beanstalk.

### Model package

We start by packaging code for model prediction that will be used by the Flask application. This can also be used for offline model training. The directory structure of our project would look like:

```
deployment/
├── app/
│   └── main.py
├── ride_duration/
│   ├── __init__.py
│   ├── predict.py
│   ├── utils.py
│   └── VERSION
├── Dockerfile
├── Pipfile
├── MANIFEST.in
├── Pipfile.lock
├── setup.py
├── test.py
├── train.py
└── pyproject.toml
```

First we create `setup.py` and `pyproject.toml` for packaging. Refer to the links to see the complete code. For `setup.py` you only have to change the package metadata (or just leave them blank) and set `install_requires` to `[]`. This list will be later filled using a tool that integrates with Pipenv which we will use for package management.

```python
from pathlib import Path
from setuptools import find_packages, setup


# Package meta-data.
NAME = "ride-duration-prediction"
DESCRIPTION = ""
URL = ""
EMAIL = ""
AUTHOR = ""
REQUIRES_PYTHON = ">=3.9.0"


# The rest you shouldn't have to touch too much. Except for install_requires=[]. 
# Perhaps also the License and Trove Classifiers if publishing to PyPI (public).
...

setup(
    ...
    install_requires=[],             
    ...
    license="MIT",
    classifiers=[
        # Trove classifiers
        "License :: OSI Approved :: MIT License",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: Implementation :: CPython",
        "Programming Language :: Python :: Implementation :: PyPy",
    ],
)
```

```bash
pipenv install scikit-learn==1.0.2 flask pandas mlflow --python=3.9
pipenv install
pipenv install --dev requests
pipenv install --dev pipenv-setup
pipenv-setup sync
```

In [3]:
from setuptools import find_packages, setup


In [5]:
setup??

[0;31mSignature:[0m [0msetup[0m[0;34m([0m[0;34m**[0m[0mattrs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
The gateway to the Distutils: do everything your setup script needs
to do, in a highly flexible and user-driven way.  Briefly: create a
Distribution instance; find and parse config files; parse the command
line; run each Distutils command found there, customized by the options
supplied to 'setup()' (as keyword arguments), in config files, and on
the command line.

The Distribution instance might be an instance of a class supplied via
the 'distclass' keyword argument to 'setup'; if no such class is
supplied, then the Distribution class (in dist.py) is instantiated.
All other arguments to 'setup' (except for 'cmdclass') are used to set
attributes of the Distribution instance.

The 'cmdclass' argument, if supplied, is a dictionary mapping command
names to command classes.  Each command encountered on the command line
will be turned into a command class, whi

## Deploying batch predictions

Suppose we need regular predictions, e.g. daily, hourly, or monthly &mdash; periods where there is a lot of downtime. For example, making predictions on collected data for the sake of analytics or making reports, or predicting churn for our customer base. 

These use cases do not require the responsiveness of a web service. Hence, we can implement an offline service that makes batch predictions. For example, we can orchestrate a workflow in prefect for making batch predictions on some data that we pull from a database, then we can write the predictions on a database, upload a predictions file to S3, or push predictions to an analytics dashboard.

## Machine learning for streaming


streaming
- producer and consumers
- producer pushes event to event stream and consumers wil read from this stream.
- and react to these events. 
- recall web service: 1-1 relationship (explicit connection between user and service)
- 1-many  / many - many. 
- user -> producer=backend -> send event containing all info about ride ->
     services will react on this event

- e.g. one consuming service predict tip -> send push notif to user asking for tip.
- duration prediction (web service) = okay pred
- streaming service, better ride duration prediction -> update prediction. 
- only implicit connection, we dont know which consumer will react, how many
- example: content moderation
    - user -> video -> event -> C1 (copyright)
                             -> C2 (NSFW)        -> prediction stream -> decision service
                             -> C2 (violence)          

- can be scaled to infinitely many services or models (in principle)

## Deploying a model as a web service

```
app
data
ride_duration
|- VERSION
|- __init__.py
Dockerfile
setup.py
project.toml
MANIFEST.in
```

In [None]:
edit pipfile -> pipenv install


In [28]:
import requests
import json

ride = [{
    'VendorID': 2,
    'store_and_fwd_flag': 'N',
    'RatecodeID': 1.0,
    'PULocationID': 130,
    'DOLocationID': 205,
    'passenger_count': 5.0,
    'trip_distance': 3.66,
    'fare_amount': 14.0,
    'extra': 0.5,
    'mta_tax': 0.5,
    'tip_amount': 10.0,
    'tolls_amount': 0.0,
    'ehail_fee': None,
    'improvement_surcharge': 0.3,
    'total_amount': 25.3,
    'payment_type': 1.0,
    'trip_type': 1.0,
    'congestion_surcharge': 0.0
}]


host = 'http://192.168.254.180:9696'
url = f'{host}/predict'
response = requests.post(url, json=ride)
result = response.json()
print(json.dumps(result, indent=4))

{
    "duration": 12.265893072879651
}


```bash
```

In [None]:
get model from mlflow registry using run id
problematic if server goes down
we become dependent on tracking server.
-> go directly to artifact root. 


In [10]:
import requests
import json

ride = [{
    'PULocationID': 130,
    'DOLocationID': 205,
    'trip_distance': 3.66,
}]


host = 'http://0.0.0.0:9696'
url = f'{host}/predict'
response = requests.post(url, json=ride)
result = response.json()

print(json.dumps(result, indent=4))


{
    "duration": 12.265893072879651
}


In [8]:
import pandas as pd

df = pd.read_parquet('data/green_tripdata_2021-01.parquet')
df

Unnamed: 0,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
0,2,2021-01-01 00:15:56,2021-01-01 00:19:52,N,1.0,43,151,1.0,1.01,5.50,0.50,0.5,0.00,0.00,,0.3,6.80,2.0,1.0,0.00
1,2,2021-01-01 00:25:59,2021-01-01 00:34:44,N,1.0,166,239,1.0,2.53,10.00,0.50,0.5,2.81,0.00,,0.3,16.86,1.0,1.0,2.75
2,2,2021-01-01 00:45:57,2021-01-01 00:51:55,N,1.0,41,42,1.0,1.12,6.00,0.50,0.5,1.00,0.00,,0.3,8.30,1.0,1.0,0.00
3,2,2020-12-31 23:57:51,2021-01-01 00:04:56,N,1.0,168,75,1.0,1.99,8.00,0.50,0.5,0.00,0.00,,0.3,9.30,2.0,1.0,0.00
4,2,2021-01-01 00:16:36,2021-01-01 00:16:40,N,2.0,265,265,3.0,0.00,-52.00,0.00,-0.5,0.00,0.00,,-0.3,-52.80,3.0,1.0,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76513,2,2021-01-31 21:38:00,2021-01-31 22:16:00,,,81,90,,17.63,56.23,2.75,0.0,0.00,6.12,,0.3,65.40,,,
76514,2,2021-01-31 22:43:00,2021-01-31 23:21:00,,,35,213,,18.36,46.66,0.00,0.0,12.20,6.12,,0.3,65.28,,,
76515,2,2021-01-31 22:16:00,2021-01-31 22:27:00,,,74,69,,2.50,18.95,2.75,0.0,0.00,0.00,,0.3,22.00,,,
76516,2,2021-01-31 23:10:00,2021-01-31 23:37:00,,,168,215,,14.48,48.87,2.75,0.0,0.00,6.12,,0.3,58.04,,,


In [25]:
from ride_duration.predict import load_model
from ride_duration.predict import make_prediction

model = load_model(run_id='f4e2242a53a3410d89c061d1958ae70a')
make_prediction(model, df)

array([ 6.67431673, 13.79195043,  6.96578162, ..., 13.79195043,
       36.27351977, 10.71632294])

In [48]:
client.get_latest_versions(name='NYCRideDurationModel', stages=['Production'])[0]

<ModelVersion: creation_timestamp=1655460227751, current_stage='Production', description='', last_updated_timestamp=1655460239062, name='NYCRideDurationModel', run_id='f4e2242a53a3410d89c061d1958ae70a', run_link='', source='s3://mlflow-models-ron/1/f4e2242a53a3410d89c061d1958ae70a/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>

In [26]:
model = mlflow.pyfunc.load_model('models:/NYCRideDurationModel/Production')
make_prediction(model, df)

array([ 6.67431673, 13.79195043,  6.96578162, ..., 13.79195043,
       36.27351977, 10.71632294])

In [7]:
pd.read_parquet('output/green/2021-01.parquet')

Unnamed: 0,ride_id,lpep_pickup_datetime,PULocationID,DOLocationID,actual_duration,predicted_duration,diff,model_version
0,d685e8a5-3374-453d-acb2-20fb17445dad,2021-01-01 00:15:56,43,151,3.933333,4.403834,-0.470501,e1efc53e9bd149078b0c12aeaa6365df
1,488137c5-c8f9-44aa-9995-884d16800d0a,2021-01-01 00:25:59,166,239,8.750000,8.830572,-0.080572,e1efc53e9bd149078b0c12aeaa6365df
2,c8a71eb8-dc60-4223-b81c-520d1e4726f6,2021-01-01 00:45:57,41,42,5.966667,6.819916,-0.853250,e1efc53e9bd149078b0c12aeaa6365df
3,572f288d-f6d7-44e2-bca0-a17e0f920914,2020-12-31 23:57:51,168,75,7.083333,13.923927,-6.840594,e1efc53e9bd149078b0c12aeaa6365df
4,15c5da8f-13fa-4047-8566-8c25865943ee,2021-01-01 00:26:31,75,75,2.316667,6.735151,-4.418484,e1efc53e9bd149078b0c12aeaa6365df
...,...,...,...,...,...,...,...,...
73903,23134189-8855-476f-8f66-ab3d6f93ee0c,2021-01-31 21:38:00,81,90,38.000000,40.089000,-2.089000,e1efc53e9bd149078b0c12aeaa6365df
73904,11b8fc14-744b-4d74-9048-39a2651dd7c9,2021-01-31 22:43:00,35,213,38.000000,31.554369,6.445631,e1efc53e9bd149078b0c12aeaa6365df
73905,494fc2dc-8b72-49f1-99df-7243c0bf0506,2021-01-31 22:16:00,74,69,11.000000,17.447926,-6.447926,e1efc53e9bd149078b0c12aeaa6365df
73906,4b5220e3-005f-41ac-b4a3-503806f5e127,2021-01-31 23:10:00,168,215,27.000000,33.382096,-6.382096,e1efc53e9bd149078b0c12aeaa6365df


```
mlflow server -h 0.0.0.0 -p 5000
    --backend-store-uri=sqlite:///mlflow.db \
    --default-artifact-root=s3://mlflow-models-ron/
```

In [4]:
import mlflow
from mlflow.tracking import MlflowClient


TRACKING_SERVER_HOST = "ec2-3-93-179-24.compute-1.amazonaws.com"
TRACKING_URI = f"http://{TRACKING_SERVER_HOST}:5000"

mlflow.set_tracking_uri(TRACKING_URI)


In [5]:
client = MlflowClient(tracking_uri=TRACKING_URI)
client.list_experiments()

[<Experiment: artifact_location='s3://mlflow-models-ron/0', experiment_id='0', lifecycle_stage='active', name='Default', tags={}>]

- run training
- can be accessed locally. but make sure environment is configured. 

In [58]:
!export TRACKING_SERVER_HOST=ec2-52-90-170-113.compute-1.amazonaws.com

In [2]:
TRACKING_SERVER_HOST = os.getenv('TRACKING_SERVER_HOST')
TRACKING_SERVER_HOST

In [5]:
import requests
import os

TRACKING_SERVER_HOST = "ec2-52-90-170-113.compute-1.amazonaws.com"
TRACKING_URI = f"http://{TRACKING_SERVER_HOST}:5000"

response = requests.head(TRACKING_URI)
if response.status_code != 200:
    raise Exception(f"Tracking server unavailable: HTTP response code {response.status_code}")

In [4]:
response.status_code

200

## Streaming

* Scenario
* Creating the role
* Create a Lambda function, test it
* Create a Kinesis stream
* Connect the function to the stream
* Send the records

Links
* [Tutorial: Using Amazon Lambda with Amazon Kinesis](https://docs.amazonaws.cn/en_us/lambda/latest/dg/with-kinesis-example.html)


