# Model Deployment

deployment

batch (offline)
- run regularly

online
- up running all the time
 - web service (http requests, get back prediction)
 - streaming (stream of events, model services listening for events on the stream and react to this)


⚠️ **Attribution:** These are notes for [Module 4: Model Deployment](https://github.com/DataTalksClub/mlops-zoomcamp/blob/main/04-deployment) of the [MLOps Zoomcamp](https://github.com/DataTalksClub/mlops-zoomcamp). The MLOps Zoomcamp is a free course from [DataTalks.Club](https://github.com/DataTalksClub).



batch mode
- regularly (every day, hourly, monthly, etc)
- database of all data (pull data from db -> apply model)
- scoring job
- daily: get all data from yesterday
- hourly: get all data from previous hour
- and so on
- write to predictions db
- something can read from predictions db and react on this predictions, e.g. report
- marketing related tasks, e.g. churn prediction
- no need to know that user is about to churn immediately
- (i.e. churning does not occur at a small time interval)

web service
- contains model
- ride duration prediction
- backend sends info to service (pu location id, dropoff, time of day, etc)
- send back prediction, then passed to user
- needs to be up all the time.
- user uses app, checks ride duration, and decides whether to hire taxi or not.
- for this decision we can't wait five minutes, need this immediately.

streaming
- producer and consumers
- producer pushes event to event stream and consumers wil read from this stream.
- and react to these events. 
- recall web service: 1-1 relationship (explicit connection between user and service)
- 1-many  / many - many. 
- user -> producer=backend -> send event containing all info about ride ->
     services will react on this event

- e.g. one consuming service predict tip -> send push notif to user asking for tip.
- duration prediction (web service) = okay pred
- streaming service, better ride duration prediction -> update prediction. 
- only implicit connection, we dont know which consumer will react, how many
- example: content moderation
    - user -> video -> event -> C1 (copyright)
                             -> C2 (NSFW)        -> prediction stream -> decision service
                             -> C2 (violence)          

- can be scaled to infinitely many services or models (in principle)

## Deploying a model as a web service

```
app
data
ride_duration
|- VERSION
|- __init__.py
Dockerfile
setup.py
project.toml
MANIFEST.in
```

```bash
pipenv install scikit-learn==1.0.2 flask pandas mlflow --python=3.9
pipenv install
pipenv install --dev requests
pipenv install --dev pipenv-setup
pipenv-setup sync
```

In [None]:
edit pipfile -> pipenv install


In [28]:
import requests
import json

ride = [{
    'VendorID': 2,
    'store_and_fwd_flag': 'N',
    'RatecodeID': 1.0,
    'PULocationID': 130,
    'DOLocationID': 205,
    'passenger_count': 5.0,
    'trip_distance': 3.66,
    'fare_amount': 14.0,
    'extra': 0.5,
    'mta_tax': 0.5,
    'tip_amount': 10.0,
    'tolls_amount': 0.0,
    'ehail_fee': None,
    'improvement_surcharge': 0.3,
    'total_amount': 25.3,
    'payment_type': 1.0,
    'trip_type': 1.0,
    'congestion_surcharge': 0.0
}]


host = 'http://192.168.254.180:9696'
url = f'{host}/predict'
response = requests.post(url, json=ride)
result = response.json()
print(json.dumps(result, indent=4))

{
    "duration": 12.265893072879651
}


```bash
```

In [None]:
get model from mlflow registry using run id
problematic if server goes down
we become dependent on tracking server.
-> go directly to artifact root. 


## Batch deployment

We want to look at how often drivers deviate from the predicted duration. Can be useful for analytics. 

In [10]:
import requests
import json

ride = [{
    'PULocationID': 130,
    'DOLocationID': 205,
    'trip_distance': 3.66,
}]


host = 'http://0.0.0.0:9696'
url = f'{host}/predict'
response = requests.post(url, json=ride)
result = response.json()

print(json.dumps(result, indent=4))


{
    "duration": 12.265893072879651
}


In [2]:
import pandas as pd

df = pd.read_parquet('data/green_tripdata_2021-01.parquet')
df

Unnamed: 0,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
0,2,2021-01-01 00:15:56,2021-01-01 00:19:52,N,1.0,43,151,1.0,1.01,5.50,0.50,0.5,0.00,0.00,,0.3,6.80,2.0,1.0,0.00
1,2,2021-01-01 00:25:59,2021-01-01 00:34:44,N,1.0,166,239,1.0,2.53,10.00,0.50,0.5,2.81,0.00,,0.3,16.86,1.0,1.0,2.75
2,2,2021-01-01 00:45:57,2021-01-01 00:51:55,N,1.0,41,42,1.0,1.12,6.00,0.50,0.5,1.00,0.00,,0.3,8.30,1.0,1.0,0.00
3,2,2020-12-31 23:57:51,2021-01-01 00:04:56,N,1.0,168,75,1.0,1.99,8.00,0.50,0.5,0.00,0.00,,0.3,9.30,2.0,1.0,0.00
4,2,2021-01-01 00:16:36,2021-01-01 00:16:40,N,2.0,265,265,3.0,0.00,-52.00,0.00,-0.5,0.00,0.00,,-0.3,-52.80,3.0,1.0,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76513,2,2021-01-31 21:38:00,2021-01-31 22:16:00,,,81,90,,17.63,56.23,2.75,0.0,0.00,6.12,,0.3,65.40,,,
76514,2,2021-01-31 22:43:00,2021-01-31 23:21:00,,,35,213,,18.36,46.66,0.00,0.0,12.20,6.12,,0.3,65.28,,,
76515,2,2021-01-31 22:16:00,2021-01-31 22:27:00,,,74,69,,2.50,18.95,2.75,0.0,0.00,0.00,,0.3,22.00,,,
76516,2,2021-01-31 23:10:00,2021-01-31 23:37:00,,,168,215,,14.48,48.87,2.75,0.0,0.00,6.12,,0.3,58.04,,,


In [3]:
from ride_model.predict import load_model
from ride_model.predict import make_prediction

model = load_model()
make_prediction(model, df)

array([ 4.40383416,  8.83057186,  6.81991632, ..., 17.4479256 ,
       33.38209552, 13.17360371])

In [7]:
pd.read_parquet('output/green/2021-01.parquet')

Unnamed: 0,ride_id,lpep_pickup_datetime,PULocationID,DOLocationID,actual_duration,predicted_duration,diff,model_version
0,d685e8a5-3374-453d-acb2-20fb17445dad,2021-01-01 00:15:56,43,151,3.933333,4.403834,-0.470501,e1efc53e9bd149078b0c12aeaa6365df
1,488137c5-c8f9-44aa-9995-884d16800d0a,2021-01-01 00:25:59,166,239,8.750000,8.830572,-0.080572,e1efc53e9bd149078b0c12aeaa6365df
2,c8a71eb8-dc60-4223-b81c-520d1e4726f6,2021-01-01 00:45:57,41,42,5.966667,6.819916,-0.853250,e1efc53e9bd149078b0c12aeaa6365df
3,572f288d-f6d7-44e2-bca0-a17e0f920914,2020-12-31 23:57:51,168,75,7.083333,13.923927,-6.840594,e1efc53e9bd149078b0c12aeaa6365df
4,15c5da8f-13fa-4047-8566-8c25865943ee,2021-01-01 00:26:31,75,75,2.316667,6.735151,-4.418484,e1efc53e9bd149078b0c12aeaa6365df
...,...,...,...,...,...,...,...,...
73903,23134189-8855-476f-8f66-ab3d6f93ee0c,2021-01-31 21:38:00,81,90,38.000000,40.089000,-2.089000,e1efc53e9bd149078b0c12aeaa6365df
73904,11b8fc14-744b-4d74-9048-39a2651dd7c9,2021-01-31 22:43:00,35,213,38.000000,31.554369,6.445631,e1efc53e9bd149078b0c12aeaa6365df
73905,494fc2dc-8b72-49f1-99df-7243c0bf0506,2021-01-31 22:16:00,74,69,11.000000,17.447926,-6.447926,e1efc53e9bd149078b0c12aeaa6365df
73906,4b5220e3-005f-41ac-b4a3-503806f5e127,2021-01-31 23:10:00,168,215,27.000000,33.382096,-6.382096,e1efc53e9bd149078b0c12aeaa6365df


In [15]:
from ride_duration.utils import load_training_dataframe, prepare_features

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.feature_extraction import DictVectorizer
from sklearn.pipeline import make_pipeline


def train_pipeline(params, X_train, y_train, X_valid, y_valid):
    """Fit and save preprocessing pipeline."""

    with mlflow.start_run(nested=True):
        
        pipe = make_pipeline(
            DictVectorizer(),
            RandomForestRegressor(**params)
        )
        
        pipe.fit(X_train, y_train)

        # MLflow logging
        y_pred_train = pipe.predict(X_train)
        y_pred_valid = pipe.predict(X_valid)

        rmse_train = mean_squared_error(y_train, y_pred_train, squared=False)
        rmse_valid = mean_squared_error(y_valid, y_pred_valid, squared=False)

        mlflow.log_metric('rmse_train', rmse_train)
        mlflow.log_metric('rmse_valid', rmse_valid)
        mlflow.sklearn.log_model(pipe, artifact_path='models')


TRACKING_SERVER_HOST = "ec2-34-229-91-27.compute-1.amazonaws.com"
TRACKING_URI = f"http://{TRACKING_SERVER_HOST}:5000"
mlflow.set_tracking_uri(TRACKING_URI)
mlflow.set_experiment("nyc-taxi-experiment")

train_path = './data/green_tripdata_2021-01.parquet'
valid_path = './data/green_tripdata_2021-02.parquet'

train_data = load_training_dataframe(train_path)
valid_data = load_training_dataframe(valid_path)

X_train = train_data.drop(['duration'], axis=1)
X_valid = valid_data.drop(['duration'], axis=1)
y_train = train_data.duration.values
y_valid = valid_data.duration.values


for n_estimators in [300]:
    for max_depth in [2, 8]:
        params = {
            'n_estimators': n_estimators,
            'max_depth': max_depth,
        }

        train_pipeline(
            params,
            prepare_features(X_train), y_train,
            prepare_features(X_valid), y_valid,
        )

Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"


ParamValidationError: Parameter validation failed:
Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"

In [18]:
!aws s3 ls

2022-06-17 00:01:19 mlflow-models-ron


In [10]:
mlflow.get_artifact_uri()


's3:1/54285958c9f244e0af8615ee2caf09ea/artifacts'

In [None]:
client = MlflowClient(tracking_uri=TRACKING_URI)
print(client.list_experiments())

In [1]:
import mlflow
from mlflow.tracking import MlflowClient





[<Experiment: artifact_location='s3://mlflow-models-ron/0', experiment_id='0', lifecycle_stage='active', name='Default', tags={}>]


In [3]:
logged_model = f's3://mlflow-models-ron/0/{1}/artifacts/model'
model = mlflow.pyfunc.load_model(logged_model)


MlflowException: The following failures occurred while downloading one or more artifacts from s3://mlflow-models-ron/0/1/artifacts: {'model': "ClientError('An error occurred (404) when calling the HeadObject operation: Not Found')"}