# Model Deployment

In this module, we will look into deploying the ride duration model which has been our working example in the modules. Deploying means that other applications can get predictions from our model. We will look at three modes of deployment: **online** deployment, **offline** or batch deployment, and **streaming**. 

In online mode, our service must be up all the time. To do this, we implement a web service which takes in HTTP requests and sends out predictions. In offline or mode, we have a service running regularly, but not necessarily all the time. This can make predictions for a batch of examples that runs periodically using workflow orchestration. Finally, we look at how to implement a streaming service, i.e. a machine learning service that listens to a stream of events and reacts to it using AWS Kinesis and AWS Lambda.

```{margin}
⚠️ **Attribution:** These are notes for [Module 4: Model Deployment](https://github.com/DataTalksClub/mlops-zoomcamp/blob/main/04-deployment) of the [MLOps Zoomcamp](https://github.com/DataTalksClub/mlops-zoomcamp). The MLOps Zoomcamp is a free course from [DataTalks.Club](https://github.com/DataTalksClub).
```


## Deploying models with Flask and Docker

In this section, we develop a web server using Flask for serving model predictions. The model is obtained from an S3 artifacts store and predicts on data sent to the service by the backend. We will containerize this application using Docker. This container can be deployed anywhere where Docker is supported such as Kubernetes and Elastic Beanstalk.

### Model package

Here we will package code for model prediction that will be used by the Flask application. This can also be used for offline model training or batch scoring. The directory structure of our project would look like:

```
deployment/
├── app/
│   └── main.py
├── ride_duration/
│   ├── __init__.py
│   ├── predict.py
│   ├── utils.py
│   └── VERSION
├── Dockerfile
├── Pipfile
├── MANIFEST.in
├── Pipfile.lock
├── setup.py
├── test.py
├── train.py
└── pyproject.toml
```

First, we create [`setup.py`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/setup.py) and [`pyproject.toml`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/pyproject.toml) for packaging. Refer to the links to see the complete code. For `setup.py` you only have to change the package metadata (or just leave them blank) and set `install_requires` to `[]`. This list will be later filled using a tool that integrates with Pipenv which we will use for package management.

```{margin}
[`setup.py`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/setup.py)
```
```python
from pathlib import Path
from setuptools import find_packages, setup


# Package meta-data.
NAME = "ride-duration-prediction"
DESCRIPTION = ""
URL = ""
EMAIL = ""
AUTHOR = ""
REQUIRES_PYTHON = ">=3.9.0"


# The rest you shouldn't have to touch too much. Except for install_requires=[]. 
# Perhaps also the License and Trove Classifiers if publishing to PyPI (public).
...

setup(
    ...
    install_requires=[],             
    ...
    license="MIT",
    classifiers=[
        # Trove classifiers
        "License :: OSI Approved :: MIT License",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: Implementation :: CPython",
        "Programming Language :: Python :: Implementation :: PyPy",
    ],
)
```

Additionally, we can include [`MANIFEST.in`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/MANIFEST.in) file to specify the files included in the source distribution of the package. The full list can be viewed in the `SOURCES.txt` file of the generated `egg-info` folder after building the package.

```{margin}
[`MANIFEST.in`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/MANIFEST.in)
```
```
include ride_duration/*.py
include ride_duration/VERSION

recursive-exclude * __pycache__
recursive-exclude * *.py[co]

exclude Dockerfile
exclude Pipfile
exclude Pipfile.lock
exclude *.py
exclude app/*
exclude data/*
```

### Pipenv

To manage our projects package dependencies, we will use [Pipenv](https://pipenv.pypa.io/en/latest/). Notice that we get `Pipfile` which supersedes the usual requirements file, and also a `Pipfile.lock` containing hashes of downloaded packages that ensure reproducible builds. 

```bash
pipenv install scikit-learn==1.0.2 flask pandas mlflow boto3 --python=3.9
pipenv install --dev requests
pipenv install --dev pipenv-setup
```

Next, we install the model package. Here we use `pipenv-setup sync` to update `install_requires` in the `setup` script according to the packages installed using Pipenv. This makes sure there are no dependency conflicts when using the package.

```bash
pipenv-setup sync
pipenv install -e .
```

Our `Pipfile` should now look like the following. Note that `ride-duration-prediction` is installed in editable mode which makes sense since the underlying code is still in development. 

```{margin}
[`Pipfile`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/Pipfile)
```
```ini
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
scikit-learn = "==1.0.2"
flask = "*"
pandas = "*"
mlflow = "*"
boto3 = "*"
ride-duration-prediction = {editable = true, path = "."}

[dev-packages]
requests = "*"
pipenv-setup = "*"

[requires]
python_version = "3.10"
```

### Model package scripts

For the model package, we have `utils.py` where we define helper functions. These are the usual `load_training_dataframe` function which creates the target features (ride duration in minutes) and filters it to some range, i.e. `[1, 60]`, and `prepare_features` for creating the `PU_DO` combination feature IDs of pickup and dropoff points.

```{margin}
[`utils.py`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/ride_duration/utils.py)
```
```python
def load_training_dataframe(file_path, y_min=1, y_max=60):
    """Load data from disk and preprocess for training."""
    
    # Load data from disk
    data = pd.read_parquet(file_path)

    # Create target column and filter outliers
    data['duration'] = data.lpep_dropoff_datetime - data.lpep_pickup_datetime
    data['duration'] = data.duration.dt.total_seconds() / 60
    data = data[(data.duration >= y_min) & (data.duration <= y_max)]

    # Create uuids
    data['ride_id'] = generate_uuids(len(data))

    return data


def prepare_features(input_data):
    """Prepare features for dict vectorizer."""

    X = pd.DataFrame(input_data)
    X['PU_DO'] = X['PULocationID'].astype(str) + '_' + X['DOLocationID'].astype(str)
    X = X[['PU_DO', 'trip_distance']].to_dict(orient='records')
    
    return X
```

Note that this package expects models that are pipelines such as:

```python
pipeline = make_pipeline(
    DictVectorizer(), 
    RandomForestRegressor(**params, n_jobs=-1)
)
```

This avoids having to load the preprocessor separately from the artifacts store. Thus, our models expect `prepare_features(input_data)` where `input_data` can be a dataframe with rows containing features of rides or a list of features dictionaries, e.g. obtained from as a JSON payload. 

The `load_model()` function in `predict.py` is of interest. Here we see that the model is loaded either using the MLflow client to get the latest production version of the model, or directly from the S3 artifacts store whenever the request fails (e.g. the tracking server is down). This ensures that we always get a model assuming the following environmental variables are properly configured.

```{margin}
[`predict.py`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/ride_duration/predict.py)
```
```python
import joblib
import mlflow
import pandas as pd
import os 
import requests

from ride_duration.utils import package_dir, prepare_features
from typing import Union
from mlflow.tracking import MlflowClient


def load_model():
    """Get latest production model from tracking server 
    or specific model from S3 bucket if server is down."""

    try:
        TRACKING_SERVER_HOST = os.getenv('TRACKING_SERVER_HOST')
        TRACKING_URI = f"http://{TRACKING_SERVER_HOST}:5000"

        # Check availability of API
        response = requests.head(TRACKING_URI)
        if response.status_code != 200:
            raise Exception(f"Tracking server unavailable: HTTP response code {response.status_code}")
            
        # Fetch production model from client
        mlflow.set_tracking_uri(TRACKING_URI)
        client = MlflowClient(tracking_uri=TRACKING_URI)
        prod_model = client.get_latest_versions(name='NYCRideDurationModel', stages=['Production'])[0]

    except:
        EXPERIMENT_ID = os.getenv('EXPERIMENT_ID')
        RUN_ID = os.getenv('MODEL_RUN_ID')
        source = f"s3://mlflow-models-ron/{EXPERIMENT_ID}/{RUN_ID}/artifacts/model"
        print(f"Downloading model {RUN_ID} from S3...")
        
    else:
        RUN_ID = prod_model.run_id
        source = prod_model.source
        print(f"Downloading model {RUN_ID} (latest, production)...")
    
    model = mlflow.pyfunc.load_model(source)
    return model, RUN_ID


def make_prediction(model, input_data: Union[list[dict], pd.DataFrame]):
    """Make prediction from features dict or DataFrame."""
    
    X = prepare_features(input_data)
    preds = model.predict(X)

    return preds
```

Let's try to load a model after setting up an environment. Note that the tracking server here is not available as the instance is currently stopped. We expect the model to download from S3 directly using the run ID.

```bash
❯ export TRACKING_SERVER_HOST="ec2-3-93-179-24.compute-1.amazonaws.com"
❯ export EXPERIMENT_ID="1"
❯ export MODEL_RUN_ID="f4e2242a53a3410d89c061d1958ae70a"
❯ python
Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:01:00)
[Clang 13.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ride_duration.predict import load_model
>>> model, run_id = load_model()
Downloading model f4e2242a53a3410d89c061d1958ae70a from S3...
```

<br>

```{figure} ../../../img/s3-artifacts-ss.png
---
---
Artifacts store for model runs of experiment 1.
```

### Serving predictions using Flask

For our Flask application, we simply define an endpoint that serves the model predictions. Note that the model loads once when the server starts as this takes a considerable amount of time. For versioning, we return the run ID of the model is returned along with the prediction.

```{margin}
[`app/main.py`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/app/main.py)
```
```python
from ride_duration.predict import load_model, make_prediction
from flask import Flask, request, jsonify


model, run_id = load_model()
app = Flask('duration-prediction')


@app.route('/predict', methods=['POST'])
def predict_endpoint():
    """Predict ride duration using NYCRideDurationModel."""
    
    ride = request.get_json()
    preds = make_prediction(model, ride)

    return jsonify({
        'duration': float(preds[0]),
        'model_version': run_id,
    })


if __name__ == "__main__":
    app.run(debug=True, host='0.0.0.0', port=9696)
```

We also define a script for testing the endpoint. Note that this same script can be used without modification to test remote hosts using port forwarding.

```{margin}
[`test.py`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/test.py)
```
```python
import json
import requests


ride = [{
    'PULocationID': 130,
    'DOLocationID': 205,
    'trip_distance': 3.66,
}]


if __name__ == "__main__":
    
    host = 'http://0.0.0.0:9696'
    url = f'{host}/predict'
    response = requests.post(url, json=ride)
    result = response.json()
    
    print(result)
```

### Dockerfile

For our `Dockerfile`, we start by installing Pipenv. Next we copy `Pipfile` and `Pipfile.lock` as well as files for installing the model package. We also copy the files for the web service. Finally, we install everything using Pipenv, expose the `9696` endpoint, and configure the entrypoint (i.e. serve the main app on `0.0.0.0:9696`).

```{margin}
[`Dockerfile.py`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/Dockerfile.py)
```
```Dockerfile
FROM python:3.9.13-slim

RUN pip install -U pip
RUN pip install pipenv

WORKDIR /app

COPY [ "Pipfile", "Pipfile.lock", "setup.py", "pyproject.toml",  "./"]

COPY [ "ride_duration",  "./ride_duration"]

COPY [ "app",  "./app"]

RUN pipenv install --system --deploy

EXPOSE 9696

# https://stackoverflow.com/a/71092624/1091950
ENTRYPOINT [ "gunicorn", "--bind=0.0.0.0:9696", "--timeout=600", "app.main:app" ]
```

The environmental variables and AWS credentials are saved in a `.env` file in the same directory:

```bash
TRACKING_SERVER_HOST=ec2-3-93-179-24.compute-1.amazonaws.com
EXPERIMENT_ID=1
MODEL_RUN_ID=f4e2242a53a3410d89c061d1958ae70a
AWS_ACCESS_KEY_ID=******************
AWS_SECRET_ACCESS_KEY=******************
```

Building the image:

```bash
docker build -t ride-duration-prediction-service:v1 .
```
```bash
[+] Building 72.3s (13/13) FINISHED
 => [internal] load build definition from Dockerfile               0.1s
 => => transferring dockerfile: 388B                               0.0s
 => [internal] load .dockerignore                                  0.1s
 => => transferring context: 2B                                    0.0s
 => [internal] load metadata for docker.io/library/python:3.9.13-  3.4s
 => [internal] load build context                                  0.1s
 => => transferring context: 80.02kB                               0.1s
 => [1/8] FROM docker.io/library/python:3.9.13-slim@sha256:451ccc  0.0s
 => CACHED [2/8] RUN pip install -U pip                            0.0s
 => CACHED [3/8] RUN pip install pipenv                            0.0s
 => CACHED [4/8] WORKDIR /app                                      0.0s
 => [5/8] COPY [ Pipfile, Pipfile.lock, setup.py, pyproject.toml,  0.1s
 => [6/8] COPY [ ride_duration,  ./ride_duration]                  0.1s
 => [7/8] COPY [ app,  ./app]                                      0.0s
 => [8/8] RUN pipenv install --system --deploy                    64.8s
 => exporting to image                                             3.6s
 => => exporting layers                                            3.6s
 => => writing image sha256:1b2da34ca1b3504d45df527049f788a485713  0.0s
 => => naming to docker.io/library/ride-duration-prediction-servi  0.0s
```

Running the container:

```bash
docker run --env-file .env -it --rm -p 9696:9696 ride-duration-prediction-service:v1
```
```
[2022-06-20 11:12:08 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2022-06-20 11:12:08 +0000] [1] [INFO] Listening at: http://0.0.0.0:9696 (1)
[2022-06-20 11:12:08 +0000] [1] [INFO] Using worker: sync
[2022-06-20 11:12:08 +0000] [9] [INFO] Booting worker with pid: 9
Downloading model f4e2242a53a3410d89c061d1958ae70a from S3...
2022/06/20 11:22:28 WARNING mlflow.pyfunc: Detected one or more mismatches between the model's dependencies and the current Python environment:
 - psutil (current: uninstalled, required: psutil==5.9.1)
To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.
```

Looks like there is a mismatch between the environment where the model was trained and the current environment where the model is loaded for inference. Running the test script in another terminal to see if it still works:

```bash
❯ python test.py
{'duration': 18.210770674183355, 'model_version': 'f4e2242a53a3410d89c061d1958ae70a'}
```

Note that after the initial loading time, the next predictions are returned instantaneously.

## Deploying batch predictions

For use cases that do not require the responsiveness of a web service, we can implement an offline service that makes batch predictions. Typically, offline services are expected to be done between fixed time periods, e.g. daily, weekly, or monthly. A critical element of this is **workflow orchestration** where we regularly pull from a database, make predictions on that data, then write the predictions on a database, or to a file that is uploaded to S3, or it can be pushed to an analytics dashboard thereby refreshing it. 

### Scoring script

```{margin}
[`score.py`](https://github.com/particle1331/inefficient-networks/blob/8edae4a5c88618238550fa319203a7f3f7f690f4/docs/notebooks/mlops/04-deployment/score.py)
```
```python
from ride_duration.utils import load_training_dataframe
from ride_duration.predict import load_model, make_prediction


def generate_uuids(n):
    ride_ids = []
    for i in range(n):
        ride_ids.append(str(uuid.uuid4()))
    return ride_ids


def apply_model(
    input_file: str, 
    run_id: str, 
    output_file: str
) -> None:
    
    print(f'Reading the data from {input_file}...')
    df = load_training_dataframe(input_file)
    df['ride_id'] = generate_uuids(len(df))

    print(f'Loading the model with RUN_ID={run_id}...')
    model = load_model()

    print(f'Applying the model...')
    preds = make_prediction(model, df)

    print(f'Saving the result to {output_file}...')
    df_result = pd.DataFrame()
    df_result['ride_id'] = df['ride_id']
    df_result['lpep_pickup_datetime'] = df['lpep_pickup_datetime']
    df_result['PULocationID'] = df['PULocationID']
    df_result['DOLocationID'] = df['DOLocationID']
    df_result['actual_duration'] = df['duration']
    df_result['predicted_duration'] = preds
    df_result['diff'] = df_result['actual_duration'] - df_result['predicted_duration']
    df_result['model_version'] = run_id
    df_result.to_parquet(output_file, index=False)


def run(taxi_type: str, year: int, month: int, run_id: str) -> None:

    source_url = 'https://s3.amazonaws.com/nyc-tlc/trip+data'
    input_file = f'{source_url}/{taxi_type}_tripdata_{year:04d}-{month:02d}.parquet'
    output_file = f'output/{taxi_type}/{year:04d}-{month:02d}.parquet'

    apply_model(
        input_file=input_file,
        run_id=run_id,
        output_file=output_file
    )


if __name__ == '__main__':

    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("--taxi_type", default='green', type=str)
    parser.add_argument("--year", default=2021, type=int)
    parser.add_argument("--month", default=1, type=int)
    parser.add_argument("--run_id", type=str)
    parser.add_argument("--experiment_id", type=int)
    args = parser.parse_args()
    
    run(
        taxi_type=args.taxi_type,
        year=args.year,
        month=args.month,
        run_id=args.run_id
    )
```

```bash
pipenv install --dev python-dotenv
python score.py
```

## Machine learning for streaming


streaming
- producer and consumers
- producer pushes event to event stream and consumers wil read from this stream.
- and react to these events. 
- recall web service: 1-1 relationship (explicit connection between user and service)
- 1-many  / many - many. 
- user -> producer=backend -> send event containing all info about ride ->
     services will react on this event

- e.g. one consuming service predict tip -> send push notif to user asking for tip.
- duration prediction (web service) = okay pred
- streaming service, better ride duration prediction -> update prediction. 
- only implicit connection, we dont know which consumer will react, how many
- example: content moderation
    - user -> video -> event -> C1 (copyright)
                             -> C2 (NSFW)        -> prediction stream -> decision service
                             -> C2 (violence)          

- can be scaled to infinitely many services or models (in principle)

In [8]:
import pandas as pd

df = pd.read_parquet('data/green_tripdata_2021-01.parquet')
df

Unnamed: 0,VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type,congestion_surcharge
0,2,2021-01-01 00:15:56,2021-01-01 00:19:52,N,1.0,43,151,1.0,1.01,5.50,0.50,0.5,0.00,0.00,,0.3,6.80,2.0,1.0,0.00
1,2,2021-01-01 00:25:59,2021-01-01 00:34:44,N,1.0,166,239,1.0,2.53,10.00,0.50,0.5,2.81,0.00,,0.3,16.86,1.0,1.0,2.75
2,2,2021-01-01 00:45:57,2021-01-01 00:51:55,N,1.0,41,42,1.0,1.12,6.00,0.50,0.5,1.00,0.00,,0.3,8.30,1.0,1.0,0.00
3,2,2020-12-31 23:57:51,2021-01-01 00:04:56,N,1.0,168,75,1.0,1.99,8.00,0.50,0.5,0.00,0.00,,0.3,9.30,2.0,1.0,0.00
4,2,2021-01-01 00:16:36,2021-01-01 00:16:40,N,2.0,265,265,3.0,0.00,-52.00,0.00,-0.5,0.00,0.00,,-0.3,-52.80,3.0,1.0,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76513,2,2021-01-31 21:38:00,2021-01-31 22:16:00,,,81,90,,17.63,56.23,2.75,0.0,0.00,6.12,,0.3,65.40,,,
76514,2,2021-01-31 22:43:00,2021-01-31 23:21:00,,,35,213,,18.36,46.66,0.00,0.0,12.20,6.12,,0.3,65.28,,,
76515,2,2021-01-31 22:16:00,2021-01-31 22:27:00,,,74,69,,2.50,18.95,2.75,0.0,0.00,0.00,,0.3,22.00,,,
76516,2,2021-01-31 23:10:00,2021-01-31 23:37:00,,,168,215,,14.48,48.87,2.75,0.0,0.00,6.12,,0.3,58.04,,,


In [25]:
from ride_duration.predict import load_model
from ride_duration.predict import make_prediction

model = load_model(run_id='f4e2242a53a3410d89c061d1958ae70a')
make_prediction(model, df)

array([ 6.67431673, 13.79195043,  6.96578162, ..., 13.79195043,
       36.27351977, 10.71632294])

In [8]:
import pandas as pd
out = pd.read_parquet('output/green/2021-01.parquet')
out.head(5)

Unnamed: 0,ride_id,lpep_pickup_datetime,PULocationID,DOLocationID,actual_duration,predicted_duration,diff,model_version
0,711a6b32-40d9-4ebd-964d-aeac5ca4ee62,2021-01-01 00:15:56,43,151,3.933333,6.480039,-2.546706,08fccca832b74a49995863d7ff8a7917
1,50ee8a65-a018-44e7-9424-5149ae0d5c56,2021-01-01 00:25:59,166,239,8.75,13.822634,-5.072634,08fccca832b74a49995863d7ff8a7917
2,514ae81e-80bb-47ec-9ff7-8929eb478d75,2021-01-01 00:45:57,41,42,5.966667,6.945991,-0.979325,08fccca832b74a49995863d7ff8a7917
3,46ee914c-0165-4b30-942a-d96b5c0650b2,2020-12-31 23:57:51,168,75,7.083333,11.516087,-4.432754,08fccca832b74a49995863d7ff8a7917
4,41bf60ed-0a5c-4b84-8116-20c52f329097,2021-01-01 00:26:31,75,75,2.316667,3.507386,-1.190719,08fccca832b74a49995863d7ff8a7917


In [48]:
client.get_latest_versions(name='NYCRideDurationModel', stages=['Production'])[0]

<ModelVersion: creation_timestamp=1655460227751, current_stage='Production', description='', last_updated_timestamp=1655460239062, name='NYCRideDurationModel', run_id='f4e2242a53a3410d89c061d1958ae70a', run_link='', source='s3://mlflow-models-ron/1/f4e2242a53a3410d89c061d1958ae70a/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>

In [26]:
model = mlflow.pyfunc.load_model('models:/NYCRideDurationModel/Production')
make_prediction(model, df)

array([ 6.67431673, 13.79195043,  6.96578162, ..., 13.79195043,
       36.27351977, 10.71632294])

In [7]:
pd.read_parquet('output/green/2021-01.parquet')

Unnamed: 0,ride_id,lpep_pickup_datetime,PULocationID,DOLocationID,actual_duration,predicted_duration,diff,model_version
0,d685e8a5-3374-453d-acb2-20fb17445dad,2021-01-01 00:15:56,43,151,3.933333,4.403834,-0.470501,e1efc53e9bd149078b0c12aeaa6365df
1,488137c5-c8f9-44aa-9995-884d16800d0a,2021-01-01 00:25:59,166,239,8.750000,8.830572,-0.080572,e1efc53e9bd149078b0c12aeaa6365df
2,c8a71eb8-dc60-4223-b81c-520d1e4726f6,2021-01-01 00:45:57,41,42,5.966667,6.819916,-0.853250,e1efc53e9bd149078b0c12aeaa6365df
3,572f288d-f6d7-44e2-bca0-a17e0f920914,2020-12-31 23:57:51,168,75,7.083333,13.923927,-6.840594,e1efc53e9bd149078b0c12aeaa6365df
4,15c5da8f-13fa-4047-8566-8c25865943ee,2021-01-01 00:26:31,75,75,2.316667,6.735151,-4.418484,e1efc53e9bd149078b0c12aeaa6365df
...,...,...,...,...,...,...,...,...
73903,23134189-8855-476f-8f66-ab3d6f93ee0c,2021-01-31 21:38:00,81,90,38.000000,40.089000,-2.089000,e1efc53e9bd149078b0c12aeaa6365df
73904,11b8fc14-744b-4d74-9048-39a2651dd7c9,2021-01-31 22:43:00,35,213,38.000000,31.554369,6.445631,e1efc53e9bd149078b0c12aeaa6365df
73905,494fc2dc-8b72-49f1-99df-7243c0bf0506,2021-01-31 22:16:00,74,69,11.000000,17.447926,-6.447926,e1efc53e9bd149078b0c12aeaa6365df
73906,4b5220e3-005f-41ac-b4a3-503806f5e127,2021-01-31 23:10:00,168,215,27.000000,33.382096,-6.382096,e1efc53e9bd149078b0c12aeaa6365df


```
mlflow server -h 0.0.0.0 -p 5000
    --backend-store-uri=sqlite:///mlflow.db \
    --default-artifact-root=s3://mlflow-models-ron/
```

In [4]:
import mlflow
from mlflow.tracking import MlflowClient


TRACKING_SERVER_HOST = "ec2-3-93-179-24.compute-1.amazonaws.com"
TRACKING_URI = f"http://{TRACKING_SERVER_HOST}:5000"

mlflow.set_tracking_uri(TRACKING_URI)


In [5]:
client = MlflowClient(tracking_uri=TRACKING_URI)
client.list_experiments()

[<Experiment: artifact_location='s3://mlflow-models-ron/0', experiment_id='0', lifecycle_stage='active', name='Default', tags={}>]

- run training
- can be accessed locally. but make sure environment is configured. 

In [58]:
!export TRACKING_SERVER_HOST=ec2-52-90-170-113.compute-1.amazonaws.com

In [2]:
TRACKING_SERVER_HOST = os.getenv('TRACKING_SERVER_HOST')
TRACKING_SERVER_HOST

In [5]:
import requests
import os

TRACKING_SERVER_HOST = "ec2-52-90-170-113.compute-1.amazonaws.com"
TRACKING_URI = f"http://{TRACKING_SERVER_HOST}:5000"

response = requests.head(TRACKING_URI)
if response.status_code != 200:
    raise Exception(f"Tracking server unavailable: HTTP response code {response.status_code}")

In [4]:
response.status_code

200

## Streaming

* Scenario
* Creating the role
* Create a Lambda function, test it
* Create a Kinesis stream
* Connect the function to the stream
* Send the records

Links
* [Tutorial: Using Amazon Lambda with Amazon Kinesis](https://docs.amazonaws.cn/en_us/lambda/latest/dg/with-kinesis-example.html)




## Appendix: Train script

For training models that we use to serve predictions in our API, we use the following script. This trains a model using the `ride_duration` package, which ensures smooth integration in the API, and logs this model to a remote MLflow tracking server. The tracking server host is provided as a command line argument.

```python
import mlflow 
import joblib

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.feature_extraction import DictVectorizer
from sklearn.pipeline import make_pipeline

from ride_duration.utils import load_training_dataframe, prepare_features


def setup(tracking_server_host):
    TRACKING_URI = f"http://{tracking_server_host}:5000"
    mlflow.set_tracking_uri(TRACKING_URI)
    mlflow.set_experiment("nyc-taxi-experiment")


def run_training(X_train, y_train, X_valid, y_valid):
    with mlflow.start_run():
        params = {
            'n_estimators': 100,
            'max_depth': 20
        }
        
        pipeline = make_pipeline(
            DictVectorizer(), 
            RandomForestRegressor(**params, n_jobs=-1)
        )
        
        pipeline.fit(X_train, y_train)
        y_pred = pipeline.predict(X_valid)
        rmse = mean_squared_error(y_valid, y_pred, squared=False)
        
        mlflow.log_params(params)
        mlflow.log_metric("rmse_valid", rmse)
        mlflow.sklearn.log_model(pipeline, artifact_path='model')


if __name__ == "__main__":

    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("--tracking-server-host", type=str)
    parser.add_argument("--train_path", type=str)
    parser.add_argument("--valid_path", type=str)
    args = parser.parse_args()

    # Getting data from disk
    train_data = load_training_dataframe(args.train_path)
    valid_data = load_training_dataframe(args.valid_path)

    # Preprocessing dataset
    X_train = prepare_features(train_data.drop(['duration'], axis=1))
    X_valid = prepare_features(valid_data.drop(['duration'], axis=1))
    y_train = train_data.duration.values
    y_valid = valid_data.duration.values

    # Push training to server
    setup(args.tracking_server_host)
    run_training(X_train, y_train, X_valid, y_valid)
```