In [1]:
from e2e_taxi_ride_duration_prediction.mlflow_utils import setup_mlflow

# Evaluation Criteria
## First steps
This Project uses `uv` as en environment manager and `just` as a command runner (instead of makefile). 
To install run `pip install uv` or `pip install rust-just`.
After installing just and uv run `just dev` to install development dependencies or run `just` to see a list of available commands and descriptions

## Cloud
Prerequisites: 
 - awscli
 - configured credentials

There is a terraform folder. Run
```bash
terraform init
terraform plan
terraform apply
```
to initialize an EC2 instance that grabs the latest container uploaded to ghcr.io with the model and runs it. The `terraform apply` function returns the ip of the ec2 instance.
you can then send payloads to the api, for example:
```bash
curl -X POST http://18.193.115.191:8000/predict \
     -H "Content-Type: application/json" \
     -d '{"PULocationID": 161, "DOLocationID": 236, "trip_distance": 2.5}'
```
and you will get a prediction of the ride duration back.

## Experiment Tracking and model Registry
In the module `/e2e_taxi_ride_duration_prediction/mlflow_utils.py` I implemented the setup of model tracking with mlflow.

In [2]:
setup_mlflow()

2025/08/04 21:20:24 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/08/04 21:20:24 INFO mlflow.store.db.utils: Updating database tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
[32m2025-08-04 21:20:24.624[0m | [32m[1mSUCCESS [0m | [36me2e_taxi_ride_duration_prediction.mlflow_utils[0m:[36msetup_mlflow[0m:[36m56[0m - [32m[1mMLflow tracking URI and experiment set up successfully.[0m


True

## Workflow Orchestration
The workflow is split into flows, subflows and tasks for prefect and can be run on schedule if deployed.
As an example the command `just serve-prefect` is defined in the justfile and will serve the baseline training flow, which can be triggered by then running `just train-prefect` or by running `uv run prefect deployment run main/taxi-model-baseline-training`.

## Model deployment
The model can be built locally with the dockerfile (`just docker-build`). The baseline model is also published to ghrc.io via a Github Action that can be triggered manually.
You can run the model locally without needing the repository simply by calling
```bash
docker run --rm -p 8000:8000 ghcr.io/mircohoehne/taxi-api
```
or
```bash
podman run --rm -p 8000:8000 ghcr.io/mircohoehne/taxi-api
```
if you have podman. If you downloaded the repository there is a just command for building (`just docker-build`) and for running (`just docker-run`).
To test payloads you can use the command
```bash
curl -X POST "http://localhost:8000/predict" \
      -H "Content-Type: application/json" \
      -d '{"PULocationID": 161, "DOLocationID": 236, "trip_distance": 2.5}'
```
As stated previously the api can also be deployed to aws by using the provided terraform files.

## Model Monitoring
Monitoring is implemented in `e2e_taxi_ride_duration_prediction/monitoring.py`. 
Implemented is the ability to generate reports based on current data, reference data and the model used. The generated report contains the metrics from the DataDriftPreset and RegressionPreset by default, to monitor and evaluate model performance.

## Reproducibility
To reproduce the workflow you can run the prefect flow as explained above (or just run `scripts/train_model.py`) and start an mlflow server with
```bash
mlflow server --backend-store-uri sqlite:///mlruns/mlflow.db --default-artifact-root mlruns --host 0.0.0.0
```
and check the ui at `localhost:5000`. There you will see the 'taxi_ride_duration_prediction' experiment with one run.
If you want to reproduce logging, you can refer to the [monitoring notebook](02_monitoring.ipynb) for an example of the monitoring.

## Best Practices
- The tests can be found in the tests/ folder and you can run them with `just test`.
- Ruff was used as a linter and formatter (also checked in pre-commit hooks and Github Actions)
- This Project uses Justfile as a Makefile replacement with the same functionality
- There are pre-commit hooks, which are defined in `.pre-commit-config.yaml`
- There is a automatic CI Pipeline (`.github/workflows/ci.yml`) that is triggered on every pull requests or push on the main branch and a CD Pipeline (`.github/workflows/cd.yml`) which can be triggered to upload a new containerized api that includes the current model.