# Serving and Deploying Machine Learning Models with BentoML: Germany Car Price Prediction Case Study

In the [this](https://medium.com/@mohsenim/tracking-machine-learning-experiments-with-mlflow-and-dockerizing-trained-models-germany-car-price-e539303b6f97) post and [this](https://github.com/mohsenim/MLflow-XGBoost-Docker) GitHub repository, I discussed [MLflow](https://mlflow.org/) and its capabilities for tracking machine learning (ML) experiments, and serving and containerizing ML models. In this post, I will explore [BentoML](https://www.bentoml.com/), another open-source platform for packaging ML models. While BentoML may not offer the same level of experiment tracking and logging features as MLflow, it effectively simplifies the productionization of ML models by offering easy-to-use built-in features. Instead of comparing BentoML to MLflow, this post will focus solely on BentoML. I'll save the comparison between the two platforms for a future post.

BentoML provides a straightforward approach for packaging trained models and their associated code and dependencies into a unified distribution format called "Bento". This platform supports various deployment options, including serving models as REST APIs, Docker containers, and batch jobs. 

As in the [previous](https://medium.com/@mohsenim/tracking-machine-learning-experiments-with-mlflow-and-dockerizing-trained-models-germany-car-price-e539303b6f97) post, we focus on predicting car prices in Germany as a case study. To accomplish this, we use the [Germany Cars Dataset](https://www.kaggle.com/datasets/ander289386/cars-germany), which was previously discussed in terms of its preprocessing and cleaning procedures. We establish a pipeline comprising a preprocessing stage for data preparation, followed by the implementation of an XGBoost model. It's worth noting that alternative models can be seamlessly integrated. XGBoost was selected for its efficiency and widespread popularity as a machine learning ensemble algorithm.


## Importing Libraries
First, we import the necessary libraries for reading the dataset, training, and serving a model.

In [4]:
from pathlib import Path

import bentoml
from typing import Annotated
from bentoml.validators import DataframeSchema
import joblib
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn import preprocessing
from sklearn.compose import ColumnTransformer
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

## Train a Model

In this section, we begin by defining a pipeline for preprocessing categorical variables and integrating an XGBoost model into the pipeline. We employ XGBoost, a highly potent machine learning model known for its effectiveness. XGBoost offers a range of parameters; for detailed information about these parameters, please refer to the [XGBoost documentation](https://xgboost.readthedocs.io).

In [7]:
def get_xgb_model_pipeline(categorical_cols, params):
    """
    Build the model
    """
    ordinal_encoder = preprocessing.OrdinalEncoder()
    preprocess = ColumnTransformer(
        [("Ordinal-Encoder", ordinal_encoder, categorical_cols)],
        remainder="passthrough",
    )
    xgb_model = xgb.XGBRegressor(**params)
    pipeline = Pipeline([("preprocess", preprocess), ("xgb_model", xgb_model)])
    return pipeline

Now, we define a function to load the dataset and another function to train a model. Feel free to adjust the parameters of XGBoost as needed.

In [5]:
def load_dataset(path):
    """
    Load the datase (csv file) from `path`
    """
    df = pd.read_csv(path)
    categorical_cols = ["make", "model", "fuel", "gear", "offerType"]
    numerical_cols = ["mileage_log", "hp", "age", "price_log"]

    cols = categorical_cols + numerical_cols
    data = df[cols]

    train, test = train_test_split(data, test_size=0.20, random_state=37)
    train_x = train.drop(["price_log"], axis=1)
    train_y = train[["price_log"]]

    test_x = test.drop(["price_log"], axis=1)
    test_y = test[["price_log"]]
    return (
        train_x,
        train_y,
        test_x,
        test_y,
        categorical_cols,
        numerical_cols,
    )


def train():
    """
    Train the model
    """
    dataset_path = Path("./data/autoscout24-germany-dataset-cleaned.csv")
    train_x, train_y, test_x, test_y, categorical_cols, _ = load_dataset(dataset_path)

    params = {"max_depth": 8, "subsample": 0.7}

    pipeline = get_xgb_model_pipeline(categorical_cols=categorical_cols, params=params)
    pipeline.fit(train_x, train_y)

    # Evaluation
    pred_y = pipeline.predict(test_x)
    eval_metric = mean_squared_error(test_y, pred_y)

    result = {"mse": eval_metric, "model": pipeline}
    return result


Everything is now ready to train a model and save it for further use. It is worth mentioning that the model predicts prices in logarithmic scale, and this should be taken into account when interpreting model evaluation metrics and predictions.

In [9]:
artifacts_path = Path("./artifacts")

# train a model
result = train()
print(f"Trained! Mean squared error (MSE) of the model: {result['mse']}")

# save the model
model_name = "german_car_model.pkl"
joblib.dump(result["model"], artifacts_path / model_name)
print(f"Model {model_name} is saved in: '{artifacts_path}'.")

Trained! Mean squared error (MSE) of the model: 0.003412174771345764
Model german_car_model.pkl is saved in: 'artifacts'.


## Create a Service

The true power of BentoML lies in its simplified approach to serving a model as REST APIs. It supports various data types, which enable it to validate the data flowing into and out of a service. In the code below, we load the model saved in the `artifacts` folder and define the service endpoint `predict`. 

In [7]:
artifacts_path = Path('./artifacts')   

@bentoml.service(
    resources={"cpu": "1"},
    traffic={"timeout": 10},
)
class CarPricePrediction:
    def __init__(self) -> None:
        self.pipeline = joblib.load(artifacts_path/"car_price_model.pkl")

    @bentoml.api
    def predict(self, 
                input_records: Annotated[pd.DataFrame, DataframeSchema(orient='records', columns=["make","model","fuel","gear","offerType","mileage_log","hp","age"])]
               ) -> np.ndarray:
        result = self.pipeline.predict(input_records)
        return result

## Serve the Model

The following command makes the service accessible at the address `http://127.0.0.1:3000/predict`:
```
bentoml serve service:CarPricePrediction
```

To test the model, requests can be sent to the REST API using the curl command:
```
curl -X 'POST' \
  'http://127.0.0.1:3000/predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "input_records": [
    {"make":"Skoda", "model": "Fabia", "fuel": "Gasoline" ,"gear": "Automatic", "offerType": "Used","mileage_log": 1.929419,"hp": 95.0 ,"age": 0.0}
  ]
}'
```

which returns the following result:
```
[4.235708236694336]
```

As the model is trained to predict the logarithm of price, the output is in logarithmic scale and should be converted back using `10**prediction`.

## Build the Bento and Deploy the Model

To build a Bento, which is a format defining all components required for running a BentoML service, a YAML configuration file should be prepared. This file which is by default named `bentofile.yaml` is like the following in our case:  

Now, we can build the Bento by running:
```
bentoml build
```


After building the Bento, a docker image can be created from the Bento:
```
bentoml containerize service:CarPricePrediction
```


## Running the Docker Image

The above command creates a docker image called `car_price_prediction:7tfbadqjpkt6coaa`. The image can later be run locally, on a server, or on a cloud. To run the docker image locally, you can use this command:
```
docker run --rm -p 3000:3000 car_price_prediction:7tfbadqjpkt6coaa
```