# Predictive model: Business Use Case

## Goal

- Using the available [Road Safety Data](https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data) create a predictive modeling business use case
- Desired use case: **On-premise forecast of *dangarous* traffic situations**
- Train model capable to predict accident severity / vehicle ranging in
```
{
    0: no casualty, 
    1: (average) slight, 
    2: (average) serios,
    3: (average) fatal
}
```

## Considerations

- Take all available information into account including spatiotemporal, environmental, vehicle
- Important when defining features: 
    - Only features available *before* accident can be used
    - Do not utilize features not to be used in practice, e.g. 
        - due to GDPR (driver properties)
        - busincess-strategic (car model name)

In [None]:
import os
import pandas
import mlflow
import numpy as np
from pathlib import Path
import matplotlib.pyplot as plt

from helpers.utils import (
    load_yaml,
    infer_catboost_feature_types
)
from helpers.models import CasualtyRegressor

from sklearn import set_config
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.model_selection import KFold, ShuffleSplit

from catboost import CatBoostRegressor, Pool

set_config(transform_output="pandas")
assets_dir = Path(os.environ["DATA_DIR"]) / "assets"
config_dir = Path(os.environ["CONFIG_DIR"])

In [None]:
X = pandas.read_pickle(assets_dir / "accidents_vehicles_casualties_dataset_result.pkl")
y = X.pop("target")

## Training and logging

- Define model parameters
- Apply training rounds and log params and metrics to tracking server
- Log trained model as model artifact

In [None]:
feat = infer_catboost_feature_types(X)

catboost_init_params = {
    "cat_features": feat["categorical"], 
    "text_features": feat["text"], 
    "od_type": "Iter", 
    "iterations": 200
}

catboost_fit_params = {
    "early_stopping_rounds": 101,
    "verbose": 100
}

model = CatBoostRegressor(**catboost_init_params)

In [None]:
mlflow.set_registry_uri(os.environ["MLFLOW_TRACKING_URI"])
mlflow.set_experiment("casualty-regression")

In [None]:
with mlflow.start_run() as run:
    
    run_id = run.info.run_id
    
    mlflow.log_params(catboost_init_params)
    mlflow.log_params(catboost_fit_params)
    mlflow.log_param("n_casualties", len(X))
    mlflow.log_figure(hist_figure, "casualty_distribution.png")
    mlflow.log_figure(date_figure, "date_distribution.png")
    
    for train, test in KFold(n_splits=3, shuffle=True, random_state=42).split(X):
        Xtrain = X.iloc[train]
        ytrain = y.iloc[train]

        data_test = Pool(
            X.iloc[test], 
            y.iloc[test], 
            text_features=feat["text"],
            cat_features=feat["categorical"]
        )

        train, val = next(
            ShuffleSplit(n_splits=1, test_size=0.05, random_state=42)
            .split(Xtrain)
        )

        data_val = Pool(
            X.iloc[val], 
            y.iloc[val], 
            text_features=feat["text"],
            cat_features=feat["categorical"]
        )

        data_train = Pool(
            X.iloc[train], 
            y.iloc[train], 
            text_features=feat["text"],
            cat_features=feat["categorical"]
        )

        model.fit(data_train, eval_set=data_val, **catboost_fit_params)

        ypred = model.predict(data_test)
        
        mae = mean_absolute_error(data_test.get_label(), ypred)
        mse = mean_squared_error(data_test.get_label(), ypred)
        
        mlflow.log_metrics(
            {
                "mae": mae,
                "mse": mse
            }
        )

In [None]:
with mlflow.start_run(run_id=run_id):
    # Log model artifact
    mlflow.pyfunc.log_model("model", python_model=CasualtyRegressor(preprocessor, model), code_path=["/home/jovyan/work/helpers"])

Access [MLflow](http://localhost:5000) to view experiment output. Verify that model can be loaded and predictions can be applied.

In [None]:
m.predict(example)