# STM Transit Delay Data Modeling

## Overview

This notebook explores tree-based machine learning models in order to find the one that predicts STM transit delays with the best accuracy. The featured models are XGBoost, LightGBM and CatBoost, because they are more suitable for large datasets with mixed data and high cardinality.

## Imports

In [None]:
from catboost import CatBoostRegressor
import joblib
import lightgbm as lgb
import matplotlib.pyplot as plt
import pandas as pd
import random
import seaborn as sns
import shap
from sklearn.feature_selection import mutual_info_regression
from sklearn.metrics import mean_absolute_error, root_mean_squared_error, r2_score
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.preprocessing import PolynomialFeatures
import sys
import xgboost as xgb

In [3]:
# Import custom code
sys.path.insert(0, '..')
from src.helper_functions import get_top_abs_correlations

In [4]:
# Load data
df = pd.read_parquet('../data/preprocessed.parquet')
print(f'Shape of dataset: {df.shape}')

Shape of dataset: (1500000, 21)


## Split the data

In [5]:
# Separate features from target variable
X = df.drop('delay', axis=1)
y = df['delay']

The 3 models can run multiple iterations with a training and validation set. Therefore, a hold-out set will be kept to evaluate the final model.

In [6]:
# Train-validation-test split (60-20-20)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

del X_temp
del y_temp

**Scaling**

Since only tree-based models are explored in this project, **scaling is not needed** because the models are not sensitive to the absolute scale or distribution of the features.

## Fit Base Models

All models allow to setup a number of rounds and early stopping. To start, all models will run 100 rounds with an early stopping of 3.

In [7]:
# Create dataframe to track metrics
metrics_df = pd.DataFrame(columns=['model', 'MAE', 'RMSE', 'R²'])

In [8]:
def add_reg_metrics(metrics_df:pd.DataFrame, y_pred:pd.Series, y_true:pd.Series, model_name:str) -> pd.DataFrame:
	mae = mean_absolute_error(y_true, y_pred)
	rmse = root_mean_squared_error(y_true, y_pred)
	r2 = r2_score(y_true, y_pred)

	metrics_df.loc[len(metrics_df)] = [model_name, mae, rmse, r2]
	return metrics_df

### XGBoost

In [9]:
# Create regression matrices
xg_train_data = xgb.DMatrix(X_train, y_train, enable_categorical=False)
xg_val_data = xgb.DMatrix(X_val, y_val, enable_categorical=False)
xg_eval_set = [(xg_train_data, 'train'), (xg_val_data, 'validation')]

In [10]:
# Train model
xg_reg_base = xgb.train(
  params= {'objective': 'reg:squarederror', 'tree_method': 'hist'},
  dtrain=xg_train_data,
  num_boost_round=100,
  evals=xg_eval_set,
  verbose_eval=10,
  early_stopping_rounds=3
)

[0]	train-rmse:156.39455	validation-rmse:155.63622
[10]	train-rmse:147.63804	validation-rmse:147.64934
[20]	train-rmse:144.79119	validation-rmse:145.32511
[30]	train-rmse:142.92251	validation-rmse:143.65306
[40]	train-rmse:141.10499	validation-rmse:142.15731
[50]	train-rmse:139.14415	validation-rmse:140.41181
[60]	train-rmse:137.65209	validation-rmse:139.10703
[70]	train-rmse:136.82802	validation-rmse:138.48722
[80]	train-rmse:135.72298	validation-rmse:137.51687
[90]	train-rmse:134.60838	validation-rmse:136.60422
[99]	train-rmse:133.59100	validation-rmse:135.83129


In [11]:
# Evaluate model
y_pred = xg_reg_base.predict(xg_val_data)

metrics_df = add_reg_metrics(metrics_df, y_pred, y_val, 'xg_reg_base')
metrics_df

Unnamed: 0,model,MAE,RMSE,R²
0,xg_reg_base,69.204192,135.83129,0.282591


**MAE**<br>
On average, the predictions are off by 69 seconds, which is reasonable, knowing that [STM](https://www.stm.info/en/info/networks/bus-network-and-schedules-enlightened) considers a bus arriving 3 minutes after the planned schedule as being on time.

**RMSE**<br>
The higher RMSE compared to MAE suggests that there are some significant prediction errors that influence the overall error metric.

**R²**<br>
The model explains 28.26% of the variance, which is not good but understandable because of how random transit delays can be (bad weather, vehicle breakdown, accidents, etc.)

### LightGBM

In [None]:
# Create regression datasets
lgb_train_data = lgb.Dataset(X_train, label=y_train)
lgb_val_data = lgb.Dataset(X_val, label=y_val, reference=lgb_train_data)

In [None]:
# Train model
lgb_reg_base = lgb.train(
    params={
        'objective': 'regression',
        'metric': 'rmse',
        'learning_rate': 0.05,
        'max_depth': -1
    },
    train_set=lgb_train_data,
    valid_sets=[lgb_val_data],
    num_boost_round=100,
    callbacks=[lgb.early_stopping(stopping_rounds=3)]
)

In [None]:
# Evaluate model
y_pred = lgb_reg_base.predict(X_val)

metrics_df = add_reg_metrics(metrics_df, y_pred, y_val, 'lgb_reg_base')
metrics_df

The LightGBM model performs worse than XGBoost.

### CatBoost

In [None]:
# Fit model
cat_reg_base = CatBoostRegressor(
    iterations=100,
    learning_rate=0.05,
    depth=10,
    random_seed=42,
    verbose=10
)

cat_reg_base.fit(X_train, y_train, eval_set=(X_val, y_val), early_stopping_rounds=3)

In [None]:
# Evaluate model
y_pred = cat_reg_base.predict(X_val)

metrics_df = add_reg_metrics(metrics_df, y_pred, y_val, 'cat_reg_base')
metrics_df

CatBoost performs almost like LightGBM, but has a longer fitting time. XGBoost seems to capture more of the underlying patterns than the two other models. This is the model that will be used for analysis and tuning.

## Residual Analysis

In [None]:
def plot_residuals(y_true, y_pred, model_name:str) -> None:
	fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(15, 7))

	# Predicted vs. actual values
	ax1.scatter(x=y_pred, y=y_true)
	ax1.set_title('Predicted vs. Actual values')
	ax1.set_xlabel('Predicted Delay (seconds)')
	ax1.set_ylabel('Actual Delay (seconds)')
	ax1.grid(True)

	# Residuals
	residuals = y_true - y_pred
	ax2.scatter(x=y_pred, y=residuals)
	ax2.set_title('Residual Plot')
	ax2.set_xlabel('Predicted Delay (seconds)')
	ax2.set_ylabel('Residuals (seconds)')
	ax2.axhline(0, linestyle='--', color='orange')
	ax2.grid(True)

	fig.suptitle('Residual Analysis', fontsize=18)
	fig.tight_layout()
	fig.savefig(f'../images/residual_analysis_{model_name}.png', bbox_inches='tight')
	plt.show()

In [None]:
# Plot residuals
plot_residuals(y_val, y_pred, 'xg_reg_base')

**Predicted vs. Actual Plot**

There's a dense cluster around 0 for both predicted and actual values, indicating many predictions and centered near 0. However, there is substantial spread both above and below the diagonal line, which suggests underprediction and overprediction. There are clear outliers that are far from the main cluster.


**Residual Plot**

The residuals show a visible funnel shapes, which indicates a systematic error in prediction. The spread of residuals increases as the predicted delay increases. This is a sign of heteroscedasticity (the variance of errors is not constant across all predictions).

## Hyperparameter Tuning

In [None]:
param_dist = {
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0],
    'alpha': [0, 1, 2, 3, 4, 5],
    'lambda': [0, 1, 2, 3, 4, 5]
}

xgb_model = xgb.XGBRegressor(objective='reg:squarederror', random_state=42, n_estimators=100)

random_search = RandomizedSearchCV(
    estimator=xgb_model,
    param_distributions=param_dist,
    scoring='neg_root_mean_squared_error',
    cv=2,
    verbose=1,
    n_iter=25,
    random_state=42
)

random_search.fit(X_train, y_train)

In [None]:
# Best model
xg_reg_tuned = random_search.best_estimator_
xg_best_params = random_search.best_params_

In [None]:
# Evaluate model
y_pred = xg_reg_tuned.predict(X_val)

metrics_df = add_reg_metrics(metrics_df, y_pred, y_val, 'xg_reg_tuned')
metrics_df

The performance didn't improve that much from the base model.

## Feature Importances

In [None]:
# Get top 5 most important features
importances = xg_reg_tuned.get_booster().get_score()
importances_df = pd.DataFrame.from_dict(importances, orient='index') \
	.rename(columns={0: 'importance'}).reset_index() \
	.rename(columns={'index': 'feature'})
importances_df.sort_values('importance', ascending=False).head()

In [None]:
# Plot the feature importances
ax = xgb.plot_importance(xg_reg_tuned)
ax.figure.tight_layout()
ax.figure.savefig('../images/feature_importances_xg_reg_tuned.png')
plt.show()

**Most Important Features:**
- `exp_trip_duration` This is the most important feature in the model. It seems like the expected trip duration is highly predictive of the actual delay. This makes sense as longer expected trips are more prone to disruptions and variations.
- `hist_avg_delay` Historical average delay is the second most important predictor. This aligns well with time series predictability since past delays often indicate patterns or bottlenecks that repeat over time.
- `route_bearing` The direction of a vehicle might indicate if it's in the direction of traffic or not.
- `arrivals_per_hour` The bus frequency is contributing to the prediction. Less frequent buses might be more susceptible to delays since missed connections or unexpected traffic issues tend to accumulate.
- `trip_progress` Delays accumulate when the vehicle is further along the trip.


**Least Important Features:**
- `time_of_day_evening`, `time_of_day_morning`, `time_of_day_night` Evening seems to be a bit more influential than morning or night, which could indicate evening rush hour impacts.
- `schedule_relationship_Scheduled` This has low impact, which might indicate that deviations from scheduled times are not systematically captured by the model.

- `wheelchair_boarding` Very low importance, indicating it has minimal influence on delays.


## SHAP Plots

In [None]:
def shap_plot(shap_values, X_true, model_name:str, barplot:bool=True) -> None:
	if barplot:
		shap.summary_plot(shap_values, X_true, plot_type='bar', show=False)
		plt.title('SHAP Summary Barplot')
		plot_type = 'barplot' 
	else: # beeswarm
		shap.summary_plot(shap_values, X_true, show=False)
		plt.title('SHAP Summary Beeswarm Plot')
		plot_type = 'beeswarm_plot' 
	plt.tight_layout()
	plt.savefig(f'../images/shap_{plot_type}_{model_name}.png', bbox_inches='tight')
	plt.show()

In [None]:
def shap_single_pred(X_true, explainer, shap_values, model_name:str) -> None:
	random.seed(42)
	index = random.randrange(len(X_true))
	shap.force_plot(
		explainer.expected_value,
		shap_values[index, :],
		X_true.iloc[index, :],
		figsize=(30, 4),
		matplotlib=True,
		show=False)
	plt.tight_layout()
	plt.savefig(f'../images/shap_force_plot_{model_name}.png', bbox_inches='tight')
	plt.show()

In [None]:
# Initialize SHAP
X_val_sample = X_val.sample(5000, random_state=42) # sample validation set to prevent memory overload
explainer = shap.TreeExplainer(xg_reg_tuned)
shap_values = explainer.shap_values(X_val_sample)

In [None]:
# Summary barplot
shap_plot(shap_values, X_val_sample, 'xg_reg_tuned', barplot=True)

**Comparison with XGBoost feature importances**

- Interestingly, `hist_avg_delay` comes out as the most impactful in this plot, while it was second in the XGBoost importance plot.
- `arrivals_per_hour` is elevated to the second position, while in the default importance, it is in fourth place.
- This suggests that in terms of predictive influence, `hist_avg_delay` and `arrivals_per_hour` are actually more significant than what the XGBoost default metric captured.

In [None]:
# Summary beeswarm plot
shap_plot(shap_values, X_val_sample, 'xg_reg_tuned', barplot=False)

**Interpretation**

- High `hist_avg_delay` (red) tends to push predictions higher, and low values (blue) push it lower. The high influence of `hist_avg_delay` confirms that delay is highly dependent on past performance. This could be useful for forecasting in specific segments or optimizing bus routes during peak times.
- Some features like `route_bearing` and `exp_avg_delay` have either a positive or negative infuence, suggesting there's a more complex feature interaction.

In [None]:
# Force plot a single prediction
shap_single_pred(X_val_sample, explainer, shap_values, 'xg_reg_tuned')

This plot is a breakdown of the specific prediction (`42.36`) for one instance.

- Features that increase the prediction (red):
	- `time_of_day_evening`: The trip being in the evening increases the delay.
	- `pressure_msl`: A high atmospheric air pressure (`1010.8`) can have an impact on the vehicle and passenger behaviour.
	- `tempetature_2m`: For this instance, a temperature of `14.2` degrees Celsius increased the delay.
	- `trip_progress`: A trip that's at the end (`0.92`) tend to accumulate delays.
	- `wind_speed_10m`: A wind speed of `9.1` kilometers per hour caused the delay to increase in that case.

- Features that decrease the prediction (blue):
	- `exp_trip_duration`: A `3060.0` (51 minute) trip reduced the delay for this instance.
	- `route_bearing`: A value of `112.44` (South-East) decreased the delay.
	- `arrivals_per_hour`: A bus that arrives less frequently (`2.0` times per hour) is less susceptible to delays.
	- `hist_avg_delay`: A value of `47.64` decreased the delay.

## Feature Optimization

### Add feature interactions

Some features were surprisingly low impact and the SHAP plots suggest there might be interactions between some features.

In [None]:
# Generate second degree polynomial features
poly = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)

X_poly = poly.fit_transform(X)
X_train_poly = poly.transform(X_train)
X_val_poly = poly.transform(X_val)
X_test_poly = poly.transform(X_test)

feature_names = poly.get_feature_names_out(X.columns)
X_poly = pd.DataFrame(X_poly, columns=feature_names)
X_train_poly = pd.DataFrame(X_train_poly, columns=feature_names)
X_val_poly = pd.DataFrame(X_val_poly, columns=feature_names)
X_test_poly = pd.DataFrame(X_test_poly, columns=feature_names)

X_poly.shape

In [None]:
# Refit model
xg_reg_poly = xgb.XGBRegressor(
  	objective='reg:squarederror',
  	random_state=42,
  	n_estimators=100,
    max_depth=xg_best_params['max_depth'],
    learning_rate=xg_best_params['learning_rate'],
    subsample=xg_best_params['subsample'],
    colsample_bytree=xg_best_params['colsample_bytree'],
    alpha=xg_best_params['alpha'],
    reg_lambda=xg_best_params['lambda']
)

xg_reg_poly.fit(X_train_poly, y_train)

In [14]:
# Evaluate model
y_pred = xg_reg_poly.predict(X_val_poly)

metrics_df = add_reg_metrics(metrics_df, y_pred, y_val, 'xg_reg_poly')
metrics_df

Unnamed: 0,model,MAE,RMSE,R²
0,xg_reg_base,69.204192,135.83129,0.282591
1,xg_reg_poly,66.992521,132.338056,0.319016


There is a slight improvement from the model without the interactions.

### Keep best interactions

In [None]:
# Apply Mutual Information to rank feature importance
mi_scores = mutual_info_regression(X_val_poly, y_val) # use validation set to save computation time

In [None]:
# Create dataframe
mi_df = pd.DataFrame({
    'feature': feature_names,
    'mi_score': mi_scores
}).sort_values(by='mi_score', ascending=False)
mi_df.head()

In [None]:
# Display the top 10 interactions
top_10_features = mi_df.head(10)['feature'].tolist()
X_top_10 = X_poly[top_10_features]
X_top_10.head()

In [None]:
# Plot a heatmap of the correlations between these top features
plt.figure(figsize=(10, 8))
sns.heatmap(X_top_10.corr(), annot=True, cmap='coolwarm', fmt=".2f", vmin=-1, vmax=1)
plt.title('Correlation Heatmap of Top 10 Interaction Features')
plt.show()

In [None]:
# Get features with a score greater than 0.1
best_features = mi_df[mi_df['mi_score'] > 0.1]['feature'].sort_values().tolist()

In [None]:
# Keep best features
X_best = X_poly[best_features]
X_train_best = X_train_poly[best_features]
X_val_best = X_val_poly[best_features]
X_test_best = X_test_poly[best_features]

In [None]:
# Get most correlated interactions
correlations = get_top_abs_correlations(X_best)
correlations

In [None]:
# Remove highly correlated features
features_to_drop = list(set(correlations.index.get_level_values(level=1)))

X_best = X_best.drop(features_to_drop, axis=1)
X_train_best = X_train_best.drop(features_to_drop, axis=1)
X_val_best = X_val_best.drop(features_to_drop, axis=1)
X_test_best = X_test_best.drop(features_to_drop, axis=1)

## Retrain Model with Best Features

### Fit Model

In [None]:
# Refit model
xg_reg_pruned = xgb.XGBRegressor(
  	objective='reg:squarederror',
  	random_state=42,
  	n_estimators=100,
    max_depth=xg_best_params['max_depth'],
    learning_rate=xg_best_params['learning_rate'],
    subsample=xg_best_params['subsample'],
    colsample_bytree=xg_best_params['colsample_bytree'],
    alpha=xg_best_params['alpha'],
    reg_lambda=xg_best_params['lambda']
)

xg_reg_pruned.fit(X_train_best, y_train)

### Evaluate Model

In [None]:
y_pred = xg_reg_pruned.predict(X_val_best)

metrics_df = add_reg_metrics(metrics_df, y_pred, y_val, 'xg_reg_pruned')
metrics_df

The RMSE and R-squared improved after removing interactions, which is good.

### Hyper-tune model

In [None]:
param_dist = {
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0],
    'alpha': [0, 1, 2, 3, 4, 5],
    'lambda': [0, 1, 2, 3, 4, 5]
}

xgb_model = xgb.XGBRegressor(objective='reg:squarederror', random_state=42, n_estimators=100)

random_search = RandomizedSearchCV(
    estimator=xgb_model,
    param_distributions=param_dist,
    scoring='neg_root_mean_squared_error',
    cv=2,
    verbose=1,
    n_iter=25,
    random_state=42
)

random_search.fit(X_train_best, y_train)

In [None]:
# Best model
xg_reg_pruned_tuned = random_search.best_estimator_
xg_best_params = random_search.best_params_

In [None]:
# Evaluate model
y_pred = xg_reg_pruned_tuned.predict(X_val_best)

metrics_df = add_reg_metrics(metrics_df, y_pred, y_val, 'xg_reg_pruned_tuned')
metrics_df

### Feature Importances

In [None]:
# Get top 5 most important features
importances = xg_reg_pruned_tuned.get_booster().get_score()
importances_df = pd.DataFrame.from_dict(importances, orient='index') \
	.rename(columns={0: 'importance'}).reset_index() \
	.rename(columns={'index': 'feature'})
importances_df.sort_values('importance', ascending=False).head()

In [None]:
# Plot the feature importances
ax = xgb.plot_importance(xg_reg_pruned_tuned, max_num_features=20)
ax.figure.tight_layout()
ax.figure.savefig('../images/feature_importances_xg_reg_pruned_tuned.png')
plt.show()

**Observations**

- `arrivals_per_hour exp_trip_duration` is by far the most important. This suggests that the number of arrivals combined with the expected trip duration captures strong predictive patterns.
- The interaction `arrivals_per_hour hist_avg_delay` is second, indicating that congestion or frequent arrivals play a big role when paired with historical average delays.
- Weather (`temperature_2m`) is also heavily impactful when combined with `trip_duration` and `route_bearing`, showing that environmental conditions are critical when predicting delays.

### SHAP Plots

In [None]:
# Initialize SHAP
sample_size = 5000 # sample test set to prevent memory overload
X_val_sample = X_val_best.sample(n=sample_size, random_state=42) 
explainer = shap.TreeExplainer(xg_reg_pruned)
shap_values = explainer.shap_values(X_val_sample)

In [None]:
# Summary barplot
shap_plot(shap_values, X_val_sample, 'xg_reg_pruned_tuned', barplot=True)

- Compared to XGBoost default importance, the top interaction shifts: `cloud_cover hist_avg_delay` is the most impactful in SHAP, even though it was not the highest in XGBoost's internal ranking.
- `route_bearing stop_cluster` and `route_bearing temperature_2m` also surface as strong influences.
- XGBoost splits more on `arrivals_per_hour exp_trip_duration`, but SHAP reveals that `cloud cover` and `historical delays` are actually more predictive.
This suggests that XGBoost may prioritize frequent splits on interactions that help its tree-building, but SHAP captures the real impact on the prediction values.

In [None]:
# Summary beeswarm plot
shap_plot(shap_values, X_val_sample, 'xg_reg_pruned_tuned', barplot=False)

- For `cloud_cover hist_avg_delay`: High values of both features (red) seem to push the prediction higher, which is visible with strong skew to the right.
- For `route_bearing stop_cluster`: Certain clusters, when combined with specific route bearings, consistently push predictions higher or lower.
- `route_bearing temperature_2m`: High temperatures combined with specific route bearings have a strong influence.
- `exp_trip_duration route_bearing`: Longer trip durations on specific bearings heavily impact the prediction, possibly capturing peak-hour traffic or route-specific congestion.
- `arrivals_per_hour hist_avg_delay`: This aligns with intuition—more buses arriving in historically delayed spots leads to higher expected delays.

In [None]:
# Force plot a single prediction
shap_single_pred(X_val_sample, explainer, shap_values, 'xg_reg_pruned_tuned')

## Final Model

### Retrain Model

In [None]:
# Create regression matrices
xg_train_data = xgb.DMatrix(X_train_best, y_train, enable_categorical=False)
xg_val_data = xgb.DMatrix(X_val_best, y_val, enable_categorical=False)
xg_test_data = xgb.DMatrix(X_test_best, y_test, enable_categorical=False)
xg_eval_set = [(xg_train_data, 'train'), (xg_val_data, 'validation')]

In [None]:
# Train final model with more boosting rounds
final_model = xgb.train(
  params= {
    'objective':'reg:squarederror', 
  	'tree_method': 'hist',
    'max_depth': xg_best_params['max_depth'],
    'learning_rate': xg_best_params['learning_rate'],
    'subsample': xg_best_params['subsample'],
    'colsample_bytree': xg_best_params['colsample_bytree'],
	'alpha': xg_best_params['alpha'],
    'lambda': xg_best_params['lambda'],
  },
  dtrain=xg_train_data,
  num_boost_round=10000,
  evals=xg_eval_set,
  verbose_eval=50,
  early_stopping_rounds=50
)

### Evaluate with Test Set

In [None]:
# Evaluate model
y_pred = final_model.predict(xg_test_data)

metrics_df = add_reg_metrics(metrics_df, y_pred, y_test, 'xg_reg_final')
metrics_df

In [None]:
# Plot residuals
plot_residuals(y_test, y_pred, 'xg_reg_final')

### Make Prediction

In [None]:
# Display features
best_features = X_best.columns.tolist()
best_features

In [None]:
# Create feature matrix
test_input = {
#   	'cloud_cover': [0],
# 	'exp_trip_duration': [3600],
#   	'frequency_normal': [1],
	
	
# 	'relative_humidity_2m': [60],
# 	'wind_direction_10m': [140],
# 	'precipitation': [0],
# 	'time_of_day_morning': [0],
# 	'hist_avg_delay': [300],
# 	'route_direction_South': [0],
# 	'wind_speed_10m': [10],
	
# 	'time_of_day_evening': [0],
# 	'stop_location_group': [2],
# 	'is_peak_hour': [1],
# 	'trip_phase_middle': [0],
# 	'frequency_very_rare': [0],
# 	'route_direction_North': [0],
# 	'route_direction_West': [1],
# 	'frequency_rare': [0],
# 	'temperature_2m': [24.3],
# 	'stop_distance': [400],
	
# 	'trip_phase_start': [0]
}

x_test = pd.DataFrame(test_input)

In [None]:
# Predict delay
prediction = final_model.predict(x_test)
print(f'Predicted delay: {prediction[0]:.2f} seconds')

### Export Data

In [None]:
# Save model, hyperparameters and predictors
joblib.dump(final_model, '../models/regression_model.pkl')
joblib.dump(xg_best_params, '../models/best_hyperparams.pkl')
joblib.dump(best_features, '../models/best_features.pkl')

## End