# Testing the models with Euro 24 results

In this notebook, we test the two models with Euro 24 data, which was unseen during model training. The idea is to see if training specifically for big tournaments makes a difference.

Dependencies:

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime
import joblib
from sklearn.metrics import classification_report

## Model trained with qualification phase performance metrics

Import processed data and slice the Euro 24 games:

In [18]:
df_euro24_qual = pd.read_pickle("../data/processed/qualification_performance.pkl")
df_euro24_qual = df_euro24_qual[df_euro24_qual["period"] == "Euro24"]

Feature names:

In [19]:
qual_feature_names = []

for ab in ["A", "B"]:
    for metric in ["win_ratio", "draw_ratio", "avg_goals_scored", "avg_goals_conceded"]:
        qual_feature_names.append(f"{ab}_{metric}")

qual_feature_names.append("host_advantage")

Feature and target sets:

In [21]:
X_test_qual = df_euro24_qual[qual_feature_names].copy()
y_test_qual = df_euro24_qual[["result"]]

Load the model and predict:

In [30]:
qual_model = joblib.load("../models/qual_perf_model.pkl")
y_pred_qual = qual_model.predict(X_test_qual)
print(classification_report(y_test_qual, y_pred_qual))

              precision    recall  f1-score   support

       A_win       0.52      0.55      0.53        22
       B_win       0.42      0.83      0.56        12
        Draw       0.75      0.18      0.29        17

    accuracy                           0.49        51
   macro avg       0.56      0.52      0.46        51
weighted avg       0.57      0.49      0.46        51



Performance seems better than what had been observed during training.

## Model trained on rolling window performance metrics

Import processed data and slice Euro 24 games.

In [16]:
df_euro24_rolling = pd.read_pickle("../data/processed/rolling_performance.pkl")
df_euro24_rolling = df_euro24_rolling[(df_euro24_rolling["tournament"] == "UEFA Euro") & (df_euro24_rolling["date"] >= datetime(2024, 6, 14))].copy()

Feature names:

In [22]:
rolling_feature_names = []

for ha in ["home", "away"]:
    for metric in ["win_ratio", "draw_ratio", "avg_goals_scored", "avg_goals_conceded"]:
        rolling_feature_names.append(f"{ha}_{metric}_roll730D")

rolling_feature_names.append("host_advantage")

Feature and target sets:

In [24]:
X_test_rolling = df_euro24_rolling[rolling_feature_names].copy()
y_test_rolling = df_euro24_rolling[["result"]]

Load the model and predict:

In [32]:
rolling_model = joblib.load("../models/rolling_model.pkl")
y_pred_rolling = rolling_model.predict(X_test_rolling)
print(classification_report(y_test_rolling, y_pred_rolling))

              precision    recall  f1-score   support

    away_win       0.33      0.85      0.48        13
        draw       0.25      0.18      0.21        17
    home_win       0.50      0.14      0.22        21

    accuracy                           0.33        51
   macro avg       0.36      0.39      0.30        51
weighted avg       0.37      0.33      0.28        51



Performance is quite bad.

## Conclusion

Although rolling window model (trained to predict an match result) seemed to display better performance overall, qualification phase model (trained to predict only big tournament results) performed better when tested on the unseen data from Euro 24.