# Modelling for predicting match results from past performance with a rolling window

Dependencies

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime

Data as prepared in relevant notebooks and saved as pickle.

In [2]:
df_main = pd.read_pickle("../data/processed/rolling_performance.pkl")
df_main.info() ; df_main.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33907 entries, 0 to 33906
Data columns (total 44 columns):
 #   Column                            Non-Null Count  Dtype         
---  ------                            --------------  -----         
 0   date                              33907 non-null  datetime64[ns]
 1   home_team                         33907 non-null  object        
 2   away_team                         33907 non-null  object        
 3   home_score                        33907 non-null  int64         
 4   away_score                        33907 non-null  int64         
 5   tournament                        33907 non-null  object        
 6   city                              33907 non-null  object        
 7   country                           33907 non-null  object        
 8   neutral                           33907 non-null  bool          
 9   match_id                          33907 non-null  int64         
 10  home_win_ratio_roll183D           33907 non-nu

Unnamed: 0,date,home_team,away_team,home_score,away_score,tournament,city,country,neutral,match_id,...,away_win_ratio_roll548D,away_draw_ratio_roll548D,away_avg_goals_scored_roll548D,away_avg_goals_conceded_roll548D,away_win_ratio_roll730D,away_draw_ratio_roll730D,away_avg_goals_scored_roll730D,away_avg_goals_conceded_roll730D,host_advantage,result
0,1975-01-01,Cameroon,Guinea,1,0,Friendly,Yaoundé,Cameroon,False,870,...,0.111111,0.444444,1.222222,2.111111,0.277778,0.388889,1.611111,1.777778,1,home_win
1,1975-01-01,Iraq,Tunisia,0,0,Friendly,Baghdad,Iraq,False,871,...,0.647059,0.058824,1.647059,0.941176,0.52381,0.095238,1.47619,1.333333,1,draw
2,1975-01-03,Bermuda,Suriname,2,5,Friendly,Hamilton,Bermuda,False,872,...,0.444444,0.333333,1.666667,1.111111,0.444444,0.333333,1.666667,1.111111,1,away_win
3,1975-01-09,Iraq,Libya,3,1,Friendly,Baghdad,Iraq,False,874,...,0.466667,0.266667,2.066667,1.0,0.466667,0.266667,2.066667,1.0,1,home_win
4,1975-01-14,Kuwait,Libya,1,0,Friendly,Kuwait City,Kuwait,False,875,...,0.4375,0.25,2.0,1.125,0.4375,0.25,2.0,1.125,1,home_win


## Preparing training and test sets

We will build different models for different rolling windows. Below is a dictionary for the column names for each window.

In [3]:
train_col_names = {}

for window in ["roll183D", "roll365D", "roll548D", "roll730D"]:
    col_names = []
    for ha in ["home", "away"]:
        for metric in ["win_ratio", "draw_ratio", "avg_goals_scored", "avg_goals_conceded"]:
            col_names.append(f"{ha}_{metric}_{window}")
            train_col_names[window] = col_names

train_col_names_all = []

for window in ["roll183D", "roll365D", "roll548D", "roll730D"]:
    train_col_names_all += train_col_names[window]

We want to optimise the models for predicting new results from historical information. So the train-test split will be made at a certain point in time, rather than randomly. To define a cut-off, let's check the number of matches per year.

In [4]:
df_main["year"] = df_main["date"].dt.year
df_main.value_counts("year", sort = False).sort_index(ascending = False).cumsum()

year
2024      590
2023     1571
2022     2481
2021     3541
2020     3804
2019     4894
2018     5743
2017     6605
2016     7473
2015     8435
2014     9200
2013    10115
2012    11066
2011    12111
2010    12884
2009    13744
2008    14774
2007    15700
2006    16453
2005    17196
2004    18216
2003    19087
2002    19753
2001    20714
2000    21664
1999    22362
1998    23031
1997    23879
1996    24647
1995    25243
1994    25769
1993    26491
1992    26993
1991    27428
1990    27835
1989    28391
1988    28847
1987    29167
1986    29538
1985    30082
1984    30536
1983    30923
1982    31283
1981    31737
1980    32173
1979    32528
1978    32848
1977    33254
1976    33574
1975    33907
Name: count, dtype: int64

2020 was a special year with relatively few games. Starting with 2021 will give about 3500 observations for testing, more than 10%.

In [5]:
df_test = df_main[df_main["year"] > 2020].copy()
df_train = df_main[df_main["year"] <= 2020].copy()

Separate the target and features, for each window, as well as all windows

In [6]:
y_train = df_train[["result"]] ; y_test = df_test[["result"]]
X_train = df_train[train_col_names_all] ; X_test = df_test[train_col_names_all]
X_train_183 = df_train[train_col_names["roll183D"]] ; X_test_183 = df_test[train_col_names["roll183D"]]
X_train_365 = df_train[train_col_names["roll365D"]] ; X_test_365 = df_test[train_col_names["roll365D"]]
X_train_548 = df_train[train_col_names["roll548D"]] ; X_test_548 = df_test[train_col_names["roll548D"]]
X_train_730 = df_train[train_col_names["roll730D"]] ; X_test_730 = df_test[train_col_names["roll730D"]]

## Comparing potential models with all windows

We set up a pycaret workflow to compare possible classification models. We apply a time series logic in the setup, in line withe the train-test split. For initial model selection, we use features from all windows.

In [7]:
from pycaret.classification import *

In [8]:
clf = setup(
    data = pd.concat([X_train, y_train], axis = 1),
    target = "result",
    session_id = 42, 
    test_data = pd.concat([X_test, y_test], axis = 1),
    experiment_name = "rolling_model_all_windows",
    n_jobs = -1, 
    train_size = 0.8, 
    fix_imbalance = True,
    normalize = True, 
    data_split_shuffle = False, 
    fold_shuffle = False, 
    fold_strategy = "timeseries"
)

Unnamed: 0,Description,Value
0,Session id,42
1,Target,result
2,Target type,Multiclass
3,Target mapping,"away_win: 0, draw: 1, home_win: 2"
4,Original data shape,"(33907, 33)"
5,Transformed data shape,"(46453, 33)"
6,Transformed train set shape,"(42912, 33)"
7,Transformed test set shape,"(3541, 33)"
8,Numeric features,32
9,Preprocess,True


Let's base the model selection on F1 to achieve a balance between precision and recall.

In [9]:
model_selection = compare_models(sort = "f1", exclude = ["lightgbm", "catboost"])

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
ridge,Ridge Classifier,0.4838,0.0,0.4838,0.4894,0.4842,0.1986,0.1997,0.138
lr,Logistic Regression,0.4766,0.0,0.4766,0.4976,0.4839,0.198,0.1994,0.59
rf,Random Forest Classifier,0.4892,0.6502,0.4892,0.4773,0.4818,0.1881,0.1889,2.727
gbc,Gradient Boosting Classifier,0.486,0.0,0.486,0.4796,0.4814,0.189,0.1897,15.892
lda,Linear Discriminant Analysis,0.47,0.0,0.47,0.5013,0.48,0.1942,0.1968,0.173
et,Extra Trees Classifier,0.492,0.6489,0.492,0.4733,0.4797,0.184,0.1855,1.621
ada,Ada Boost Classifier,0.4659,0.0,0.4659,0.4818,0.4715,0.1767,0.1777,1.115
svm,SVM - Linear Kernel,0.4821,0.0,0.4821,0.4737,0.4694,0.1837,0.1874,0.359
xgboost,Extreme Gradient Boosting,0.4755,0.6358,0.4755,0.4604,0.4651,0.1594,0.1609,1.419
nb,Naive Bayes,0.4059,0.6533,0.4059,0.5353,0.4117,0.1476,0.174,0.142


Let's proceed with three top results: ridge, LR, and RF.

### Hyperparameter tuning for ridge regression

In [10]:
model_ridge = create_model("ridge")

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4286,0.0,0.4286,0.4572,0.4374,0.1374,0.1392
1,0.458,0.0,0.458,0.4749,0.4637,0.1775,0.1786
2,0.5069,0.0,0.5069,0.5084,0.5036,0.2237,0.2258
3,0.5011,0.0,0.5011,0.4992,0.4978,0.2167,0.2179
4,0.4884,0.0,0.4884,0.4906,0.4889,0.2037,0.2039
5,0.4964,0.0,0.4964,0.4955,0.494,0.2115,0.2124
6,0.4935,0.0,0.4935,0.4999,0.4955,0.212,0.2126
7,0.4808,0.0,0.4808,0.486,0.4809,0.1946,0.1957
8,0.492,0.0,0.492,0.4934,0.4914,0.2026,0.2032
9,0.4928,0.0,0.4928,0.4885,0.489,0.2064,0.2072


In [11]:
model_ridge_tuned = tune_model(
    model_ridge,
    optimize = "f1", 
    search_library = "scikit-learn",
    search_algorithm = "grid", 
    custom_grid = {
        "alpha": np.logspace(-2, 2, 21).tolist()
    }, 
    fold = 10
)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4322,0.0,0.4322,0.4608,0.4412,0.1424,0.1443
1,0.4562,0.0,0.4562,0.4739,0.4621,0.1751,0.1764
2,0.5062,0.0,0.5062,0.5068,0.5023,0.2225,0.2246
3,0.5029,0.0,0.5029,0.5006,0.4993,0.2191,0.2203
4,0.4899,0.0,0.4899,0.4921,0.4904,0.206,0.2063
5,0.4967,0.0,0.4967,0.4959,0.4944,0.2121,0.213
6,0.4938,0.0,0.4938,0.5002,0.4958,0.2125,0.213
7,0.4812,0.0,0.4812,0.4862,0.4812,0.195,0.1961
8,0.4924,0.0,0.4924,0.494,0.4919,0.2033,0.2039
9,0.4928,0.0,0.4928,0.4887,0.4891,0.2065,0.2073


Fitting 10 folds for each of 21 candidates, totalling 210 fits


### Hyperparameter tuning for logistic regression

In [12]:
model_lr = create_model("lr")

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.429,0.0,0.429,0.4621,0.4387,0.1395,0.142
1,0.4518,0.0,0.4518,0.4778,0.4599,0.1727,0.1749
2,0.4967,0.0,0.4967,0.5199,0.5033,0.2243,0.2267
3,0.4953,0.0,0.4953,0.5087,0.5005,0.2187,0.2194
4,0.4851,0.0,0.4851,0.5024,0.4919,0.2093,0.2102
5,0.4993,0.0,0.4993,0.5127,0.5044,0.2252,0.226
6,0.4808,0.0,0.4808,0.5117,0.4925,0.2073,0.2093
7,0.4674,0.0,0.4674,0.4904,0.4752,0.1869,0.1885
8,0.4772,0.0,0.4772,0.4968,0.485,0.1923,0.1932
9,0.4837,0.0,0.4837,0.4935,0.4875,0.2036,0.2041


In [13]:
model_lr_tuned = tune_model(
    model_lr,
    optimize = "f1", 
    search_library = "scikit-learn",
    search_algorithm = "grid", 
    custom_grid = {
        "penalty": ["l1", "l2"],
        "C": np.logspace(-2, 2, 21).tolist(),
        "solver" : ["liblinear"]
    }, 
    fold = 10
)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4471,0.0,0.4471,0.4564,0.4497,0.1535,0.1542
1,0.4638,0.0,0.4638,0.4751,0.4672,0.1835,0.1844
2,0.508,0.0,0.508,0.5115,0.5056,0.227,0.2292
3,0.5036,0.0,0.5036,0.5073,0.5034,0.2249,0.2259
4,0.492,0.0,0.492,0.4989,0.4947,0.2131,0.2134
5,0.5051,0.0,0.5051,0.5081,0.505,0.2271,0.2279
6,0.4935,0.0,0.4935,0.5062,0.4984,0.2162,0.2169
7,0.4779,0.0,0.4779,0.4872,0.4802,0.1935,0.1946
8,0.4873,0.0,0.4873,0.4941,0.4895,0.1993,0.1998
9,0.4848,0.0,0.4848,0.4855,0.4839,0.1992,0.1997


Fitting 10 folds for each of 42 candidates, totalling 420 fits


### Hyperparameter tuning for random forest

In [17]:
model_rf = create_model("rf")

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4529,0.6124,0.4529,0.4434,0.447,0.1364,0.1368
1,0.4678,0.6361,0.4678,0.4652,0.4663,0.1722,0.1723
2,0.5033,0.6593,0.5033,0.4941,0.4972,0.2063,0.207
3,0.5029,0.6646,0.5029,0.4904,0.4954,0.2065,0.2072
4,0.4862,0.6542,0.4862,0.4781,0.4818,0.1881,0.1883
5,0.5018,0.6607,0.5018,0.4883,0.4938,0.2046,0.2053
6,0.4924,0.6549,0.4924,0.4877,0.4899,0.1977,0.1977
7,0.4837,0.6487,0.4837,0.4735,0.4776,0.1837,0.1842
8,0.5062,0.6637,0.5062,0.4846,0.4925,0.1989,0.2004
9,0.4949,0.6471,0.4949,0.4678,0.4766,0.1871,0.1896


In [18]:
model_rf_tuned = tune_model(
    model_rf, 
    optimize = "f1",
    search_library = "scikit-optimize",
    search_algorithm = "bayesian", 
    custom_grid = {
        "n_estimators": [10, 50, 100, 250, 500, 1000],
        "min_samples_split": np.linspace(2, 66, 17).astype(int),
        "min_samples_leaf": np.linspace(1, 33, 17).astype(int),
        "max_depth": np.linspace(4, 40, 19).astype(int),
        "max_features": ["sqrt", "log2"]
    }, 
    fold = 10, 
    n_iter = 25
)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4402,0.6281,0.4402,0.4633,0.4482,0.1478,0.1492
1,0.4598,0.6521,0.4598,0.4776,0.4662,0.1788,0.1799
2,0.4928,0.6705,0.4928,0.5002,0.4927,0.2073,0.2091
3,0.4964,0.6742,0.4964,0.5026,0.4985,0.2151,0.2156
4,0.4855,0.6713,0.4855,0.5037,0.4927,0.2094,0.2104
5,0.4967,0.6736,0.4967,0.5107,0.5025,0.2205,0.2211
6,0.4743,0.6739,0.4743,0.5022,0.4853,0.1947,0.1962
7,0.4779,0.6612,0.4779,0.4963,0.4847,0.1985,0.1996
8,0.4931,0.6741,0.4931,0.4933,0.4932,0.2,0.2
9,0.4993,0.6652,0.4993,0.4915,0.4948,0.2121,0.2123


Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for eac

Based on these results, we will proceed with logistic regression.

### Evaluating the selected model

In [None]:
evaluate_model(model_lr_tuned)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

Based on feature importances, the window of 730 days appears to be the time frame where past performance provides the best predictions.

## Comparing potential models with the window of 730 days

In [8]:
clf_w730 = setup(
    data = pd.concat([X_train_730, y_train], axis = 1),
    target = "result",
    session_id = 42, 
    test_data = pd.concat([X_test_730, y_test], axis = 1),
    experiment_name = "rolling_model_window730",
    n_jobs = -1, 
    train_size = 0.8, 
    fix_imbalance = True,
    normalize = True, 
    data_split_shuffle = False, 
    fold_shuffle = False, 
    fold_strategy = "timeseries"
)

Unnamed: 0,Description,Value
0,Session id,42
1,Target,result
2,Target type,Multiclass
3,Target mapping,"away_win: 0, draw: 1, home_win: 2"
4,Original data shape,"(33907, 9)"
5,Transformed data shape,"(46453, 9)"
6,Transformed train set shape,"(42912, 9)"
7,Transformed test set shape,"(3541, 9)"
8,Numeric features,8
9,Preprocess,True


In [9]:
model_selection_w730 = compare_models(sort = "f1", exclude = ["lightgbm", "catboost"])

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
gbc,Gradient Boosting Classifier,0.4819,0.0,0.4819,0.4941,0.4862,0.199,0.1999,5.016
ridge,Ridge Classifier,0.485,0.0,0.485,0.488,0.4839,0.1987,0.1999,0.069
lr,Logistic Regression,0.4766,0.0,0.4766,0.4967,0.4835,0.197,0.1984,0.444
lda,Linear Discriminant Analysis,0.4711,0.0,0.4711,0.5031,0.4811,0.196,0.1988,0.084
ada,Ada Boost Classifier,0.4727,0.0,0.4727,0.496,0.4805,0.194,0.1957,0.484
et,Extra Trees Classifier,0.4815,0.6415,0.4815,0.4679,0.4732,0.1737,0.1744,0.97
rf,Random Forest Classifier,0.474,0.6398,0.474,0.4682,0.4705,0.1704,0.1707,1.576
xgboost,Extreme Gradient Boosting,0.4658,0.6369,0.4658,0.4693,0.467,0.1647,0.165,0.569
svm,SVM - Linear Kernel,0.478,0.0,0.478,0.4668,0.4542,0.1731,0.1805,0.127
nb,Naive Bayes,0.4061,0.6584,0.4061,0.5425,0.411,0.1504,0.1792,0.077


### Hyperparameter tuning for gradient boosting classifier

In [10]:
model_gbc_w730 = create_model("gbc")

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4413,0.0,0.4413,0.4607,0.4486,0.1477,0.1487
1,0.4688,0.0,0.4688,0.4819,0.474,0.1863,0.187
2,0.4957,0.0,0.4957,0.5057,0.4956,0.2151,0.2176
3,0.483,0.0,0.483,0.4944,0.4872,0.1986,0.1993
4,0.4891,0.0,0.4891,0.5025,0.4948,0.2121,0.2127
5,0.4986,0.0,0.4986,0.5064,0.5019,0.2188,0.219
6,0.4757,0.0,0.4757,0.4984,0.4846,0.1953,0.1965
7,0.4815,0.0,0.4815,0.499,0.4877,0.2043,0.2055
8,0.4935,0.0,0.4935,0.501,0.4969,0.2068,0.2069
9,0.492,0.0,0.492,0.4906,0.4912,0.2053,0.2054


In [11]:
model_gbc_w730_tuned = tune_model(
    model_gbc_w730,
    optimize = "f1", 
    search_library = "scikit-optimize",
    search_algorithm = "bayesian", 
    custom_grid = {
        "n_estimators": [50, 100, 250, 500, 1000],
        "learning_rate" : [0.001, 0.01, 0.1, 1.0],
        "subsample" : [0.5, 0.8, 1.0], 
        "min_samples_split": np.linspace(2, 66, 17).astype(int),
        "min_samples_leaf": np.linspace(1, 33, 17).astype(int),
        "max_depth": np.linspace(4, 40, 19).astype(int), 
        "max_features": ["sqrt", "log2"]
    }, 
    fold = 5,
    n_iter = 10
)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4756,0.0,0.4756,0.4813,0.4781,0.1889,0.1891
1,0.4859,0.0,0.4859,0.4946,0.4889,0.2026,0.2032
2,0.4975,0.0,0.4975,0.4997,0.4981,0.2101,0.2103
3,0.4716,0.0,0.4716,0.4815,0.4759,0.183,0.1834
4,0.4928,0.0,0.4928,0.4854,0.4886,0.1982,0.1985
Mean,0.4847,0.0,0.4847,0.4885,0.4859,0.1966,0.1969
Std,0.0098,0.0,0.0098,0.0074,0.0081,0.0096,0.0097


Fitting 5 folds for each of 1 candidates, totalling 5 fits
Fitting 5 folds for each of 1 candidates, totalling 5 fits
Fitting 5 folds for each of 1 candidates, totalling 5 fits
Fitting 5 folds for each of 1 candidates, totalling 5 fits
Fitting 5 folds for each of 1 candidates, totalling 5 fits
Fitting 5 folds for each of 1 candidates, totalling 5 fits
Fitting 5 folds for each of 1 candidates, totalling 5 fits
Fitting 5 folds for each of 1 candidates, totalling 5 fits
Fitting 5 folds for each of 1 candidates, totalling 5 fits
Fitting 5 folds for each of 1 candidates, totalling 5 fits
Original model was better than the tuned model, hence it will be returned. NOTE: The display metrics are for the tuned model (not the original one).


### Hyperparameter tuning for logistic regression

In [12]:
model_lr_w730 = create_model("lr")

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4395,0.0,0.4395,0.4724,0.4494,0.1547,0.1572
1,0.4562,0.0,0.4562,0.4856,0.4648,0.1817,0.1844
2,0.5025,0.0,0.5025,0.5178,0.5059,0.2275,0.2296
3,0.4964,0.0,0.4964,0.51,0.5016,0.2206,0.2214
4,0.4801,0.0,0.4801,0.4956,0.4864,0.2,0.2007
5,0.4935,0.0,0.4935,0.5074,0.4989,0.2163,0.2171
6,0.4812,0.0,0.4812,0.5082,0.4917,0.2054,0.2069
7,0.4652,0.0,0.4652,0.487,0.473,0.1816,0.183
8,0.4804,0.0,0.4804,0.4988,0.4879,0.1961,0.1969
9,0.4707,0.0,0.4707,0.4841,0.4759,0.1861,0.1867


In [16]:
model_lr_w730_tuned = tune_model(
    model_lr_w730, 
    optimize = "f1",
    search_library = "scikit-optimize",
    search_algorithm = "bayesian", 
    custom_grid = {
        "penalty": ["elasticnet"],
        "C": np.logspace(-2, 2, 21).tolist(),
        "l1_ratio" : np.linspace(0, 1, 11).tolist(),
        "solver" : ["saga"]
    }, 
    fold = 10, 
    n_iter = 50
)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4326,0.0,0.4326,0.4632,0.4427,0.1421,0.144
1,0.4605,0.0,0.4605,0.4886,0.4688,0.1878,0.1903
2,0.5033,0.0,0.5033,0.52,0.5072,0.2297,0.2319
3,0.4993,0.0,0.4993,0.5116,0.5039,0.2243,0.2251
4,0.4841,0.0,0.4841,0.5002,0.4905,0.2067,0.2075
5,0.4964,0.0,0.4964,0.5095,0.5013,0.2207,0.2215
6,0.4841,0.0,0.4841,0.5111,0.4946,0.2098,0.2113
7,0.4667,0.0,0.4667,0.4877,0.474,0.1836,0.1849
8,0.4819,0.0,0.4819,0.499,0.4888,0.1975,0.1983
9,0.4725,0.0,0.4725,0.4858,0.4777,0.1888,0.1895


Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for each of 1 candidates, totalling 10 fits
Fitting 10 folds for eac

In [17]:
model_lr_w730_tuned_2 = tune_model(
    model_lr_w730, 
    optimize = "f1",
    search_library = "scikit-learn",
    search_algorithm = "grid", 
    custom_grid = {
        "penalty": ["l1", "l2"],
        "C": np.logspace(-2, 2, 21).tolist(),
        "solver" : ["liblinear"]
    }, 
    fold = 10
)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4569,0.0,0.4569,0.4653,0.4598,0.1659,0.1663
1,0.4678,0.0,0.4678,0.4816,0.472,0.1914,0.1926
2,0.5033,0.0,0.5033,0.5072,0.5001,0.2206,0.2233
3,0.5062,0.0,0.5062,0.5072,0.5047,0.2268,0.2278
4,0.4888,0.0,0.4888,0.4937,0.4905,0.2066,0.2069
5,0.4993,0.0,0.4993,0.5021,0.499,0.2186,0.2194
6,0.4877,0.0,0.4877,0.4989,0.492,0.2061,0.2067
7,0.4717,0.0,0.4717,0.4797,0.4736,0.1823,0.1831
8,0.4917,0.0,0.4917,0.4952,0.4925,0.2031,0.2035
9,0.4761,0.0,0.4761,0.4776,0.4757,0.1863,0.1868


Fitting 10 folds for each of 42 candidates, totalling 420 fits


### Hyperparameter tuning for ridge regression

In [18]:
model_ridge_w730 = create_model("ridge")

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4373,0.0,0.4373,0.4587,0.4448,0.1452,0.1463
1,0.4641,0.0,0.4641,0.4846,0.4701,0.1902,0.192
2,0.5101,0.0,0.5101,0.5045,0.5012,0.2244,0.2277
3,0.4964,0.0,0.4964,0.4928,0.4917,0.2081,0.2095
4,0.4917,0.0,0.4917,0.4922,0.4912,0.2079,0.2082
5,0.5047,0.0,0.5047,0.5034,0.5023,0.2238,0.2247
6,0.4957,0.0,0.4957,0.4967,0.495,0.2115,0.2121
7,0.4783,0.0,0.4783,0.4792,0.476,0.1879,0.1891
8,0.4928,0.0,0.4928,0.4913,0.4907,0.2009,0.2015
9,0.479,0.0,0.479,0.4761,0.4758,0.1871,0.1878


In [19]:
model_ridge_w730_tuned = tune_model(
    model_ridge_w730,
    optimize = "f1", 
    search_library = "scikit-learn",
    search_algorithm = "grid", 
    custom_grid = {
        "alpha": np.logspace(-2, 2, 21)
    }, 
    fold = 10
)

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.4388,0.0,0.4388,0.4594,0.446,0.1468,0.1479
1,0.4638,0.0,0.4638,0.484,0.4697,0.1894,0.1912
2,0.5087,0.0,0.5087,0.503,0.4999,0.2221,0.2253
3,0.4967,0.0,0.4967,0.4931,0.4921,0.2085,0.2099
4,0.492,0.0,0.492,0.4927,0.4916,0.2085,0.2088
5,0.5043,0.0,0.5043,0.503,0.5018,0.2232,0.2241
6,0.4957,0.0,0.4957,0.4967,0.495,0.2115,0.2121
7,0.4786,0.0,0.4786,0.4796,0.4764,0.1886,0.1898
8,0.4928,0.0,0.4928,0.4913,0.4906,0.2008,0.2015
9,0.4793,0.0,0.4793,0.4764,0.4761,0.1876,0.1883


Fitting 10 folds for each of 21 candidates, totalling 210 fits
Original model was better than the tuned model, hence it will be returned. NOTE: The display metrics are for the tuned model (not the original one).


### Evaluating the selected model

The difference between LR and GBC is negligible in comparison with the large difference in resource use. So we proceed with LR.

In [20]:
evaluate_model(model_lr_w730_tuned)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

The model is quite bad at predicting draws (class 1), which is the minority of the cases, and rather good at predicting home wins (class 2), which is the majority of cases. Prediction probabilities should be taken into account.

## Prepare model for deploying

In [22]:
model_lr_final = finalize_model(model_lr_w730_tuned)
save_model(model_lr_final, "../models/rolling_model")

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=Memory(location=None),
          steps=[('label_encoding',
                  TransformerWrapperWithInverse(exclude=None, include=None,
                                                transformer=LabelEncoder())),
                 ('numerical_imputer',
                  TransformerWrapper(exclude=None,
                                     include=['home_win_ratio_roll730D',
                                              'home_draw_ratio_roll730D',
                                              'home_avg_goals_scored_roll730D',
                                              'home_avg_goals_conceded_roll730D',
                                              'away_win_ratio_rol...
                  TransformerWrapper(exclude=None, include=None,
                                     transformer=StandardScaler(copy=True,
                                                                with_mean=True,
                                                                with_std=True))),
