# Match Importance

Public literature believes that the importance of a match has a significant impact on participating team's performance.

For example, if the probability of a team, A, winning the Premier League is significantly increased by winning the game, whereas the match has little effect on any outcome for team B, then team A is likely to outperform.

In this notebook, we demonstrate the functions provided for calculating match importance.

## Imports and Debugging

In [1]:
import importlib
import logging
import numpy as np
import os
import pandas as pd
import sys

sys.path.append(os.path.abspath(os.path.join('..', 'src')))

import utils
import importance
import elo

importlib.reload(utils)
importlib.reload(importance)
importlib.reload(elo)

<module 'elo' from '/Users/mattgc/code/match-importance/src/elo.py'>

**Recommended**: Leave the level at INFO, unless an error is occuring

In [2]:
inp = input("Level of logging (DEBUG, INFO, WARNING, ERROR, CRITICAL): ")
logging.basicConfig(level=getattr(logging, inp), force=True)

In [3]:
# Global variables
SEASON_17_18 = os.path.abspath(os.path.join('..', 'data', 'seasons', '17_18.csv'))
LATEST_SEASON = os.path.abspath(os.path.join('..', 'data', 'seasons', '24_25.csv'))
REMAINING_SEASON = os.path.abspath(os.path.join('..', 'data', 'seasons', '24_25_remaining.csv'))
TEST_MATCHES = os.path.abspath(os.path.join('..', 'data', 'epl-test.csv'))
ELO = os.path.abspath(os.path.join('..', 'data', 'match_elo.csv'))

# Predict test matches

In this section, this code predicts matches in February 2025 (as of December 2024) by repeatedly simulating the remaining matches of the season and estimating the probability that partaking teams achieve outcomes given they win/lose the matches of interest.

Outcomes examined are:

- Qualifying for the UEFA Champions League
- Winning the season
- Relegating from the season

## Import data

In [4]:
matches = pd.read_csv(LATEST_SEASON)
matches["Date"] = pd.to_datetime(matches["Date"], dayfirst=True).dt.date
matches = matches.sort_values(by=["Date", "Time"]).reset_index(drop=True)
t = len(matches)
matches

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,...,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA,BFECAHH,BFECAHA
0,E0,2024-08-16,20:00,Man United,Fulham,1,0,H,0,0,...,1.86,2.07,1.83,2.11,1.88,2.11,1.82,2.05,1.90,2.08
1,E0,2024-08-17,12:30,Ipswich,Liverpool,0,2,A,0,0,...,2.05,1.88,2.04,1.90,2.20,2.00,1.99,1.88,2.04,1.93
2,E0,2024-08-17,15:00,Arsenal,Wolves,2,0,H,1,0,...,2.02,1.91,2.00,1.90,2.05,1.93,1.99,1.87,2.02,1.96
3,E0,2024-08-17,15:00,Everton,Brighton,0,3,A,0,1,...,1.87,2.06,1.86,2.07,1.92,2.10,1.83,2.04,1.88,2.11
4,E0,2024-08-17,15:00,Newcastle,Southampton,1,0,H,1,0,...,1.87,2.06,1.88,2.06,1.89,2.10,1.82,2.05,1.89,2.10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
153,E0,2024-12-14,17:30,Nott'm Forest,Aston Villa,2,1,H,0,0,...,1.94,1.99,1.93,1.99,2.00,2.00,1.93,1.95,1.95,2.04
154,E0,2024-12-15,14:00,Brighton,Crystal Palace,1,3,A,0,2,...,1.97,1.96,1.93,2.00,1.97,2.01,1.90,1.96,1.96,2.03
155,E0,2024-12-15,16:30,Man City,Man United,1,2,A,1,0,...,2.05,1.75,2.11,1.81,2.34,1.86,2.11,1.77,2.11,1.88
156,E0,2024-12-15,19:00,Chelsea,Brentford,2,1,H,1,0,...,1.91,2.02,1.88,2.04,1.99,2.04,1.93,1.94,2.01,1.98


In [5]:
remaining = pd.read_csv(REMAINING_SEASON).drop(columns=["Attendance"])
remaining["Date"] = pd.to_datetime(remaining["Date"]).dt.date
matches = pd.concat([matches, remaining], ignore_index=True)
matches

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,...,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA,BFECAHH,BFECAHA
0,E0,2024-08-16,20:00,Man United,Fulham,1.0,0.0,H,0.0,0.0,...,1.86,2.07,1.83,2.11,1.88,2.11,1.82,2.05,1.90,2.08
1,E0,2024-08-17,12:30,Ipswich,Liverpool,0.0,2.0,A,0.0,0.0,...,2.05,1.88,2.04,1.90,2.20,2.00,1.99,1.88,2.04,1.93
2,E0,2024-08-17,15:00,Arsenal,Wolves,2.0,0.0,H,1.0,0.0,...,2.02,1.91,2.00,1.90,2.05,1.93,1.99,1.87,2.02,1.96
3,E0,2024-08-17,15:00,Everton,Brighton,0.0,3.0,A,0.0,1.0,...,1.87,2.06,1.86,2.07,1.92,2.10,1.83,2.04,1.88,2.11
4,E0,2024-08-17,15:00,Newcastle,Southampton,1.0,0.0,H,1.0,0.0,...,1.87,2.06,1.88,2.06,1.89,2.10,1.82,2.05,1.89,2.10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,,2025-05-25,,Newcastle,Everton,,,,,,...,,,,,,,,,,
376,,2025-05-25,,Nott'm Forest,Chelsea,,,,,,...,,,,,,,,,,
377,,2025-05-25,,Southampton,Arsenal,,,,,,...,,,,,,,,,,
378,,2025-05-25,,Tottenham,Brighton,,,,,,...,,,,,,,,,,


In [6]:
to_predict = pd.read_csv(TEST_MATCHES)
to_predict

Unnamed: 0,Date,HomeTeam,AwayTeam
0,01-Feb-25,Bournemouth,Liverpool
1,01-Feb-25,Arsenal,Man City
2,01-Feb-25,Brentford,Tottenham
3,01-Feb-25,Chelsea,West Ham
4,01-Feb-25,Everton,Leicester
5,01-Feb-25,Ipswich,Southampton
6,01-Feb-25,Man United,Crystal Palace
7,01-Feb-25,Newcastle,Fulham
8,01-Feb-25,Nott'm Forest,Brighton
9,01-Feb-25,Wolves,Aston Villa


## Pre-processing

To predict the outcomes of matches, the probabilities of Home Win v Draw v Away Win are skewed according to a "TeamEloWinProb" feature (credit to Michal). In this section, we will calculate this feature for 2024/25 season. 

For matches that haven't been played, we will carry the most recent value forward.

In [7]:
elos = elo.get(matches)
elos["Date"] = pd.to_datetime(elos["Date"]).dt.date
elos["HomeTeam"] = elos["HomeTeam"].replace("Forest", "Nott'm Forest")
elos["AwayTeam"] = elos["AwayTeam"].replace("Forest", "Nott'm Forest")
matches = matches.merge(
    elos,
    on=["Date", "HomeTeam", "AwayTeam"]
)
matches

INFO:root:Saving cached data to /Users/mattgc/soccerdata/data/ClubElo
100%|██████████| 380/380 [00:00<00:00, 545.94it/s]


Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,...,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA,BFECAHH,BFECAHA,HomeElo,AwayElo,TeamEloWinProb
0,E0,2024-08-16,20:00,Man United,Fulham,1.0,0.0,H,0.0,0.0,...,2.11,1.88,2.11,1.82,2.05,1.90,2.08,1779.043945,1716.276367,0.589360
1,E0,2024-08-17,12:30,Ipswich,Liverpool,0.0,2.0,A,0.0,0.0,...,1.90,2.20,2.00,1.99,1.88,2.04,1.93,1568.325562,1900.688354,0.128618
2,E0,2024-08-17,15:00,Arsenal,Wolves,2.0,0.0,H,1.0,0.0,...,1.90,2.05,1.93,1.99,1.87,2.02,1.96,1946.902832,1677.862305,0.824729
3,E0,2024-08-17,15:00,Everton,Brighton,0.0,3.0,A,0.0,1.0,...,2.07,1.92,2.10,1.83,2.04,1.88,2.11,1706.850830,1713.163208,0.490917
4,E0,2024-08-17,15:00,Newcastle,Southampton,1.0,0.0,H,1.0,0.0,...,2.06,1.89,2.10,1.82,2.05,1.89,2.10,1801.797119,1599.603394,0.762044
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,,2025-05-25,,Newcastle,Everton,,,,,,...,,,,,,,,1803.491943,1704.383423,0.638882
376,,2025-05-25,,Nott'm Forest,Chelsea,,,,,,...,,,,,,,,1722.681763,1896.247314,0.269114
377,,2025-05-25,,Southampton,Arsenal,,,,,,...,,,,,,,,1579.600464,1983.363647,0.089135
378,,2025-05-25,,Tottenham,Brighton,,,,,,...,,,,,,,,1812.923828,1765.461060,0.567883


## Run a single simulation

In [8]:
standings = utils.calculate_standings(matches, 0, t)
standings

Unnamed: 0,Team,Points
0,Liverpool,37
1,Chelsea,34
2,Arsenal,30
3,Nott'm Forest,28
4,Man City,27
5,Aston Villa,25
6,Bournemouth,24
7,Fulham,24
8,Brighton,24
9,Brentford,23


In [9]:
simulation = (
    utils.generate_simulations(matches, nruns=1, current=t)[0]
)
simulation

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,...,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA,BFECAHH,BFECAHA,HomeElo,AwayElo,TeamEloWinProb
0,E0,2024-08-16,20:00,Man United,Fulham,1.0,0.0,H,0.0,0.0,...,2.11,1.88,2.11,1.82,2.05,1.90,2.08,1779.043945,1716.276367,0.589360
1,E0,2024-08-17,12:30,Ipswich,Liverpool,0.0,2.0,A,0.0,0.0,...,1.90,2.20,2.00,1.99,1.88,2.04,1.93,1568.325562,1900.688354,0.128618
2,E0,2024-08-17,15:00,Arsenal,Wolves,2.0,0.0,H,1.0,0.0,...,1.90,2.05,1.93,1.99,1.87,2.02,1.96,1946.902832,1677.862305,0.824729
3,E0,2024-08-17,15:00,Everton,Brighton,0.0,3.0,A,0.0,1.0,...,2.07,1.92,2.10,1.83,2.04,1.88,2.11,1706.850830,1713.163208,0.490917
4,E0,2024-08-17,15:00,Newcastle,Southampton,1.0,0.0,H,1.0,0.0,...,2.06,1.89,2.10,1.82,2.05,1.89,2.10,1801.797119,1599.603394,0.762044
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,,2025-05-25,,Newcastle,Everton,,,H,,,...,,,,,,,,1803.491943,1704.383423,0.638882
376,,2025-05-25,,Nott'm Forest,Chelsea,,,A,,,...,,,,,,,,1722.681763,1896.247314,0.269114
377,,2025-05-25,,Southampton,Arsenal,,,H,,,...,,,,,,,,1579.600464,1983.363647,0.089135
378,,2025-05-25,,Tottenham,Brighton,,,D,,,...,,,,,,,,1812.923828,1765.461060,0.567883


In [10]:
simulated_standings = utils.calculate_standings(simulation)
simulated_standings

Unnamed: 0,Team,Points
0,Chelsea,74
1,Man City,70
2,Liverpool,66
3,Arsenal,64
4,West Ham,59
5,Fulham,53
6,Tottenham,53
7,Bournemouth,53
8,Nott'm Forest,52
9,Newcastle,52


`utils.calculate_outcomes` generates a tuple describing whether or not the home/away team 

1. win the tournament

2. relegate from the tournament

3. qualify for the champions league

given the home/away team wins the match. 

That gives 2 (home/away team) * 3 (outcomes) * 2 (home/away win) = 12 indicator variables. The tuple is organised as 

(\<home outcomes given home win\>, \<away outcomes given home win\>, \<home outcomes given away win\>, \<away outcomes given away win\>)

In [11]:
utils.calculate_outcomes(simulated_standings, "Bournemouth", "Liverpool")

(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1)

## Predict match importance for matches of interest

In [12]:
targets = list(
    matches.reset_index().merge(to_predict, on=["HomeTeam", "AwayTeam"])["index"]
)
targets

[230, 231, 232, 233, 234, 235, 236, 237, 238, 239]

In [13]:
match_importance = importance.predict(
    matches,
    start=0,
    end=len(matches)-1,
    current=t-1,
    targets=targets,
    nruns=1000
)
match_importance.loc[targets]

Match 230 done: Arsenal vs Man City, 0.125, 0.173
Match 231 done: Bournemouth vs Liverpool, 0.075, 0.1479999999999999
Match 232 done: Brentford vs Tottenham, 0.038, 0.059000000000000004
Match 233 done: Chelsea vs West Ham, 0.08300000000000007, 0.013
Match 234 done: Everton vs Leicester, 0.003, 0.0
Match 235 done: Ipswich vs Southampton, 0.0, 0.0
Match 236 done: Man United vs Crystal Palace, 0.026999999999999996, 0.001
Match 237 done: Newcastle vs Fulham, 0.054000000000000006, 0.049
Match 238 done: Nott'm Forest vs Brighton, 0.076, 0.061000000000000006
Match 239 done: Wolves vs Aston Villa, 0.0, 0.09599999999999999


Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,...,MaxCAHA,AvgCAHH,AvgCAHA,BFECAHH,BFECAHA,HomeElo,AwayElo,TeamEloWinProb,HI,AI
230,,2025-02-01,,Arsenal,Man City,,,,,,...,,,,,,1983.363647,1966.654907,0.524027,0.125,0.173
231,,2025-02-01,,Bournemouth,Liverpool,,,,,,...,,,,,,1757.698608,1998.375122,0.200136,0.075,0.148
232,,2025-02-01,,Brentford,Tottenham,,,,,,...,,,,,,1749.327759,1812.923828,0.409487,0.038,0.059
233,,2025-02-01,,Chelsea,West Ham,,,,,,...,,,,,,1896.247314,1732.55249,0.719565,0.083,0.013
234,,2025-02-01,,Everton,Leicester,,,,,,...,,,,,,1704.383423,1655.346558,0.570105,0.003,0.0
235,,2025-02-01,,Ipswich,Southampton,,,,,,...,,,,,,1583.811401,1579.600464,0.50606,0.0,0.0
236,,2025-02-01,,Man United,Crystal Palace,,,,,,...,,,,,,1785.758789,1745.683105,0.557419,0.027,0.001
237,,2025-02-01,,Newcastle,Fulham,,,,,,...,,,,,,1803.491943,1759.959473,0.562322,0.054,0.049
238,,2025-02-01,,Nott'm Forest,Brighton,,,,,,...,,,,,,1722.681763,1765.46106,0.438745,0.076,0.061
239,,2025-02-01,,Wolves,Aston Villa,,,,,,...,,,,,,1664.031616,1811.931274,0.299144,0.0,0.096


In [14]:
match_importance.loc[targets, ["Date", "Time", "HomeTeam", "AwayTeam", "HI", "AI", "HomeElo", "AwayElo"]].to_csv("../output/epl-test-importance.csv", index=False)

# Backfill season match importance in master file

When backfilling the match importance, this code estimates the importance of match $i$ by assuming the outcomes of all matches that come before it are known. It simulates the matches after $i$ to estimate that match importance of $i$ as if the matches after were not known.

This section demonstrates how to backfill the match importance for 2017/18.

In [15]:
match_elos = pd.read_csv(ELO)[["Date", "HomeTeam", "AwayTeam", "TeamEloWinProb"]]
match_elos["Date"] = pd.to_datetime(match_elos["Date"], format="%Y-%m-%d")
match_elos

Unnamed: 0,Date,HomeTeam,AwayTeam,TeamEloWinProb
0,2010-01-05,Stoke,Fulham,0.335889
1,2010-01-09,Arsenal,Everton,0.730693
2,2010-01-09,Birmingham,Man United,0.170839
3,2010-01-11,Man City,Blackburn,0.644634
4,2010-01-16,Chelsea,Sunderland,0.875865
...,...,...,...,...
5799,2024-05-19,Crystal Palace,Aston Villa,0.435603
5800,2024-05-19,Liverpool,Wolves,0.777918
5801,2024-05-19,Luton,Fulham,0.320134
5802,2024-05-19,Man City,West Ham,0.860403


In [16]:
matches = pd.read_csv(SEASON_17_18)
matches["Date"] = pd.to_datetime(matches["Date"], format="%d/%m/%Y")
matches = matches.merge(
    match_elos,
    on=["Date", "HomeTeam", "AwayTeam"],
    how="left"
)
matches[["HI", "AI"]] = np.nan
matches

Unnamed: 0,Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,...,BbMxAHH,BbAvAHH,BbMxAHA,BbAvAHA,PSCH,PSCD,PSCA,TeamEloWinProb,HI,AI
0,E0,2017-08-11,Arsenal,Leicester,4,3,H,2,2,D,...,1.91,1.85,2.10,2.02,1.49,4.73,7.25,0.680436,,
1,E0,2017-08-12,Brighton,Man City,0,2,A,0,0,D,...,1.95,1.91,2.01,1.96,11.75,6.15,1.29,0.163951,,
2,E0,2017-08-12,Chelsea,Burnley,2,3,A,0,3,A,...,2.03,1.97,1.95,1.90,1.33,5.40,12.25,0.833734,,
3,E0,2017-08-12,Crystal Palace,Huddersfield,0,3,A,0,2,A,...,2.10,2.05,1.86,1.83,1.79,3.56,5.51,0.723831,,
4,E0,2017-08-12,Everton,Stoke,1,0,H,1,0,H,...,1.94,1.90,2.01,1.98,1.82,3.49,5.42,0.623467,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,E0,2018-05-13,Newcastle,Chelsea,3,0,H,1,0,H,...,1.90,1.83,2.11,2.03,4.85,3.72,1.80,0.224904,,
376,E0,2018-05-13,Southampton,Man City,0,1,A,0,0,D,...,2.01,1.95,1.97,1.91,6.32,4.78,1.51,0.139422,,
377,E0,2018-05-13,Swansea,Stoke,1,2,A,1,2,A,...,1.94,1.88,2.03,1.98,2.08,3.56,3.82,0.528203,,
378,E0,2018-05-13,Tottenham,Leicester,5,4,H,1,2,A,...,1.96,1.86,2.05,2.00,1.38,5.50,8.15,0.770980,,


In [17]:
match_importance = importance.backfill(
    data=matches,
    nruns=25
)
match_importance

Calculating match importance for the season, 0 to 379
Match 0 done: Arsenal vs Leicester, 0.12, 0.04
Match 1 done: Brighton vs Man City, 0.04, 0.16000000000000003
Match 2 done: Chelsea vs Burnley, 0.12, 0.04
Match 3 done: Crystal Palace vs Huddersfield, 0.0, 0.04
Match 4 done: Everton vs Stoke, 0.039999999999999994, 0.04
Match 5 done: Southampton vs Swansea, 0.12, 0.08
Match 6 done: Watford vs Liverpool, 0.0, 0.08000000000000002
Match 7 done: West Brom vs Bournemouth, 0.04, 0.0
Match 8 done: Man United vs West Ham, 0.040000000000000036, 0.04000000000000001
Match 9 done: Newcastle vs Tottenham, 0.0, 0.24
Match 10 done: Bournemouth vs Watford, 0.0, 0.0
Match 11 done: Burnley vs West Brom, 0.04, 0.08
Match 12 done: Leicester vs Brighton, 0.04, 0.0
Match 13 done: Liverpool vs Crystal Palace, 0.040000000000000036, 0.0
Match 14 done: Southampton vs West Ham, 0.04, 0.08
Match 15 done: Stoke vs Arsenal, 0.0, 0.2
Match 16 done: Swansea vs Man United, 0.04, 0.08000000000000002
Match 17 done: Hud

Unnamed: 0,Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,...,BbMxAHH,BbAvAHH,BbMxAHA,BbAvAHA,PSCH,PSCD,PSCA,TeamEloWinProb,HI,AI
0,E0,2017-08-11,Arsenal,Leicester,4,3,H,2,2,D,...,1.91,1.85,2.10,2.02,1.49,4.73,7.25,0.680436,0.12,0.04
1,E0,2017-08-12,Brighton,Man City,0,2,A,0,0,D,...,1.95,1.91,2.01,1.96,11.75,6.15,1.29,0.163951,0.04,0.16
2,E0,2017-08-12,Chelsea,Burnley,2,3,A,0,3,A,...,2.03,1.97,1.95,1.90,1.33,5.40,12.25,0.833734,0.12,0.04
3,E0,2017-08-12,Crystal Palace,Huddersfield,0,3,A,0,2,A,...,2.10,2.05,1.86,1.83,1.79,3.56,5.51,0.723831,0.00,0.04
4,E0,2017-08-12,Everton,Stoke,1,0,H,1,0,H,...,1.94,1.90,2.01,1.98,1.82,3.49,5.42,0.623467,0.04,0.04
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,E0,2018-05-13,Newcastle,Chelsea,3,0,H,1,0,H,...,1.90,1.83,2.11,2.03,4.85,3.72,1.80,0.224904,0.00,0.00
376,E0,2018-05-13,Southampton,Man City,0,1,A,0,0,D,...,2.01,1.95,1.97,1.91,6.32,4.78,1.51,0.139422,0.00,0.00
377,E0,2018-05-13,Swansea,Stoke,1,2,A,1,2,A,...,1.94,1.88,2.03,1.98,2.08,3.56,3.82,0.528203,0.00,0.00
378,E0,2018-05-13,Tottenham,Leicester,5,4,H,1,2,A,...,1.96,1.86,2.05,2.00,1.38,5.50,8.15,0.770980,0.00,0.00
