## Team Ratings

Build team ratings by updating an Elo-based system over 2+ seasons of Premier League data

In [1]:
from datetime import date
from typing import Tuple
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
from autoscout import preprocess
from autoscout.util import load_csv

from ratings.team import RatingSystem

Load the match summary data

Competition scorelines and expected goals data can be downloaded from [fbref](https://fbref.com) using [autoscout](https://github.com/olliestanley/autoscout/)

Some matches for the current season have not been played yet so we filter these out by removing any matches which do not have a `referee` listed. We also split the `score` column into two separate ones, namely `home_goals` and `away_goals`

In [2]:
# Each season DataFrame separately
dfs = [
    load_csv(f"data/raw/epl_{season}_matches.csv")
    for season in [2021, 2022, 2023]
]

# Overall DataFrame
df = preprocess.combine_data(dfs).dropna(axis=0, how="any")
df = df[~df["referee"].str.contains("0")].reset_index(drop=True)
df[["home_goals", "away_goals"]] = df["score"].str.split("–", expand=True)

Develop a rating system using the code provided in `ratings.team`

This will update the ratings for all matches in the dataset, and fit a forecasting model to generate win, draw, and loss probabilities from a match prediction

In [3]:
def ratings(
    data: pd.DataFrame,
    k: int,
    baseline: float,
    home_advantage: float,
    default_rating: float,
) -> RatingSystem:
    system = RatingSystem(k, baseline, home_advantage, default_rating)
    system.process_dataset(data)
    system.fit_forecast_model(data)
    system.forecast_dataset(data)
    return system

Define the parameters for the system and create it - these may not be the optimal parameters but I have found them to produce reasonable results

Higher `k` values cause the ratings to place more weight on recent matches

Higher `baseline` values cause the system to overall predict higher scores (more total goals in a game)

Higher `home_advantage` values cause predictions to be more generous to home teams, which can also alter ratings as these are determined by difference between expected and real performance

In [4]:
k = 64
baseline = 2.8
home_advantage = 0.06

system = ratings(df, k, baseline, home_advantage, 1000)

Define a function to predict a match using a given `DataFrame`, `RatingSystem`, teams, and date

In [5]:
def predict_match_as_of(
    df: pd.DataFrame, system: RatingSystem, home_team: str, away_team: str, date_str: str
) -> Tuple[float, float]:
    home_att, home_def = system.get_team_ratings_before_date(
        df, home_team, date_str
    )

    away_att, away_def = system.get_team_ratings_before_date(
        df, away_team, date_str
    )

    return system.predict_match_from_ratings(
        home_att, home_def, away_att, away_def
    )

Specify the teams and date to predict - change this cell to predict a different match

If manually inputting dates, use the format "YYYY-MM-DD"

In [6]:
home_team = "Manchester Utd"
away_team = "Brighton"
date_str = str(date.today())

Predict results based on the system created and turn the predictions into result probabilities

In [7]:
# Predict and output results
home_pred, away_pred = predict_match_as_of(df, system, home_team, away_team, date_str)
print(f"\nPredicted xG: {home_team} {round(home_pred, 2)} - {round(away_pred, 2)} {away_team}")
home_prob, draw_prob, away_prob = system.forecast_match_from_predictions(home_pred, away_pred)[0]
print(f"Probabilities: {home_team} {round(home_prob, 3)} - Draw {round(draw_prob, 3)} - {away_team} {round(away_prob, 3)}")


Predicted xG: Manchester Utd 1.36 - 1.43 Brighton
Win Probabilities: Manchester Utd 0.347 - Draw 0.252 - Brighton 0.401


Done!