Code and supplementary materials for "The Efficacy of League Formats in Ranking Teams" paper.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
appendix
data
scripts
LICENSE
README.md

README.md

The Efficacy of League Formats in Ranking Teams

This repository contains the code used for the analysis of football data in the paper The efficacy of league formats in ranking teams (submitted to the special issue on sports analytics in Statistical Modelling Journal) by Jan Lasek and Marek Gagolewski. The efficacy of a given league format is understood as the accuracy to reproduce the true ranking of competing teams. The content of this repository, steps to reproduce the main results and supplementary materials are outlined below.

Contents

league-formats-efficacy
├── data
│    ├── download_data.sh # Script for downloading data from http://www.football-data.co.uk/
│    └── preprocessing_football_data_co_uk.R
├── scripts
│    ├── config.R # It should be defined locally
│    ├── setup_simulations.R
│    ├── run_simulations.R
│    ├── run_all.sh # Simulations for all settings
│    ├── schedule_functions.R
│    ├── simulation_functions.R
│    ├── evaluation_functions.R
│    ├── extra_functions.R
│    ├── evaluate_simulations.R
│    └── rating_systems
│         ├── rating_systems.R
│         ├── prediction_functions.R
│         └── optimise_models.R
└── appendix

Requirements

The project was developed in R using the following packages:

  • doMC_1.3.5
  • iterators_1.0.8
  • foreach_1.4.3
  • dplyr_0.5.0
  • stringi_1.1.7
  • skellam_0.2.0
  • MASS_7.3-45

Compatibility of the scripts for other versions of the aforementioned packages is not guaranteed.

Reproducing the Main Results

Step 1

First, create file scripts/config.R and set appropriate paths for keeping data and saving the results. This depends on local platform settings. File contents should look like:

results_save_folder <- "where/you/want/to/store/simulation/results"

All the results will be saved in results_save_folder.

Step 2

Second, to install necessary libraries and setup the output data folders, run:

$ Rscript setup_simulations.R

Additionally, you can edit the script and define the following:

  • the team rating model to use (model variable),
  • the league formats to analyse (league_formats),
  • directory for storing the results of a specific experiment (specific_result_folder).

An error will be raised (directory not empty), if there are some old results stored. Clean them first (or move to some other location).

Step 3

Third, to produce the simulation results with given parameters, edit the run_all.sh script and execute:

$ ./run_all.sh

Alternatively, to produce the results for a given parameter setup, run:

$ Rscript run_simulations.R --n=12 --model=poisson_correlated --n_sim=100 --shape=20 --sigma=0.3 --n_cores=3 --specific_result_folder=results_all --log2file=1

with appropriate parameters (please consult the script). These operations are performed for a parameters grid in the run_all.sh script. The results will be saved in the folder specified in the config.R script.

Step 4

Finally, for measuring the agreement between the final league standings and latent team strength parameters, execute:

$ Rscript evaluate_simulations.R

This produces a csv file with results in the respective folder.

Reproducing Intermediate Results

Rating Systems Performance

In order to generate and evaluate the predictions of different rating systems first download data from http://www.football-data.co.uk/. There is script data/download_data.sh to assist you with it. The data need to be first preprocessed by running

$ Rscript preprocessing_football_data_co_uk.R

in the data folder.

The correlation parameter is set to ρ=0.45 (or as desired) for the correlated Poisson model in the scripts/rating_systems/prediction_functions.R script. To reproduce results, go to scripts/rating_systems and run

$ Rscript optimize_models.R

The script runs grid search for regularization parameter λ for different models and a league season of choice (this can be specified directly in the script) and saves some logs and results into results folder in the same directory.

Appendices

League Formats in UEFA

The first appendix consists of a listing of formats that are in operation in the UEFA countries in the 2017/2018 (or 2018) season.

Click to download: PDF | MARKDOWN

Tournament Metrics for Several Parameter Combinations

The second appendix gives the detailed estimates of different tournament metrics considered in our study.

Click to download: PDF | MARKDOWN

Schedules in Two Stage Systems

Two example schedules of the final round in the 2RR + (1RR/1RR) league format employed in the championship and the relegation group in the case of 12 and 16 teams are listed in the third appendix.

Click to download: PDF | MARKDOWN