Victorian Baseball Elo Ratings Project

layout	title	permalink
page	About	/about/

Victorian Baseball Elo Ratings Project

Overview

This is a project to develop a reasonably sophisticated prediction model based on the Elo rating system. I've used data from the Victorian Summer Baseball League (VSBL) and the Melbourne Winter Baseball League (MWBL).

It is intended as a learning exercise.

Any advice on how to improve the model, or the code, is more than welcome.

Currently, the data used to generate this model is not provided here. I want to ensure I have permission to upload it prior to releasing it on github.

Some notes on nomenclature: As the Elo system is being applied here to a game that has scores, it is important to be precise and careful when it comes to word usage:

the Elo value is known as a rating
the value is W/L/D (1/0/0.5) respectively. This is often known as the expected score in Elo methods, but I think it makes more sense to call it a value
the score is the 'baseball score' in the traditional sense i.e. 12 v. 5

Navigating the directories

On github, all of the directories with the underscore prefix are for jekyll. All other directories are directly related to the project itself. I imagine in time, this distinction will grow increasingly vague as better web visualisations are achieved.

Future improvements

make base data available
include some kind of user input
include further predictions for the league other than the next round:
- Monte Carlo method
- mathematical calculation method
learn to distinguish between finals and regular season - either number of games per round, dates, or scraping the results more carefully.
graphs, and pretty visualisations - D3.js? SVG? Not sure...
interactivity within the graphs
broader design goals including more sophisticated implementations of my personal website and portfolio
optimise the prediction model:
- determine the ideal home field advantage
- determine the ideal k factor i.e. how quickly the model responds to new data
- determine the ideal season regression factor i.e. how much are teams alike one season to the next
include a margin of victory metric to improve the fidelity of the model and take full advantage of the limited data available to me
create, test, and upload multiple possible models to compare which is best and why:
- Home always wins
- Offensive and Defensive Elo ratings?
What's the difference in skill between the grades?
- Use relegation Elo etc. to determine?
- This would require multiple seasons of data. (maybe)
How would Elo corrections work? Linear conversion?
For one season of data - lowest Elo Grade A = highest Elo Grade B?
How similar are teams from one game to the next?
apply this model and more to other situations i.e. The Big Bash
add field data (LOCATION, orientation, size)
add weather data for each game
determine standard for washouts - using weather data and previous washout games

I also believe that the data I currently possess may be enough to determine a Park Factor for each of the parks used by the league. Ideally I would have information about the number of innings played. I will either have to ignore this, or assume it's five. It also very much could be different across different competitions which complicates things.

This README could be improved by adding plain language explanations to a number of its elements, to ensure the lay-person can understand, in at least broad strokes, what is going on, and how accurate/inaccurate the model/data/prediction is. Assumptions should also be listed and described.

Assumptions

Computers operate in binary, whereas humans typically work in decimal. At first glance, this distinction should be largely superficial - however, it works out that decimal numbers a represented as binary fractions on computers. In the same way that 0.3 (3/10) is not precisely 1/3, and neither is 0.33 (3/10 + 3/100), many of these binary fractions approximate the actual value to 53 bits. Ultimately, this problem has a minor to zero effect on the model, but I found it interesting nonetheless. There is the fraction module that allows for exact arithmetic fractions, but it's probably overkill to use it here.

Learn more

Read more about Elo rating systems and predictive models in general:

Elo rating system - Wikipedia
fivethirtyeight's MLB, NBA, and NFL Elo ratings
The Arc AFL Footy Elo ratings
Predicting Outcomes in Australian Rules Football PhD

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
_data		_data
_includes		_includes
_layouts		_layouts
_posts		_posts
_sass		_sass
assets		assets
model		model
.gitignore		.gitignore
404.md		404.md
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
_config.yml		_config.yml
index.md		index.md
predictions.md		predictions.md
results.md		results.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Victorian Baseball Elo Ratings Project

Overview

Navigating the directories

Future improvements

Assumptions

Learn more

About

Releases

Packages

Languages

gabrieldwyer/baseball_elo_ratings

Folders and files

Latest commit

History

Repository files navigation

Victorian Baseball Elo Ratings Project

Overview

Navigating the directories

Future improvements

Assumptions

Learn more

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages