March Madness Bracket Maker
March Madness is notorious for having game outcomes which are almost impossible to predict (known as upsets). However, there are statistics about each team in the tournament which can be used to predict the winner of a game and thus create a bracket.
This project is an application to build a bracket for March Madness. It uses a Support Vector Machine to decide which team wins the matchup it is given.
Statistics about each team which are used in computing the features for the classifier are
- average points per game (ppg)
- offensive efficiency (oe)
- defensive efficiency (de)
- field goal efficiency (fge)
- offensive rebounds (or)
Since the classifier is meant to predict the outcome of a game, each game needs a numerical representation. If we arbitrarily assign one team to be Team One and the other to be Team Two, then a way to represent the game is the difference between corresponding statistics for each team. Thus, the classifier can be given two teams which are matching up and output either 1 or 2 depending on which team it believes will win. By entering each matchup and progressing up the bracket, we can construct a bracket for the Tournament.
All data is contained in the Data folder
- gameData.csv contains historical game data for two years (2017 and 2016). It contains the teams (arbitrarily assigned to team one or two) and their statistics for that year. It also denotes which team won.
- de.csv contains the 2017-2018 defensive efficiency for each team in the NCAA
- fge.csv contains the 2017-2018 field goal efficiency for each team in the NCAA
- oe.csv contains the 2017-2018 offensive efficiency for each team in the NCAA
- or.csv contains the 2017-2018 offensive rebounds for each team in the NCAA
- ppg.csv contains the 2017-2018 average points per game for each team in the NCAA
- rpi.csv contains the 2017-2018 rpi for each team in the NCAA
All data was taken from https://www.teamrankings.com/ncb/team-stats/
- data_extract.py contains helper methods for extracting data, training the classifier, and saving it to disk.
- execute.py begins the program for creating the bracket. It uses the command prompt to do so.
Why Support Vector Machine
Support Vector Machines generally perform well with binary classification in a multi-dimensional space. During the development of this project, it consistently performed better than Decision Trees, Naive Bayes Classifiers, and Random Forests. The optimal hyperparameters were determined using 10-fold cross-validation techniques. The algorithm itself comes from Sci-Kit learn
Successfully predicted 27/32 games in the First Round of the 2018 Tournament. It scored in the 75th percentile of ESPN Bracket Challenge users