The English Premier League is among the world's most popular sporting leagues. I have followed it for almost a decade. The competition was officially founded in 1992 by the English Football Association and has blossomed into the most competitive soccer league in the world. With any sport, there is a large market for betting in the league. As an avid soccer fan who has played soccer his entire life, I want to know if I can beat the odds using Machine Learning.
For a given premier league matchup, I want to use past results and match statistics to make an educated prediction as to whether or not a team will win a match. The aim is build a machine learning model that can do this. Ideally I would want people people to use this algorithim to have the upper hand in anything they desire.. (i.e., betting, bragging rights, or whatever they please!)
Now that we have an idea of what we want to do on a high level, lets break-down the plan of action in building the best model for the task! The project roadmap for our analysis will begin with data cleaning and manipulation, followed by exploratory data analysis to gain insight into our predictor variables. Our objective is to use other predictors to predict a binary class variable, "Result," which will detail whether a team wins, loses, or draws in a football match. We will then perform a training/test split, implement 10-fold cross-validation, and compare the performance of different models such as Logistic Regression, Decision Tree, and Random Forest. The model with the highest performance will be chosen, and its performance will be analyzed on the testing dataset.