# Introduction

This data science project aims to predict the outcome of premier league matches for gameweek 30 by giving teams an attack rating (based on goals scored and xG) and a defense rating (based on goals conceded and xG against). The rating also takes into account form, giving more recent matches more importance than matches well in the past. We start with a very simple model and build up towards more detailed models that would better capture the parameters that would affect the score of a football match.

This model is still just a preliminary idea and a lot of the hyper-parameters should be trained such as how fast the weights decay, the thresholds for predicting wins, draws, and losses. Many more factors should also be taken into consideration such as past results from the same fixture, player injuries, etc. This has the potential of being further expanded upon to get a nice classification model that would predict every match as a home win, draw or away win.

In [27]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import poisson

## A first model

We assume that the number of goals a team scores in a match is given by a Poisson distribution with a given mean.

We start with a very simple poisson model. The mean is merely calculated as the mean of the goals scored in the season until that match (until matchday 30 for us). 

We then give 4 sets of results for the fixtures. The first just reports the means to see what we would expect on average. This does not allow for any randomness.
The next simulates a poisson random variable 5 times and takes the average of these. This is to ensure some robustness of results but at the same time allowing for randomness as well.
The third shows 1 realization of the poisson random variable, giving the result some real randomness.
The last shows the most likely result. This is not particularly interesting as it does not vary much between teams as a large range of means would give the same most likely result.

The simulation of course depends on the random seed and therefore has some element of randomness in it and therefore predicts more unlikely results as well but over the season we expect the mean to be somewhere near the mean input into the system.

In [310]:
standings = pd.read_csv('soccer-standings.csv', header=1, index_col=0)
teams = standings.index
fixtures = pd.DataFrame([['Chelsea', 'West Bromwich Albion'], ['Leeds United', 'Sheffield United'], ['Leicester City', 'Manchester City'], ['Arsenal', 'Liverpool'], ['Southampton', 'Burnley'], ['Newcastle United', 'Tottenham Hotspur'], ['Aston Villa', 'Fulham'], ['Manchester United', 'Brighton & Hove Albion'], ['Everton', 'Crystal Palace'], ['Wolverhampton', 'West Ham United']])
fixtures.columns = ['Home', 'Away']

In [320]:
standings_1 = standings.copy()

standings_1['mean'] = standings_1['G']/standings['M']

standings_1['score'] = np.random.poisson(lam=standings_1['mean'])

for i in range(4):
    standings_1['{}'.format(i)] = poisson.pmf(i, standings_1['mean'])
standings_1['most_likely_scored'] = standings_1[['0', '1', '2', '3']].idxmax(axis=1)

standings_1['match_goals_scored'] = np.mean(np.random.poisson(standings_1['mean'], (5,20)), axis=0)

In [321]:
fixtures_1 = fixtures.copy()
fixtures_1['home_mean_goals'] = 0
fixtures_1['away_mean_goals'] = 0

fixtures_1['home_avg_sim_goals'] = 0
fixtures_1['away_avg_sim_goals'] = 0

fixtures_1['home_sim_goals'] = 0
fixtures_1['away_sim_goals'] = 0

fixtures_1['home_ml_goals'] = 0
fixtures_1['away_ml_goals'] = 0




for i in range(len(standings)):
    for j in range(len(fixtures)):
        if fixtures_1.iloc[j,0] == teams[i]:
            fixtures_1.iloc[j,-8] = standings_1.iloc[i]['mean']
            fixtures_1.iloc[j,-6] = standings_1.iloc[i]['match_goals_scored']
            fixtures_1.iloc[j,-4] = standings_1.iloc[i]['score']
            fixtures_1.iloc[j,-2] = standings_1.iloc[i]['most_likely_scored']
        elif fixtures_1.iloc[j,1] == teams[i]:
            fixtures_1.iloc[j,-7] = standings_1.iloc[i]['mean']
            fixtures_1.iloc[j,-5] = standings_1.iloc[i]['match_goals_scored']
            fixtures_1.iloc[j,-3] = standings_1.iloc[i]['score']
            fixtures_1.iloc[j,-1] = standings_1.iloc[i]['most_likely_scored']

In [322]:
fixtures_1

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_avg_sim_goals,away_avg_sim_goals,home_sim_goals,away_sim_goals,home_ml_goals,away_ml_goals
0,Chelsea,West Bromwich Albion,1.517241,0.689655,2.6,1.0,1,1,1,0
1,Leeds United,Sheffield United,1.551724,0.551724,1.8,0.2,2,0,1,0
2,Leicester City,Manchester City,1.827586,2.133333,1.4,1.2,2,3,1,2
3,Arsenal,Liverpool,1.37931,1.655172,2.0,1.2,1,4,1,1
4,Southampton,Burnley,1.241379,0.758621,1.6,0.4,1,1,1,0
5,Newcastle United,Tottenham Hotspur,0.965517,1.689655,1.2,1.4,0,3,0,1
6,Aston Villa,Fulham,1.392857,0.766667,1.4,0.8,2,0,1,0
7,Manchester United,Brighton & Hove Albion,1.931034,1.103448,2.4,1.0,4,2,1,1
8,Everton,Crystal Palace,1.428571,1.068966,1.2,0.6,2,1,1,1
9,Wolverhampton,West Ham United,0.965517,1.551724,1.0,1.0,0,1,0,1


We see here that the most one sided game is predicted to be Leeds United vs Sheffield United in favour of Leeds, while Manchester City is set to take a narrow win over Leicester City. Arsenal vs Liverpool is also set to be a close game. Chelsea, Manchester United and Tottenham Hotspur are all set to win comfortably over their oponents (with a small chance of an upset).

Of course, this simple model is not going to be very good but it is a good starting point and we will improve it from here. One of the major drawbacks of this model is that it does not take into account any defensive ability of the teams whatsoever, and so we deal with this next.

## Taking into account defensive scores
We now include a defensive score in the analysis as well. The defensive score for a team is the number of goals conceded per game on average. We then take the mean of the poisson distribution to be the average of the goals scored per game of a team and the goals conceded per game of the opposing team. So, for example, to predict the Chelsea vs West Brom game, we would see how many goals Chelsea scores on average per game and how many goals West Brom concedes on average per game to come up with the mean for the poisson distrubution.


In [314]:
standings_2 = standings.copy()

standings_2['mean_f'] = standings_1['G']/standings['M']
standings_2['mean_a'] = standings_1['GA']/standings['M']

standings_2['opponents'] = ['Leicester City', 'Brighton & Hove Albion', 'Manchester City', 'West Bromwich Albion', 'Wolverhampton', 'Newcastle United', 'Arsenal', 'Crystal Palace', 'Liverpool', 'Fulham', 'Sheffield United', 'Everton', 'West Ham United', 'Burnley', 'Southampton', 'Manchester United', 'Tottenham Hotspur', 'Aston Villa', 'Chelsea', 'Leeds United']

standings_2['opp_mean_f'] = 0
standings_2['opp_mean_a'] = 0

for i in range(len(standings_2)):
    standings_2.loc[standings_2.index[i], 'opp_mean_f'] = standings_2.loc[standings_2.loc[standings_2.index[i], 'opponents'], 'mean_f']
    standings_2.loc[standings_2.index[i], 'opp_mean_a'] = standings_2.loc[standings_2.loc[standings_2.index[i], 'opponents'], 'mean_a']

standings_2['match_attack_score'] = (standings_2['mean_f'] + standings_2['opp_mean_a'])/2
standings_2['match_defence_score'] = (standings_2['mean_a'] + standings_2['opp_mean_f'])/2

standings_2['match_goals_scored'] = np.random.poisson(lam=standings_2['match_attack_score'])
for i in range(4):
    standings_2['{}'.format(i)] = poisson.pmf(i, standings_2['match_attack_score'])
standings_2['most_likely_scored'] = standings_2[['0', '1', '2', '3']].idxmax(axis=1)

standings_2['match_goals_scored_avg'] = np.mean(np.random.poisson(standings_2['match_attack_score'], (5,20)), axis=0)

In [315]:
fixtures_2 = fixtures.copy()
fixtures_2['home_mean_goals'] = 0
fixtures_2['away_mean_goals'] = 0

fixtures_2['home_avg_sim_goals'] = 0
fixtures_2['away_avg_sim_goals'] = 0

fixtures_2['home_sim_goals'] = 0
fixtures_2['away_sim_goals'] = 0

fixtures_2['home_ml_goals'] = 0
fixtures_2['away_ml_goals'] = 0




for i in range(len(standings)):
    for j in range(len(fixtures)):
        if fixtures_2.iloc[j,0] == teams[i]:
            fixtures_2.iloc[j,-8] = standings_2.iloc[i]['match_attack_score']
            fixtures_2.iloc[j,-6] = standings_2.iloc[i]['match_goals_scored_avg']
            fixtures_2.iloc[j,-4] = standings_2.iloc[i]['match_goals_scored']
            fixtures_2.iloc[j,-2] = standings_2.iloc[i]['most_likely_scored']
        elif fixtures_2.iloc[j,1] == teams[i]:
            fixtures_2.iloc[j,-7] = standings_2.iloc[i]['match_attack_score']
            fixtures_2.iloc[j,-5] = standings_2.iloc[i]['match_goals_scored_avg']
            fixtures_2.iloc[j,-3] = standings_2.iloc[i]['match_goals_scored']
            fixtures_2.iloc[j,-1] = standings_2.iloc[i]['most_likely_scored']

In [316]:
fixtures_2

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_avg_sim_goals,away_avg_sim_goals,home_sim_goals,away_sim_goals,home_ml_goals,away_ml_goals
0,Chelsea,West Bromwich Albion,1.741379,0.775862,1.2,1.0,3,1,1,0
1,Leeds United,Sheffield United,1.637931,1.086207,2.0,0.6,1,0,1,1
2,Leicester City,Manchester City,1.263793,1.618391,1.8,2.0,1,0,1,1
3,Arsenal,Liverpool,1.310345,1.37931,1.0,1.2,3,2,1,1
4,Southampton,Burnley,1.258621,1.258621,0.8,1.4,1,2,1,1
5,Newcastle United,Tottenham Hotspur,1.0,1.672414,1.4,1.6,1,1,0,1
6,Aston Villa,Fulham,1.329762,0.919048,2.0,1.2,3,0,1,0
7,Manchester United,Brighton & Hove Albion,1.586207,1.103448,1.6,2.4,1,1,1,1
8,Everton,Crystal Palace,1.524631,1.195197,1.0,1.2,1,2,1,1
9,Wolverhampton,West Ham United,1.086207,1.431034,1.0,1.6,1,1,1,1


We see that a lot has changed when we include a defensive score for the teams. We now see that the most one-sided game is predicted to be Chelsea vs West Bromwich Albion with the blues winning very comfortably. Leeds United is still set to win but because of their poor defensive score, we see that it is no longer predicted to be as one-sided as before. Arsenal vs Liverpool and Southampton vs Burnley are set to be the closest games which could go either way. Manchester City and Manchester United seem to have somewhat difficult fixtures, that are not going to be entirely straightforward.

## Including form

Another thing we can add to this model is form, which gives more recent matches more importance compared to last season for example. To do this, we look at the exponentially weighted averages to come up with attacking and defensive scores for every team. We then model the number of goals scored as a Poisson random variable with mean as the average of the team's atacking score and the opposition's defensive score as before.

In [169]:
everygame = pd.read_csv('everygame.csv')

In [236]:
team_results = {}
for i in range(len(teams)):
    team_results[i] = everygame[(everygame['Home']==teams[i]) | (everygame['Away']==teams[i])][['Home','xG_home', 'G_home', 'G_away', 'xG_away', 'Away']]

    team_results[i] = team_results[i].reset_index(drop=True)
    
    team_results[i]['xG_for'] = [team_results[i].iloc[x]['xG_home'] if team_results[i].iloc[x]['Home'] == teams[i] else team_results[i].iloc[x]['xG_away'] for x in range(len(team_results[i]))]
    team_results[i]['G_for'] = [team_results[i].iloc[x]['G_home'] if team_results[i].iloc[x]['Home'] == teams[i] else team_results[i].iloc[x]['G_away'] for x in range(len(team_results[i]))]
    team_results[i]['G_against'] = [team_results[i].iloc[x]['G_away'] if team_results[i].iloc[x]['Home'] == teams[i] else team_results[i].iloc[x]['G_home'] for x in range(len(team_results[i]))]
    team_results[i]['xG_against'] = [team_results[i].iloc[x]['xG_away'] if team_results[i].iloc[x]['Home'] == teams[i] else team_results[i].iloc[x]['xG_home'] for x in range(len(team_results[i]))]
    
    
    team_results[i]['ew_xG_for'] = team_results[i]['xG_for'].ewm(com=9).mean()
    team_results[i]['ew_G_for'] = team_results[i]['G_for'].ewm(com=9).mean()
    team_results[i]['ew_G_against'] = team_results[i]['G_against'].ewm(com=9).mean()
    team_results[i]['ew_xG_against'] = team_results[i]['xG_against'].ewm(com=9).mean()

In [331]:
standings_3 = standings.copy()

standings_3['mean_f'] = 0
standings_3['mean_a'] = 0

for i in range(len(standings)):
    
    standings_3.iloc[i , -2] = team_results[i].iloc[29]['ew_G_for']
    standings_3.iloc[i , -1] = team_results[i].iloc[29]['ew_G_against']

In [332]:
standings_3['opponents'] = ['Leicester City', 'Brighton & Hove Albion', 'Manchester City', 'West Bromwich Albion', 'Wolverhampton', 'Newcastle United', 'Arsenal', 'Crystal Palace', 'Liverpool', 'Fulham', 'Sheffield United', 'Everton', 'West Ham United', 'Burnley', 'Southampton', 'Manchester United', 'Tottenham Hotspur', 'Aston Villa', 'Chelsea', 'Leeds United']

standings_3['opp_mean_f'] = 0
standings_3['opp_mean_a'] = 0

for i in range(len(standings_3)):
    standings_3.loc[standings_3.index[i], 'opp_mean_f'] = standings_3.loc[standings_3.loc[standings_3.index[i], 'opponents'], 'mean_f']
    standings_3.loc[standings_3.index[i], 'opp_mean_a'] = standings_3.loc[standings_3.loc[standings_3.index[i], 'opponents'], 'mean_a']

standings_3['match_attack_score'] = (standings_3['mean_f'] + standings_3['opp_mean_a'])/2
standings_3['match_defence_score'] = (standings_3['mean_a'] + standings_3['opp_mean_f'])/2

standings_3['match_goals_scored'] = np.random.poisson(lam=standings_3['match_attack_score'])
for i in range(4):
    standings_3['{}'.format(i)] = poisson.pmf(i, standings_3['match_attack_score'])
standings_3['most_likely_scored'] = standings_3[['0', '1', '2', '3']].idxmax(axis=1)

standings_3['match_goals_scored_avg'] = np.mean(np.random.poisson(standings_3['match_attack_score'], (5,20)), axis=0)

In [333]:
fixtures_3 = fixtures.copy()
fixtures_3['home_mean_goals'] = 0
fixtures_3['away_mean_goals'] = 0

fixtures_3['home_avg_sim_goals'] = 0
fixtures_3['away_avg_sim_goals'] = 0

fixtures_3['home_sim_goals'] = 0
fixtures_3['away_sim_goals'] = 0

fixtures_3['home_ml_goals'] = 0
fixtures_3['away_ml_goals'] = 0




for i in range(len(standings)):
    for j in range(len(fixtures)):
        if fixtures_3.iloc[j,0] == teams[i]:
            fixtures_3.iloc[j,-8] = standings_3.iloc[i]['match_attack_score']
            fixtures_3.iloc[j,-6] = standings_3.iloc[i]['match_goals_scored_avg']
            fixtures_3.iloc[j,-4] = standings_3.iloc[i]['match_goals_scored']
            fixtures_3.iloc[j,-2] = standings_3.iloc[i]['most_likely_scored']
        elif fixtures_3.iloc[j,1] == teams[i]:
            fixtures_3.iloc[j,-7] = standings_3.iloc[i]['match_attack_score']
            fixtures_3.iloc[j,-5] = standings_3.iloc[i]['match_goals_scored_avg']
            fixtures_3.iloc[j,-3] = standings_3.iloc[i]['match_goals_scored']
            fixtures_3.iloc[j,-1] = standings_3.iloc[i]['most_likely_scored']

In [334]:
fixtures_3

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_avg_sim_goals,away_avg_sim_goals,home_sim_goals,away_sim_goals,home_ml_goals,away_ml_goals
0,Chelsea,West Bromwich Albion,1.332583,0.539666,1.4,0.6,1,2,1,0
1,Leeds United,Sheffield United,1.610309,0.912566,1.2,0.8,1,1,1,0
2,Leicester City,Manchester City,1.325656,1.771905,1.0,1.6,1,2,1,1
3,Arsenal,Liverpool,1.404392,1.163939,1.2,0.8,1,1,1,1
4,Southampton,Burnley,1.12114,1.524978,0.8,0.6,1,4,1,1
5,Newcastle United,Tottenham Hotspur,0.911202,1.66767,0.6,1.0,0,4,0,1
6,Aston Villa,Fulham,0.973783,0.848551,1.6,0.4,0,1,0,0
7,Manchester United,Brighton & Hove Albion,1.383741,0.917891,1.2,1.2,3,0,1,0
8,Everton,Crystal Palace,1.321123,1.090029,1.6,1.0,2,2,1,1
9,Wolverhampton,West Ham United,1.03127,1.420082,0.8,1.6,1,1,1,1


We see a few changes here. The most notable is that Arsenal vs Liverpool has now tipped in favour of the gunners. This would be due to a combination of the Gunners' good form as well as the Reds' defensive woes in the past weeeks. Southampton vs Burnley has also now tipped the other way with Burnley the favourites for the game.

Another feature we can include in the model is whether the match is home or away. In normal circumstances this would play a huge factor in the outcome of matches as the home team gets the support of the thousands of fans in the stadium. Notable examples would be Liverpool's home record in the last few seasons and Chelsea's home record where they went unbeaten at home for 86 games in a row. This year however, due to the lack of fans in the stadium, the difference between home games and away games doesn't seem to be much. So far this season we have seen 118 home wins, 68 draws, 114 away wins. This shows no evidence of any home advantage and therefore I will not consider this in the model right now.


## Including xG

So far we have only been using goals as our form of data in order to predict games. The use of xG has exploded over the last few years due to its ability to capture the expected number of goals a team should have scored, removing some good or bad luck the team may have had in the game. One thing to note is that xG should not be used as the be all and end all because it does say something about a team or player who has many more goals than xG and vice versa. We therefore look at both, goals and xG. We therefore look at the average of these two.

In [373]:
matchday = 30

standings_4 = standings.copy()

standings_4['mean_f'] = 0
standings_4['x_mean_f'] = 0
standings_4['mean_a'] = 0
standings_4['x_mean_a'] = 0

for i in range(len(standings)):
    
    standings_4.iloc[i , -4] = team_results[i].iloc[matchday-1]['ew_G_for']
    standings_4.iloc[i , -3] = team_results[i].iloc[matchday-1]['ew_xG_for']
    standings_4.iloc[i , -2] = team_results[i].iloc[matchday-1]['ew_G_against']
    standings_4.iloc[i , -1] = team_results[i].iloc[matchday-1]['ew_xG_against']
    
standings_4['opponents'] = ['Leicester City', 'Brighton & Hove Albion', 'Manchester City', 'West Bromwich Albion', 'Wolverhampton', 'Newcastle United', 'Arsenal', 'Crystal Palace', 'Liverpool', 'Fulham', 'Sheffield United', 'Everton', 'West Ham United', 'Burnley', 'Southampton', 'Manchester United', 'Tottenham Hotspur', 'Aston Villa', 'Chelsea', 'Leeds United']

standings_4['opp_mean_f'] = 0
standings_4['opp_x_mean_f'] = 0
standings_4['opp_mean_a'] = 0
standings_4['opp_x_mean_a'] = 0

for i in range(len(standings_3)):
    standings_4.loc[standings_4.index[i], 'opp_mean_f'] = standings_4.loc[standings_4.loc[standings_4.index[i], 'opponents'], 'mean_f']
    standings_4.loc[standings_4.index[i], 'opp_x_mean_f'] = standings_4.loc[standings_4.loc[standings_4.index[i], 'opponents'], 'x_mean_f']
    standings_4.loc[standings_4.index[i], 'opp_mean_a'] = standings_4.loc[standings_4.loc[standings_4.index[i], 'opponents'], 'mean_a']
    standings_4.loc[standings_4.index[i], 'opp_x_mean_a'] = standings_4.loc[standings_4.loc[standings_4.index[i], 'opponents'], 'x_mean_a']

standings_4['match_attack_score'] = (standings_4['mean_f'] + standings_4['x_mean_f'] + standings_4['opp_mean_a'] + standings_4['opp_x_mean_a'])/4
standings_4['match_defence_score'] = (standings_4['mean_a'] + standings_4['x_mean_a'] + standings_4['opp_mean_f'] + standings_4['opp_x_mean_f'])/4

standings_4['match_goals_scored'] = np.random.poisson(lam=standings_4['match_attack_score'])
for i in range(4):
    standings_4['{}'.format(i)] = poisson.pmf(i, standings_4['match_attack_score'])
standings_4['most_likely_scored'] = standings_4[['0', '1', '2', '3']].idxmax(axis=1)

standings_4['match_goals_scored_avg'] = np.mean(np.random.poisson(standings_4['match_attack_score'], (5,20)), axis=0)

In [374]:
fixtures_4 = fixtures.copy()
fixtures_4['home_mean_goals'] = 0
fixtures_4['away_mean_goals'] = 0

fixtures_4['home_avg_sim_goals'] = 0
fixtures_4['away_avg_sim_goals'] = 0

fixtures_4['home_sim_goals'] = 0
fixtures_4['away_sim_goals'] = 0

fixtures_4['home_ml_goals'] = 0
fixtures_4['away_ml_goals'] = 0




for i in range(len(standings)):
    for j in range(len(fixtures)):
        if fixtures_4.iloc[j,0] == teams[i]:
            fixtures_4.iloc[j,-8] = standings_4.iloc[i]['match_attack_score']
            fixtures_4.iloc[j,-6] = standings_4.iloc[i]['match_goals_scored_avg']
            fixtures_4.iloc[j,-4] = standings_4.iloc[i]['match_goals_scored']
            fixtures_4.iloc[j,-2] = standings_4.iloc[i]['most_likely_scored']
        elif fixtures_4.iloc[j,1] == teams[i]:
            fixtures_4.iloc[j,-7] = standings_4.iloc[i]['match_attack_score']
            fixtures_4.iloc[j,-5] = standings_4.iloc[i]['match_goals_scored_avg']
            fixtures_4.iloc[j,-3] = standings_4.iloc[i]['match_goals_scored']
            fixtures_4.iloc[j,-1] = standings_4.iloc[i]['most_likely_scored']

In [375]:
fixtures_4

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_avg_sim_goals,away_avg_sim_goals,home_sim_goals,away_sim_goals,home_ml_goals,away_ml_goals
0,Chelsea,West Bromwich Albion,1.409424,0.628405,1.2,0.2,3,0,1,0
1,Leeds United,Sheffield United,1.582416,1.002662,1.0,1.0,2,0,1,1
2,Leicester City,Manchester City,1.268313,1.69625,0.6,1.0,0,2,1,1
3,Arsenal,Liverpool,1.395475,1.266028,2.0,1.2,0,0,1,1
4,Southampton,Burnley,1.200252,1.372793,2.6,1.0,0,0,1,1
5,Newcastle United,Tottenham Hotspur,0.945984,1.507728,0.6,0.6,0,4,0,1
6,Aston Villa,Fulham,1.064555,1.035897,1.2,1.2,1,1,1,1
7,Manchester United,Brighton & Hove Albion,1.26852,1.073446,0.4,1.4,5,1,1,1
8,Everton,Crystal Palace,1.321161,1.096714,0.8,1.8,5,2,1,1
9,Wolverhampton,West Ham United,1.067769,1.365233,1.8,1.6,1,3,1,1


We now discuss some of the drawbacks of assuming a poisson distribution. Firstly, a poisson distribution assumes that goals are independent to each other which can definitely be argued. When there is a goal in a match, the likelihood of more goals increases as the dynamic of the match changes and more chances get created as a result of a more open game in general. Another drawback of the poisson distribution is that it takes integer values from 0 to infinity. It assigns some non-zero probability to a team scoring 20 goals for instance but we know that such an event is impossible and should have probability 0. It is a little unclear where the stopping point should be though as we would not have expected leicester or manu to score 9 goals against southampton but they did. Lastly, this is only one realization of the poisson random variable and is therefore not at all robust. We can take more realizations and average over this, but we still do want to capture some element of randomness into the system as unlikely events do occur and averaging will just move us towards the mean.

We test this model on the last 5 matchdays as well

In [384]:
matchday = 29

standings_5 = standings.copy()

standings_5['mean_f'] = 0
standings_5['x_mean_f'] = 0
standings_5['mean_a'] = 0
standings_5['x_mean_a'] = 0

for i in range(len(standings)):
    
    standings_5.iloc[i , -4] = team_results[i].iloc[matchday-1]['ew_G_for']
    standings_5.iloc[i , -3] = team_results[i].iloc[matchday-1]['ew_xG_for']
    standings_5.iloc[i , -2] = team_results[i].iloc[matchday-1]['ew_G_against']
    standings_5.iloc[i , -1] = team_results[i].iloc[matchday-1]['ew_xG_against']
    
standings_5['opponents'] = ['Wolverhampton', 'Crystal Palace', 'Burnley', 'Liverpool', 'Arsenal', 'Southampton', 'Chelsea', 'West Bromwich Albion', 'West Ham United', 'Sheffield United', 'Fulham', 'Manchester United', 'Manchester City', 'Tottenham Hotspur', 'Leicester City', 'Newcastle United', 'Brighton & Hove Albion', 'Leeds United', 'Everton', 'Aston Villa']

standings_5['opp_mean_f'] = 0
standings_5['opp_x_mean_f'] = 0
standings_5['opp_mean_a'] = 0
standings_5['opp_x_mean_a'] = 0

for i in range(len(standings_3)):
    standings_5.loc[standings_5.index[i], 'opp_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_f']
    standings_5.loc[standings_5.index[i], 'opp_x_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_f']
    standings_5.loc[standings_5.index[i], 'opp_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_a']
    standings_5.loc[standings_5.index[i], 'opp_x_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_a']

standings_5['match_attack_score'] = (standings_5['mean_f'] + standings_5['x_mean_f'] + standings_5['opp_mean_a'] + standings_5['opp_x_mean_a'])/4
standings_5['match_defence_score'] = (standings_5['mean_a'] + standings_5['x_mean_a'] + standings_5['opp_mean_f'] + standings_5['opp_x_mean_f'])/4

standings_5['match_goals_scored'] = np.random.poisson(lam=standings_5['match_attack_score'])
for i in range(4):
    standings_5['{}'.format(i)] = poisson.pmf(i, standings_5['match_attack_score'])
standings_5['most_likely_scored'] = standings_5[['0', '1', '2', '3']].idxmax(axis=1)

standings_5['match_goals_scored_avg'] = np.mean(np.random.poisson(standings_5['match_attack_score'], (5,20)), axis=0)

fixtures_matchday_29 = pd.DataFrame([['Crystal Palace', 'Manchester United'], ['West Bromwich Albion', 'Everton'], ['Liverpool', 'Chelsea'], ['Fulham', 'Leeds United'], ['Brighton & Hove Albion', 'Newcastle United'], ['Manchester City', 'Wolverhampton'], ['Sheffield United', 'Aston Villa'], ['Burnley', 'Leicester City'], ['Arsenal', 'West Ham United'], ['Southampton', 'Tottenham Hotspur']])
fixtures_matchday_29.columns = ['Home', 'Away']
fixtures_5 = fixtures_matchday_29.copy()
fixtures_5['home_mean_goals'] = 0
fixtures_5['away_mean_goals'] = 0

fixtures_5['home_avg_sim_goals'] = 0
fixtures_5['away_avg_sim_goals'] = 0

fixtures_5['home_sim_goals'] = 0
fixtures_5['away_sim_goals'] = 0

fixtures_5['home_ml_goals'] = 0
fixtures_5['away_ml_goals'] = 0




for i in range(len(standings)):
    for j in range(len(fixtures)):
        if fixtures_5.iloc[j,0] == teams[i]:
            fixtures_5.iloc[j,-8] = standings_5.iloc[i]['match_attack_score']
            fixtures_5.iloc[j,-6] = standings_5.iloc[i]['match_goals_scored_avg']
            fixtures_5.iloc[j,-4] = standings_5.iloc[i]['match_goals_scored']
            fixtures_5.iloc[j,-2] = standings_5.iloc[i]['most_likely_scored']
        elif fixtures_5.iloc[j,1] == teams[i]:
            fixtures_5.iloc[j,-7] = standings_5.iloc[i]['match_attack_score']
            fixtures_5.iloc[j,-5] = standings_5.iloc[i]['match_goals_scored_avg']
            fixtures_5.iloc[j,-3] = standings_5.iloc[i]['match_goals_scored']
            fixtures_5.iloc[j,-1] = standings_5.iloc[i]['most_likely_scored']

In [385]:
fixtures_5

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_avg_sim_goals,away_avg_sim_goals,home_sim_goals,away_sim_goals,home_ml_goals,away_ml_goals
0,Crystal Palace,Manchester United,0.816823,1.576085,0.6,1.2,2,1,0,1
1,West Bromwich Albion,Everton,1.048067,1.288804,1.4,2.0,1,2,1,1
2,Liverpool,Chelsea,0.94704,1.25931,0.6,2.2,2,4,0,1
3,Fulham,Leeds United,1.111489,1.204576,0.6,1.0,1,1,1,1
4,Brighton & Hove Albion,Newcastle United,1.384941,0.86557,2.0,1.2,3,0,1,0
5,Manchester City,Wolverhampton,1.728612,0.892844,2.0,0.4,2,0,1,0
6,Sheffield United,Aston Villa,0.907514,1.408015,0.6,1.2,0,1,0,1
7,Burnley,Leicester City,1.037881,1.563276,1.0,1.0,2,1,1,1
8,Arsenal,West Ham United,1.410923,1.375819,1.6,1.6,2,1,1,1
9,Southampton,Tottenham Hotspur,1.05612,1.647042,1.0,1.0,1,3,1,1


match - predicted - result
0 - away - draw
1 - draw/away - away
2 - away - away
3 - away/draw - away
4 - home - home
5 - home - home
6 - away - home
7 - away - draw
8 - home/draw - draw
9 -

accuracy - 6/9

In [395]:
matchday = 28

standings_5 = standings.copy()

standings_5['mean_f'] = 0
standings_5['x_mean_f'] = 0
standings_5['mean_a'] = 0
standings_5['x_mean_a'] = 0

for i in range(len(standings)):
    
    standings_5.iloc[i , -4] = team_results[i].iloc[matchday-1]['ew_G_for']
    standings_5.iloc[i , -3] = team_results[i].iloc[matchday-1]['ew_xG_for']
    standings_5.iloc[i , -2] = team_results[i].iloc[matchday-1]['ew_G_against']
    standings_5.iloc[i , -1] = team_results[i].iloc[matchday-1]['ew_xG_against']

fixtures_matchday_28 = pd.DataFrame([['Crystal Palace', 'West Bromwich Albion'], ['Everton', 'Burnley'], ['Leeds United', 'Chelsea'], ['Fulham', 'Manchester City'], ['Manchester United', 'West Ham United'], ['Wolverhampton', 'Liverpool'], ['Newcastle United', 'Aston Villa'], ['Leicester City', 'Sheffield United'], ['Arsenal', 'Tottenham Hotspur'], ['Southampton', 'Brighton & Hove Albion']])
fixtures_matchday_28.columns = ['Home', 'Away']

standings_5['opponents'] = 0
for team in range(len(teams)):
    for i in range(len(fixtures_matchday_28)):
        if fixtures_matchday_28.iloc[i,0] == teams[team]:
            standings_5.iloc[team, -1] = fixtures_matchday_28.iloc[i][1]
        elif fixtures_matchday_28.iloc[i,1] == teams[team]:
            standings_5.iloc[team, -1] = fixtures_matchday_28.iloc[i][0]


standings_5['opp_mean_f'] = 0
standings_5['opp_x_mean_f'] = 0
standings_5['opp_mean_a'] = 0
standings_5['opp_x_mean_a'] = 0

for i in range(len(standings_3)):
    standings_5.loc[standings_5.index[i], 'opp_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_f']
    standings_5.loc[standings_5.index[i], 'opp_x_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_f']
    standings_5.loc[standings_5.index[i], 'opp_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_a']
    standings_5.loc[standings_5.index[i], 'opp_x_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_a']

standings_5['match_attack_score'] = (standings_5['mean_f'] + standings_5['x_mean_f'] + standings_5['opp_mean_a'] + standings_5['opp_x_mean_a'])/4
standings_5['match_defence_score'] = (standings_5['mean_a'] + standings_5['x_mean_a'] + standings_5['opp_mean_f'] + standings_5['opp_x_mean_f'])/4

standings_5['match_goals_scored'] = np.random.poisson(lam=standings_5['match_attack_score'])
for i in range(4):
    standings_5['{}'.format(i)] = poisson.pmf(i, standings_5['match_attack_score'])
standings_5['most_likely_scored'] = standings_5[['0', '1', '2', '3']].idxmax(axis=1)

standings_5['match_goals_scored_avg'] = np.mean(np.random.poisson(standings_5['match_attack_score'], (5,20)), axis=0)


fixtures_5 = fixtures_matchday_28.copy()
fixtures_5['home_mean_goals'] = 0
fixtures_5['away_mean_goals'] = 0

fixtures_5['home_avg_sim_goals'] = 0
fixtures_5['away_avg_sim_goals'] = 0

fixtures_5['home_sim_goals'] = 0
fixtures_5['away_sim_goals'] = 0

fixtures_5['home_ml_goals'] = 0
fixtures_5['away_ml_goals'] = 0




for i in range(len(standings)):
    for j in range(len(fixtures)):
        if fixtures_5.iloc[j,0] == teams[i]:
            fixtures_5.iloc[j,-8] = standings_5.iloc[i]['match_attack_score']
            fixtures_5.iloc[j,-6] = standings_5.iloc[i]['match_goals_scored_avg']
            fixtures_5.iloc[j,-4] = standings_5.iloc[i]['match_goals_scored']
            fixtures_5.iloc[j,-2] = standings_5.iloc[i]['most_likely_scored']
        elif fixtures_5.iloc[j,1] == teams[i]:
            fixtures_5.iloc[j,-7] = standings_5.iloc[i]['match_attack_score']
            fixtures_5.iloc[j,-5] = standings_5.iloc[i]['match_goals_scored_avg']
            fixtures_5.iloc[j,-3] = standings_5.iloc[i]['match_goals_scored']
            fixtures_5.iloc[j,-1] = standings_5.iloc[i]['most_likely_scored']
        


In [397]:
fixtures_5

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_avg_sim_goals,away_avg_sim_goals,home_sim_goals,away_sim_goals,home_ml_goals,away_ml_goals
0,Crystal Palace,West Bromwich Albion,1.11247,1.182207,0.8,1.0,0,1,1,1
1,Everton,Burnley,1.230482,1.142843,1.2,1.2,0,1,1,1
2,Leeds United,Chelsea,0.946049,1.428779,1.2,1.4,0,2,0,1
3,Fulham,Manchester City,0.81025,1.491116,1.4,1.0,1,2,0,1
4,Manchester United,West Ham United,1.372018,1.16396,2.0,1.8,1,3,1,1
5,Wolverhampton,Liverpool,1.128805,1.27953,1.4,1.0,3,0,1,1
6,Newcastle United,Aston Villa,1.006067,1.233222,1.2,1.6,2,5,1,1
7,Leicester City,Sheffield United,1.492268,0.943645,0.6,1.4,0,0,1,0
8,Arsenal,Tottenham Hotspur,1.356752,1.258254,2.6,1.6,1,1,1,1
9,Southampton,Brighton & Hove Albion,1.02855,1.494857,1.2,1.4,2,5,1,1


match - predicted - result
0 - draw - home
1 - draw - away
2 - away - draw
3 - away - away
4 - home - home
5 - away/draw - away
6 - away - draw
7 - home - home
8 - home/draw - home
9 - away - away

accuracy - 6/10

In [406]:
matchday = 27

standings_5 = standings.copy()

standings_5['mean_f'] = 0
standings_5['x_mean_f'] = 0
standings_5['mean_a'] = 0
standings_5['x_mean_a'] = 0

for i in range(len(standings)):
    
    standings_5.iloc[i , -4] = team_results[i].iloc[matchday-1]['ew_G_for']
    standings_5.iloc[i , -3] = team_results[i].iloc[matchday-1]['ew_xG_for']
    standings_5.iloc[i , -2] = team_results[i].iloc[matchday-1]['ew_G_against']
    standings_5.iloc[i , -1] = team_results[i].iloc[matchday-1]['ew_xG_against']

fixtures_matchday_27 = pd.DataFrame([['Burnley', 'Arsenal'], ['Sheffield United', 'Southampton'], ['Aston Villa', 'Wolverhampton'], ['Brighton & Hove Albion', 'Leicester City'], ['West Bromwich Albion', 'Newcastle United'], ['Liverpool', 'Fulham'], ['Manchester City', 'Manchester United'], ['Tottenham Hotspur', 'Crystal Palace'], ['Chelsea', 'Everton'], ['West Ham United', 'Leeds United']])
fixtures_matchday_27.columns = ['Home', 'Away']

standings_5['opponents'] = 0
for team in range(len(teams)):
    for i in range(len(fixtures_matchday_28)):
        if fixtures_matchday_27.iloc[i,0] == teams[team]:
            standings_5.iloc[team, -1] = fixtures_matchday_27.iloc[i][1]
        elif fixtures_matchday_27.iloc[i,1] == teams[team]:
            standings_5.iloc[team, -1] = fixtures_matchday_27.iloc[i][0]


standings_5['opp_mean_f'] = 0
standings_5['opp_x_mean_f'] = 0
standings_5['opp_mean_a'] = 0
standings_5['opp_x_mean_a'] = 0

for i in range(len(standings_3)):
    standings_5.loc[standings_5.index[i], 'opp_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_f']
    standings_5.loc[standings_5.index[i], 'opp_x_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_f']
    standings_5.loc[standings_5.index[i], 'opp_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_a']
    standings_5.loc[standings_5.index[i], 'opp_x_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_a']

standings_5['match_attack_score'] = (standings_5['mean_f'] + standings_5['x_mean_f'] + standings_5['opp_mean_a'] + standings_5['opp_x_mean_a'])/4
standings_5['match_defence_score'] = (standings_5['mean_a'] + standings_5['x_mean_a'] + standings_5['opp_mean_f'] + standings_5['opp_x_mean_f'])/4

standings_5['match_goals_scored'] = np.random.poisson(lam=standings_5['match_attack_score'])
for i in range(4):
    standings_5['{}'.format(i)] = poisson.pmf(i, standings_5['match_attack_score'])
standings_5['most_likely_scored'] = standings_5[['0', '1', '2', '3']].idxmax(axis=1)

standings_5['match_goals_scored_avg'] = np.mean(np.random.poisson(standings_5['match_attack_score'], (5,20)), axis=0)


fixtures_5 = fixtures_matchday_27.copy()
fixtures_5['home_mean_goals'] = 0
fixtures_5['away_mean_goals'] = 0

fixtures_5['home_avg_sim_goals'] = 0
fixtures_5['away_avg_sim_goals'] = 0

fixtures_5['home_sim_goals'] = 0
fixtures_5['away_sim_goals'] = 0

fixtures_5['home_ml_goals'] = 0
fixtures_5['away_ml_goals'] = 0




for i in range(len(standings)):
    for j in range(len(fixtures)):
        if fixtures_5.iloc[j,0] == teams[i]:
            fixtures_5.iloc[j,-8] = standings_5.iloc[i]['match_attack_score']
            fixtures_5.iloc[j,-6] = standings_5.iloc[i]['match_goals_scored_avg']
            fixtures_5.iloc[j,-4] = standings_5.iloc[i]['match_goals_scored']
            fixtures_5.iloc[j,-2] = standings_5.iloc[i]['most_likely_scored']
        elif fixtures_5.iloc[j,1] == teams[i]:
            fixtures_5.iloc[j,-7] = standings_5.iloc[i]['match_attack_score']
            fixtures_5.iloc[j,-5] = standings_5.iloc[i]['match_goals_scored_avg']
            fixtures_5.iloc[j,-3] = standings_5.iloc[i]['match_goals_scored']
            fixtures_5.iloc[j,-1] = standings_5.iloc[i]['most_likely_scored']
        


In [407]:
fixtures_5

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_avg_sim_goals,away_avg_sim_goals,home_sim_goals,away_sim_goals,home_ml_goals,away_ml_goals
0,Burnley,Arsenal,0.969437,1.421571,0.6,1.2,0,0,0,1
1,Sheffield United,Southampton,1.133796,1.226019,0.6,0.8,2,1,1,1
2,Aston Villa,Wolverhampton,1.201662,1.066857,1.0,1.6,2,1,1,1
3,Brighton & Hove Albion,Leicester City,1.200125,1.223523,1.0,0.4,0,0,1,1
4,West Bromwich Albion,Newcastle United,1.105085,1.272509,0.6,1.6,0,1,1,1
5,Liverpool,Fulham,1.200537,1.079471,0.2,2.0,1,1,1,1
6,Manchester City,Manchester United,1.588027,1.132531,1.4,1.4,2,1,1,1
7,Tottenham Hotspur,Crystal Palace,1.500129,0.911925,1.6,2.4,1,1,1,0
8,Chelsea,Everton,1.286123,0.946935,1.0,0.4,0,0,1,0
9,West Ham United,Leeds United,1.518918,1.220639,1.8,1.0,1,1,1,1


match - predicted - result
0 - away - draw
1 - away/draw - away
2 - home/draw - draw
3 - draw - away
4 - away/draw - draw
5 - home/draw - away
6 - home - away
7 - home - home
8 - home - home
9 - home - home

accuracy - 6/10

In [408]:
matchday = 26

standings_5 = standings.copy()

standings_5['mean_f'] = 0
standings_5['x_mean_f'] = 0
standings_5['mean_a'] = 0
standings_5['x_mean_a'] = 0

for i in range(len(standings)):
    
    standings_5.iloc[i , -4] = team_results[i].iloc[matchday-1]['ew_G_for']
    standings_5.iloc[i , -3] = team_results[i].iloc[matchday-1]['ew_xG_for']
    standings_5.iloc[i , -2] = team_results[i].iloc[matchday-1]['ew_G_against']
    standings_5.iloc[i , -1] = team_results[i].iloc[matchday-1]['ew_xG_against']

fixtures_matchday_26 = pd.DataFrame([['Manchester City', 'West Ham United'], ['West Bromwich Albion', 'Brighton & Hove Albion'], ['Leeds United', 'Aston Villa'], ['Newcastle United', 'Wolverhampton'], ['Crystal Palace', 'Fulham'], ['Leicester City', 'Arsenal'], ['Tottenham Hotspur', 'Burnley'], ['Chelsea', 'Manchester United'], ['Sheffield United', 'Liverpool'], ['Everton', 'Southampton']])
fixtures_matchday_26.columns = ['Home', 'Away']

standings_5['opponents'] = 0
for team in range(len(teams)):
    for i in range(len(fixtures_matchday_28)):
        if fixtures_matchday_26.iloc[i,0] == teams[team]:
            standings_5.iloc[team, -1] = fixtures_matchday_26.iloc[i][1]
        elif fixtures_matchday_26.iloc[i,1] == teams[team]:
            standings_5.iloc[team, -1] = fixtures_matchday_26.iloc[i][0]


standings_5['opp_mean_f'] = 0
standings_5['opp_x_mean_f'] = 0
standings_5['opp_mean_a'] = 0
standings_5['opp_x_mean_a'] = 0

for i in range(len(standings_3)):
    standings_5.loc[standings_5.index[i], 'opp_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_f']
    standings_5.loc[standings_5.index[i], 'opp_x_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_f']
    standings_5.loc[standings_5.index[i], 'opp_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_a']
    standings_5.loc[standings_5.index[i], 'opp_x_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_a']

standings_5['match_attack_score'] = (standings_5['mean_f'] + standings_5['x_mean_f'] + standings_5['opp_mean_a'] + standings_5['opp_x_mean_a'])/4
standings_5['match_defence_score'] = (standings_5['mean_a'] + standings_5['x_mean_a'] + standings_5['opp_mean_f'] + standings_5['opp_x_mean_f'])/4

standings_5['match_goals_scored'] = np.random.poisson(lam=standings_5['match_attack_score'])
for i in range(4):
    standings_5['{}'.format(i)] = poisson.pmf(i, standings_5['match_attack_score'])
standings_5['most_likely_scored'] = standings_5[['0', '1', '2', '3']].idxmax(axis=1)

standings_5['match_goals_scored_avg'] = np.mean(np.random.poisson(standings_5['match_attack_score'], (5,20)), axis=0)


fixtures_5 = fixtures_matchday_26.copy()
fixtures_5['home_mean_goals'] = 0
fixtures_5['away_mean_goals'] = 0

fixtures_5['home_avg_sim_goals'] = 0
fixtures_5['away_avg_sim_goals'] = 0

fixtures_5['home_sim_goals'] = 0
fixtures_5['away_sim_goals'] = 0

fixtures_5['home_ml_goals'] = 0
fixtures_5['away_ml_goals'] = 0




for i in range(len(standings)):
    for j in range(len(fixtures)):
        if fixtures_5.iloc[j,0] == teams[i]:
            fixtures_5.iloc[j,-8] = standings_5.iloc[i]['match_attack_score']
            fixtures_5.iloc[j,-6] = standings_5.iloc[i]['match_goals_scored_avg']
            fixtures_5.iloc[j,-4] = standings_5.iloc[i]['match_goals_scored']
            fixtures_5.iloc[j,-2] = standings_5.iloc[i]['most_likely_scored']
        elif fixtures_5.iloc[j,1] == teams[i]:
            fixtures_5.iloc[j,-7] = standings_5.iloc[i]['match_attack_score']
            fixtures_5.iloc[j,-5] = standings_5.iloc[i]['match_goals_scored_avg']
            fixtures_5.iloc[j,-3] = standings_5.iloc[i]['match_goals_scored']
            fixtures_5.iloc[j,-1] = standings_5.iloc[i]['most_likely_scored']
        


In [409]:
fixtures_5

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_avg_sim_goals,away_avg_sim_goals,home_sim_goals,away_sim_goals,home_ml_goals,away_ml_goals
0,Manchester City,West Ham United,1.596139,1.02857,1.2,1.0,2,2,1,1
1,West Bromwich Albion,Brighton & Hove Albion,0.853792,1.449754,0.4,1.4,1,5,0,1
2,Leeds United,Aston Villa,1.297044,1.315002,1.6,1.4,2,0,1,1
3,Newcastle United,Wolverhampton,1.042432,1.33018,1.0,2.2,1,2,1,1
4,Crystal Palace,Fulham,0.853597,1.260421,1.4,1.6,0,1,0,1
5,Leicester City,Arsenal,1.265246,1.36685,1.8,1.6,1,0,1,1
6,Tottenham Hotspur,Burnley,1.341236,0.974431,2.0,1.2,1,2,1,0
7,Chelsea,Manchester United,1.18326,1.283626,0.6,2.0,0,1,1,1
8,Sheffield United,Liverpool,0.990362,1.59328,0.6,1.6,0,1,0,1
9,Everton,Southampton,1.466197,1.097248,1.2,0.4,1,1,1,1


match - predicted - result
0 - home - home
1 - away - home
2 - draw - away
3 - away - draw
4 - away - draw
5 - away/draw - away
6 - home - home
7 - away/draw - draw
8 - away - away
9 - home - home

accuracy - 6/10

In [413]:
matchday = 25

standings_5 = standings.copy()

standings_5['mean_f'] = 0
standings_5['x_mean_f'] = 0
standings_5['mean_a'] = 0
standings_5['x_mean_a'] = 0

for i in range(len(standings)):
    
    standings_5.iloc[i , -4] = team_results[i].iloc[matchday-1]['ew_G_for']
    standings_5.iloc[i , -3] = team_results[i].iloc[matchday-1]['ew_xG_for']
    standings_5.iloc[i , -2] = team_results[i].iloc[matchday-1]['ew_G_against']
    standings_5.iloc[i , -1] = team_results[i].iloc[matchday-1]['ew_xG_against']

fixtures_matchday_25 = pd.DataFrame([['Wolverhampton', 'Leeds United'], ['Southampton', 'Chelsea'], ['Burnley', 'West Bromwich Albion'], ['Liverpool', 'Everton'], ['Fulham', 'Sheffield United'], ['West Ham United', 'Tottenham Hotspur'], ['Aston Villa', 'Leicester City'], ['Arsenal', 'Manchester City'], ['Manchester United', 'Newcastle United'], ['Brighton & Hove Albion', 'Crystal Palace']])
fixtures_matchday_25.columns = ['Home', 'Away']

standings_5['opponents'] = 0
for team in range(len(teams)):
    for i in range(len(fixtures_matchday_25)):
        if fixtures_matchday_25.iloc[i,0] == teams[team]:
            standings_5.iloc[team, -1] = fixtures_matchday_25.iloc[i][1]
        elif fixtures_matchday_25.iloc[i,1] == teams[team]:
            standings_5.iloc[team, -1] = fixtures_matchday_25.iloc[i][0]


standings_5['opp_mean_f'] = 0
standings_5['opp_x_mean_f'] = 0
standings_5['opp_mean_a'] = 0
standings_5['opp_x_mean_a'] = 0

for i in range(len(standings_3)):
    standings_5.loc[standings_5.index[i], 'opp_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_f']
    standings_5.loc[standings_5.index[i], 'opp_x_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_f']
    standings_5.loc[standings_5.index[i], 'opp_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_a']
    standings_5.loc[standings_5.index[i], 'opp_x_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_a']

standings_5['match_attack_score'] = (standings_5['mean_f'] + standings_5['x_mean_f'] + standings_5['opp_mean_a'] + standings_5['opp_x_mean_a'])/4
standings_5['match_defence_score'] = (standings_5['mean_a'] + standings_5['x_mean_a'] + standings_5['opp_mean_f'] + standings_5['opp_x_mean_f'])/4

standings_5['match_goals_scored'] = np.random.poisson(lam=standings_5['match_attack_score'])
for i in range(4):
    standings_5['{}'.format(i)] = poisson.pmf(i, standings_5['match_attack_score'])
standings_5['most_likely_scored'] = standings_5[['0', '1', '2', '3']].idxmax(axis=1)

standings_5['match_goals_scored_avg'] = np.mean(np.random.poisson(standings_5['match_attack_score'], (5,20)), axis=0)


fixtures_5 = fixtures_matchday_25.copy()
fixtures_5['home_mean_goals'] = 0
fixtures_5['away_mean_goals'] = 0

fixtures_5['home_avg_sim_goals'] = 0
fixtures_5['away_avg_sim_goals'] = 0

fixtures_5['home_sim_goals'] = 0
fixtures_5['away_sim_goals'] = 0

fixtures_5['home_ml_goals'] = 0
fixtures_5['away_ml_goals'] = 0




for i in range(len(standings)):
    for j in range(len(fixtures)):
        if fixtures_5.iloc[j,0] == teams[i]:
            fixtures_5.iloc[j,-8] = standings_5.iloc[i]['match_attack_score']
            fixtures_5.iloc[j,-6] = standings_5.iloc[i]['match_goals_scored_avg']
            fixtures_5.iloc[j,-4] = standings_5.iloc[i]['match_goals_scored']
            fixtures_5.iloc[j,-2] = standings_5.iloc[i]['most_likely_scored']
        elif fixtures_5.iloc[j,1] == teams[i]:
            fixtures_5.iloc[j,-7] = standings_5.iloc[i]['match_attack_score']
            fixtures_5.iloc[j,-5] = standings_5.iloc[i]['match_goals_scored_avg']
            fixtures_5.iloc[j,-3] = standings_5.iloc[i]['match_goals_scored']
            fixtures_5.iloc[j,-1] = standings_5.iloc[i]['most_likely_scored']
        


In [414]:
fixtures_5

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_avg_sim_goals,away_avg_sim_goals,home_sim_goals,away_sim_goals,home_ml_goals,away_ml_goals
0,Wolverhampton,Leeds United,1.294276,1.338985,0.8,0.6,3,1,1,1
1,Southampton,Chelsea,0.845269,1.599399,0.6,1.6,1,1,0,1
2,Burnley,West Bromwich Albion,1.324525,0.952853,1.8,1.2,2,2,1,0
3,Liverpool,Everton,1.481839,1.3241,1.6,0.8,1,0,1,1
4,Fulham,Sheffield United,1.248234,0.87776,1.0,1.0,1,2,1,0
5,West Ham United,Tottenham Hotspur,1.36467,1.1951,1.4,1.2,1,1,1,1
6,Aston Villa,Leicester City,1.149617,1.355264,1.4,1.4,2,0,1,1
7,Arsenal,Manchester City,0.925618,1.657815,0.2,0.8,0,1,0,1
8,Manchester United,Newcastle United,1.837569,1.006833,1.2,1.6,3,2,1,1
9,Brighton & Hove Albion,Crystal Palace,1.434911,0.91822,0.8,0.8,3,1,1,0


match - predicted - result
0 - draw - home
1 - away - draw
2 - home - draw
3 - home/draw - away
4 - home - home
5 - home/draw - home
6 - away - away
7 - away - away
8 - home - home
9 - home - away

accuracy - 5/10

overall accuracy - 29/49 = 60%

In [415]:
from scipy.stats import skellam

# Refining the model and finding probabilities

These predictions are interesting but what we really want is probabilities. We would like to find the probability of a home win, a draw, and an away win so that we can compare with betting websites. We then look at these as theoretical values and would bet accordingly. For example, if the probability implied by the betting odds are much lower than that given with the model, we would want to be on that result. We also find probabilities for the total number of goals in a match, and the win margin.

We look to the Skellam distribution for the difference between two Poisson random variables.

Idea: Skellam distribution (difference between two Poisson random variables)

We now define functions in order to make this easier. We should have done this long ago, it would have saved a lot of time and effort. But oh well, at least we have it now.

We see that we have many hyperparamters here that should be tuned: correlation (between home and away goals), xG_importance (amount of weight we give to xG), com (decay of weights). We would ideally perform a grid search to optimize over these parameters but I don't currently have a nice framework to quantify the performance of the model.



In [441]:
df = everygame[['G_home', 'G_away', 'xG_home', 'xG_away']]

In [443]:
df.corr()

Unnamed: 0,G_home,G_away,xG_home,xG_away
G_home,1.0,-0.054796,0.610008,-0.167906
G_away,-0.054796,1.0,-0.20493,0.620044
xG_home,0.610008,-0.20493,1.0,-0.283827
xG_away,-0.167906,0.620044,-0.283827,1.0


In [485]:
def get_data(com=9):    
    everygame = pd.read_csv('everygame.csv')
    team_results = {}
    for i in range(len(teams)):
        team_results[i] = everygame[(everygame['Home']==teams[i]) | (everygame['Away']==teams[i])][['Home','xG_home', 'G_home', 'G_away', 'xG_away', 'Away']]

        team_results[i] = team_results[i].reset_index(drop=True)

        team_results[i]['xG_for'] = [team_results[i].iloc[x]['xG_home'] if team_results[i].iloc[x]['Home'] == teams[i] else team_results[i].iloc[x]['xG_away'] for x in range(len(team_results[i]))]
        team_results[i]['G_for'] = [team_results[i].iloc[x]['G_home'] if team_results[i].iloc[x]['Home'] == teams[i] else team_results[i].iloc[x]['G_away'] for x in range(len(team_results[i]))]
        team_results[i]['G_against'] = [team_results[i].iloc[x]['G_away'] if team_results[i].iloc[x]['Home'] == teams[i] else team_results[i].iloc[x]['G_home'] for x in range(len(team_results[i]))]
        team_results[i]['xG_against'] = [team_results[i].iloc[x]['xG_away'] if team_results[i].iloc[x]['Home'] == teams[i] else team_results[i].iloc[x]['xG_home'] for x in range(len(team_results[i]))]


        team_results[i]['ew_xG_for'] = team_results[i]['xG_for'].ewm(com=com).mean()
        team_results[i]['ew_G_for'] = team_results[i]['G_for'].ewm(com=com).mean()
        team_results[i]['ew_G_against'] = team_results[i]['G_against'].ewm(com=com).mean()
        team_results[i]['ew_xG_against'] = team_results[i]['xG_against'].ewm(com=com).mean()
        
    
    return team_results


def predict_matchday_results(standings, team_results, fixtures, matchday, correlation=0, xG_importance=1):

    standings_5 = standings.copy()

    standings_5['mean_f'] = 0
    standings_5['x_mean_f'] = 0
    standings_5['mean_a'] = 0
    standings_5['x_mean_a'] = 0

    for i in range(len(standings)):

        standings_5.iloc[i , -4] = team_results[i].iloc[matchday-1]['ew_G_for']
        standings_5.iloc[i , -3] = team_results[i].iloc[matchday-1]['ew_xG_for']
        standings_5.iloc[i , -2] = team_results[i].iloc[matchday-1]['ew_G_against']
        standings_5.iloc[i , -1] = team_results[i].iloc[matchday-1]['ew_xG_against']


    standings_5['opponents'] = 0
    for team in range(len(teams)):
        for i in range(len(fixtures)):
            if fixtures.iloc[i,0] == teams[team]:
                standings_5.iloc[team, -1] = fixtures.iloc[i][1]
            elif fixtures.iloc[i,1] == teams[team]:
                standings_5.iloc[team, -1] = fixtures.iloc[i][0]


    standings_5['opp_mean_f'] = 0
    standings_5['opp_x_mean_f'] = 0
    standings_5['opp_mean_a'] = 0
    standings_5['opp_x_mean_a'] = 0

    for i in range(len(standings_3)):
        standings_5.loc[standings_5.index[i], 'opp_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_f']
        standings_5.loc[standings_5.index[i], 'opp_x_mean_f'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_f']
        standings_5.loc[standings_5.index[i], 'opp_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'mean_a']
        standings_5.loc[standings_5.index[i], 'opp_x_mean_a'] = standings_5.loc[standings_5.loc[standings_5.index[i], 'opponents'], 'x_mean_a']

    standings_5['match_attack_score'] = (standings_5['mean_f'] + xG_importance*standings_5['x_mean_f'] + standings_5['opp_mean_a'] + xG_importance*standings_5['opp_x_mean_a'])/ (2 + 2*xG_importance)
    standings_5['match_defence_score'] = (standings_5['mean_a'] + xG_importance*standings_5['x_mean_a'] + standings_5['opp_mean_f'] + xG_importance*standings_5['opp_x_mean_f'])/ (2 + 2*xG_importance)


    fixtures_5 = fixtures.copy()
    fixtures_5['home_mean_goals'] = 0
    fixtures_5['away_mean_goals'] = 0

    for i in range(len(standings)):
        for j in range(len(fixtures)):
            if fixtures_5.iloc[j,0] == teams[i]:
                fixtures_5.iloc[j,-2] = standings_5.iloc[i]['match_attack_score']
                
            elif fixtures_5.iloc[j,1] == teams[i]:
                fixtures_5.iloc[j,-1] = standings_5.iloc[i]['match_attack_score']

    fixtures_5['home_win'] = 1 - skellam.cdf(0, fixtures_5['home_mean_goals'] - correlation*np.sqrt(fixtures_5['home_mean_goals']*fixtures_5['away_mean_goals']), fixtures_5['away_mean_goals'] - correlation*np.sqrt(fixtures_5['home_mean_goals']*fixtures_5['away_mean_goals']))
    for i in range(1,4):
        fixtures_5['home_by_{}'.format(i)] = skellam.pmf(i, fixtures_5['home_mean_goals'] - correlation*np.sqrt(fixtures_5['home_mean_goals']*fixtures_5['away_mean_goals']), fixtures_5['away_mean_goals'] - correlation*np.sqrt(fixtures_5['home_mean_goals']*fixtures_5['away_mean_goals']))
    fixtures_5['draw'] = skellam.pmf(0, fixtures_5['home_mean_goals'] - correlation*np.sqrt(fixtures_5['home_mean_goals']*fixtures_5['away_mean_goals']), fixtures_5['away_mean_goals'] - correlation*np.sqrt(fixtures_5['home_mean_goals']*fixtures_5['away_mean_goals']))
    fixtures_5['away_win'] = skellam.cdf(-1, fixtures_5['home_mean_goals'] - correlation*np.sqrt(fixtures_5['home_mean_goals']*fixtures_5['away_mean_goals']), fixtures_5['away_mean_goals'] - correlation*np.sqrt(fixtures_5['home_mean_goals']*fixtures_5['away_mean_goals']))
    for i in range(1,4):
        fixtures_5['away_by_{}'.format(i)] = skellam.pmf(-i, fixtures_5['home_mean_goals'] - correlation*np.sqrt(fixtures_5['home_mean_goals']*fixtures_5['away_mean_goals']), fixtures_5['away_mean_goals'] - correlation*np.sqrt(fixtures_5['home_mean_goals']*fixtures_5['away_mean_goals']))
    
    for i in range(7):
        fixtures_5['num_goals_{}'.format(i)] = poisson.pmf(i, fixtures_5['home_mean_goals'] + fixtures_5['away_mean_goals'])
    
    fixtures_5['predicted_result'] = fixtures_5[['home_win', 'away_win', 'draw']].idxmax(axis="columns")
    return fixtures_5

    

In [466]:
team_results = get_data()

In [487]:
predict_matchday_results(standings, team_results, fixtures, 30, correlation=-0.15)

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_win,home_by_1,home_by_2,home_by_3,draw,away_win,...,away_by_2,away_by_3,num_goals_0,num_goals_1,num_goals_2,num_goals_3,num_goals_4,num_goals_5,num_goals_6,predicted_result
0,Chelsea,West Bromwich Albion,1.409424,0.628405,0.559225,0.263249,0.172701,0.08159,0.255487,0.185289,...,0.04254,0.009975,0.130311,0.265552,0.270575,0.183795,0.093636,0.038163,0.012962,home_win
1,Leeds United,Sheffield United,1.582416,1.002662,0.509676,0.228737,0.155448,0.07912,0.233702,0.256622,...,0.070346,0.024086,0.07539,0.19489,0.251902,0.217062,0.140281,0.072527,0.031248,home_win
2,Leicester City,Manchester City,1.268313,1.69625,0.301089,0.16268,0.086919,0.035633,0.221215,0.477695,...,0.144089,0.076054,0.051583,0.152921,0.226672,0.223995,0.166011,0.09843,0.048634,away_win
3,Arsenal,Liverpool,1.395475,1.266028,0.408331,0.203614,0.12196,0.055149,0.23973,0.351938,...,0.102965,0.042781,0.069843,0.185888,0.24737,0.219459,0.146023,0.077728,0.034479,home_win
4,Southampton,Burnley,1.200252,1.372793,0.339879,0.185587,0.098513,0.039263,0.243964,0.416157,...,0.124432,0.055737,0.076303,0.196331,0.252584,0.216636,0.139354,0.071713,0.030753,away_win
5,Newcastle United,Tottenham Hotspur,0.945984,1.507728,0.253916,0.155584,0.068582,0.02246,0.241105,0.504979,...,0.15416,0.075694,0.085974,0.210955,0.258812,0.211683,0.129852,0.063724,0.02606,away_win
6,Aston Villa,Fulham,1.064555,1.035897,0.369163,0.210643,0.10565,0.038646,0.275537,0.3553,...,0.100753,0.035991,0.122401,0.257098,0.270011,0.189048,0.099272,0.041703,0.014599,home_win
7,Manchester United,Brighton & Hove Albion,1.26852,1.073446,0.416344,0.216738,0.123777,0.052319,0.257192,0.326464,...,0.092584,0.033846,0.096138,0.225153,0.26365,0.20582,0.120506,0.056444,0.022032,home_win
8,Everton,Crystal Palace,1.321161,1.096714,0.424919,0.216329,0.127018,0.055453,0.252088,0.322993,...,0.091887,0.03412,0.089111,0.215459,0.260476,0.209933,0.126898,0.061365,0.024729,home_win
9,Wolverhampton,West Ham United,1.067769,1.365233,0.30779,0.177917,0.086762,0.031475,0.24989,0.44232,...,0.133016,0.059749,0.087773,0.213552,0.259786,0.210687,0.12815,0.062358,0.025286,away_win


In [488]:
predict_matchday_results(standings, team_results, fixtures, 30, correlation=-0.15, xG_importance=2)

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_win,home_by_1,home_by_2,home_by_3,draw,away_win,...,away_by_2,away_by_3,num_goals_0,num_goals_1,num_goals_2,num_goals_3,num_goals_4,num_goals_5,num_goals_6,predicted_result
0,Chelsea,West Bromwich Albion,1.435038,0.657984,0.55795,0.259968,0.172302,0.082553,0.252059,0.18999,...,0.044542,0.010851,0.123314,0.258099,0.270103,0.188444,0.098604,0.041276,0.014399,home_win
1,Leeds United,Sheffield United,1.573118,1.032694,0.500683,0.226318,0.15248,0.077077,0.23405,0.265267,...,0.073375,0.025729,0.073843,0.192421,0.250707,0.217765,0.141864,0.073934,0.03211,home_win
2,Leicester City,Manchester City,1.249198,1.671032,0.300761,0.163532,0.086675,0.03516,0.223173,0.476066,...,0.143736,0.075086,0.053921,0.157463,0.229913,0.2238,0.163387,0.095425,0.046444,away_win
3,Arsenal,Liverpool,1.392503,1.300058,0.400833,0.200626,0.119501,0.05384,0.238409,0.360758,...,0.106044,0.045007,0.067707,0.182306,0.245435,0.220283,0.148281,0.079851,0.035834,home_win
4,Southampton,Burnley,1.226622,1.322065,0.35582,0.191556,0.103843,0.042213,0.245957,0.398223,...,0.118296,0.051326,0.078184,0.199267,0.253935,0.215733,0.137459,0.070068,0.029764,away_win
5,Newcastle United,Tottenham Hotspur,0.957578,1.454414,0.265004,0.161275,0.07203,0.023857,0.245714,0.489282,...,0.148925,0.070926,0.089637,0.216203,0.26074,0.209634,0.126409,0.060979,0.024514,away_win
6,Aston Villa,Fulham,1.094812,1.098346,0.364835,0.20587,0.104878,0.039195,0.268652,0.366513,...,0.105467,0.039526,0.111564,0.244677,0.268308,0.196147,0.107545,0.047173,0.017243,away_win
7,Manchester United,Brighton & Hove Albion,1.230113,1.125297,0.395457,0.210145,0.116594,0.047935,0.257306,0.347237,...,0.099865,0.037998,0.094855,0.223421,0.263125,0.206589,0.12165,0.057307,0.022497,home_win
8,Everton,Crystal Palace,1.321174,1.098942,0.424447,0.21612,0.126863,0.055379,0.251988,0.323565,...,0.092098,0.034255,0.088911,0.215176,0.260375,0.210046,0.127084,0.061512,0.024811,home_win
9,Wolverhampton,West Ham United,1.079935,1.34695,0.314099,0.180458,0.088887,0.032565,0.250836,0.435066,...,0.130522,0.057945,0.088312,0.214322,0.260067,0.210384,0.127645,0.061956,0.02506,away_win


In [489]:
predict_matchday_results(standings, team_results, fixtures, 30, correlation=0, xG_importance=2)

Unnamed: 0,Home,Away,home_mean_goals,away_mean_goals,home_win,home_by_1,home_by_2,home_by_3,draw,away_win,...,away_by_2,away_by_3,num_goals_0,num_goals_1,num_goals_2,num_goals_3,num_goals_4,num_goals_5,num_goals_6,predicted_result
0,Chelsea,West Bromwich Albion,1.435038,0.657984,0.558879,0.274739,0.171962,0.076501,0.270297,0.170824,...,0.036152,0.007374,0.123314,0.258099,0.270103,0.188444,0.098604,0.041276,0.014399,home_win
1,Leeds United,Sheffield United,1.573118,1.032694,0.498794,0.239824,0.152061,0.070834,0.252274,0.248932,...,0.06553,0.020039,0.073843,0.192421,0.250707,0.217765,0.141864,0.073934,0.03211,home_win
2,Leicester City,Manchester City,1.249198,1.671032,0.28638,0.166853,0.080146,0.028808,0.240778,0.472842,...,0.143413,0.068957,0.053921,0.157463,0.229913,0.2238,0.163387,0.095425,0.046444,away_win
3,Arsenal,Liverpool,1.392503,1.300058,0.392185,0.209632,0.115389,0.047025,0.258272,0.349543,...,0.100577,0.038267,0.067707,0.182306,0.245435,0.220283,0.148281,0.079851,0.035834,home_win
4,Southampton,Burnley,1.226622,1.322065,0.344133,0.197765,0.097804,0.035531,0.266641,0.389226,...,0.113616,0.044487,0.078184,0.199267,0.253935,0.215733,0.137459,0.070068,0.029764,away_win
5,Newcastle United,Tottenham Hotspur,0.957578,1.454414,0.248417,0.161207,0.063823,0.018373,0.265286,0.486297,...,0.147234,0.064375,0.089637,0.216203,0.26074,0.209634,0.126409,0.060979,0.024514,away_win
6,Aston Villa,Fulham,1.094812,1.098346,0.353179,0.211863,0.098029,0.032679,0.29186,0.354961,...,0.098663,0.032996,0.111564,0.244677,0.268308,0.196147,0.107545,0.047173,0.017243,away_win
7,Manchester United,Brighton & Hove Albion,1.230113,1.125297,0.386004,0.218376,0.11117,0.041134,0.279223,0.334773,...,0.093032,0.031489,0.094855,0.223421,0.263125,0.206589,0.12165,0.057307,0.022497,home_win
8,Everton,Crystal Palace,1.321174,1.098942,0.41702,0.226064,0.122669,0.048531,0.273144,0.309836,...,0.084872,0.02793,0.088911,0.215176,0.260375,0.210046,0.127084,0.061512,0.024811,home_win
9,Wolverhampton,West Ham United,1.079935,1.34695,0.299866,0.183745,0.081475,0.026343,0.271764,0.42837,...,0.126745,0.051113,0.088312,0.214322,0.260067,0.210384,0.127645,0.061956,0.02506,away_win


In [478]:
predicted_results_matchday_29 = predict_matchday_results(standings, team_results, fixtures_matchday_29, 29, correlation=-0.15, xG_importance=2)
results_matchday_29 = ['draw', 'away_win', 'away_win', 'away_win', 'home_win', 'home_win', 'home_win', 'draw', 'draw', 'draw']
accuracy_29 = np.sum(predicted_results_matchday_29['predicted_result'] == results_matchday_29)/len(results_matchday_29)

In [479]:
predicted_results_matchday_28 = predict_matchday_results(standings, team_results, fixtures_matchday_28, 28, correlation=-0.15, xG_importance=2)
results_matchday_28 = ['home_win', 'away_win', 'draw', 'away_win', 'home_win', 'away_win', 'draw', 'home_win', 'home_win', 'away_win']
accuracy_28 = np.sum(predicted_results_matchday_28['predicted_result'] == results_matchday_28)/len(results_matchday_28)