# Forecasting The Final Premier League Table 2019/20

The Premier League suspended its 2019/20 season on March 13, 2020 with 92 games left unplayed, due to emergency measures required for dealing the the worldwide COVID-19 pandemic. 

At the time of writing (April 8, 2020) it is unclear if these games will ever be played. 

Given that our model can forecast the result of every game, we can also forecast what the final league table would be. 

In [None]:
# install the packages we need

import pandas as pd
import numpy as np

In [None]:
# load the Premier League table at the date of the suspension

Table = pd.read_excel("../Data/Premier League table March 13 2020.xlsx")
Table

In [None]:
# load the forecasts we produced in the last session

forecasts19_20 = pd.read_excel("../Data/forecasts19_20.xlsx")
forecasts19_20

We want to create a subset of the unplayed games. First we restrict the data to only those variables that we need: 1. our prediction for each game and 2. the identity of the home team and the away team. Of course, we also need the indicator as to whether a game was played or not:

In [None]:
Unplayed = forecasts19_20[['Home team','away team','notplayed','logitpred']]
Unplayed

Now we create the subset of unplayed games:

In [None]:
Unplayed = Unplayed[Unplayed['notplayed']==1].copy()
Unplayed.describe()

We now assign the points for each result: 3 points for win and zero for a loss. (As we established in the last session, our model doesn't forecast draws). We therefore allocate the points to the home and away team conditional on our forecast result:

In [None]:
Unplayed['Hpts'] = np.where(Unplayed['logitpred']=="H", 3, 0)
Unplayed['Apts'] = np.where(Unplayed['logitpred']=="H", 0, 3)
Unplayed

Each row contains a result with two teams. We need to create a list of results with only one team in each row. To do this we will create two subsets, one for home teams and the other for away teams, and then stack (concatenate) the two subsets on top of each other. 

First, let's look at the home team results. We rename the columns, so that when we concatenate with the away teams the columns will have consistent names:

In [None]:
Results = Unplayed[['Home team','Hpts']].rename(columns={'Home team': 'club','Hpts':'XPoints'})
Results

Now we generate a subset of our forecast away team points:

In [None]:
AResults = Unplayed[['away team','Apts']].rename(columns={'away team': 'club','Apts':'XPoints'})
AResults

Now we concatenate the two dfs (Results and AResults) into a single df 

In [None]:
Results = pd.concat([Results, AResults])
Results

In [None]:
Results.describe()

Now we use .groupby to sum the forecast points won by each team:

In [None]:
PtsX = Results.groupby('club')['XPoints'].sum().reset_index()
PtsX

We can now merge these points forecasts into the table which showed the points won up until March 13, 2020 when league play was suspended:

In [None]:
Table = pd.merge(Table, PtsX, on= 'club', how = 'left')
Table

Our forecast points for the end of the season is therefore the sum of Points (actually won) and the XPoints (our forecast for the remaining games): 

In [None]:
Table['finalpoints']=Table['Points']+ Table['XPoints']
Table

Points determine league position, which is crucial not just for determining the champion, but also qualification for European competition (The Champions League and Europa League) and, perhaps more importantly, which teams get relegated to the Football League Championship in the following season (the bottom three teams).

The positions on March 13, 2020 are listed in the df. We now create a variable 'rank', which is the position of each team if XPoints are added:

In [None]:
Table.sort_values("finalpoints", inplace = True, ascending = False)
Table['rank'] = Table['finalpoints'].rank(ascending= False)
Table

## Conclusions

This exercise is a nice application of our forecasting model, but given the likelihood that the 92 games may never be played, it is possible that some exercise of this kind might actually be required. 

Note that the model does not resolves ties- if two team get equal points, they are awarded half of the two ranks they occupy. Thus Chelsea and Manchester United in this model are tied in 3rd and 4th positions, so they are given a value of 3.5 each. In practice, goal difference is used to separate teams of equal points, and this model could be developed to generate a forecast of goal difference as well.

There is little doubt that had the full season been played Liverpool would have won the title- it was almost a mathematical certainty as of March 13, 2020. Qualification for Europe competition and relegation from the Premier League were much less clear. 

One very notable change that arises from our model is that when the league was suspended Bournemouth was in 18th place and would have been relegated if it stayed in this position, while our model forecasts the team would have won enough points to rise to 16th place, avoiding relegation, while Brighton would have sunk to 18th place and been relegated. The reason for this is that on average  Bournemouth's TM value equaled 70% of its remaining opponents, but Brighton's TM value was only equal to 50% of its remaining opponents. But clearly, fans of each team would have opposite opinions about a modeling exercise of this kind!

Assuming the season cannot be completed, the Premier League will have to decide how to treat unplayed games. At the least, this exercise shows that this decision, whatever it turns out to be, will not be without controversy.  