# Predicting EPL Result Using Poisson Distribution and Classification

## Notebook Contents:

[Objective](#Objective)  
[Data Prep/Summary](#DPS)  
[Model](#model)

<a id='Objective'></a>
### Objective:
Predict the probability of each matches results of English Premier League using past season and Pro Evolution Soccer data using Poisson distribution and classification. We train seasons from 13/14 to 17/18 (five seasons) to predict 18/19 season. New teams promoted on each seasons will be ignored.

In [1]:
import warnings
warnings.filterwarnings("ignore")

In [2]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, VotingClassifier, BaggingClassifier, ExtraTreesClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from scipy.stats import poisson

<a id='DPS'></a>
### Data Prep/Summary:
- Pro Evloution Soccer Data (PES 14 - PES 19)
- English Premier League Stats (2013 - Present)
- English Premier League Fixtures (2013 - Present)

In [3]:
# PES Roster Power Index(RPI) from PES Database
# Use PES 19 to predict (train) 18/19 season
# Merge PES 15 data with EPL 13/14 results in order to predict EPL 14/15 season

pes_data = pd.read_csv('./Data/pes_data.csv')

In [4]:
# Read past seasons data which have been cleaned beforehands
# We want teamnames, total number of goals each team scored and allowed ('Team', 'HGF', 'HGA')
# We calculate each teams offensive and defensive rating ('H_Att', 'A_Att', 'H_Def', 'A_Def')
# Number of yellow cards, red cards, and the number of clean sheets each team for the last season

# IMPORTANT
# Season13 has the final stats of 12/13 season
# So we are going to merge with Season 18 and PES 19 to predict EPL 18/19
# Past season RPIs and Match Power Index(MPI)from past season

epl_data = pd.read_csv('./Data/epl_data.csv')[['Season', 'Team', 'HGF', 'HGA', 'H_Att', 'A_Att', 'H_Def', 'A_Def', 'CS', 'YC', 'RC']]

In [5]:
# Read all season fixtures from 13/14 to 18/19 (present)
# We need home/away team and the final result to compare with our prediction

# Load EPL fixtures from Season 
epl_fixture = pd.read_csv('./Data/epl_fixture.csv')[['Season', 'HomeTeam', 'AwayTeam', 'FTR']]

In [6]:
# Empty full fixture for 18/19
# Create 380 rows(games) between teams

teams = sorted(set(pd.read_csv('./Data/matchday.csv')['Team']))
fixture_1819 = pd.DataFrame(columns=['HomeTeam', 'AwayTeam'])
fixture_1819['HomeTeam'] = teams * 20
for i in range(0, 400):
     fixture_1819.loc[i, 'AwayTeam'] = teams[int(i/20)]
fixture_1819 = fixture_1819[fixture_1819['HomeTeam'] != fixture_1819['AwayTeam']].reset_index(drop=True)
fixture_1819['Season'] = 19

### Getting W/D/L Probability from Poisson Distribution
Add W/D/L probability to fixture dataframe using Poisson distribution.

In [7]:
# Draw W/D/L probabilities for all games played

def result_percentage(epl_data, hometeam, awayteam, season):
    '''
    returns array of size 2.
    list of percentage of hometeam win/away win/draw
    dataframe of percentage of how many goals each team scores in 90 minutes (min: 0, max: 5)
    '''
    score = []
    dataframe = epl_data[epl_data['Season'] == season - 1]
    if len(dataframe) > 0:
        home_avg = dataframe['HGF'].sum()/380
        away_avg = dataframe['HGA'].sum()/380

        home_score = float(dataframe[dataframe['Team'] == hometeam]['H_Att']) * float(dataframe[dataframe['Team'] == awayteam]['A_Def']) * home_avg
        away_score = float(dataframe[dataframe['Team'] == awayteam]['A_Att']) * float(dataframe[dataframe['Team'] == hometeam]['H_Def']) * away_avg

        # maximum score for a team is 5
        for goals in range(0, 6):
            scores = {}
            scores['Home'] = (poisson.pmf(goals, home_score)) # Hometeam score
            scores['Away'] = (poisson.pmf(goals, away_score)) # away score

            if len(scores) == 2:
                        score.append(scores)

        score = pd.DataFrame(score, columns=(['Home', 'Away']))

    '''
    % of home team winning
    home score > away score
    home[1] * away[0]
    home[2] * away[0] + home[2] * away[1]
    home[3] * away[0] + home[3] * away[1] + home[3] * away[2]
    home[4] * away[0] + home[4] * away[1] + home[4] * away[2] + home[4] * away[3]
    home[5] * away[0] + home[5] * away[1] + home[5] * away[2] + home[5] * away[3]+  home[5] * away[4]
    ''' 
    home_w = 0
    away_w = 0
    draw = 0
    result = []
    for home in range(1, len(score)):
        for away in range(0, home):
            home_w += (score['Home'][home] * score['Away'][away])
    result.append(home_w)

    for away in range(1, len(score)):
        for home in range(0, away):
            away_w += (score['Home'][home] * score['Away'][away])
    result.append(away_w)

    for home in range(0, len(score)):
        away = home
        draw += (score['Home'][home] * score['Away'][away])
    result.append(draw)

    return result, score

In [8]:
# Predict probability of W/D/L using Poisson

def predict(fixture, data):
    for i in range(len(fixture)):
        season = fixture.loc[i, 'Season']
        hometeam = fixture.loc[i, 'HomeTeam']
        awayteam = fixture.loc[i, 'AwayTeam']
        if (hometeam in list(data[data['Season'] == (season - 1)]['Team'])) & (awayteam in list(data[data['Season'] == (season - 1)]['Team'])):
            result = result_percentage(data, hometeam, awayteam, season)[0]
            fixture.loc[i, 'W'] = result[0]
            fixture.loc[i, 'D'] = result[2]
            fixture.loc[i, 'L'] = result[1]
        else:
            pass

In [9]:
# Put W/D/L probabilities in past season fixtures
predict(epl_fixture, epl_data)

In [10]:
# Drop matches that can't be predicted
# 3 teams are promoted every season, so we don't have data
# Reset index for future use of dataframe
epl_fixture.dropna(inplace=True)
epl_fixture.reset_index(inplace=True, drop=True)

In [11]:
epl_fixture.head()

Unnamed: 0,Season,HomeTeam,AwayTeam,FTR,W,D,L
0,14,Arsenal,Aston Villa,A,0.65322,0.117755,0.094719
1,14,Liverpool,Stoke City,H,0.587701,0.282341,0.12727
2,14,Norwich City,Everton,D,0.355604,0.307891,0.335295
3,14,Sunderland,Fulham,A,0.380148,0.307465,0.311142
4,14,Swansea City,Manchester Utd,A,0.176014,0.18561,0.602336


In [12]:
# Put W/D/L probabilities in current season fixtures
predict(fixture_1819, epl_data)

In [13]:
fixture_1819.dropna(inplace=True)
fixture_1819.reset_index(inplace=True, drop=True)

In [14]:
fixture_1819.head()

Unnamed: 0,HomeTeam,AwayTeam,Season,W,D,L
0,Bournemouth,Arsenal,19,0.374453,0.247433,0.370468
1,Brighton & Hove,Arsenal,19,0.397048,0.265983,0.332787
2,Burnley,Arsenal,19,0.351889,0.337976,0.309586
3,Chelsea,Arsenal,19,0.583343,0.239465,0.169422
4,Crystal Palace,Arsenal,19,0.44342,0.242809,0.305012


In [15]:
# Merge EPL data with PES data
epl_data['Season'] = epl_data['Season'] + 1

merged_data = pd.merge(pes_data, epl_data, on=['Team','Season'], how='right')

### Merge PES data to Fixture Dataframe

In [16]:
# Merge all features to fixture data
def mergedata(fixture, data):
    for i in range(len(fixture)):
        season = fixture.loc[i, 'Season']
        hometeam = fixture.loc[i, 'HomeTeam']
        awayteam = fixture.loc[i, 'AwayTeam']
        try:
            fixture.loc[i, 'HtOff'] = data[(data['Season'] == season) & (data['Team'] == hometeam)]['H_Att'].values[0]
            fixture.loc[i, 'HtDef'] = data[(data['Season'] == season) & (data['Team'] == hometeam)]['H_Def'].values[0]
            fixture.loc[i, 'HtPesOvr'] = data[(data['Season'] == season) & (data['Team'] == hometeam)]['Ovr'].values[0]
            fixture.loc[i, 'HtPesDef'] = data[(data['Season'] == season) & (data['Team'] == hometeam)]['Def'].values[0]
            fixture.loc[i, 'HtPesOff'] = data[(data['Season'] == season) & (data['Team'] == hometeam)]['Fwd'].values[0]
            fixture.loc[i, 'HtPesMid'] = data[(data['Season'] == season) & (data['Team'] == hometeam)]['Mid'].values[0]
            fixture.loc[i, 'HtPesPhy'] = data[(data['Season'] == season) & (data['Team'] == hometeam)]['Phy'].values[0]
            fixture.loc[i, 'HtPesSpd'] = data[(data['Season'] == season) & (data['Team'] == hometeam)]['Spd'].values[0]
            fixture.loc[i, 'HtYC'] = data[(data['Season'] == season) & (data['Team'] == hometeam)]['YC'].values[0]
            fixture.loc[i, 'HtRC'] = data[(data['Season'] == season) & (data['Team'] == hometeam)]['RC'].values[0]
            fixture.loc[i, 'HtCS'] = data[(data['Season'] == season) & (data['Team'] == hometeam)]['CS'].values[0]

            fixture.loc[i, 'AtOff'] = data[(data['Season'] == season) & (data['Team'] == awayteam)]['A_Att'].values[0]
            fixture.loc[i, 'AtDef'] = data[(data['Season'] == season) & (data['Team'] == awayteam)]['A_Def'].values[0]
            fixture.loc[i, 'AtPesOvr'] = data[(data['Season'] == season) & (data['Team'] == awayteam)]['Ovr'].values[0]
            fixture.loc[i, 'AtPesDef'] = data[(data['Season'] == season) & (data['Team'] == awayteam)]['Def'].values[0]
            fixture.loc[i, 'AtPesOff'] = data[(data['Season'] == season) & (data['Team'] == awayteam)]['Fwd'].values[0]
            fixture.loc[i, 'AtPesMid'] = data[(data['Season'] == season) & (data['Team'] == awayteam)]['Mid'].values[0]
            fixture.loc[i, 'AtPesPhy'] = data[(data['Season'] == season) & (data['Team'] == awayteam)]['Phy'].values[0]
            fixture.loc[i, 'AtPesSpd'] = data[(data['Season'] == season) & (data['Team'] == awayteam)]['Spd'].values[0]
            fixture.loc[i, 'AtYC'] = data[(data['Season'] == season) & (data['Team'] == awayteam)]['YC'].values[0]
            fixture.loc[i, 'AtRC'] = data[(data['Season'] == season) & (data['Team'] == awayteam)]['RC'].values[0]
            fixture.loc[i, 'AtCS'] = data[(data['Season'] == season) & (data['Team'] == awayteam)]['CS'].values[0]

        except IndexError:
            pass
    return fixture

In [17]:
epl_fixture = mergedata(epl_fixture, merged_data)

In [18]:
epl_fixture.head()

Unnamed: 0,Season,HomeTeam,AwayTeam,FTR,W,D,L,HtOff,HtDef,HtPesOvr,...,AtDef,AtPesOvr,AtPesDef,AtPesOff,AtPesMid,AtPesPhy,AtPesSpd,AtYC,AtRC,AtCS
0,14,Arsenal,Aston Villa,A,0.65322,0.117755,0.094719,1.587838,0.976645,84.0,...,1.385135,79.0,79.0,81.0,79.0,78.0,79.0,70.0,3.0,5.0
1,14,Liverpool,Stoke City,H,0.587701,0.282341,0.12727,1.114865,0.679406,82.0,...,0.777027,81.0,79.0,82.0,80.0,81.0,76.0,73.0,4.0,12.0
2,14,Norwich City,Everton,D,0.355604,0.307891,0.335295,0.844595,0.849257,79.0,...,0.777027,80.0,81.0,79.0,83.0,79.0,77.0,57.0,3.0,11.0
3,14,Sunderland,Fulham,A,0.380148,0.307465,0.311142,0.675676,0.806794,80.0,...,1.013514,80.0,80.0,84.0,80.0,78.0,76.0,48.0,3.0,8.0
4,14,Swansea City,Manchester Utd,A,0.176014,0.18561,0.602336,0.945946,1.104034,79.0,...,0.810811,85.0,83.0,89.0,85.0,79.0,78.0,57.0,1.0,13.0


In [19]:
fixture_1819 = mergedata(fixture_1819, merged_data).dropna()

In [20]:
fixture_1819.head()

Unnamed: 0,HomeTeam,AwayTeam,Season,W,D,L,HtOff,HtDef,HtPesOvr,HtPesDef,...,AtDef,AtPesOvr,AtPesDef,AtPesOff,AtPesMid,AtPesPhy,AtPesSpd,AtYC,AtRC,AtCS
0,Bournemouth,Arsenal,19,0.374453,0.247433,0.370468,0.893471,1.376147,77.0,77.0,...,1.065292,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0
1,Brighton & Hove,Arsenal,19,0.397048,0.265983,0.332787,0.824742,1.146789,77.0,76.0,...,1.065292,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0
2,Burnley,Arsenal,19,0.351889,0.337976,0.309586,0.549828,0.779817,76.0,76.0,...,1.065292,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0
3,Chelsea,Arsenal,19,0.583343,0.239465,0.169422,1.030928,0.733945,83.0,82.0,...,1.065292,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0
4,Crystal Palace,Arsenal,19,0.44342,0.242809,0.305012,0.996564,1.238532,78.0,78.0,...,1.065292,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0


<a id='model'></a>
### Model
- KNN
- Logistic Regression
- Random Forest
- Bagging Classifier
- ExtraTrees Classifier
- AdaBoost Classifier
- Gradient Boosting Classifier
- VC

#### GridSearchCV
- KNN
- Logistic Regression
- Gradient Boosting Classifier

In [21]:
# Divide EPL fixtures from Season 13 to 18 and Season 19 (Games played current season so far)
epl_past_fixture = epl_fixture[epl_fixture['Season'] < 19]
epl_current_fixture = epl_fixture[epl_fixture['Season'] == 19]

In [22]:
X = epl_past_fixture.drop(columns=(['HomeTeam', 'AwayTeam', 'FTR', 'Season']))
y = epl_past_fixture['FTR']

In [23]:
# Our predicting features
X.columns

Index(['W', 'D', 'L', 'HtOff', 'HtDef', 'HtPesOvr', 'HtPesDef', 'HtPesOff',
       'HtPesMid', 'HtPesPhy', 'HtPesSpd', 'HtYC', 'HtRC', 'HtCS', 'AtOff',
       'AtDef', 'AtPesOvr', 'AtPesDef', 'AtPesOff', 'AtPesMid', 'AtPesPhy',
       'AtPesSpd', 'AtYC', 'AtRC', 'AtCS'],
      dtype='object')

In [24]:
# Train test split our data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y, test_size=.33)

In [25]:
ss = StandardScaler()
X_train = ss.fit_transform(X_train)
X_test = ss.fit_transform(X_test)

knn = KNeighborsClassifier(n_neighbors=5)
lr = LogisticRegression(random_state=42)
rf = RandomForestClassifier(random_state=42)
bag = BaggingClassifier(random_state=42)
etc = ExtraTreesClassifier(random_state=42)
ada = AdaBoostClassifier(random_state=42)
gbc = GradientBoostingClassifier(random_state=42)
vc = VotingClassifier(estimators=[('lr', LogisticRegression()), ('knn', KNeighborsClassifier()),
                                 ('bag', BaggingClassifier(n_estimators=5)), ('etc', ExtraTreesClassifier()),
                                  ('ada', AdaBoostClassifier()), ('gbc', GradientBoostingClassifier())], voting='soft')

knn.fit(X_train, y_train)
lr.fit(X_train, y_train)
rf.fit(X_train, y_train)
bag.fit(X_train, y_train)
etc.fit(X_train, y_train)
ada.fit(X_train, y_train)
gbc.fit(X_train, y_train)
vc.fit(X_train, y_train);

In [26]:
print('---- Score on Train (Past Data) ----')
print('KNN:' + str(knn.score(X_train, y_train)))
print('LR:' + str(lr.score(X_train, y_train)))
print('RF:' + str(rf.score(X_train, y_train)))
print('BAG:' + str(bag.score(X_train, y_train)))
print('ETC:' + str(etc.score(X_train, y_train)))
print('ADA:' + str(ada.score(X_train, y_train)))
print('GBC:' + str(gbc.score(X_train, y_train)))
print('VC:' + str(vc.score(X_train, y_train)))

print('---- Score on Test (Past Data)  ----')
print('KNN:' + str(knn.score(X_test, y_test)))
print('LR:' + str(lr.score(X_test, y_test)))
print('RF:' + str(rf.score(X_test, y_test)))
print('BAG:' + str(bag.score(X_test, y_test)))
print('ETC:' + str(etc.score(X_test, y_test)))
print('ADA:' + str(ada.score(X_test, y_test)))
print('GBC:' + str(gbc.score(X_test, y_test)))
print('VC:' + str(vc.score(X_test, y_test)))

---- Score on Train (Past Data) ----
KNN:0.605927552140505
LR:0.531284302963776
RF:0.9901207464324918
BAG:0.9890230515916575
ETC:1.0
ADA:0.5784851811196488
GBC:0.823271130625686
VC:0.986827661909989
---- Score on Test (Past Data)  ----
KNN:0.42761692650334077
LR:0.5055679287305123
RF:0.4298440979955457
BAG:0.40757238307349664
ETC:0.4365256124721604
ADA:0.4521158129175947
GBC:0.46547884187082406
VC:0.4922048997772829


In [27]:
knn_grid = {
    'n_neighbors': [5, 10, 15],
    'leaf_size': [15, 30, 45],
    'weights': ['uniform', 'distance']
}
lr_grid = {
    'penalty': ['l1', 'l2'],
    'C': [0.1, 1.0, 10]
}
gbc_grid = {
    'n_estimators': [50, 100, 150],
    'max_depth': [1, 3, 5],
}

knn_gs = GridSearchCV(knn, knn_grid, cv=5, scoring='accuracy')
lr_gs = GridSearchCV(lr, lr_grid, cv=5, scoring='accuracy')
gbc_gs = GridSearchCV(gbc, gbc_grid, cv=5, scoring='accuracy')

In [28]:
knn_gs.fit(X_train, y_train)
lr_gs.fit(X_train, y_train)
gbc_gs.fit(X_train, y_train);

In [29]:
print('---- Score on Test (Past Data)  ----')
print('KNN:' + str(knn_gs.score(X_test, y_test)))
print('LR:' + str(lr_gs.score(X_test, y_test)))
print('GBC:' + str(gbc_gs.score(X_test, y_test)))

---- Score on Test (Past Data)  ----
KNN:0.45879732739420936
LR:0.5189309576837416
GBC:0.49443207126948774


In [30]:
# Predicting current season games
cs_X = epl_current_fixture.drop(columns=(['HomeTeam', 'AwayTeam', 'FTR', 'Season']))
cs_y = epl_current_fixture['FTR']

In [31]:
ss = StandardScaler()
cs_X = ss.fit_transform(cs_X)

In [32]:
print('---- Score on Current Season ----')
print('KNN:' + str(knn_gs.score(cs_X, cs_y)))
print('LR:' + str(lr_gs.score(cs_X, cs_y)))
print('GBC:' + str(gbc_gs.score(cs_X, cs_y)))

---- Score on Current Season ----
KNN:0.5535714285714286
LR:0.5892857142857143
GBC:0.6428571428571429


### Predicting Season 18/19

In [33]:
fixture_1819.head()

Unnamed: 0,HomeTeam,AwayTeam,Season,W,D,L,HtOff,HtDef,HtPesOvr,HtPesDef,...,AtDef,AtPesOvr,AtPesDef,AtPesOff,AtPesMid,AtPesPhy,AtPesSpd,AtYC,AtRC,AtCS
0,Bournemouth,Arsenal,19,0.374453,0.247433,0.370468,0.893471,1.376147,77.0,77.0,...,1.065292,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0
1,Brighton & Hove,Arsenal,19,0.397048,0.265983,0.332787,0.824742,1.146789,77.0,76.0,...,1.065292,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0
2,Burnley,Arsenal,19,0.351889,0.337976,0.309586,0.549828,0.779817,76.0,76.0,...,1.065292,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0
3,Chelsea,Arsenal,19,0.583343,0.239465,0.169422,1.030928,0.733945,83.0,82.0,...,1.065292,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0
4,Crystal Palace,Arsenal,19,0.44342,0.242809,0.305012,0.996564,1.238532,78.0,78.0,...,1.065292,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0


In [34]:
fixture_X = fixture_1819.drop(columns=(['HomeTeam', 'AwayTeam', 'Season']))

In [35]:
fixture_1819.loc[:, 'Result'] = gbc_gs.predict(fixture_X)

In [36]:
fixture_1819.head()

Unnamed: 0,HomeTeam,AwayTeam,Season,W,D,L,HtOff,HtDef,HtPesOvr,HtPesDef,...,AtPesOvr,AtPesDef,AtPesOff,AtPesMid,AtPesPhy,AtPesSpd,AtYC,AtRC,AtCS,Result
0,Bournemouth,Arsenal,19,0.374453,0.247433,0.370468,0.893471,1.376147,77.0,77.0,...,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0,H
1,Brighton & Hove,Arsenal,19,0.397048,0.265983,0.332787,0.824742,1.146789,77.0,76.0,...,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0,A
2,Burnley,Arsenal,19,0.351889,0.337976,0.309586,0.549828,0.779817,76.0,76.0,...,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0,A
3,Chelsea,Arsenal,19,0.583343,0.239465,0.169422,1.030928,0.733945,83.0,82.0,...,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0,H
4,Crystal Palace,Arsenal,19,0.44342,0.242809,0.305012,0.996564,1.238532,78.0,78.0,...,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0,A


In [37]:
fixture_1819['Result'].value_counts()

H    153
A    119
Name: Result, dtype: int64

In [38]:
standing = pd.DataFrame(columns=(['Team', 'W', 'D', 'L', 'PTS', 'Rank']))
standing['Team'] = teams
standing[['PTS', 'W', 'L', 'D', 'Rank']]= 0

In [39]:
for i in range(len(fixture_1819)):
    if fixture_1819.loc[i, 'Result'] == 'H':
        standing.loc[standing['Team'] == fixture_1819.loc[i, 'HomeTeam'], 'PTS'] += 3
        standing.loc[standing['Team'] == fixture_1819.loc[i, 'HomeTeam'], 'W'] += 1
        standing.loc[standing['Team'] == fixture_1819.loc[i, 'AwayTeam'], 'L'] += 1

    if fixture_1819.loc[i, 'Result'] == 'A':
        standing.loc[standing['Team'] == fixture_1819.loc[i, 'AwayTeam'], 'PTS'] += 3
        standing.loc[standing['Team'] == fixture_1819.loc[i, 'AwayTeam'], 'W'] += 1
        standing.loc[standing['Team'] == fixture_1819.loc[i, 'HomeTeam'], 'L'] += 1
        
    if fixture_1819.loc[i, 'Result'] == 'D':
        standing.loc[standing['Team'] == fixture_1819.loc[i, 'HomeTeam'], 'PTS'] += 1
        standing.loc[standing['Team'] == fixture_1819.loc[i, 'AwayTeam'], 'PTS'] += 1
        standing.loc[standing['Team'] == fixture_1819.loc[i, 'AwayTeam'], 'D'] += 1
        standing.loc[standing['Team'] == fixture_1819.loc[i, 'HomeTeam'], 'D'] += 1


In [40]:
standing = standing.sort_values('PTS', ascending=False)
standing = standing.reset_index(drop=True)
for i in range(len(standing)):
    standing.loc[i, 'Rank'] = i + 1
standing = standing.set_index('Rank')

In [41]:
standing

Unnamed: 0_level_0,Team,W,D,L,PTS
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,Manchester City,32,0,0,96
2,Manchester Utd,28,0,4,84
3,Liverpool,27,0,5,81
4,Tottenham,27,0,5,81
5,Chelsea,23,0,9,69
6,Arsenal,22,0,10,66
7,Leicester City,17,0,15,51
8,Bournemouth,16,0,16,48
9,Watford,14,0,18,42
10,Crystal Palace,13,0,19,39


In [42]:
E0 = pd.read_csv('./E0.csv')

In [43]:
E0.head()

Unnamed: 0,Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,...,BbAv<2.5,BbAH,BbAHh,BbMxAHH,BbAvAHH,BbMxAHA,BbAvAHA,PSCH,PSCD,PSCA
0,E0,10/08/2018,Man United,Leicester,2,1,H,1,0,H,...,1.79,17,-0.75,1.75,1.7,2.29,2.21,1.55,4.07,7.69
1,E0,11/08/2018,Bournemouth,Cardiff,2,0,H,1,0,H,...,1.83,20,-0.75,2.2,2.13,1.8,1.75,1.88,3.61,4.7
2,E0,11/08/2018,Fulham,Crystal Palace,0,2,A,0,1,A,...,1.87,22,-0.25,2.18,2.11,1.81,1.77,2.62,3.38,2.9
3,E0,11/08/2018,Huddersfield,Chelsea,0,3,A,0,2,A,...,1.84,23,1.0,1.84,1.8,2.13,2.06,7.24,3.95,1.58
4,E0,11/08/2018,Newcastle,Tottenham,1,2,A,1,2,A,...,1.81,20,0.25,2.2,2.12,1.8,1.76,4.74,3.53,1.89


In [45]:
fixture_1819.head()

Unnamed: 0,HomeTeam,AwayTeam,Season,W,D,L,HtOff,HtDef,HtPesOvr,HtPesDef,...,AtPesOvr,AtPesDef,AtPesOff,AtPesMid,AtPesPhy,AtPesSpd,AtYC,AtRC,AtCS,Result
0,Bournemouth,Arsenal,19,0.374453,0.247433,0.370468,0.893471,1.376147,77.0,77.0,...,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0,H
1,Brighton & Hove,Arsenal,19,0.397048,0.265983,0.332787,0.824742,1.146789,77.0,76.0,...,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0,A
2,Burnley,Arsenal,19,0.351889,0.337976,0.309586,0.549828,0.779817,76.0,76.0,...,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0,A
3,Chelsea,Arsenal,19,0.583343,0.239465,0.169422,1.030928,0.733945,83.0,82.0,...,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0,H
4,Crystal Palace,Arsenal,19,0.44342,0.242809,0.305012,0.996564,1.238532,78.0,78.0,...,82.0,81.0,84.0,82.0,76.0,77.0,57.0,2.0,13.0,A


In [50]:
prediction = fixture_1819[['HomeTeam','AwayTeam','Result']]

In [54]:
for i in range(len(prediction)):
    if E0['HomeTeam'] == prediction['HomeTeam'] and E0['AwayTeam'] == prediction['AwayTeam']:
        prediction['Real'] == E0['FTR']


ValueError: Can only compare identically-labeled Series objects

In [69]:
E0[(E0['HomeTeam'] == 'Arsenal') & (E0['AwayTeam'] == 'Burnley')]['FTR'].values[0]

'H'

In [70]:
for i in range(len(prediction)):
    try:
        prediction.loc[i, 'Real'] = E0[(E0['HomeTeam'] == prediction.loc[i, 'HomeTeam']) & (E0['AwayTeam'] == prediction.loc[i, 'AwayTeam'])]['FTR'].values[0]

    except IndexError:
        pass

In [77]:
prediction.dropna(inplace=True)

In [79]:
prediction.reset_index(inplace=True)

In [80]:
prediction

Unnamed: 0,index,HomeTeam,AwayTeam,Result,Real
0,0,Bournemouth,Arsenal,H,A
1,3,Chelsea,Arsenal,H,H
2,4,Crystal Palace,Arsenal,A,D
3,5,Everton,Arsenal,H,H
4,6,Huddersfield,Arsenal,A,A
5,8,Liverpool,Arsenal,H,H
6,12,Southampton,Arsenal,A,H
7,13,Tottenham,Arsenal,H,D
8,16,Arsenal,Bournemouth,H,H
9,18,Burnley,Bournemouth,A,H


In [83]:
(prediction['Result'] == prediction['Real']).mean()

0.6236559139784946