# Measuring the home advantage in the English Premier League

## 1. Aim

Test the hypothesis the that the proportion of games won by the home team in the English Premier League is equal to 0.33.

## 2. Setup the notebook

### 2.1 Import the modules

In [1]:
from __future__ import annotations
from statsmodels.stats.proportion import proportion_confint, proportions_ztest
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme()

### 2.2 Load the data

In [2]:
epl1819 = pd.read_csv("..\\data\\epl_1819.csv")

## 3. Prepare the data

### 3.1 Preview the data

In [3]:
epl1819.head()

Unnamed: 0,timestamp,date_GMT,status,attendance,home_team_name,away_team_name,referee,Game Week,Pre-Match PPG (Home),Pre-Match PPG (Away),...,odds_ft_home_team_win,odds_ft_draw,odds_ft_away_team_win,odds_ft_over15,odds_ft_over25,odds_ft_over35,odds_ft_over45,odds_btts_yes,odds_btts_no,stadium_name
0,1533927600,Aug 10 2018 - 7:00pm,complete,74439,Manchester United,Leicester City,Andre Marriner,1,0.0,0.0,...,1.37,4.98,9.81,1.33,2.0,3.6,7.5,2.05,1.69,Old Trafford (Manchester)
1,1533987000,Aug 11 2018 - 11:30am,complete,51749,Newcastle United,Tottenham Hotspur,Martin Atkinson,1,0.0,0.0,...,4.51,3.77,1.84,1.31,1.95,3.5,5.75,1.83,1.87,St. James' Park (Newcastle upon Tyne)
2,1533996000,Aug 11 2018 - 2:00pm,complete,10353,AFC Bournemouth,Cardiff City,Kevin Friend,1,0.0,0.0,...,2.03,3.51,3.96,1.31,1.95,3.45,6.7,1.83,1.83,Vitality Stadium (Bournemouth- Dorset)
3,1533996000,Aug 11 2018 - 2:00pm,complete,24821,Fulham,Crystal Palace,Mike Dean,1,0.0,0.0,...,2.31,3.44,3.26,1.28,1.87,3.2,7.0,1.71,2.0,Craven Cottage (London)
4,1533996000,Aug 11 2018 - 2:00pm,complete,24121,Huddersfield Town,Chelsea,Chris Kavanagh,1,0.0,0.0,...,7.47,4.27,1.51,1.29,1.91,3.3,6.95,2.0,1.71,John Smith's Stadium (Huddersfield- West Yorks...


### 3.2 Select interesting columns

#### 3.2.1 Rename the columns

In [4]:
col_names: dict(str, str) = {
    "home_team_goal_count": "h_goals",
    "away_team_goal_count": "a_goals"}
epl1819.rename(
    columns={
        "home_team_goal_count": "h_goals",
        "away_team_goal_count": "a_goals"},
    inplace=True)

#### 3.2.2 Select the columns

In [5]:
home_adv = epl1819[["h_goals", "a_goals"]].copy(False)

### 3.3 Indentify winner

#### 3.3.1 Calculate the difference

In [6]:
home_adv["diff"] = home_adv["h_goals"] - home_adv["a_goals"]

#### 3.3.2 Identify if home won

In [7]:
def is_home_win(x):
    # returns 1 if x > 0, otherwise 0
    if (x > 0):
        return 1
    else:
        return 0

In [8]:
home_adv["home_win"] = home_adv["diff"].apply(is_home_win)

In [9]:
home_adv.head()

Unnamed: 0,h_goals,a_goals,diff,home_win
0,2,1,1,1
1,1,2,-1,0
2,2,0,2,1
3,0,2,-2,0
4,0,3,-3,0


## 4. Test the data

### 4.1 Calculate parameters

In [10]:
home_wins = home_adv["home_win"].sum()
home_wins

181

In [11]:
sample_size = home_adv.index.size
sample_size

380

### 4.2 Calculate $z$-interval

In [12]:
proportion_confint(
    count=home_wins,
    nobs=sample_size)

(0.4261002050234076, 0.5265313739239608)

### 4.3 Perform $z$-test

In [13]:
proportions_ztest(
    count=home_wins,
    nobs=sample_size,
    value=1/3)

(5.58074684431204, 2.3948800362434798e-08)