# Analysis of home team advantage for Elo ratings in SHL

Good explanations on how to calculate home team advantage:

* https://docs.google.com/file/d/0Bxr6KEe4KY_OYnJuLUw1WF9GcGs/edit (from http://wsasu.blogspot.se/)
* http://clubelo.com/Articles/HomeFieldAdvantage.html

In [1]:
%pylab inline
import pandas as pd

Populating the interactive namespace from numpy and matplotlib


In [2]:
import glob
df = pd.concat([pd.read_csv(path, encoding='utf-8') for path in glob.glob('data/*.csv')])
df.sort(['Season', 'Game type', 'Date', 'Home', 'Visitor'], inplace=True)

# Home team advantage

If $y$ is the ratio between home team points and the total number of points, then

$y = {1 \over {1 + 10^{(B - (A + x) / 400}}} \implies x = \frac{-400 ln(\frac{1}{y - 1})}{ln 10}$

... where $x$ is the number of Elo points corresponding to home team advantage.

In [3]:
def home_team_points(x):
    # Home team win
    if x['HS'] > x['VS']:
        if x['Dec'] in ['SO', 'OT']:
            return 2
        
        return 3
    
    # Home team loss
    if x['HS'] < x['VS']:
        if x['Dec'] in ['SO', 'OT']:
            return 1
        
        return 0
    
    # Draw
    return 1

In [4]:
def home_team_advantage(df):
    y = 1. * df.apply(home_team_points, axis=1).sum() / (len(df) * 3)
    return (-400 * log(1. / y - 1)) / log(10)

## Home team advantage per season

In [5]:
pd.DataFrame(df[df['Game type'] == 'SHL'].groupby('Season').apply(home_team_advantage),
             columns=['Home team advantage'])

Unnamed: 0_level_0,Home team advantage
Season,Unnamed: 1_level_1
2006,59.534697
2007,47.317019
2008,70.436504
2009,75.569948
2010,66.788369
2011,48.747802
2012,26.018462
2013,16.155104
2014,75.569948


## Home team advantage, all time

In [6]:
home_team_advantage(df[df['Game type'] == 'SHL'])

53.84870751579318