# NFL Scores and Betting Data
## Grant Cloud
<p>Inspired by <i>Mathletics - Chapters 38 & 39</i>, this notebook aims to look at NFL spread cover rates depending on home/away and favorite/underdog since 2000.</p>

imports

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from helpers import vis_null

load the data

In [2]:
df = pd.read_csv('data/nfl_spread_covers.csv')
print('{} regular season games from 2000 through 2018'.format(df.shape[0]))

4848 regular season games from 2000 through 2018


In [3]:
# helper function to calculate who covered in each game  
def calc_cover(row):
    if row.favorite == row.team_home:
        diff = (row.score_home + row.spread_favorite) - row.score_away
        if diff > 0:
            return 'home_favorite'
        elif diff < 0:
            return 'away_underdog'
        else:
            return 'push'
    elif row.favorite == row.team_away:
        diff = row.score_home - (row.score_away + row.spread_favorite)
        if diff > 0:
            return 'home_underdog'
        elif diff < 0:
            return 'away_favorite'
        else:
            return 'push'

creating a new column with the results of the spread by using the helper function <i>calc_cover</i>

In [4]:
df['cover'] = df.apply(calc_cover, axis=1)

<p>viewing the how many times each possible outcome has occurred</p>

In [5]:
covSer = df.cover.value_counts()
covSer

away_underdog    1512
home_favorite    1422
away_favorite     738
home_underdog     737
push              141
Name: cover, dtype: int64

<p>calculating probabilities of each outcome occuring</p>

In [6]:
au = covSer[0] / ( covSer[0] + covSer[1])
hf = covSer[1] / ( covSer[0] + covSer[1])
af = covSer[2] / ( covSer[2] + covSer[3])
hu = covSer[3] / ( covSer[3] + covSer[3])
push = covSer[4] / (covSer[0] + covSer[1] + covSer[2] + covSer[3] + covSer[4])
pd.Series([au,hf,af,hu,push], index=df.cover.value_counts().index)

away_underdog    0.515337
home_favorite    0.484663
away_favorite    0.500339
home_underdog    0.500000
push             0.030989
dtype: float64

### Analysis

<p>We can see that since 2000, away underdogs have covered the spread 51.5337% of the time and home_favorites cover the spread 48.4663% of the time -- 90 more away underdogs have covered than home favorites since 2000 --. This shows that bookies shifting lines to take advantage of home favorite bias to improve their margins. These hit percentages differ from the percentages found in Levitt's sample analysis of NFL covers during the 2001 season (where the conclusion was only that there is favorite bias, and that home underdogs hit 57.7% of the time). 

looking only at more recent years (2015-2018)

In [7]:
df = df[df.schedule_season.astype('int') >= 2015]
print('{} regular season games from 2015 through 2018'.format(df.shape[0]))
covSer = df.cover.value_counts()
au = covSer[0] / ( covSer[0] + covSer[1])
hf = covSer[1] / ( covSer[0] + covSer[1])
af = covSer[2] / ( covSer[2] + covSer[3])
hu = covSer[3] / ( covSer[3] + covSer[3])
push = covSer[4] / (covSer[0] + covSer[1] + covSer[2] + covSer[3] + covSer[4])
pd.Series([au,hf,af,hu,push], index=df.cover.value_counts().index)

1024 regular season games from 2015 through 2018


away_underdog    0.516963
home_favorite    0.483037
away_favorite    0.501466
home_underdog    0.500000
push             0.037111
dtype: float64

the trend holds, bookies are taking advantage of home favorite bias

### Should I bet away underdogs since they win > 50% of the time?
<p>no. Assuming a standard book's 11-10 odds on the spread, you need to hit bets more than 52.4% of the time to be profitable (see <i>Mathletics - Chapter 38</i> to find out where the 52.4% came from). Since away underdogs only hit 51.6963% of the time, you would still lose money in the long haul only betting away underdogs</p>