# Introduction

At the onset of the 2016 fantasy football season I decided that I was going to try and win my league with numbers and computer science. I was going to take the opportunity to learn modern data science tools and finally put some of that college statistics I'd learned to use. 

A combination of factors led me to move through various tools and approaches at the beginning of the season. Ultimately I ended up using Excel to quickly fire up a list of players in the order I wanted by hand. In the next sectoin I'm goign to reproduce this methdology using python, specifically the data science package `pandas`.

# Python Reproduction of Excel Methodology

## Imports

These are the packages used to reproduce the list I used for drafting my team for 2016.

* `pandas` Is a Python data analysis library and is available [here](http://pandas.pydata.org/)
* `numpy` Is a package for scientific computing in python, used below primarily for its mathmatical functions and constructs. It is available [here](http://www.numpy.org/).
* `matplotlib.pyplot` Is a package for plotting data and is available [here](http://matplotlib.org/).
* `pyffl` Is a package I've developed for scraping fantasy football data and will be including any pure python functions there. It is available [here](https://github.com/smkell/pyffl)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.core.display import display

import pyffl

## Getting the data

My original intent was to use a wide variety of projection and ranking data to formulate my ranks. However, time, laziness and indecisiveness ulitmately led me to only using ESPN's standard projections. 

I've created a function in the `pyffl` package for retrieving projections from a variety of sources. Of course once again only `ESPN` is currently implemented.

In [2]:
rules = pyffl.LeagueRules()

In [3]:
projections = pyffl.scrape_projections(['espn'], 2016)


Scraping ESPN projections for week 0 of 2016 season
...Done


In [4]:
projections = pyffl.calculate_points(projections, rules)

## Building the dataframe

Pandas primarily operates on objects known as `Series` and `DataFrame` where a `DataFrame` is a table composed of several `Series` associated together by an `index`. In the below code segment we construct a `DataFram` for our projections. The `*_key` lists give the names of the columns in the desired order for display.

In [5]:
df = pd.DataFrame(projections)
df['source'] = 'espn'
df['projection'] = True
df['vor'] = np.nan
df['rank'] = df['pts'].rank(ascending=0)
df['positionRank'] = 0.0

info_keys = ['rank', 'positionRank', 'name', 'team', 'position', 'source', 'projection']
skill_keys = ['passCmp', 'passAtt', 'passYds', 'passTds', 'passInts',
              'rushAtt', 'rushYds', 'rushTds',
              'recsCmp', 'recsAtt', 'recsYds', 'recsTds']
dst_keys = ['dstTckls', 'dstSacks', 'dstFmblFrc', 'dstFmblRec', 'dstInts', 'dstIntTds', 'dstFmblTds']
k_keys = ['fg0139Cmp', 'fg0139Att', 'fg4049Cmp', 'fg4049Att', 'fg50Cmp', 'fg50Att',
          'fgCmp', 'fgAtt', 'xpCmp', 'xpAtt']
calc_keys = ['pts','vor']
all_keys = info_keys + skill_keys + dst_keys + calc_keys
df = df[all_keys]

## Calculating VOR 

`Value over replacement` is a measure of a player's value compared to other players in that position. The theory here is that we can measure how valuable a player by comparing how many more points he is projected to score than the next startable player in the position. Analysing this number shows descrete gaps in value where players are split in tiers. We can, in principle, use this information to pick the most valuable players at the right moment in the draft.

In [6]:
# Positions is a `dict` where the key is the position, and the value is the number of players in that position which could
# be started in any given week. I.e. if there are 8 teams in the league and one QB slot per team then 8 QBs could be started.
# Likewise if there are 2 RB slots and 1 RB/WR/TE Flex then at most 3*8(24) RBs could be started in a given week.
positions = {
    'QB': (rules.starting_qbs + rules.starting_superflex) * rules.num_teams,
    'RB': (rules.starting_rbs + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
    'WR': (rules.starting_wrs + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
    'TE': (rules.starting_tes + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
    'K': rules.starting_ks * rules.num_teams,
    'D/ST': rules.starting_dst * rules.num_teams
}

In [7]:
# Iterate through the positions, and calculate the vor for each player in the position.
for position, draftable in positions.iteritems():
    last_qb = df[df['position'] == position].sort_values(by='pts',ascending=False).iloc[draftable,:]
    df.ix[df['position'] == position, 'vor'] = df.ix[df['position'] == position, 'pts'] - last_qb['pts']
    df.ix[df['position'] == position, 'positionRank'] = df.ix[df['position'] == position, 'pts'].rank(ascending=0)

The following table are the top 16 players in all positions. It represents my rankings for the first two rounds of the 2016 draft.

In [8]:
round = 0
start = round * rules.num_teams
end = start + rules.num_teams
df[['rank', 'positionRank', 'name', 'position', 'pts', 'vor']].sort_values(by='vor',ascending=False).iloc[start:end,:]

Unnamed: 0,rank,positionRank,name,position,pts,vor
320,1.0,1.0,Antonio Brown,WR,383.0,185.42
322,2.0,2.0,Julio Jones,WR,346.15,148.57
520,47.0,1.0,Rob Gronkowski,TE,244.19,140.42
521,55.0,2.0,Jordan Reed,TE,234.27,130.5
123,13.0,1.0,David Johnson,RB,280.66,120.97
124,19.0,2.0,Devonta Freeman,RB,270.52,110.83
121,22.0,3.0,Todd Gurley,RB,269.01,109.32
321,6.0,3.0,Odell Beckham Jr.,WR,305.81,108.23


In [9]:
round = 1
start = 1 + (round * rules.num_teams)
end = start + rules.num_teams
df[['rank', 'name', 'position', 'pts', 'vor']].sort_values(by='vor',ascending=False).iloc[start:end,:]

Unnamed: 0,rank,name,position,pts,vor
120,26.0,Adrian Peterson,RB,264.5,104.81
524,79.0,Travis Kelce,TE,198.89,95.12
130,36.0,LeSean McCoy,RB,253.65,93.96
125,37.0,Le'Veon Bell,RB,252.79,93.1
523,85.0,Delanie Walker,TE,196.52,92.75
126,39.0,Lamar Miller,RB,251.8,92.11
122,40.0,Ezekiel Elliott,RB,251.62,91.93
323,10.0,DeAndre Hopkins,WR,284.82,87.24


## A quick interlude for analysis

So far we've actually come pretty far in terms of data collection and massaging. I think, then that it's time that we start taking a look at what it all means. 

In the above table I've listed the 16 most valuable players according to my model. There are problems here. First of all going into the draft I never had any intention of drafting either Rob Gronkowski or Jordan Reed, simply based off the fact that I *knew* that the tight-end position simply wasn't valuable enough to justify a first or second round pick. The fact that two tightends show up in the top 16 suggest that either my assumptions, or the model, are wrong.

## Comparing projected rankings to actual rankings

In [10]:
actuals = pyffl.scrape_actuals(2016)
actuals = pyffl.calculate_points(actuals, rules)

In [11]:
dfa = pd.DataFrame(actuals, columns=all_keys)
dfa['vor'] = np.nan
dfa['rank'] = dfa['pts'].rank(ascending=0)

In [12]:
positions = {
    'QB': (rules.starting_qbs + rules.starting_superflex) * rules.num_teams,
    'RB': (rules.starting_rbs + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
    'WR': (rules.starting_wrs + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
    'TE': (rules.starting_tes + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
}
# Iterate through the positions, and calculate the vor for each player in the position.
for position, draftable in positions.iteritems():
    last_qb = dfa[dfa['position'] == position].sort_values(by='pts',ascending=False).iloc[draftable,:]
    dfa.ix[dfa['position'] == position, 'vor'] = dfa.ix[dfa['position'] == position, 'pts'] - last_qb['pts']
    dfa.ix[dfa['position'] == position, 'positionRank'] = dfa.ix[dfa['position'] == position, 'pts'].rank(ascending=0)

### Round 1 Projected vs Actual

In [13]:
round = 0
start = round * rules.num_teams
end = start + rules.num_teams
display(df[['rank', 'positionRank', 'name', 'position', 'pts', 'vor']].sort_values(by='vor', ascending=False).iloc[start:end,:])
display(dfa[['rank', 'positionRank', 'name', 'position', 'pts', 'vor']].sort_values(by='vor',ascending=False).iloc[start:end,:])

Unnamed: 0,rank,positionRank,name,position,pts,vor
320,1.0,1.0,Antonio Brown,WR,383.0,185.42
322,2.0,2.0,Julio Jones,WR,346.15,148.57
520,47.0,1.0,Rob Gronkowski,TE,244.19,140.42
521,55.0,2.0,Jordan Reed,TE,234.27,130.5
123,13.0,1.0,David Johnson,RB,280.66,120.97
124,19.0,2.0,Devonta Freeman,RB,270.52,110.83
121,22.0,3.0,Todd Gurley,RB,269.01,109.32
321,6.0,3.0,Odell Beckham Jr.,WR,305.81,108.23


Unnamed: 0,rank,positionRank,name,position,pts,vor
267,1.0,1.0,David Johnson,RB,286.4,187.7
183,4.0,2.0,DeMarco Murray,RB,250.6,151.9
223,6.0,3.0,Ezekiel Elliott,RB,246.2,147.5
114,10.0,4.0,Melvin Gordon,RB,230.3,131.6
302,7.0,1.0,Antonio Brown,WR,242.7,108.7
33,9.0,2.0,Mike Evans,WR,235.0,101.0
422,22.0,5.0,Le'Veon Bell,RB,194.6,95.9
53,28.0,6.0,LeSean McCoy,RB,188.4,89.7


### Round 2 Projected vs. Actual

In [14]:
round = 1
start = 1 + (round * rules.num_teams)
end = start + rules.num_teams
display(df[['rank', 'positionRank', 'name', 'position', 'pts', 'vor']].sort_values(by='vor', ascending=False).iloc[start:end,:])
display(dfa[['rank', 'positionRank', 'name', 'position', 'pts', 'vor']].sort_values(by='vor',ascending=False).iloc[start:end,:])

Unnamed: 0,rank,positionRank,name,position,pts,vor
120,26.0,4.0,Adrian Peterson,RB,264.5,104.81
524,79.0,4.0,Travis Kelce,TE,198.89,95.12
130,36.0,5.0,LeSean McCoy,RB,253.65,93.96
125,37.0,6.0,Le'Veon Bell,RB,252.79,93.1
523,85.0,5.0,Delanie Walker,TE,196.52,92.75
126,39.0,7.0,Lamar Miller,RB,251.8,92.11
122,40.0,8.0,Ezekiel Elliott,RB,251.62,91.93
323,10.0,4.0,DeAndre Hopkins,WR,284.82,87.24


Unnamed: 0,rank,positionRank,name,position,pts,vor
92,2.0,1.0,Aaron Rodgers,QB,263.46,81.56
293,60.0,2.0,Jordan Reed,TE,152.0,81.4
184,65.0,3.0,Delanie Walker,TE,151.0,80.4
123,3.0,2.0,Drew Brees,QB,261.34,79.44
19,37.0,7.0,Devonta Freeman,RB,174.8,76.1
210,72.0,4.0,Jimmy Graham,TE,145.6,75.0
24,13.0,3.0,Julio Jones,WR,209.0,75.0
109,76.0,5.0,Travis Kelce,TE,142.0,71.4


### Average projected points per position vs Average actual points per position

In [15]:
num_rostered = (rules.starting_qbs + 
                rules.starting_rbs + 
                rules.starting_wrs + 
                rules.starting_tes + 
                rules.starting_ks + 
                rules.starting_dst + 
                rules.starting_flex +
                rules.starting_superflex +
                8)
grp_proj = df.sort_values(by='pts', ascending=False).iloc[0:num_rostered*8].groupby('position')
display(grp_proj.agg({'pts': np.mean}))

grp_actl = dfa.sort_values(by='pts', ascending=False).iloc[0:num_rostered*8].groupby('position')
display(grp_actl.agg({'pts': np.mean}))

Unnamed: 0_level_0,pts
position,Unnamed: 1_level_1
QB,262.582333
RB,201.517368
TE,180.324375
WR,207.957333


Unnamed: 0_level_0,pts
position,Unnamed: 1_level_1
QB,183.596667
RB,149.117143
TE,118.388889
WR,143.08918


### Identifying the biggest busts of the year.

The below table merges the actual and projected points, and calculates the difference between the two. The table is then sorted by the difference to identify the biggest "busts" or disappointments. Several of these we can attribute to injuries and we can discount them fairly easily. Others are less easy to explain.

In [16]:
def color_negative_red(val):
    color = 'red' if val < 0 else 'black'
    return 'color: %s' % color

mrg = df.merge(dfa, left_on=['name','position'], right_on=['name','position'], how='inner', suffixes=['_proj', '_actl'])
mrg['pts_diff'] = mrg['pts_actl'] - mrg['pts_proj']
mrg['positionRank_diff'] = mrg['positionRank_proj'] - mrg['positionRank_actl']
mrg_cols = ['name', 'position','positionRank_proj', 'positionRank_actl', 'pts_proj', 'pts_actl', 'pts_diff', 'positionRank_diff']
tbl = mrg[mrg_cols].sort_values(by='pts_diff').iloc[0:rules.num_teams*3,:]
display(tbl.style.applymap(color_negative_red))

Unnamed: 0,name,position,positionRank_proj,positionRank_actl,pts_proj,pts_actl,pts_diff,positionRank_diff
60,Adrian Peterson,RB,4,102,264.5,7.7,-256.8,-98
185,Keenan Allen,WR,8,142,264.36,12.3,-252.06,-134
24,Robert Griffin III,QB,20,45,253.734,9.3,-244.434,-25
181,Sammy Watkins,WR,17,128,237.17,23.3,-213.87,-111
192,Eric Decker,WR,12,103,250.94,40.4,-210.54,-91
74,Jamaal Charles,RB,12,94,221.65,13.4,-208.25,-82
84,Danny Woodhead,RB,13,75,211.5,27.1,-184.4,-62
23,Jay Cutler,QB,26,35,229.376,50.76,-178.616,-9
67,Doug Martin,RB,10,54,224.65,53.0,-171.65,-44
90,Ameer Abdullah,RB,20,76,190.47,26.8,-163.67,-56


In [17]:
dfa[dfa['name'] == 'Antonio Brown'].loc[:,['name','rank','recsCmp','recsYds','recsTds', 'pts']]

Unnamed: 0,name,rank,recsCmp,recsYds,recsTds,pts
302,Antonio Brown,7.0,82,998,10,242.7
