# Introduction

At the onset of the 2016 fantasy football season I decided that I was going to try and win my league with numbers and computer science. I was going to take the opportunity to learn modern data science tools and finally put some of that college statistics I'd learned to use. 

A combination of factors led me to move through various tools and approaches at the beginning of the season. Ultimately I ended up using Excel to quickly fire up a list of players in the order I wanted by hand. In the next sectoin I'm goign to reproduce this methdology using python, specifically the data science package `pandas`.

# Python Reproduction of Excel Methodology

## Imports

These are the packages used to reproduce the list I used for drafting my team for 2016.

* `pandas` Is a Python data analysis library and is available [here](http://pandas.pydata.org/)
* `numpy` Is a package for scientific computing in python, used below primarily for its mathmatical functions and constructs. It is available [here](http://www.numpy.org/).
* `matplotlib.pyplot` Is a package for plotting data and is available [here](http://matplotlib.org/).
* `pyffl` Is a package I've developed for scraping fantasy football data and will be including any pure python functions there. It is available [here](https://github.com/smkell/pyffl)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import pyffl

## Getting the data

My original intent was to use a wide variety of projection and ranking data to formulate my ranks. However, time, laziness and indecisiveness ulitmately led me to only using ESPN's standard projections. 

I've created a function in the `pyffl` package for retrieving projections from a variety of sources. Of course once again only `ESPN` is currently implemented.

In [2]:
rules = pyffl.LeagueRules()

In [3]:
projections = pyffl.scrape_projections(['espn'], 2016)


Scraping ESPN projections for week 0 of 2016 season
...Done


In [4]:
projections = pyffl.calculate_points(projections, rules)

## Building the dataframe

Pandas primarily operates on objects known as `Series` and `DataFrame` where a `DataFrame` is a table composed of several `Series` associated together by an `index`. In the below code segment we construct a `DataFram` for our projections. The `*_key` lists give the names of the columns in the desired order for display.

In [5]:
info_keys = ['name', 'team', 'position']
skill_keys = ['passCmp', 'passAtt', 'passYds', 'passTds', 'passInts',
              'rushAtt', 'rushYds', 'rushTds',
              'recsCmp', 'recsAtt', 'recsYds', 'recsTds']
dst_keys = ['dstTckls', 'dstSacks', 'dstFmblFrc', 'dstFmblRec', 'dstInts', 'dstIntTds', 'dstFmblTds']
k_keys = ['fg0139Cmp', 'fg0139Att', 'fg4049Cmp', 'fg4049Att', 'fg50Cmp', 'fg50Att',
          'fgCmp', 'fgAtt', 'xpCmp', 'xpAtt']
calc_keys = ['pts']
all_keys = info_keys + skill_keys + dst_keys + calc_keys
df = pd.DataFrame(projections, columns=all_keys)

## Calculating VOR 

`Value over replacement` is a measure of a player's value compared to other players in that position. The theory here is that we can measure how valuable a player by comparing how many more points he is projected to score than the next startable player in the position. Analysing this number shows descrete gaps in value where players are split in tiers. We can, in principle, use this information to pick the most valuable players at the right moment in the draft.

In [6]:
# Add the value over replacement column
df['vor'] = np.nan

In [7]:
# Positions is a `dict` where the key is the position, and the value is the number of players in that position which could
# be started in any given week. I.e. if there are 8 teams in the league and one QB slot per team then 8 QBs could be started.
# Likewise if there are 2 RB slots and 1 RB/WR/TE Flex then at most 3*8(24) RBs could be started in a given week.
positions = {
    'QB': (rules.starting_qbs + rules.starting_superflex) * rules.num_teams,
    'RB': (rules.starting_rbs + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
    'WR': (rules.starting_wrs + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
    'TE': (rules.starting_tes + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
    'K': rules.starting_ks * rules.num_teams,
    'D/ST': rules.starting_dst * rules.num_teams
}

In [8]:
# Iterate through the positions, and calculate the vor for each player in the position.
for position, draftable in positions.iteritems():
    last_qb = df[df['position'] == position].sort_values(by='pts',ascending=False).iloc[draftable,:]
    df.ix[df['position'] == position, 'vor'] = df.ix[df['position'] == position, 'pts'] - last_qb['pts']

The following table are the top 16 players in all positions. It represents my rankings for the first two rounds of the 2016 draft.

In [9]:
round = 0
start = round * rules.num_teams
end = start + rules.num_teams
df[['name', 'position', 'pts', 'vor']].sort_values(by='vor',ascending=False).iloc[start:end,:]

Unnamed: 0,name,position,pts,vor
320,Antonio Brown,WR,319.87,162.19
322,Julio Jones,WR,303.08,145.4
123,David Johnson,RB,265.32,113.89
121,Todd Gurley,RB,263.7,112.27
120,Adrian Peterson,RB,260.37,108.94
124,Devonta Freeman,RB,256.95,105.52
520,Rob Gronkowski,TE,189.32,102.49
521,Jordan Reed,TE,183.53,96.7


In [10]:
round = 1
start = 1 + (round * rules.num_teams)
end = start + rules.num_teams
df[['name', 'position', 'pts', 'vor']].sort_values(by='vor',ascending=False).iloc[start:end,:]

Unnamed: 0,name,position,pts,vor
130,LeSean McCoy,RB,244.21,92.78
126,Lamar Miller,RB,244.13,92.7
122,Ezekiel Elliott,RB,242.77,91.34
125,Le'Veon Bell,RB,238.04,86.61
522,Greg Olsen,TE,172.55,85.72
128,Mark Ingram,RB,235.96,84.53
524,Travis Kelce,TE,169.39,82.56
323,DeAndre Hopkins,WR,237.03,79.35


## A quick interlude for analysis

So far we've actually come pretty far in terms of data collection and massaging. I think, then that it's time that we start taking a look at what it all means. 

In the above table I've listed the 16 most valuable players according to my model. There are problems here. First of all going into the draft I never had any intention of drafting either Rob Gronkowski or Jordan Reed, simply based off the fact that I *knew* that the tight-end position simply wasn't valuable enough to justify a first or second round pick. The fact that two tightends show up in the top 16 suggest that either my assumptions, or the model, are wrong.

## Comparing projected rankings to actual rankings

In [11]:
actuals = pyffl.scrape_actuals(2016)
actuals = pyffl.calculate_points(actuals, rules)

In [12]:
dfa = pd.DataFrame(actuals, columns=all_keys)
dfa['vor'] = np.nan

In [24]:
positions = {
    'QB': (rules.starting_qbs + rules.starting_superflex) * rules.num_teams,
    'RB': (rules.starting_rbs + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
    'WR': (rules.starting_wrs + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
    'TE': (rules.starting_tes + rules.starting_superflex + rules.starting_flex) * rules.num_teams,
}
# Iterate through the positions, and calculate the vor for each player in the position.
for position, draftable in positions.iteritems():
    last_qb = dfa[dfa['position'] == position].sort_values(by='pts',ascending=False).iloc[draftable,:]
    dfa.ix[dfa['position'] == position, 'vor'] = dfa.ix[dfa['position'] == position, 'pts'] - last_qb['pts']

In [22]:
round = 0
start = round * rules.num_teams
end = start + rules.num_teams
dfa[['name', 'position', 'pts', 'vor']].sort_values(by='vor',ascending=False).iloc[start:end,:]

Unnamed: 0,name,position,pts,vor
267,David Johnson,RB,268.7,173.8
223,Ezekiel Elliott,RB,240.3,145.4
183,DeMarco Murray,RB,232.9,138.0
114,Melvin Gordon,RB,218.5,123.6
422,Le'Veon Bell,RB,188.7,93.8
53,LeSean McCoy,RB,182.5,87.6
92,Aaron Rodgers,QB,263.46,87.46
123,Drew Brees,QB,261.34,85.34


In [23]:
round = 1
start = 1 + (round * rules.num_teams)
end = start + rules.num_teams
dfa[['name', 'position', 'pts', 'vor']].sort_values(by='vor',ascending=False).iloc[start:end,:]

Unnamed: 0,name,position,pts,vor
280,LeGarrette Blount,RB,168.0,73.1
181,Marcus Mariota,QB,247.72,71.72
146,Matt Forte,RB,164.2,69.3
19,Devonta Freeman,RB,163.0,68.1
302,Antonio Brown,WR,183.7,68.0
18,Matt Ryan,QB,241.04,65.04
109,Travis Kelce,TE,124.3,64.8
24,Julio Jones,WR,179.5,63.8
