# Jupyter Notebook

In [75]:
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

In [76]:
%load_ext sql

%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

%sql duckdb:///:memory:

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


# Summary
ADP, or average draft position, is a metric that measures the average place of draft pick of the players out of both mock and real draft data. For instance, if a player has an ADP of 7.0, it means that on average, that player is the 7th pick out of the draft. ADP can be valuable in that it provides insights on how other people value the respective players. However, since ADP is a predictive metric that is published during the pre-season before games begin, there will most likely be “breakout players”, or players that will outperform their pre-season ranking. We aim to identify these breakout players, players with more than 150 ADP and below post-season 50th rank, and find if there are common characteristics among these breakout players.

# Data Limitation

There are confounding variables that can affect the statistic such as injury. The NFL has only incorporated advanced statistical measures, such as air yards (yards ran after catch), in 2012, so conducting a time-series data analysis is limited.  In addition, it is harder to analyze breakout rookie players’ characteristics as their data is limited to one particular year. Lastly, football, like most sports, is a team sport. The players that interact well with each other will have the greatest synergy, having a better performance throughout the season compared to players who do not interact well with the rest of the team. For instance, wide receivers will likely to perform better if the quarterback throws them the necessary passes to perform well.

Futhermore, the nature of breakout players is that there are few and far between, meaning that our 

# Wide Receiver Statistics Data Sets

Importing datasets with advanced wide receiver statistics (rushing yards, etc.) for all players between the years 2013 and 2022. This is public knowledge and is free to use. Advanced wide receiver statistics were not recorded prior to 2013 which is why are data is limited to this time. Adding a year column to all datasets with the respective year in order to compare players that appear multiple times throughout the years.

In [77]:
#wide receiver stats
wr_stats_2021 = pd.read_csv('data/post-season data/FantasyPros_Fantasy_Football_Advanced_Stats_Report_WR_2021.csv')
wr_stats_2021['Year'] = '2021'
wr_stats_2020 = pd.read_csv('data/post-season data/FantasyPros_Fantasy_Football_Advanced_Stats_Report_WR_2020.csv')
wr_stats_2020['Year'] = '2020'
wr_stats_2019 = pd.read_csv('data/post-season data/FantasyPros_Fantasy_Football_Advanced_Stats_Report_WR_2019.csv')
wr_stats_2021['Year'] = '2019'
wr_stats_2018 = pd.read_csv('data/post-season data/FantasyPros_Fantasy_Football_Advanced_Stats_Report_WR_2018.csv')
wr_stats_2018['Year'] = '2018'
wr_stats_2017 = pd.read_csv('data/post-season data/FantasyPros_Fantasy_Football_Advanced_Stats_Report_WR_2017.csv')
wr_stats_2017['Year'] = '2017'
wr_stats_2016 = pd.read_csv('data/post-season data/FantasyPros_Fantasy_Football_Advanced_Stats_Report_WR_2016.csv')
wr_stats_2016['Year'] = '2016'
wr_stats_2015 = pd.read_csv('data/post-season data/FantasyPros_Fantasy_Football_Advanced_Stats_Report_WR_2015.csv')
wr_stats_2015['Year'] = '2015'
wr_stats_2014 = pd.read_csv('data/post-season data/FantasyPros_Fantasy_Football_Advanced_Stats_Report_WR_2014.csv')
wr_stats_2014['Year'] = '2014'
wr_stats_2013 = pd.read_csv('data/post-season data/FantasyPros_Fantasy_Football_Advanced_Stats_Report_WR_2013.csv')
wr_stats_2013['Year'] = '2013'

Creating a large dataset with all the wide reciever statistics from 2013 to 2021 using the previously imported data sets. Selected for specific statistics we could analyze to provide insights into players- AIR (Air yards: receiving yards not including yards after catch), AIR/R (air yards per reception), YDS (total reception yards over entire season), Y/R (yards per reception), and number of plays over a certain number of yards (10, 20, 30, 40 , 50)

In [78]:
#wr large dataset
wr_table = pd.concat([wr_stats_2021,wr_stats_2020,wr_stats_2019,wr_stats_2018,wr_stats_2017,wr_stats_2016,wr_stats_2015,wr_stats_2014,wr_stats_2013])
wr_table = wr_table[["Player", "Year", "Y/R", "YBC/R", "AIR/R", "YAC/R", "YACON/R", "10+ YDS", "20+ YDS", "30+ YDS", "40+ YDS", "50+ YDS"]]
wr_table.head()

Unnamed: 0,Player,Year,Y/R,YBC/R,AIR/R,YAC/R,YACON/R,10+ YDS,20+ YDS,30+ YDS,40+ YDS,50+ YDS
0,Cooper Kupp (LAR),2019,13.4,7.6,7.6,5.8,1.8,66.0,30.0,15.0,9.0,3.0
1,Deebo Samuel (SF),2019,18.9,8.0,8.0,11.0,3.8,52.0,24.0,13.0,10.0,6.0
2,Ja'Marr Chase (CIN),2019,18.0,9.9,9.9,8.0,3.1,51.0,22.0,13.0,8.0,6.0
3,Justin Jefferson (MIN),2019,14.9,10.5,10.5,4.4,1.4,67.0,27.0,11.0,5.0,2.0
4,Davante Adams (LV),2019,12.6,7.8,7.8,4.8,1.2,66.0,19.0,12.0,4.0,2.0


# ADP Data Sets

Importing datasets with ADP (average draft position) values for all players across the years from 2013 to 2021. This is public knowledge and free to use. Adding a year column to all datasets with the respective year in order for comparison between players when they show up multiple times throughout the years. The purpose of this dataset is to find the ADP which is the metric we are using to determine whether a player was expected to be "good" or "bad" before the season began. The ADP will be compared with post-season ranking in order to determine whether a player falls under the parameters of a "breakout player". 

In [79]:
#overall adp files

overall_adp_2021 = pd.read_csv('data/pre-season data/FantasyPros_2021_Overall_ADP_Rankings.csv')
overall_adp_2021['Year'] = '2021'

overall_adp_2020 = pd.read_csv('data/pre-season data/FantasyPros_2020_Overall_ADP_Rankings.csv')
overall_adp_2020['Year'] = '2020'

overall_adp_2019 = pd.read_csv('data/pre-season data/FantasyPros_2019_Overall_ADP_Rankings.csv')
overall_adp_2019['Year'] = '2019'

overall_adp_2018 = pd.read_csv('data/pre-season data/FantasyPros_2018_Overall_ADP_Rankings.csv')
overall_adp_2018['Year'] = '2018'

overall_adp_2017 = pd.read_csv('data/pre-season data/FantasyPros_2017_Overall_ADP_Rankings.csv')
overall_adp_2017['Year'] = '2017'

overall_adp_2016 = pd.read_csv('data/pre-season data/FantasyPros_2016_Overall_ADP_Rankings.csv')
overall_adp_2016['Year'] = '2016'

overall_adp_2015 = pd.read_csv('data/pre-season data/FantasyPros_2015_Overall_ADP_Rankings.csv')
overall_adp_2015['Year'] = '2015'

overall_adp_2014 = pd.read_csv('data/pre-season data/FantasyPros_2014_Overall_ADP_Rankings.csv')
overall_adp_2014['Year'] = '2014'

overall_adp_2013 = pd.read_csv('data/pre-season data/FantasyPros_2013_Overall_ADP_Rankings.csv')
overall_adp_2013['Year'] = '2013'

# Season Ranking Data Sets

Importing datasets with the End of Season ranking for all players across the years from 2013 to 2021. Adding a year column to all datasets with the respective year in order for comparison when a player shows up multiple times between the years.

In [80]:
#end of season overall rankings 
overall_rankings_2021 = pd.read_csv('data/FantasyPros_Fantasy_Football_Points_PPR_2021.csv')
overall_rankings_2021['Year'] = '2021'

overall_rankings_2020 = pd.read_csv('data/FantasyPros_Fantasy_Football_Points_PPR_2020.csv')
overall_rankings_2020['Year'] = '2020'

overall_rankings_2019 = pd.read_csv('data/FantasyPros_Fantasy_Football_Points_PPR_2019.csv')
overall_rankings_2019['Year'] = '2019'

overall_rankings_2018 = pd.read_csv('data/FantasyPros_Fantasy_Football_Points_PPR_2018.csv')
overall_rankings_2018['Year'] = '2018'

overall_rankings_2017 = pd.read_csv('data/FantasyPros_Fantasy_Football_Points_PPR_2017.csv')
overall_rankings_2017['Year'] = '2017'

overall_rankings_2016 = pd.read_csv('data/FantasyPros_Fantasy_Football_Points_PPR_2016.csv')
overall_rankings_2016['Year'] = '2016'

overall_rankings_2015 = pd.read_csv('data/FantasyPros_Fantasy_Football_Points_PPR_2015.csv')
overall_rankings_2015['Year'] = '2015'

overall_rankings_2014 = pd.read_csv('data/FantasyPros_Fantasy_Football_Points_PPR_2014.csv')
overall_rankings_2014['Year'] = '2014'

overall_rankings_2013 = pd.read_csv('data/FantasyPros_Fantasy_Football_Points_PPR_2013.csv')
overall_rankings_2013['Year'] = '2013'

Merged all the ADP data sets into one large dataset comprehensive of years from 2013 to 2021. Merged all the End of Season Overall rankings into one large dataframe comprehensive of years from 2013 to 2021. Merged the ADP and End of Season Overall rankings tables into one overall table.

In [81]:
#giant datatable final
adp_table = pd.concat([overall_adp_2021, overall_adp_2020, overall_adp_2019, overall_adp_2018, overall_adp_2017, overall_adp_2016, overall_adp_2015, overall_adp_2014, overall_adp_2013], axis=0)
overall_rankings_table = pd.concat([overall_rankings_2021, overall_rankings_2020, overall_rankings_2019, overall_rankings_2018, overall_rankings_2017, overall_rankings_2016, overall_rankings_2015, overall_rankings_2014, overall_rankings_2013], axis=0)
overall_merged = adp_table.merge(overall_rankings_table, how= 'outer', on = ['Player', 'Year'])

overall_merged

Unnamed: 0,Rank_x,Player,Team_x,Bye,POS,MFL,Fantrax,RTSports,FFC,Sleeper,AVG,Year,ESPN,Rank_y,Team_y,Position,Points,Games,Avg
0,1.0,Christian McCaffrey,SF,6,RB1,1.0,1.0,,1.0,1.0,1.0,2021,,168.0,CAR,RB,127.5,7.0,18.2
1,2.0,Dalvin Cook,MIN,7,RB2,2.0,2.0,,2.0,2.0,2.0,2021,,68.0,MIN,RB,206.3,13.0,15.9
2,3.0,Derrick Henry,TEN,13,RB3,3.0,3.0,,3.0,3.0,3.0,2021,,79.0,TEN,RB,193.3,8.0,24.2
3,4.0,Alvin Kamara,NO,6,RB4,4.0,4.0,,4.0,4.0,4.0,2021,,48.0,NO,RB,234.7,13.0,18.1
4,5.0,Ezekiel Elliott,DAL,7,RB5,5.0,5.0,,5.0,5.0,5.0,2021,,35.0,DAL,RB,252.1,17.0,14.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6839,,Tyler Clutts,,,,,,,,,,2013,,651.0,Multi,RB,-0.6,7.0,-0.1
6840,,Chase Daniel,,,,,,,,,,2013,,653.0,KC,QB,-0.8,4.0,-0.2
6841,,Devon Wylie,,,,,,,,,,2013,,655.0,TEN,WR,-2.0,2.0,-1.0
6842,,Greg Jenkins,,,,,,,,,,2013,,655.0,LV,WR,-2.0,5.0,-0.4


# Cleaned WR Data Set

Cleaned data set by dropping unnecessary columns- RTSports, Sleeper, MFL, Fantrax, FFC, POS, Team_y -due to being unneccary for analysis or having too many NaN values to significantly analyze. Sorted data for all those in a Wide Receiver position and dropped all players who had NaN values for their ADP, as this signifies they didn't have any pre-season predictions and thus cannot be analyzed.

In [82]:
#cleaned overall_merged_data df
# dropped 'RTSports' and 'Sleeper' column - Had a lot of NaN values 
# dropped 'MFL','Fantrax', and 'FFC'
# dropped 'POS', Team_y - duplicated from merging 
# sort for wide receiver
# take out people with NaN ADP

def clean_data(data_file):
    data_file2 = data_file.drop(columns=['RTSports', 'Sleeper', 'MFL', 'Fantrax', 'FFC', 'POS','Team_y', 'ESPN', 'Bye', 'Team_x'])
    data_file3 = data_file2.rename(columns = {"Rank_x": "Preseason_rank","Team_x":"Team", "Rank_y": "Postseason_rank", "AVG": "ADP", "Avg": "PPG"})
    data_file4 = data_file3.loc[data_file3.Position == 'WR']
    data_file5 = data_file4.dropna(subset=['ADP'])
    return data_file5

overall_merged_data_clean = clean_data(overall_merged)
overall_merged_data_clean

Unnamed: 0,Preseason_rank,Player,ADP,Year,Postseason_rank,Position,Points,Games,PPG
7,8.0,Davante Adams,8.3,2021,8.0,WR,344.3,16.0,21.5
10,11.0,Tyreek Hill,11.0,2021,21.0,WR,296.5,17.0,17.4
15,16.0,Stefon Diggs,16.3,2021,23.0,WR,285.5,17.0,16.8
17,18.0,DK Metcalf,18.5,2021,39.0,WR,244.3,17.0,14.4
19,20.0,DeAndre Hopkins,19.8,2021,132.0,WR,147.2,10.0,14.7
...,...,...,...,...,...,...,...,...,...
3892,454.0,Stedman Bailey,458.0,2013,345.0,WR,41.6,15.0,2.8
3910,472.0,Eddie Royal,476.0,2013,119.0,WR,144.1,14.0,10.3
3914,476.0,Marvin Jones Jr.,480.0,2013,82.0,WR,171.6,15.0,11.4
3917,479.0,Jerricho Cotchery,483.0,2013,104.0,WR,154.2,15.0,10.3


# Finding Breakout Players

Filtered through the cleaned Wide Receiver dataset to find those who started the season with an Average Draft Pick greather than 150 yet ended the season with a rank less than 50, signifiying a large rise in their rank as a "Breakout" player.

In [83]:
# find breakout players by finding players that have ADP Greater Than 150 AND Post-Season Rk of Less Than 50
cond_ = (overall_merged_data_clean["ADP"] > 150) & (overall_merged_data_clean["Postseason_rank"] < 50)
breakouts = overall_merged_data_clean.loc[cond_,:]
breakouts

Unnamed: 0,Preseason_rank,Player,ADP,Year,Postseason_rank,Position,Points,Games,PPG
331,332.0,Hunter Renfrow,296.0,2021,30.0,WR,259.1,17.0,15.2
1214,207.0,DeVante Parker,172.5,2019,40.0,WR,246.2,16.0,15.4
1563,148.0,Tyler Lockett,158.4,2018,49.0,WR,222.4,16.0,13.9
2402,146.0,Michael Thomas,155.0,2016,29.0,WR,255.7,15.0,17.1
2447,191.0,Davante Adams,181.2,2016,34.0,WR,246.7,16.0,15.4
2890,156.0,Michael Crabtree,161.2,2015,45.0,WR,231.2,16.0,14.5
2903,168.0,Doug Baldwin,161.2,2015,27.0,WR,268.9,16.0,16.8
3406,239.0,Odell Beckham Jr.,192.0,2014,17.0,WR,295.0,12.0,24.6
3659,225.0,Julian Edelman,224.5,2013,36.0,WR,234.2,15.0,15.6


# Previous Year Statistics for Breakout Players

Made a nested list with values of certain players and the year they broke out. Filtered through the wide reciever statistics data set to find the previous season before they broke out (some were only rookies and did not have a pre-existing season before their breakout, while others had their breakout season in 2013 and as such their statistics from the previous year were not available).

In [84]:
#nestedlist

# breakouts.loc[:, 'Year'] = breakouts['Year'].astype(int)
nameyear = breakouts[['Player', 'Year']].values.tolist()
print(nameyear)

columnnames = wr_table.columns.values
temp_table = pd.DataFrame(columns=columnnames)
empty_table = pd.DataFrame(columns= columnnames)

for name, year in nameyear:
    newrow = wr_table.loc[(wr_table['Player'] == name) & (wr_table['Year'] == str(int(year) - 1))]
    empty_table = pd.concat([empty_table, newrow])

display(empty_table)

[['Hunter Renfrow', '2021'], ['DeVante Parker', '2019'], ['Tyler Lockett', '2018'], ['Michael Thomas', '2016'], ['Davante Adams', '2016'], ['Michael Crabtree', '2015'], ['Doug Baldwin', '2015'], ['Odell Beckham Jr.', '2014'], ['Julian Edelman', '2013']]


Unnamed: 0,Player,Year,Y/R,YBC/R,AIR/R,YAC/R,YACON/R,10+ YDS,20+ YDS,30+ YDS,40+ YDS,50+ YDS
104,DeVante Parker,2018,12.9,9.0,9.0,3.9,1.5,11.0,3.0,3.0,1.0,0.0
55,Tyler Lockett,2017,12.3,8.2,8.2,4.2,0.4,20.0,5.0,3.0,3.0,2.0
81,Davante Adams,2015,9.7,6.9,6.9,2.7,1.1,16.0,6.0,2.0,1.0,0.0
50,Michael Crabtree,2014,10.3,6.5,6.5,3.8,0.0,25.0,10.0,3.0,2.0,1.0
43,Doug Baldwin,2014,12.5,7.4,7.4,5.1,0.0,34.0,15.0,4.0,2.0,0.0


# Summary Statistics for Breakout Players

The summary statistics for the year before a breakout player's breakout for number of plays over a respective number of yards (10, 20, 30, 40, 50).

In [85]:
breakouts_summary_bigplays = empty_table[["10+ YDS", "20+ YDS", "30+ YDS", "40+ YDS", "50+ YDS"]].describe()
breakouts_summary_bigplays

Unnamed: 0,10+ YDS,20+ YDS,30+ YDS,40+ YDS,50+ YDS
count,5.0,5.0,5.0,5.0,5.0
mean,21.2,7.8,3.0,1.8,0.6
std,8.81476,4.764452,0.707107,0.83666,0.894427
min,11.0,3.0,2.0,1.0,0.0
25%,16.0,5.0,3.0,1.0,0.0
50%,20.0,6.0,3.0,2.0,0.0
75%,25.0,10.0,3.0,2.0,1.0
max,34.0,15.0,4.0,3.0,2.0


In [86]:
breakouts_summary_r = empty_table[["Y/R", "YBC/R", "AIR/R", "YAC/R", "YACON/R"]].describe()
breakouts_summary_r

Unnamed: 0,Y/R,YBC/R,AIR/R,YAC/R,YACON/R
count,5.0,5.0,5.0,5.0,5.0
mean,11.54,7.6,7.6,3.94,0.6
std,1.438054,1.007472,1.007472,0.861974,0.674537
min,9.7,6.5,6.5,2.7,0.0
25%,10.3,6.9,6.9,3.8,0.0
50%,12.3,7.4,7.4,3.9,0.4
75%,12.5,8.2,8.2,4.2,1.1
max,12.9,9.0,9.0,5.1,1.5


The summary statistics compiled for all wide receivers for number of plays over a respective number of yards (10, 20, 30, 40, 50).

In [87]:
overall_summary_bigplays = wr_table[["10+ YDS", "20+ YDS", "30+ YDS", "40+ YDS", "50+ YDS"]].describe()
overall_summary_bigplays

Unnamed: 0,10+ YDS,20+ YDS,30+ YDS,40+ YDS,50+ YDS
count,1665.0,1665.0,1665.0,1665.0,1665.0
mean,17.833033,6.013213,2.495495,1.268468,0.577778
std,15.82012,5.959182,2.837995,1.732223,0.990998
min,0.0,0.0,0.0,0.0,0.0
25%,5.0,1.0,0.0,0.0,0.0
50%,13.0,4.0,2.0,1.0,0.0
75%,28.0,10.0,4.0,2.0,1.0
max,79.0,31.0,16.0,13.0,8.0


In [91]:
overall_summary_r = wr_table[["Y/R", "YBC/R", "AIR/R", "YAC/R", "YACON/R"]].describe()
overall_summary_r

Unnamed: 0,Y/R,YBC/R,AIR/R,YAC/R,YACON/R
count,1665.0,1665.0,1665.0,1665.0,1665.0
mean,12.786126,8.484925,8.484925,4.27976,0.987748
std,4.209088,3.858855,3.858855,2.192345,1.071694
min,-3.0,-5.3,-5.3,0.0,-0.2
25%,10.4,6.3,6.3,3.0,0.0
50%,12.5,8.1,8.1,4.1,0.9
75%,14.7,10.5,10.5,5.2,1.4
max,47.0,37.0,37.0,25.5,15.5


# Data Citation
Fantasy football ADP datasets from 2012 to 2022. This is used to determine whether a player fell under the parameters of a breakout player. This is public data that is free to use provided by fantasypros.
- https://www.fantasypros.com/nfl/adp/overall.php?year=2012
- https://www.fantasypros.com/nfl/adp/overall.php?year=2013
- https://www.fantasypros.com/nfl/adp/overall.php?year=2014
- https://www.fantasypros.com/nfl/adp/overall.php?year=2015
- https://www.fantasypros.com/nfl/adp/overall.php?year=2016
- https://www.fantasypros.com/nfl/adp/overall.php?year=2017
- https://www.fantasypros.com/nfl/adp/overall.php?year=2018
- https://www.fantasypros.com/nfl/adp/overall.php?year=2019 
- https://www.fantasypros.com/nfl/adp/overall.php?year=2020 
- https://www.fantasypros.com/nfl/adp/overall.php?year=2021
- https://www.fantasypros.com/nfl/adp/overall.php?year=2022

Player Ranking Data from 2012 to 2022. This is used to determine the post-season ranking based on a ppr style fantasy football. This is public data that is free to use provided by fantasypros. 
- https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2012
- https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2013
- https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2014
- https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2015
- https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2016
- https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2017
- https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2018
- https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2019
- https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2020
- https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2021
- https://www.fantasypros.com/nfl/reports/leaders/ppr.php?year=2022

Wide receiver advanced statistics from 2013 to 2022. This is used to compare the wide receiver advanced statistics between breakout players. This is public data that is free to use provided by fantasypros. 
- https://www.fantasypros.com/nfl/advanced-stats-wr.php?year=2022
- https://www.fantasypros.com/nfl/advanced-stats-wr.php?year=2021
- https://www.fantasypros.com/nfl/advanced-stats-wr.php?year=2020
- https://www.fantasypros.com/nfl/advanced-stats-wr.php?year=2019
- https://www.fantasypros.com/nfl/advanced-stats-wr.php?year=2018
- https://www.fantasypros.com/nfl/advanced-stats-wr.php?year=2017
- https://www.fantasypros.com/nfl/advanced-stats-wr.php?year=2016
- https://www.fantasypros.com/nfl/advanced-stats-wr.php?year=2015
- https://www.fantasypros.com/nfl/advanced-stats-wr.php?year=2014
- https://www.fantasypros.com/nfl/advanced-stats-wr.php?year=2013
