# Table of Contents
* [EDA on Tight End Performance over the last 10 years](#intro)
* [Importing Libraries and Data](#import)
* [Exploratory Data Analysis](#eda)
* [Normality Test](#stats)
* [Splitting Data](#train_split)
* [Preprocessing Data](#preprocessing)
* [Setting Model](#set)
* [Training Model](#training)
* [Best Results](#compare_results)
* [Finalizing Workflow](#workflow)
* [Fitting the final model](#fit)
* [API (FastAPI)](#api)
* [Interface(Streamlit)](#interface)
* [Automation(Docker)](#auto)
* [Saving Files](#store)
* [Conclusion](#conclusion)

## EDA on Tight End Performance over the last 10 years <a class="anchor" id="intro"></a>
Reports are coming out about how the I. Likely usage will go up and I am going to research the liklihood that two Tight Ends from one team can be fantasy relevent. 

## Importing Libraries and Data<a id="import"></a>

In [79]:
import nfl_data_py as nfl
import numpy as np
import pandas as pd

years = list(range(2014,2024))
data = nfl.import_seasonal_data(years, 'REG')

In [80]:
data.columns

Index(['player_id', 'season', 'season_type', 'completions', 'attempts',
       'passing_yards', 'passing_tds', 'interceptions', 'sacks', 'sack_yards',
       'sack_fumbles', 'sack_fumbles_lost', 'passing_air_yards',
       'passing_yards_after_catch', 'passing_first_downs', 'passing_epa',
       'passing_2pt_conversions', 'pacr', 'dakota', 'carries', 'rushing_yards',
       'rushing_tds', 'rushing_fumbles', 'rushing_fumbles_lost',
       'rushing_first_downs', 'rushing_epa', 'rushing_2pt_conversions',
       'receptions', 'targets', 'receiving_yards', 'receiving_tds',
       'receiving_fumbles', 'receiving_fumbles_lost', 'receiving_air_yards',
       'receiving_yards_after_catch', 'receiving_first_downs', 'receiving_epa',
       'receiving_2pt_conversions', 'racr', 'target_share', 'air_yards_share',
       'wopr_x', 'special_teams_tds', 'fantasy_points', 'fantasy_points_ppr',
       'games', 'tgt_sh', 'ay_sh', 'yac_sh', 'wopr_y', 'ry_sh', 'rtd_sh',
       'rfd_sh', 'rtdfd_sh', 'dom', '

In [81]:
ids = nfl.import_ids()
data_name = pd.merge(data,ids,how='left',left_on='player_id',right_on='gsis_id')
data_name.drop(['mfl_id',
       'sportradar_id', 'fantasypros_id', 'gsis_id', 'pff_id', 'sleeper_id',
       'nfl_id', 'espn_id', 'yahoo_id', 'fleaflicker_id', 'cbs_id',
       'rotowire_id', 'rotoworld_id', 'ktc_id', 'pfr_id', 'cfbref_id',
       'stats_id', 'stats_global_id', 'fantasy_data_id', 'swish_id',
       'merge_name', 'birthdate', 'age', 'draft_year',
       'draft_round', 'draft_pick', 'draft_ovr', 'twitter_username', 'height',
       'weight', 'college', 'db_season'],axis=1,inplace=True)
data_name.drop(['season_type'],axis=1,inplace=True)

te_data = data_name[data_name['position']=='TE']

In [82]:
te_data.columns

Index(['player_id', 'season', 'completions', 'attempts', 'passing_yards',
       'passing_tds', 'interceptions', 'sacks', 'sack_yards', 'sack_fumbles',
       'sack_fumbles_lost', 'passing_air_yards', 'passing_yards_after_catch',
       'passing_first_downs', 'passing_epa', 'passing_2pt_conversions', 'pacr',
       'dakota', 'carries', 'rushing_yards', 'rushing_tds', 'rushing_fumbles',
       'rushing_fumbles_lost', 'rushing_first_downs', 'rushing_epa',
       'rushing_2pt_conversions', 'receptions', 'targets', 'receiving_yards',
       'receiving_tds', 'receiving_fumbles', 'receiving_fumbles_lost',
       'receiving_air_yards', 'receiving_yards_after_catch',
       'receiving_first_downs', 'receiving_epa', 'receiving_2pt_conversions',
       'racr', 'target_share', 'air_yards_share', 'wopr_x',
       'special_teams_tds', 'fantasy_points', 'fantasy_points_ppr', 'games',
       'tgt_sh', 'ay_sh', 'yac_sh', 'wopr_y', 'ry_sh', 'rtd_sh', 'rfd_sh',
       'rtdfd_sh', 'dom', 'w8dom', 'yptmpa

In [83]:
#te_data.to_csv('./../data/raw/te_data_2014_2024.csv')

In [84]:
columns = ['completions', 'attempts', 'passing_yards',
       'passing_tds', 'interceptions', 'sacks', 'sack_yards', 'sack_fumbles',
       'sack_fumbles_lost', 'passing_air_yards', 'passing_yards_after_catch',
       'passing_first_downs', 'passing_epa', 'passing_2pt_conversions', 'pacr',
       'dakota']
te_data.drop(columns,axis=1,inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  te_data.drop(columns,axis=1,inplace=True)


In [85]:
te_data.groupby('season')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7682c55cfd00>

In [86]:
df = te_data
df["rank"] = df.groupby("season")["fantasy_points_ppr"].rank(method="dense", ascending=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["rank"] = df.groupby("season")["fantasy_points_ppr"].rank(method="dense", ascending=False)


In [87]:
top_12_df = df[df['rank'] <= 12] 

In [88]:
#On average for the last 10 years, TE #12 scores 143.73 points
num_12_df = df[df['rank'] == 12]
num_12_df.fantasy_points_ppr.describe()

count     10.00000
mean     143.73000
std        9.23015
min      130.60000
25%      137.22500
50%      141.55000
75%      151.02500
max      160.10000
Name: fantasy_points_ppr, dtype: float64

In [89]:
top_12_df.groupby('season')['team'].nunique()

season
2014     3
2015     3
2016     4
2017     5
2018     7
2019     9
2020    11
2021    13
2022    11
2023    12
Name: team, dtype: int64

In [None]:
top_12_df.groupby('season').get_group(2014)

In [None]:
rosters.columns

In [35]:
import numpy as np
np.float_ = np.float64

rosters=nfl.import_seasonal_rosters(years)

In [78]:
rosters.loc[(rosters['player_id'] == "00-0021547") & (rosters['season'] == 2014)]['team'].values[0]

'SD'

In [92]:
for index, row in top_12_df.iterrows():
    if row['team'] == 'FA':
        top_12_df.at[index,'team'] = rosters.loc[(rosters['player_id'] == row.player_id) & (rosters['season'] == row.season)]['team'].values[0]

In [None]:
for name_of_group, contents_of_group in top_12_df.groupby('season'):
   print(name_of_group)
   print(contents_of_group.sort_values(by=['team']))
   print(contents_of_group['team'].value_counts())
#top_12_df.groupby('season')['team'].nunique()
#With multiple seasons having less than 12 unique teams this shows me how often 2 tight ends from one team can be fantasy relevent. Fantasy relevent means a top 12 Tight End for the year.

In [143]:
processed = pd.DataFrame()
for name_of_group, contents_of_group in top_12_df.groupby('season'):
   #print(name_of_group)
   #print(contents_of_group.sort_values(by=['team']))
   if contents_of_group['team'].duplicated().any():
       pd.concat([processed,contents_of_group],axis=1)

## Exploratory Data Analysis<a id="eda"></a>

In [148]:
processed = top_12_df.loc[(top_12_df.season == 2015) | (top_12_df.season == 2016) | (top_12_df.season == 2019) | (top_12_df.season == 2020)| (top_12_df.season == 2022)]

In [165]:
# Step 1: Group by 'group' column and count occurrences of each 'value'
value_counts = processed.groupby('season')['team'].value_counts().reset_index(name='count')

# Step 2: Filter to keep only the rows where 'count' is exactly 2
values_with_exactly_two = value_counts[value_counts['count'] == 2]

# Step 3: Merge the result with the original DataFrame to filter the rows
filtered_df = processed.merge(values_with_exactly_two[['season', 'team']], on=['season', 'team'])

# Display the filtered DataFrame
print("\nFiltered DataFrame:")
print(filtered_df.sort_values(by='season'))


Filtered DataFrame:
     player_id  season  carries  rushing_yards  rushing_tds  rushing_fumbles  \
1   00-0030061    2015        0            0.0            0              0.0   
3   00-0030472    2015        0            0.0            0              0.0   
0   00-0025418    2016        0            0.0            0              0.0   
2   00-0030061    2016        0            0.0            0              0.0   
4   00-0030472    2016        0            0.0            0              0.0   
5   00-0031359    2016        0            0.0            0              0.0   
6   00-0032392    2019        0            0.0            0              0.0   
7   00-0033090    2019        0            0.0            0              0.0   
8   00-0033757    2020        0            0.0            0              0.0   
10  00-0035229    2020        1            0.0            0              1.0   
9   00-0033895    2022        1            0.0            0              0.0   
11  00-0036290    2

In [167]:
filtered_df = filtered_df[filtered_df.team != 'CAR']

In [168]:
filtered_df.to_csv('./../data/processed/processed.csv', index=False)

In [169]:
filtered_df

Unnamed: 0,player_id,season,carries,rushing_yards,rushing_tds,rushing_fumbles,rushing_fumbles_lost,rushing_first_downs,rushing_epa,rushing_2pt_conversions,...,rfd_sh,rtdfd_sh,dom,w8dom,yptmpa,ppr_sh,name,position,team,rank
1,00-0030061,2015,0,0.0,0,0.0,0.0,0.0,0.0,0,...,0.211957,0.2,0.150589,0.183799,1.431208,0.122396,Zach Ertz,TE,WAS,9.0
2,00-0030061,2016,0,0.0,0,0.0,0.0,0.0,0.0,0,...,0.248521,0.252747,0.277221,0.258938,1.5,0.160802,Zach Ertz,TE,WAS,7.0
3,00-0030472,2015,0,0.0,0,0.0,0.0,0.0,0.0,0,...,0.295082,0.308057,0.319141,0.274911,1.987474,0.181002,Jordan Reed,TE,WAS,3.0
4,00-0030472,2016,0,0.0,0,0.0,0.0,0.0,0.0,0,...,0.225275,0.232673,0.239253,0.202805,1.465812,0.14072,Jordan Reed,TE,WAS,10.0
6,00-0032392,2019,0,0.0,0,0.0,0.0,0.0,0.0,0,...,0.194313,0.199153,0.216046,0.201673,1.438757,0.143094,Austin Hooper,TE,NEP,6.0
7,00-0033090,2019,0,0.0,0,0.0,0.0,0.0,0.0,0,...,0.225,0.227778,0.217904,0.198646,1.495413,0.135662,Hunter Henry,TE,NEP,9.0
8,00-0033757,2020,0,0.0,0,0.0,0.0,0.0,0.0,0,...,0.165829,0.18107,0.19946,0.169136,1.215768,0.110175,Robert Tonyan,TE,MIN,4.0
9,00-0033895,2022,1,0.0,0,0.0,0.0,0.0,-0.566809,0,...,0.138095,0.141026,0.14823,0.137169,0.853846,0.095585,Gerald Everett,TE,CHI,12.0
10,00-0035229,2020,1,0.0,0,1.0,1.0,0.0,-4.992892,0,...,0.181818,0.186235,0.193326,0.175989,1.242268,0.11785,T.J. Hockenson,TE,MIN,5.0
11,00-0036290,2022,2,9.0,0,0.0,0.0,1.0,1.138636,0,...,0.247706,0.265625,0.288906,0.241198,1.442971,0.128745,Cole Kmet,TE,CHI,8.0


In [170]:
top_12_df.to_csv('./../data/processed/top_12_2014_2024.csv',index=False)

In [172]:
filtered_df.groupby('season')['receiving_tds'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
season,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2015,2.0,6.5,6.363961,2.0,4.25,6.5,8.75,11.0
2016,2.0,5.0,1.414214,4.0,4.5,5.0,5.5,6.0
2019,2.0,5.5,0.707107,5.0,5.25,5.5,5.75,6.0
2020,2.0,8.5,3.535534,6.0,7.25,8.5,9.75,11.0
2022,2.0,5.5,2.12132,4.0,4.75,5.5,6.25,7.0


#Findings\
The data shows that in the last ten years there has been 5 times where 2 tight ends have finished in the top 12 from the same team. The liklihood is higher than expected. 

#Next Steps\
Analyze the top 12 te for patterns on trends and on how the cutoff for a top 12 te. Then research I. Likely on how big of a jump from last year production that the player will have to make. 

Also clean the colums of the datasets. Do not need the rushing columns and most of the advanced stats.

## Normality Test<a id="stats"></a>

## Splitting Data<a id="train_split"></a>

## Preprocessing Data<a id="preprocessing"></a>

## Setting Model<a id="set"></a>

## Training Model<a id="training"></a>

## Best Results<a id="compare_results"></a>

## Finalizing Workflow<a id="workflow"></a>

## Fitting the final model<a id="fit"></a>

## API (FastAPI)<a id="api"></a>

## Interface(Streamlit)<a id="interface"></a>

## Automation(Docker)<a id="auto"></a>

## Saving Files<a id="store"></a>

## Conclusion<a id="conclusion"></a>