# Part 4

## Question 1: Is there a relationship between quarterback fantasy points scored and team defense fantasy points scored on the same team?
In other words, does a quarterback who scores more fantasy points (and presumably more real NFL points for his team) have any effect on how their team's defense performs?
- **What is your hypothesis?**: I hypothesize that quarterback fantasy football scoring will have no effect on team defense fantasy point scoring. The quarterback plays on the offensive side of the ball, and so how the QB performs should not statistically influence how his defense performs, since they are not on the football field at the same time (teams take turns switching off playing offense/defense)
- **How does this relate to the researcher's question?**: This question is related to the researcher's question because they were trying to use a model to predict the number of fantasy points a quarterback/team defense might score. If quarterback fantasy point scoring was a strong predictor of team defensive scoring (or vice versa), the researchers could've added this to their model.
- **How does this relate to Part 1?**: This question is a more specific form of the research question I asked in part 1. I'm examining this question for quarterbacks/team defenses from the same team.
- **Why use this data?**: This data provides information about the team QBs/DSTs belonged to, as well as how many fantasy points they scored for each week of the NFL season.
- **Which features will you be using?**: I will be using `fantasy points scored`, `team`, and `player` to complete this assignment
- **How many observations are there for each feature?**: There are the same number of observations for each feature, which is 15 * the number of players at that position.
    - For QBs: 585
    - For DSTs: 480
    - Since the reason there are more quarterbacks is because some teams played multiple quarterbacks, I will simplify this analysis by combining the QBs that played for an individual team and treating them as if they were 1 QB

In [15]:
# Code from earlier sections to import and clean the dataset

# import packages and dataset
import pandas as pd

# These were imported for each fantasy football position type (QB or DST)
QB = pd.read_excel('../Data/Accuscore Evaluation.xlsx', sheet_name='QB Projections')
DST = pd.read_excel('../Data/Accuscore Evaluation.xlsx', sheet_name= 'DST Projections')

# Remove duplicate columns (PLAYERID/ESPNID is simply an alias for PLAYER)
QB = QB.drop(['PLAYERID', 1], axis=1)
# Since defenses are played per team, as long as we have the TEAM data, we know what the name of the player is
DST = DST.drop(['ESPNID', 'PLAYER'], axis=1)
# Rename ORDER column to WEEK for clarity (since order describes the week of the NFL season)
QB = QB.rename(columns={'ORDER':'WEEK'})
DST = DST.rename(columns={'ORDER':'WEEK'})


# Remove Ben Roethelisberger's and Andrew Luck's bye weeks (Week 4), since the authors forgot to
ben_bye = ((QB['WEEK'] == 4) & (QB['PLAYER'] == 'Ben Roethlisberger')).astype(int).idxmax()
luck_bye = ((QB['WEEK'] == 4) & (QB['PLAYER'] == 'Andrew Luck')).astype(int).idxmax()
QB = QB.drop([ben_bye, luck_bye], axis=0)
# Get a list of all of the NFL teams as a set
teams = set(QB['TEAM'])
# Get a list of the weeks of the NFL season as a set (we'll need these to figure out how many points the QBs for a team scored in a given week)
weeks = set(QB['WEEK'])
display(teams, weeks)
# Additional cleaning to be able to make this plot
import seaborn as sns
# Create a df where rows will be team-week pairs, and columns will be QB and DST scoring
team_scoring_df = pd.DataFrame()
# For each NFL team
for team in teams:
    # For each week of the NFL season from 1-16
    for week in weeks:
        # Figure out how many fantasy points that team's quarterback(s) scored that week, and add it to the df with the row as a team-week pair, and the column as 'QB Scoring'
        team_scoring_df.loc[team + '-' + str(week), 'QB Scoring'] = QB.loc[(QB['WEEK'] == week) & (QB['TEAM'] == team), 'Actuals'].sum()
        # Add however many points the team's defense unit scored to the df, and add it to the df with the row as a team-week pair, and the column as 'DST Scoring'
        team_scoring_df.loc[team + '-' + str(week), 'DST Scoring'] = DST.loc[(DST['WEEK'] == week) & (DST['TEAM'] == team), 'Actuals'].sum()
        # Add the week of the NFL season so we can use it in the weeks_and_dst_scoring model
        team_scoring_df.loc[team + '-' + str(week), 'Week'] = week

{'ARI',
 'ATL',
 'BAL',
 'BUF',
 'CAR',
 'CHI',
 'CIN',
 'CLE',
 'DAL',
 'DEN',
 'DET',
 'GB',
 'HOU',
 'IND',
 'JAC',
 'KC',
 'MIA',
 'MIN',
 'NE',
 'NO',
 'NYG',
 'NYJ',
 'OAK',
 'PHI',
 'PIT',
 'SD',
 'SEA',
 'SF',
 'STL',
 'TB',
 'TEN',
 'WAS'}

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}

In [51]:
# Model 1 - Same team, QB-defense pairs
import statsmodels.api as sm
# Add a constant to the data (need mx + b, not just mx)
to_model = sm.add_constant(team_scoring_df)
# Create input data (X), which is the constant and QB scoring
X = to_model.loc[:, ['const', 'DST Scoring']]
# Create output data (y), which is the DST scoring
y = to_model['QB Scoring']
# Fit a linear model to the DST vs QB data
qb_scoring_model = sm.OLS(y,X).fit()
# Print out a summary of the statistics of the model
qb_scoring_model.summary()

0,1,2,3
Dep. Variable:,QB Scoring,R-squared:,0.02
Model:,OLS,Adj. R-squared:,0.018
Method:,Least Squares,F-statistic:,10.38
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,0.00136
Time:,22:37:06,Log-Likelihood:,-1831.4
No. Observations:,512,AIC:,3667.0
Df Residuals:,510,BIC:,3675.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,12.9742,0.560,23.161,0.000,11.874,14.075
DST Scoring,0.1332,0.041,3.222,0.001,0.052,0.214

0,1,2,3
Omnibus:,6.533,Durbin-Watson:,1.725
Prob(Omnibus):,0.038,Jarque-Bera (JB):,6.073
Skew:,0.215,Prob(JB):,0.048
Kurtosis:,2.685,Cond. No.,19.9


### Model interpretation
- **What can we conclude about the parameters of the model?**: DST fantasy point scoring is positively correlated to QB fantasy point scoring.
- **How certain/uncertain are we about them?**: Based on the probability from the t-test (P > |t|), we are 99.9% confident that there is a significant, linear relationship between DST scoring and QB scoring.
- **What conclusions can we make about the research question based on the model results?**: DST scoring is predictive of QB scoring, and should be added to the authors' model to help improve their projections. One way this could be done is by keeping track of how many points a team defense scores, on average, and using this linear model to add a flat amount of points to the QB projection for that team.
- **Explain the model in everyday language**: If the DST scored 0 points, we can expect the QB to have scored, on average, 12.97 fantasy points (the equivalent of 325 passing yards and 0 touchdowns, or 175 yards and 1 touchdown). For every fantasy point that the team's DST scores, the QB's fantasy points scored increases by 0.133

In [52]:
# Model 2 - Is there a relationship between week of the NFL season and QB scoring?
import statsmodels.api as sm
# Add a constant to the data (need mx + b, not just mx)
to_model = sm.add_constant(QB)
# Create input data (X), which is the constant and QB scoring
X = to_model.loc[:, ['const', 'WEEK']]
# Create output data (y), which is the DST scoring
y = to_model['Actuals']
# Fit a linear model to the DST vs QB data
weeks_model = sm.OLS(y,X).fit()
# Print out a summary of the statistics of the model
weeks_model.summary()

0,1,2,3
Dep. Variable:,Actuals,R-squared:,0.004
Model:,OLS,Adj. R-squared:,0.002
Method:,Least Squares,F-statistic:,2.128
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,0.145
Time:,22:37:10,Log-Likelihood:,-2133.6
No. Observations:,585,AIC:,4271.0
Df Residuals:,583,BIC:,4280.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,13.5214,0.794,17.021,0.000,11.961,15.082
WEEK,-0.1186,0.081,-1.459,0.145,-0.278,0.041

0,1,2,3
Omnibus:,25.271,Durbin-Watson:,2.162
Prob(Omnibus):,0.0,Jarque-Bera (JB):,16.657
Skew:,0.285,Prob(JB):,0.000242
Kurtosis:,2.402,Cond. No.,20.3


### Model interpretation
- **What can we conclude about the parameters of the model?**: QB fantasy point scoring is not related to week of the NFL season.
- **How certain/uncertain are we about them?**: Based on the probability from the t-test (P > |t|), we are not confident that there is a significant, linear relationship between QB scoring and week of the NFL season.
- **What conclusions can we make about the research question based on the model results?**: Week of the NFL season is not predictive of QB scoring, and should be kept out of the authors' model.
- **Explain the model in everyday language**: During week 1 (the first week for which we would have data), QBs are predicted to score 13.4 fantasy points (335 yards, 0 TDs or 185 yards, 1 TD). For each subsequent week of the NFL season QBs will score, on average, 0.12 fantasy points less.

In [53]:
qb_scoring_model.summary()

0,1,2,3
Dep. Variable:,QB Scoring,R-squared:,0.02
Model:,OLS,Adj. R-squared:,0.018
Method:,Least Squares,F-statistic:,10.38
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,0.00136
Time:,22:37:16,Log-Likelihood:,-1831.4
No. Observations:,512,AIC:,3667.0
Df Residuals:,510,BIC:,3675.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,12.9742,0.560,23.161,0.000,11.874,14.075
DST Scoring,0.1332,0.041,3.222,0.001,0.052,0.214

0,1,2,3
Omnibus:,6.533,Durbin-Watson:,1.725
Prob(Omnibus):,0.038,Jarque-Bera (JB):,6.073
Skew:,0.215,Prob(JB):,0.048
Kurtosis:,2.685,Cond. No.,19.9


In [54]:
weeks_model.summary()

0,1,2,3
Dep. Variable:,Actuals,R-squared:,0.004
Model:,OLS,Adj. R-squared:,0.002
Method:,Least Squares,F-statistic:,2.128
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,0.145
Time:,22:37:25,Log-Likelihood:,-2133.6
No. Observations:,585,AIC:,4271.0
Df Residuals:,583,BIC:,4280.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,13.5214,0.794,17.021,0.000,11.961,15.082
WEEK,-0.1186,0.081,-1.459,0.145,-0.278,0.041

0,1,2,3
Omnibus:,25.271,Durbin-Watson:,2.162
Prob(Omnibus):,0.0,Jarque-Bera (JB):,16.657
Skew:,0.285,Prob(JB):,0.000242
Kurtosis:,2.402,Cond. No.,20.3


### Model comparison
- **Model Metric 1: Log-Likelihood**: Based on log-likelihood, the DST scoring model is more likely than the week of the NFL season model
- **Model Metric 2: R-squared**: Based on R-squared, the DST scoring model accounts for slightly more of the variation in QB scoring than the week of the NFL season model

In [55]:
# Model 3 - Modeling QB fpts as a function of week of the NFL season, and defensive fpts scored by the QB's team
import statsmodels.api as sm
# Add a constant to the data (need mx + b, not just mx)
to_model = sm.add_constant(team_scoring_df)
# Create input data (X), which is the constant and QB scoring
X = to_model.loc[:, ['const', 'DST Scoring', 'Week']]
# Create output data (y), which is the DST scoring
y = to_model['QB Scoring']
# Fit a linear model to the DST vs QB data
weeks_and_DST_model = sm.OLS(y,X).fit()
# Print out a summary of the statistics of the model
weeks_and_DST_model.summary()

0,1,2,3
Dep. Variable:,QB Scoring,R-squared:,0.024
Model:,OLS,Adj. R-squared:,0.02
Method:,Least Squares,F-statistic:,6.301
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,0.00198
Time:,22:38:08,Log-Likelihood:,-1830.3
No. Observations:,512,AIC:,3667.0
Df Residuals:,509,BIC:,3679.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,13.9941,0.887,15.779,0.000,12.252,15.737
DST Scoring,0.1359,0.041,3.289,0.001,0.055,0.217
Week,-0.1232,0.083,-1.482,0.139,-0.286,0.040

0,1,2,3
Omnibus:,6.713,Durbin-Watson:,1.732
Prob(Omnibus):,0.035,Jarque-Bera (JB):,6.522
Skew:,0.24,Prob(JB):,0.0383
Kurtosis:,2.724,Cond. No.,35.7


### Model interpretation
- **What can we conclude about the parameters of the model?**: DST scoring actually has to 'offset' some of the output that is being added by including 'week' in the model (the coefficient for DST Scoring is higher in this model than it is in the other model, where DST scoring was the only predictor).
- **How certain/uncertain are we about them?**: Based on the probability from the t-test (P > |t|), we are 99.9% confident that there is a significant, linear relationship between DST scoring and QB scoring. We are still not confident (though somehow we are closer to confident) that there is a significant, linear relationship between week of the NFL season and QB scoring.
- **What conclusions can we make about the research question based on the model results?**: Week of the NFL season is not predictive of QB scoring, and should be kept out of the authors' model. It will also skew other elements of the authors' model if it is included, so it is important that it be kept out.
- **Explain the model in everyday language**: For every point that the team's DST scores, the QB's fantasy points scored increases by 0.136. For every week of the NFL season (as it gets later in the season), team defensive scoring decreases by 0.123 fantasy points.

In [30]:
# Model 4 - Modeling QB fpts as a function of defensive points scored by the QB's team, and the QB's projection for that week
# Additional data cleaning needed to make this model (treating QBs from the same team as a single QB)
# We'll use the most up-to-date projection, and take advantage of the fact that the formula for finding the column where the week's projections are stored is (week + 9) i.e. week 1 projections are stored in column 10

# For each team
for team in teams:
    # For each week
    for week in weeks:
        # Add up the projected points of the QBs who play for a particular team and add them to the DF under 'QB Projections'
        team_scoring_df.loc[team + '-' + str(week), 'QB Projections'] = QB.loc[(QB['WEEK'] == week) & (QB['TEAM'] == team), QB.columns[(week + 9)]].sum()

In [56]:
# Create the model
import statsmodels.api as sm
# Add a constant to the data (need mx + b, not just mx)
to_model = sm.add_constant(team_scoring_df)
# Create input data (X), which is the constant and QB scoring
X = to_model.loc[:, ['const', 'DST Scoring', 'QB Projections']]
# Create output data (y), which is the DST scoring
y = to_model['QB Scoring']
# Fit a linear model to the DST vs QB data
qb_projection_DST_scoring_model = sm.OLS(y,X).fit()
# Print out a summary of the statistics of the model
qb_projection_DST_scoring_model.summary()

0,1,2,3
Dep. Variable:,QB Scoring,R-squared:,0.35
Model:,OLS,Adj. R-squared:,0.348
Method:,Least Squares,F-statistic:,137.3
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,2.11e-48
Time:,22:41:03,Log-Likelihood:,-1726.1
No. Observations:,512,AIC:,3458.0
Df Residuals:,509,BIC:,3471.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,1.4629,0.849,1.724,0.085,-0.204,3.130
DST Scoring,0.0302,0.034,0.880,0.379,-0.037,0.098
QB Projections,0.9096,0.057,16.090,0.000,0.799,1.021

0,1,2,3
Omnibus:,10.785,Durbin-Watson:,1.968
Prob(Omnibus):,0.005,Jarque-Bera (JB):,11.075
Skew:,0.314,Prob(JB):,0.00394
Kurtosis:,3.353,Cond. No.,51.0


### Model interpretation
- **What can we conclude about the parameters of the model?**: QB fantasy point projections are more predictive of QB fantasy point scoring than DST fantasy point scoring (and/or they co-vary).
- **How certain/uncertain are we about them?**: Based on the probability from the t-test (P > |t|), we are 100% confident that there is a significant, linear relationship between QB projections and QB scoring. We are not as confident that there is a significant, linear relationship between DST scoring and QB scoring.
- **What conclusions can we make about the research question based on the model results?**: The authors' projection model performs well, outperforming the single best predictor that we found in our own study. It is possible that DST scoring is already a part of their model. We also learn that the authors' projections, on average, overestimate the scoring output of the QB position.
- **Explain the model in everyday language**: For every point that the team's DST scores, the QB's fantasy points scored increases by 0.03 (the equivalent of 1 passing yard). In contrast, for every point a quarterback is projected to score, they typically score 0.91 of those points.

In [57]:
weeks_and_DST_model.summary()

0,1,2,3
Dep. Variable:,QB Scoring,R-squared:,0.024
Model:,OLS,Adj. R-squared:,0.02
Method:,Least Squares,F-statistic:,6.301
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,0.00198
Time:,22:41:19,Log-Likelihood:,-1830.3
No. Observations:,512,AIC:,3667.0
Df Residuals:,509,BIC:,3679.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,13.9941,0.887,15.779,0.000,12.252,15.737
DST Scoring,0.1359,0.041,3.289,0.001,0.055,0.217
Week,-0.1232,0.083,-1.482,0.139,-0.286,0.040

0,1,2,3
Omnibus:,6.713,Durbin-Watson:,1.732
Prob(Omnibus):,0.035,Jarque-Bera (JB):,6.522
Skew:,0.24,Prob(JB):,0.0383
Kurtosis:,2.724,Cond. No.,35.7


In [58]:
qb_projection_DST_scoring_model.summary()

0,1,2,3
Dep. Variable:,QB Scoring,R-squared:,0.35
Model:,OLS,Adj. R-squared:,0.348
Method:,Least Squares,F-statistic:,137.3
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,2.11e-48
Time:,22:42:18,Log-Likelihood:,-1726.1
No. Observations:,512,AIC:,3458.0
Df Residuals:,509,BIC:,3471.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,1.4629,0.849,1.724,0.085,-0.204,3.130
DST Scoring,0.0302,0.034,0.880,0.379,-0.037,0.098
QB Projections,0.9096,0.057,16.090,0.000,0.799,1.021

0,1,2,3
Omnibus:,10.785,Durbin-Watson:,1.968
Prob(Omnibus):,0.005,Jarque-Bera (JB):,11.075
Skew:,0.314,Prob(JB):,0.00394
Kurtosis:,3.353,Cond. No.,51.0


### Model comparison
- **Model Metric 1: Log-Likelihood**: Based on log-likelihood, the QB projections/DST scoring model is more likely than the DST scoring/weeks model
- **Model Metric 2: R-squared**: Based on R-squared, the QB projections/DST scoring model accounts for significantly more of the total variation in points than the DST scoring/weeks model

## Question 2: Do more top-8 quarterback/DST weekly finishes result in a higher average fantasy points scored?
- **What is your hypothesis?**: I hypothesize that there will be a linear relationship, but it will not be very strong. Week-to-week scoring at the quarterback position is highly variable, so I would expect that most quarterbacks will have high highs and low lows. In other words, most QBs will finish similarly in terms of how many top-8 finishes they have, but the rest of their weeks will be much lower-scoring for fantasy.
- **How does this relate to the researcher's question?**: This question is more relevant for fantasy football managers, because in a typical 12-team fantasy league there are only 12 quarterbacks in fantasy starting lineups each week. Knowing that a wide (or narrow) % of QBs will finish with at least one top-8 week (a fantasy-relevant week) allows players to make informed decisions about how to value the QB position (is it replaceable? will lots of quarterbacks score well at least once? Then it should be valued below more scarce positions. Otherwise, it should be highly valued.)
- **How does this relate to Part 1?**: This is asking a similar style of question, where the goal is to gain insight into what factors produce consistent, high-scoring fantasy football players
- **Why use this data?**: This data provides information about the how many fantasy points QBs scored every week
- **Which features will you be using?**: I will be using `fantasy points scored` and `player` to answer this question
- **How many observations are there for each feature?**: There are the same number of observations for each feature, which is 15 * the number of players at the QB position (585)

In [None]:
# Code from earlier sections to import and clean the dataset

# import packages and dataset
import pandas as pd

# These were imported for each fantasy football position type (QB or DST)
QB = pd.read_excel('../Data/Accuscore Evaluation.xlsx', sheet_name='QB Projections')

# Remove duplicate columns (PLAYERID/ESPNID is simply an alias for PLAYER)
QB = QB.drop(['PLAYERID', 1], axis=1)
# Rename ORDER column to WEEK for clarity (since order describes the week of the NFL season)
QB = QB.rename(columns={'ORDER':'WEEK'})


# Remove Ben Roethelisberger's and Andrew Luck's bye weeks (Week 4), since the authors forgot to
ben_bye = ((QB['WEEK'] == 4) & (QB['PLAYER'] == 'Ben Roethlisberger')).astype(int).idxmax()
luck_bye = ((QB['WEEK'] == 4) & (QB['PLAYER'] == 'Andrew Luck')).astype(int).idxmax()
QB = QB.drop([ben_bye, luck_bye], axis=0)

In [59]:
# Additional cleaning to be able to make this plot
# Create a df where rows will be players, and columns will be number of weeks in the top 8
qb_scoring_df = pd.DataFrame()
# For each week of the NFL season
for week in weeks:
    # Sort QBs for the week by how many fantasy points they scored (high to low), and take the top 8
    top8 = QB[QB['WEEK'] == week].sort_values('Actuals', ascending=False)[0:7]
    # for each player
    for player in top8['PLAYER']:
        # If they've already had a top 8 week
        if player in qb_scoring_df.index:
            # Add 1 to their total
            qb_scoring_df.loc[player, 'Occurences'] += 1
        else:
            # This is their first week in the top 8
            qb_scoring_df.loc[player, 'Occurences'] = 1

print('QBs with at least 1 top 8 week:', qb_scoring_df.shape[0])

# Add everyone without a week in the top 8 to the df (for visualization purposes)
for player in set(QB['PLAYER']):
    if player in qb_scoring_df.index:
        # Do nothing, they already made it in
        pass
    else:
        # This is their first week in the top 8
        qb_scoring_df.loc[player, 'Occurences'] = 0

# Add mean fpts scored for each player
qb_scoring_df = qb_scoring_df.sort_index(axis=0)
qb_scoring_df['Average'] = QB.groupby('PLAYER')['Actuals'].mean()

QBs with at least 1 top 8 week: 30


In [61]:
# Model
import statsmodels.api as sm

# Add a constant to the data (need mx + b, not just mx)
to_model = sm.add_constant(qb_scoring_df)
# Create input data (X), which is the constant and QB scoring
X = to_model.loc[:, ['const', 'Occurences']]
# Create output data (y), which is the DST scoring
y = to_model['Average']
# Fit a linear model to the DST vs QB data
qb_top_8_model = sm.OLS(y,X).fit()
# Print out a summary of the statistics of the model
qb_top_8_model.summary()

0,1,2,3
Dep. Variable:,Average,R-squared:,0.794
Model:,OLS,Adj. R-squared:,0.788
Method:,Least Squares,F-statistic:,142.4
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,3.02e-14
Time:,22:44:38,Log-Likelihood:,-93.941
No. Observations:,39,AIC:,191.9
Df Residuals:,37,BIC:,195.2
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,6.5008,0.670,9.701,0.000,5.143,7.859
Occurences,2.0916,0.175,11.932,0.000,1.736,2.447

0,1,2,3
Omnibus:,0.384,Durbin-Watson:,2.097
Prob(Omnibus):,0.825,Jarque-Bera (JB):,0.537
Skew:,-0.004,Prob(JB):,0.765
Kurtosis:,2.425,Cond. No.,6.02


### Model interpretation
- **What can we conclude about the parameters of the model?**: Number of top-8 weeks is predictive of fantasy point average for QBs. For each top-8 week, the QB's average points scored increases by 2 points
- **How certain/uncertain are we about them?**: Based on the probability from the t-test (P > |t|), we are nearly 100% confident that there is a significant, linear relationship between top-8 scoring weeks and QB scoring.
- **What conclusions can we make about the research question based on the model results?**: QBs with more top-8 weeks tend to have a higher average scoring. If a QB scores highly in 1 or 2 weeks, it's more likely that he will continue to score highly.
- **Explain the model in everyday language**: A QB with 0 finishes in the top-8 QBs will average 6.5 fantasy points (the equivalent of 162.5 passing yards and 0 touchdowns). For each top-8 QB peformance, the QB's average fantasy points scored increases by 2 fantasy points.

In [36]:
# Modeling defenses to see if the same relationship exists for DSTs
# Additional cleaning to be able to make this model
# Create a df where rows will be players, and columns will be number of weeks in the top 8
dst_scoring_df = pd.DataFrame()
# For each week of the NFL season
for week in weeks:
    # Sort QBs for the week by how many fantasy points they scored (high to low), and take the top 8
    top8 = DST[DST['WEEK'] == week].sort_values('Actuals', ascending=False)[0:7]
    # for each player
    for player in top8['TEAM']:
        # If they've already had a top 8 week
        if player in dst_scoring_df.index:
            # Add 1 to their total
            dst_scoring_df.loc[player, 'Occurences'] += 1
        else:
            # This is their first week in the top 8
            dst_scoring_df.loc[player, 'Occurences'] = 1

print('DSTs with at least 1 top 8 week:', dst_scoring_df.shape[0])

# Add everyone without a week in the top 8 to the df (for visualization purposes)
for player in set(DST['TEAM']):
    if player in dst_scoring_df.index:
        # Do nothing, they already made it in
        pass
    else:
        # They didn't make it in the top 8
        dst_scoring_df.loc[player, 'Occurences'] = 0

# Add mean fpts scored for each player
dst_scoring_df = dst_scoring_df.sort_index(axis=0)
dst_scoring_df['Average'] = DST.groupby('TEAM')['Actuals'].mean()

DSTs with at least 1 top 8 week: 29


In [63]:
import statsmodels.api as sm

# Add a constant to the data (need mx + b, not just mx)
to_model = sm.add_constant(dst_scoring_df)
# Create input data (X), which is the constant and QB scoring
X = to_model.loc[:, ['const', 'Occurences']]
# Create output data (y), which is the DST scoring
y = to_model['Average']
# Fit a linear model to the DST Average vs Occurences data
dst_top_8_model = sm.OLS(y,X).fit()
# Print out a summary of the statistics of the model
dst_top_8_model.summary()

0,1,2,3
Dep. Variable:,Average,R-squared:,0.64
Model:,OLS,Adj. R-squared:,0.628
Method:,Least Squares,F-statistic:,53.33
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,3.92e-08
Time:,22:44:53,Log-Likelihood:,-67.213
No. Observations:,32,AIC:,138.4
Df Residuals:,30,BIC:,141.4
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,6.0017,0.719,8.346,0.000,4.533,7.470
Occurences,1.2977,0.178,7.303,0.000,0.935,1.661

0,1,2,3
Omnibus:,1.35,Durbin-Watson:,2.109
Prob(Omnibus):,0.509,Jarque-Bera (JB):,1.278
Skew:,0.433,Prob(JB):,0.528
Kurtosis:,2.542,Cond. No.,8.44


### Model interpretation
- **What can we conclude about the parameters of the model?**: Number of top-8 weeks is predictive of fantasy point average for DSTs.
- **How certain/uncertain are we about them?**: Based on the probability from the t-test (P > |t|), we are nearly 100% confident that there is a significant, linear relationship between top-8 scoring weeks and DST scoring.
- **What conclusions can we make about the research question based on the model results?**: DSTs with more top-8 weeks tend to have a higher average scoring. If a DST scores highly in 1 or 2 weeks, it's more likely that the team will continue to score highly.
- **Explain the model in everyday language**: A DST with 0 finishes in the top-8 DSTs will average 6.0 fantasy points (the equivalent of allowing 14-20 points, sacking the QB 2 times, and forcing a fumble). For each top-8 DST peformance, the DST's average fantasy points scored increases by 1.3 fantasy points (the equivalent of allowing 3 less points, sacking the QB 1 more time, or catching an interception).

In [65]:
qb_top_8_model.summary()

0,1,2,3
Dep. Variable:,Average,R-squared:,0.794
Model:,OLS,Adj. R-squared:,0.788
Method:,Least Squares,F-statistic:,142.4
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,3.02e-14
Time:,22:45:40,Log-Likelihood:,-93.941
No. Observations:,39,AIC:,191.9
Df Residuals:,37,BIC:,195.2
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,6.5008,0.670,9.701,0.000,5.143,7.859
Occurences,2.0916,0.175,11.932,0.000,1.736,2.447

0,1,2,3
Omnibus:,0.384,Durbin-Watson:,2.097
Prob(Omnibus):,0.825,Jarque-Bera (JB):,0.537
Skew:,-0.004,Prob(JB):,0.765
Kurtosis:,2.425,Cond. No.,6.02


In [64]:
dst_top_8_model.summary()

0,1,2,3
Dep. Variable:,Average,R-squared:,0.64
Model:,OLS,Adj. R-squared:,0.628
Method:,Least Squares,F-statistic:,53.33
Date:,"Mon, 05 Dec 2022",Prob (F-statistic):,3.92e-08
Time:,22:45:38,Log-Likelihood:,-67.213
No. Observations:,32,AIC:,138.4
Df Residuals:,30,BIC:,141.4
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,6.0017,0.719,8.346,0.000,4.533,7.470
Occurences,1.2977,0.178,7.303,0.000,0.935,1.661

0,1,2,3
Omnibus:,1.35,Durbin-Watson:,2.109
Prob(Omnibus):,0.509,Jarque-Bera (JB):,1.278
Skew:,0.433,Prob(JB):,0.528
Kurtosis:,2.542,Cond. No.,8.44


### Model comparison
- **Model Metric 1: Log-Likelihood**: Based on log-likelihood, top-8 scoring outcomes are more likely to have an impact for DSTs than QBs
- **Model Metric 2: R-squared**: Based on R-squared, top-8 scoring outcomes account for more of the variation in average QB points scored than for DST points scored