NFL

Predicting overall fantasy points throughout the season based on pre-season performance

Predicting fantasy points at the end of a game based on performance at any given amount of minutes into the game

(both of these predictions should be done separetely for each position)

First I will try to just make these simple multiple regression problems and see how accurate I can get with that

Importing useful libraries

In [5]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from io import StringIO

Scrape individual player data

In [6]:
def scrape_player_data(year):
    url = f'https://www.pro-football-reference.com/years/{year}/fantasy.htm'
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find the table and convert it to a DataFrame
    table = soup.find('table', {'id': 'fantasy'})
    df = pd.read_html(StringIO(str(table)))[0]

    # Clean the DataFrame (remove multi-level headers, etc.)
    df.columns = df.columns.droplevel(0)  # Drop the first header level
    df = df.rename(columns={'Unnamed: 0_level_1': 'Player'})  # Rename player column
    df = df[df['Player'] != 'Player']  # Remove extra header rows
    return df


Get player data for the 2023 season

In [None]:
# player_df = scrape_player_data(2023)

In [None]:
# Display the first few rows of the DataFrame
print(player_df.head())

# Display the variable names of the DataFrame
print(player_df.columns)

  Rk                 Player   Tm FantPos Age   G  GS  Cmp  Att   Yds  ...  TD  \
0  1  Christian McCaffrey*+  SFO      RB  27  16  16    0    0     0  ...  21   
1  2          CeeDee Lamb*+  DAL      WR  24  17  17    0    0     0  ...  14   
2  3             Josh Allen  BUF      QB  27  17  17  385  579  4306  ...  15   
3  4          Tyreek Hill*+  MIA      WR  29  16  16    0    0     0  ...  13   
4  5           Jalen Hurts*  PHI      QB  25  17  17  352  538  3858  ...  15   

   2PM  2PP FantPt    PPR   DKPt   FDPt  VBD PosRank OvRank  
0  NaN  NaN    324  391.3  399.3  357.8  157       1      1  
1    1  NaN    268  403.2  411.2  335.7  131       1      2  
2  NaN    3    393  392.6  420.6  410.6  122       1      3  
3  NaN  NaN    257  376.4  380.4  316.9  120       2      4  
4  NaN  NaN    357  356.8  382.8  371.8   89       2      5  

[5 rows x 33 columns]
Index(['Rk', 'Player', 'Tm', 'FantPos', 'Age', 'G', 'GS', 'Cmp', 'Att', 'Yds',
       'TD', 'Int', 'Att', 'Yds', 'Y/A'

Preprocess the data to ensure it's suitable for regression analysis

In [7]:
# Clean the DataFrame (remove unnecessary columns, handle missing values, etc.)
# player_df = player_df[['Player', 'Tm', 'FantPos', 'G', 'Cmp', 'Att', 'Yds', 'TD', 'Int', 'Att', 'Yds', 'TD', 'Tgt', 'Rec', 'Yds', 'TD', 'FantPt']]

# Rename columns for clarity
new_names = ({'Tm': 'Team', 'FantPos': 'Position', 'G': 'Games', 
                                       'Cmp': 'PassingCompletions', 'Att': 'PassingAttempts', 'Yds': 'PassingYards', 
                                       'TD': 'PassingTD', 'Int': 'PassingInt', 'Att': 'RushingAttempts', 
                                       'Yds': 'RushingYards', 'TD': 'RushingTD', 'Tgt': 'ReceivingTargets', 
                                       'Rec': 'Receptions', 'Yds': 'ReceivingYards', 'TD': 'ReceivingTD', 
                                       'FantPt': 'FantasyPoints'
                                       })
# player_df = player_df.rename(columns = new_names)

In [8]:
# Function that passes in a dataframe to give each player a unique identifier
def create_player_id(df):
    # remove * and + from player names
    df['PlayerID'] = df['Player'].str.replace('*', '')
    df['PlayerID'] = df['PlayerID'].str.replace('+', '')
    df['PlayerID'] = df['PlayerID'].str.split('\\').str[0]  # Remove special characters
    df['PlayerID'] = df['PlayerID'].str.lower()  # Convert to lowercase
    df['PlayerID'] = df['PlayerID'] + df.groupby('PlayerID').cumcount().astype(str)  # Add a count to handle duplicates
    return df

In [None]:
# Apply the function to the player DataFrame
player_df = create_player_id(player_df)

# Display the new player ID column
print(player_df[['Player', 'PlayerID']])

# Convert columns to appropriate data types
player_df = player_df.apply(pd.to_numeric, errors='ignore')
player_df['FantasyPoints'] = player_df['FantasyPoints'].astype(float)

# Handle missing values (e.g., fill with 0 or use appropriate imputation method)
player_df = player_df.fillna(0)

# Display the cleaned DataFrame
print(player_df.head())

Index(['Player', 'Team', 'Position', 'Games', 'PassingCompletions',
       'RushingAttempts', 'RushingAttempts', 'ReceivingYards',
       'ReceivingYards', 'ReceivingYards', 'ReceivingTD', 'ReceivingTD',
       'ReceivingTD', 'ReceivingTD', 'PassingInt', 'RushingAttempts',
       'RushingAttempts', 'ReceivingYards', 'ReceivingYards', 'ReceivingYards',
       'ReceivingTD', 'ReceivingTD', 'ReceivingTD', 'ReceivingTD',
       'ReceivingTargets', 'Receptions', 'ReceivingYards', 'ReceivingYards',
       'ReceivingYards', 'ReceivingTD', 'ReceivingTD', 'ReceivingTD',
       'ReceivingTD', 'FantasyPoints'],
      dtype='object')
                    Player              PlayerID
0    Christian McCaffrey*+  christian mccaffrey0
1            CeeDee Lamb*+          ceedee lamb0
2               Josh Allen           josh allen0
3            Tyreek Hill*+          tyreek hill0
4             Jalen Hurts*          jalen hurts0
..                     ...                   ...
647             Kyle Allen 

  player_df = player_df.apply(pd.to_numeric, errors='ignore')


In [None]:
# Store the current DataFrame as a CSV file
# player_df.to_csv('player_data_2023.csv', index=False)

I attempted many times to find a good way to scrape lots of preseason data on individual player performance

Eventually I did find some good sources, however at that point I realized that even with good data the model to predict fantasy points througout the season using preseason performance just wouldn't be great because preseason games are so different from in season games, and the players that play in the preseason often get far less play in the actual season, therefore the predictions would likely underestimate performance for the players that do play a lot in both because it would be dragged down by players who play in the preseason but not in the regular season

So, instead I am going to shift my focus to creating visualizations of player fantasy points given their performance at a given time in the game, and their average fantasy points for this season and the previous seasons

My idea is to do this very simply by taking current fantasy points at x minutes into the game and then multiplying that by (total minutes in the game)/x then slightly altering that expectation by using the average fantasy points for this or last season to either drag up or down the prediction, this will effectively create a very simple time series forecast for the player's fantasy points

Time series forecasting for fantasy football points

In [None]:
# Read the PlayerID, FantasyPoints, Games, and Position columns from the CSV file
player_df = pd.read_csv('player_data_2023.csv', usecols=['PlayerID', 'FantasyPoints', 'Games', 'Position'])

# Create a new column for average fantasy points per game
player_df['AvgFPPG'] = player_df['FantasyPoints'] / player_df['Games']

# Sort the DataFrame by average fantasy points per game in descending order
player_df = player_df.sort_values(by='AvgFPPG', ascending=False)

# Display the first few rows of the DataFrame
print(player_df.head())

# Store the updated DataFrame as a CSV file
player_df.to_csv('AvgFPPG_2023.csv', index=False)

In [None]:
# Read the CSV file into a new DataFrame
player_df = pd.read_csv('AvgFPPG.csv')

# Create a DataFrame for each position
qb_df = player_df[player_df['Position'] == 'QB']
rb_df = player_df[player_df['Position'] == 'RB']
wr_df = player_df[player_df['Position'] == 'WR']
te_df = player_df[player_df['Position'] == 'TE']

# Display the first few rows of each position DataFrame
print(qb_df.head())
print(rb_df.head())
print(wr_df.head())
print(te_df.head())

  Position  Games  FantasyPoints        PlayerID    AvgFPPG
0       QB     17          393.0     josh allen0  23.117647
1       QB     17          357.0    jalen hurts0  21.000000
2       QB     16          331.0  lamar jackson0  20.687500
4       QB      5          101.0     joe flacco0  20.200000
5       QB     17          343.0   dak prescott0  20.176471
   Position  Games  FantasyPoints              PlayerID    AvgFPPG
3        RB     16          324.0  christian mccaffrey0  20.250000
8        RB     12          223.0       kyren williams0  18.583333
20       RB     15          243.0       raheem mostert0  16.200000
29       RB     11          164.0        de'von achane0  14.909091
34       RB     10          137.0      jonathan taylor0  13.700000
   Position  Games  FantasyPoints            PlayerID    AvgFPPG
22       WR     16          257.0        tyreek hill0  16.062500
25       WR     17          268.0        ceedee lamb0  15.764706
39       WR     10          134.0   justin 

Now repeating the same several steps for 2022

In [10]:
# Scraping data for 2022
player_df_2022 = scrape_player_data(2022)

# Clean the DataFrame (remove unnecessary columns, handle missing values, etc.)
player_df_2022 = player_df_2022[['Player', 'Tm', 'FantPos', 'G', 'Cmp', 'Att', 'Yds', 'TD', 'Int', 'Att', 'Yds', 'TD', 'Tgt', 'Rec', 'Yds', 'TD', 'FantPt']]

# Rename columns for clarity
player_df_2022 = player_df_2022.rename(columns = new_names)

# Apply the function to the player DataFrame
player_df_2022 = create_player_id(player_df_2022)

# Convert columns to appropriate data types
player_df_2022 = player_df_2022.apply(pd.to_numeric, errors='ignore')
player_df_2022['FantasyPoints'] = player_df_2022['FantasyPoints'].astype(float)

# Handle missing values (e.g., fill with 0 or use appropriate imputation method)
player_df_2022 = player_df_2022.fillna(0)

# Store the current DataFrame as a CSV file
player_df_2022.to_csv('player_data_2022.csv', index=False)

In [11]:
# Read the PlayerID, FantasyPoints, Games, and Position columns from the CSV file
player_df = pd.read_csv('player_data_2023.csv', usecols=['PlayerID', 'FantasyPoints', 'Games', 'Position'])

# Create a new column for average fantasy points per game
player_df['AvgFPPG'] = player_df['FantasyPoints'] / player_df['Games']

# Sort the DataFrame by average fantasy points per game in descending order
player_df = player_df.sort_values(by='AvgFPPG', ascending=False)

# Display the first few rows of the DataFrame
print(player_df.head())

# Store the updated DataFrame as a CSV file
player_df.to_csv('AvgFPPG_2022.csv', index=False)

    Position  Games  FantasyPoints              PlayerID    AvgFPPG
2         QB     17          393.0           josh allen0  23.117647
4         QB     17          357.0          jalen hurts0  21.000000
9         QB     16          331.0        lamar jackson0  20.687500
0         RB     16          324.0  christian mccaffrey0  20.250000
126       QB      5          101.0           joe flacco0  20.200000


In [12]:
# Read the CSV file into a new DataFrame
player_df = pd.read_csv('AvgFPPG_2022.csv')

# Create a DataFrame for each position
qb_df = player_df[player_df['Position'] == 'QB']
rb_df = player_df[player_df['Position'] == 'RB']
wr_df = player_df[player_df['Position'] == 'WR']
te_df = player_df[player_df['Position'] == 'TE']

# Display the first few rows of each position DataFrame
print(qb_df.head())
print(rb_df.head())
print(wr_df.head())
print(te_df.head())

  Position  Games  FantasyPoints        PlayerID    AvgFPPG
0       QB     17          393.0     josh allen0  23.117647
1       QB     17          357.0    jalen hurts0  21.000000
2       QB     16          331.0  lamar jackson0  20.687500
4       QB      5          101.0     joe flacco0  20.200000
5       QB     17          343.0   dak prescott0  20.176471
   Position  Games  FantasyPoints              PlayerID    AvgFPPG
3        RB     16          324.0  christian mccaffrey0  20.250000
8        RB     12          223.0       kyren williams0  18.583333
20       RB     15          243.0       raheem mostert0  16.200000
29       RB     11          164.0        de'von achane0  14.909091
34       RB     10          137.0      jonathan taylor0  13.700000
   Position  Games  FantasyPoints            PlayerID    AvgFPPG
22       WR     16          257.0        tyreek hill0  16.062500
25       WR     17          268.0        ceedee lamb0  15.764706
39       WR     10          134.0   justin 

Based on this data I think I will attempt the best draft possible for my league

The best order for drafting is
Round 1: RB
Round 2: WR
Round 3: RB or WR
Round 4: RB or WR
Round 5: RB or WR (or really good TE)
Round 6: RB or WR (or really good TE)
Round 7: RB, WR, or TE
Round 8: QB, RB, or WR
Round 9: QB, RB, or WR (or really good TE)
Round 10: QB, RB, WR or TE
Round 11: QB, RB, WR or TE
Round 12: QB, RB, WR or TE
Round 13: K
Round 14: K or D/ST
Round 15: K or D/ST
