NFL

Predicting overall fantasy points throughout the season based on pre-season performance

Predicting fantasy points at the end of a game based on performance at any given amount of minutes into the game

(both of these predictions should be done separetely for each position)

First I will try to just make these simple multiple regression problems and see how accurate I can get with that

Importing useful libraries

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from io import StringIO



Scrape individual player data

In [2]:
def scrape_player_data(year):
    url = f'https://www.pro-football-reference.com/years/{year}/fantasy.htm'
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find the table and convert it to a DataFrame
    table = soup.find('table', {'id': 'fantasy'})
    df = pd.read_html(StringIO(str(table)))[0]

    # Clean the DataFrame (remove multi-level headers, etc.)
    df.columns = df.columns.droplevel(0)  # Drop the first header level
    df = df.rename(columns={'Unnamed: 0_level_1': 'Player'})  # Rename player column
    df = df[df['Player'] != 'Player']  # Remove extra header rows
    return df


Get player data for the 2023 season

In [3]:
player_df_2023 = scrape_player_data(2023)

ImportError: Missing optional dependency 'lxml'.  Use pip or conda to install lxml.

In [None]:
# Display the first few rows of the DataFrame
print(player_df_2023.head())

# Display the variable names of the DataFrame
print(player_df_2023.columns)

: 

Preprocess the data to ensure it's suitable for regression analysis

In [None]:
# Clean the DataFrame (remove unnecessary columns, handle missing values, etc.)
# player_df = player_df[['Player', 'Tm', 'FantPos', 'G', 'Cmp', 'Att', 'Yds', 'TD', 'Int', 'Att', 'Yds', 'TD', 'Tgt', 'Rec', 'Yds', 'TD', 'FantPt']]

# Rename columns for clarity
new_names = ({'Tm': 'Team', 'FantPos': 'Position', 'G': 'Games', 
                                       'Cmp': 'PassingCompletions', 'Att': 'PassingAttempts', 'Yds': 'PassingYards', 
                                       'TD': 'PassingTD', 'Int': 'PassingInt', 'Att': 'RushingAttempts', 
                                       'Yds': 'RushingYards', 'TD': 'RushingTD', 'Tgt': 'ReceivingTargets', 
                                       'Rec': 'Receptions', 'Yds': 'ReceivingYards', 'TD': 'ReceivingTD', 
                                       'FantPt': 'FantasyPoints'
                                       })
# player_df = player_df.rename(columns = new_names)

: 

In [None]:
# Function that passes in a dataframe to give each player a unique identifier
def create_player_id(df):
    # remove * and + from player names
    df['PlayerID'] = df['Player'].str.replace('*', '')
    df['PlayerID'] = df['PlayerID'].str.replace('+', '')
    df['PlayerID'] = df['PlayerID'].str.split('\\').str[0]  # Remove special characters
    df['PlayerID'] = df['PlayerID'].str.lower()  # Convert to lowercase
    df['PlayerID'] = df['PlayerID'] + df.groupby('PlayerID').cumcount().astype(str)  # Add a count to handle duplicates
    return df

: 

In [None]:
# Apply the function to the player DataFrame
player_df_2023 = create_player_id(player_df_2023)

# Display the new player ID column
print(player_df_2023[['Player', 'PlayerID']])

# Convert columns to appropriate data types
player_df_2023 = player_df_2023.apply(pd.to_numeric)
player_df_2023['FantasyPoints'] = player_df_2023['FantasyPoints'].astype(float)

# Handle missing values (e.g., fill with 0 or use appropriate imputation method)
player_df_2023 = player_df_2023.fillna(0)

# Display the cleaned DataFrame
print(player_df_2023.head())

: 

In [None]:
# Store the current DataFrame as a CSV file
player_df_2023.to_csv('player_data_2023.csv', index=False)

: 

I attempted many times to find a good way to scrape lots of preseason data on individual player performance

Eventually I did find some good sources, however at that point I realized that even with good data the model to predict fantasy points througout the season using preseason performance just wouldn't be great because preseason games are so different from in season games, and the players that play in the preseason often get far less play in the actual season, therefore the predictions would likely underestimate performance for the players that do play a lot in both because it would be dragged down by players who play in the preseason but not in the regular season

So, instead I am going to shift my focus to creating visualizations of player fantasy points given their performance at a given time in the game, and their average fantasy points for this season and the previous seasons

My idea is to do this very simply by taking current fantasy points at x minutes into the game and then multiplying that by (total minutes in the game)/x then slightly altering that expectation by using the average fantasy points for this or last season to either drag up or down the prediction, this will effectively create a very simple time series forecast for the player's fantasy points

Time series forecasting for fantasy football points

In [None]:
# Read the PlayerID, FantasyPoints, Games, and Position columns from the CSV file
player_df_2023 = pd.read_csv('player_data_2023.csv', usecols=['PlayerID', 'FantasyPoints', 'Games', 'Position'])

# Create a new column for average fantasy points per game
player_df_2023['AvgFPPG'] = player_df_2023['FantasyPoints'] / player_df_2023['Games']

# Sort the DataFrame by average fantasy points per game in descending order
player_df_2023 = player_df_2023.sort_values(by='AvgFPPG', ascending=False)

# Display the first few rows of the DataFrame
print(player_df_2023.head())

# Store the updated DataFrame as a CSV file
player_df_2023.to_csv('AvgFPPG_2023.csv', index=False)

: 

Now repeating the same several steps for 2022

In [None]:
# Scraping data for 2022
player_df_2022 = scrape_player_data(2022)

# Clean the DataFrame (remove unnecessary columns, handle missing values, etc.)
player_df_2022 = player_df_2022[['Player', 'Tm', 'FantPos', 'G', 'Cmp', 'Att', 'Yds', 'TD', 'Int', 'Att', 'Yds', 'TD', 'Tgt', 'Rec', 'Yds', 'TD', 'FantPt']]

# Rename columns for clarity
player_df_2022 = player_df_2022.rename(columns = new_names)

# Apply the function to the player DataFrame
player_df_2022 = create_player_id(player_df_2022)

# Convert columns to appropriate data types
player_df_2022 = player_df_2022.apply(pd.to_numeric, errors='ignore')
player_df_2022['FantasyPoints'] = player_df_2022['FantasyPoints'].astype(float)

# Handle missing values (e.g., fill with 0 or use appropriate imputation method)
player_df_2022 = player_df_2022.fillna(0)

# Store the current DataFrame as a CSV file
player_df_2022.to_csv('player_data_2022.csv', index=False)

: 

In [None]:
# Read the PlayerID, FantasyPoints, Games, and Position columns from the CSV file
player_df_2022 = pd.read_csv('player_data_2022.csv', usecols=['PlayerID', 'FantasyPoints', 'Games', 'Position'])

# Create a new column for average fantasy points per game
player_df_2022['AvgFPPG'] = player_df_2022['FantasyPoints'] / player_df_2022['Games']

# Sort the DataFrame by average fantasy points per game in descending order
player_df_2022 = player_df_2022.sort_values(by='AvgFPPG', ascending=False)

# Display the first few rows of the DataFrame
print(player_df_2022.head())

# Store the updated DataFrame as a CSV file
player_df_2022.to_csv('AvgFPPG_2022.csv', index=False)

: 

In [None]:
import pandas as pd

# Read the CSV file into a new DataFrame
player_df_2022 = pd.read_csv('AvgFPPG_2022.csv')

# Create a DataFrame for each position
qb_df = player_df_2022[player_df_2022['Position'] == 'QB']
rb_df = player_df_2022[player_df_2022['Position'] == 'RB']
wr_df = player_df_2022[player_df_2022['Position'] == 'WR']
te_df = player_df_2022[player_df_2022['Position'] == 'TE']

# Display the first 50 rows of each

print(qb_df.head(50))
print(rb_df.head(50))
print(wr_df.head(50))
print(te_df.head())

: 

In [None]:
# Read the CSV file into a new DataFrame
player_df_2023 = pd.read_csv('AvgFPPG_2023.csv')

# Create a DataFrame for each position
qb_df = player_df_2023[player_df_2023['Position'] == 'QB']
rb_df = player_df_2023[player_df_2023['Position'] == 'RB']
wr_df = player_df_2023[player_df_2023['Position'] == 'WR']
te_df = player_df_2023[player_df_2023['Position'] == 'TE']

# Display the first few rows of each position DataFrame
print(qb_df.head(50))
print(rb_df.head(50))
print(wr_df.head(50))
print(te_df.head(50))

: 

Based on this data I think I will attempt the best draft possible for my league

The best order for drafting is<br>
Round 1: RB<br>RB<br>
Round 2: WR<br>WR<br>
Round 3: RB or WR<br>WR<br>
Round 4: RB or WR<br>RB<br>
Round 5: RB or WR (or really good TE)<br>QB<br>
Round 6: RB or WR (or really good TE)<br>WR(FLEX)<br>
Round 7: RB, WR, or TE<br>TE<br>
Round 8: QB, RB, or WR<br>WR<br>
Round 9: QB, RB, or WR (or really good TE)<br>RB<br>
Round 10: QB, RB, WR or TE<br>
Round 11: QB, RB, WR or TE<br>
Round 12: QB, RB, WR or TE<br>
Round 13: K<br>
Round 14: K or D/ST<br>
Round 15: K or D/ST<br>

