# Processing FanDuel and DraftKings CSVs
This notebook, the first step in the live deployment of our model, takes the player lists and processes them into a format that can be merged with the data that will be used for the model.<br>

This notebook can be repeated at various points during the week to update injury status, as that information is included in the FanDuel csv and will be transferred to the DraftKings csv.

In [1]:
import numpy as np
import pandas as pd
#from datetime import datetime, timedelta
import os
from functions import get_current_weekday, calculate_nfl_week, get_next_sunday, get_current_year

# Automating datestring
This function gets the date string we need to read the CSVs. It returns the date of the next Sunday. Then the following function uses the date to get the week number of the season.

In [2]:
day = get_current_weekday()

In [3]:
date_string = get_next_sunday(day)

In [4]:
date_string

'2024-09-22'

In [5]:
week = calculate_nfl_week(date_string)

In [6]:
season = get_current_year()

# Renaming CSVs
So that we can just download the CSVs and move them into our directory without renaming them, the code below will rename the FanDuel and DraftKings player lists.<br>

**This requires a little bit of housekeeping** as we first need to delete the previous week's FanDuel and DraftKings CSVs before downloading the new ones. Yes, it would be easier to automate the process of deleting the CSVs and then downloading the new ones, but we don't want to do any scraping on the FanDuel and DraftKings sites. If we do too much of it, we're blocked from playing.

In [7]:
# Get the list of files in the current directory
files = os.listdir()

In [8]:
# Loop through each file in the directory
for file_name in files:
    if file_name.startswith("FanDuel"):
        # Read the FanDuel CSV file
        fd_df = pd.read_csv(file_name)
        
        # Create the new filename
        new_fd_filename = f"FD_{date_string}_list.csv"
        
        # Save the DataFrame with the new filename
        fd_df.to_csv(new_fd_filename, index=False)
        
        print(f"Renamed FanDuel CSV to: {new_fd_filename}")

        # Delete the original FanDuel CSV file
        os.remove(file_name)
        print(f"Deleted original FanDuel CSV: {file_name}")
    
    elif file_name.startswith("DKSalaries"):
        # Read the DK CSV file
        dk_df = pd.read_csv(file_name)
        
        # Create the new filename
        new_dk_filename = f"DK_{date_string}_list.csv"
        
        # Save the DataFrame with the new filename
        dk_df.to_csv(new_dk_filename, index=False)
        
        print(f"Renamed DK CSV to: {new_dk_filename}")

        # Delete the original FanDuel CSV file
        os.remove(file_name)
        print(f"Deleted original FanDuel CSV: {file_name}")

Renamed DK CSV to: DK_2024-09-22_list.csv
Renamed FanDuel CSV to: FD_2024-09-22_list.csv


# Reading and processing FanDuel csv

In [9]:
fd_df = pd.read_csv('FD_' + date_string + '_list.csv')

In [10]:
fd_df.rename(columns = {'Nickname': 'Name'}, inplace = True)

In [11]:
fd_df = fd_df[['Id', 'Position', 'First Name', 'Name', 'Last Name', 'Salary', 'Game', 'Team', 'Opponent', 'Injury Indicator']]

In [12]:
fd_df[['Away_Team', 'Home_Team']] = fd_df['Game'].str.split('@', expand=True)

In [13]:
fd_df.drop(columns = ['Game'], inplace = True)

In [14]:
fd_df['home_team'] = np.where(fd_df['Team'] == fd_df['Home_Team'], 1, 0)

In [15]:
fd_df.drop(columns = ['Home_Team', 'Away_Team'], inplace = True)

In [16]:
#We might have to add more teams to this process
#This is just the Week 1 main slate so there might be more team string discrepancies between FanDuel and DraftKings
fd_df['Team'] = fd_df['Team'].replace({'JAC' : 'JAX'})
fd_df['Opponent'] = fd_df['Opponent'].replace({'JAC' : 'JAX'})

In [17]:
fd_df.head()

Unnamed: 0,Id,Position,First Name,Name,Last Name,Salary,Team,Opponent,Injury Indicator,home_team
0,107027-86631,WR,CeeDee,CeeDee Lamb,Lamb,9300,DAL,BAL,,1
1,107027-85671,WR,Justin,Justin Jefferson,Jefferson,9200,MIN,HOU,,1
2,107027-86997,WR,Amon-Ra,Amon-Ra St. Brown,St. Brown,9100,DET,ARI,,0
3,107027-53681,WR,Tyreek,Tyreek Hill,Hill,9000,MIA,SEA,,0
4,107027-63115,QB,Lamar,Lamar Jackson,Jackson,8800,BAL,DAL,,0


# Reading and processing DraftKings csv

In [18]:
dk_df = pd.read_csv('DK_' + date_string + '_list.csv')

In [19]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 587 entries, 0 to 586
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Position          587 non-null    object 
 1   Name + ID         587 non-null    object 
 2   Name              587 non-null    object 
 3   ID                587 non-null    int64  
 4   Roster Position   587 non-null    object 
 5   Salary            587 non-null    int64  
 6   Game Info         587 non-null    object 
 7   TeamAbbrev        587 non-null    object 
 8   AvgPointsPerGame  587 non-null    float64
dtypes: float64(1), int64(2), object(6)
memory usage: 41.4+ KB


In [20]:
##We'll need to match FD and DK names so we can merge FD injury designations to DK.
#Then hopefully the names we have will match the names used for the model.
#Then we'll pare down the DK columns, create a mini df for home teams and merge that with DK
#The goal of this notebook should be to have FD and DK dataframes that include:
#full name, salary, position, team, opponent, home team (binary), injury designation, id (just in case)

This dictionary and function will be used later to convert team names to abbrevation strings

In [21]:
nfl_teams = [
    'Bills', 'Dolphins', 'Patriots', 'Jets',
    'Bengals', 'Browns', 'Ravens', 'Steelers',
    'Texans', 'Colts', 'Jaguars', 'Titans',
    'Broncos', 'Chiefs', 'Raiders', 'Chargers',
    'Cowboys', 'Giants', 'Eagles', 'Commanders',
    'Bears', 'Lions', 'Packers', 'Vikings',
    'Falcons', 'Panthers', 'Saints', 'Buccaneers',
    'Cardinals', 'Rams', '49ers', 'Seahawks'
]

In [22]:
team_abbreviations = [
    'BUF', 'MIA', 'NE', 'NYJ',
    'CIN', 'CLE', 'BAL', 'PIT',
    'HOU', 'IND', 'JAX', 'TEN',
    'DEN', 'KC', 'LV', 'LAC',
    'DAL', 'NYG', 'PHI', 'WAS',
    'CHI', 'DET', 'GB', 'MIN',
    'ATL', 'CAR', 'NO', 'TB',
    'ARI', 'LAR', 'SF', 'SEA'
]

In [23]:
# Creating the dictionary
team_abbrev = dict(zip(nfl_teams, team_abbreviations))

In [24]:
team_abbrev

{'Bills': 'BUF',
 'Dolphins': 'MIA',
 'Patriots': 'NE',
 'Jets': 'NYJ',
 'Bengals': 'CIN',
 'Browns': 'CLE',
 'Ravens': 'BAL',
 'Steelers': 'PIT',
 'Texans': 'HOU',
 'Colts': 'IND',
 'Jaguars': 'JAX',
 'Titans': 'TEN',
 'Broncos': 'DEN',
 'Chiefs': 'KC',
 'Raiders': 'LV',
 'Chargers': 'LAC',
 'Cowboys': 'DAL',
 'Giants': 'NYG',
 'Eagles': 'PHI',
 'Commanders': 'WAS',
 'Bears': 'CHI',
 'Lions': 'DET',
 'Packers': 'GB',
 'Vikings': 'MIN',
 'Falcons': 'ATL',
 'Panthers': 'CAR',
 'Saints': 'NO',
 'Buccaneers': 'TB',
 'Cardinals': 'ARI',
 'Rams': 'LAR',
 '49ers': 'SF',
 'Seahawks': 'SEA'}

In [25]:
# Function to convert team names to abbreviations, leaving other values unchanged
def convert_team_names(opponent):
    return team_abbrev.get(opponent, opponent)

In [26]:
fd_def = fd_df[fd_df['Position'] == 'D']

# Name matching
Name matching will be a chore. Let's start with defense, where DK uses just team name while FD uses city and name. We'll change it to just names in FD.<br>

The following line of code takes the city names off defense for FD.

In [27]:
fd_df['Name'] = np.where(fd_df['Position'] == 'D', fd_df['Last Name'], fd_df['Name'])

In [28]:
dk_df['Name'] = dk_df['Name'].str.strip()

# Merging columns to DK
We'll merge Opponent, home_team (binary) and Status (injury) to the DK dataframe.

In [29]:
fd_df.rename(columns = {'Injury Indicator': 'Status', 'Id': 'ID'}, inplace = True)

In [30]:
dk_df.rename(columns = {'TeamAbbrev': 'Team'}, inplace = True)

In [31]:
fd_merge = fd_df[['Name', 'Team', 'Opponent', 'home_team', 'Status']]

In [32]:
dk_df = pd.merge(dk_df, fd_merge, on = ['Name', 'Team'], how = 'left')

Now we still have to fill missing Opponent and home_team values in DK for the players not in the FD list.

In [33]:
team_dict = {team: (group['Opponent'].values[0], group['home_team'].values[0]) 
             for team, group in fd_df.groupby('Team')}

In [34]:
dk_df['Opponent_filled'] = dk_df['Team'].map(lambda x: team_dict.get(x, (None, None))[0])
dk_df['home_team_filled'] = dk_df['Team'].map(lambda x: team_dict.get(x, (None, None))[1])

# Fill missing values in the original columns with the new columns
dk_df['Opponent'].fillna(dk_df['Opponent_filled'], inplace=True)
dk_df['home_team'].fillna(dk_df['home_team_filled'], inplace=True)

# Drop the temporary filled columns
dk_df.drop(columns=['Opponent_filled', 'home_team_filled'], inplace=True)

In [35]:
dk_df = dk_df[['ID', 'Name', 'Position', 'Salary', 'Team', 'Opponent', 'home_team', 'Status']]
fd_df = fd_df[['ID', 'Name', 'Position', 'Salary', 'Team', 'Opponent', 'home_team', 'Status']]

In [36]:
#Adding date and week indicator columns
fd_df['Date'] = pd.to_datetime(date_string).strftime('%m-%d-%Y')
dk_df['Date'] = pd.to_datetime(date_string).strftime('%m-%d-%Y')
fd_df['Week'] = week
dk_df['Week'] = week

In [37]:
fd_df.head()

Unnamed: 0,ID,Name,Position,Salary,Team,Opponent,home_team,Status,Date,Week
0,107027-86631,CeeDee Lamb,WR,9300,DAL,BAL,1,,09-22-2024,3
1,107027-85671,Justin Jefferson,WR,9200,MIN,HOU,1,,09-22-2024,3
2,107027-86997,Amon-Ra St. Brown,WR,9100,DET,ARI,0,,09-22-2024,3
3,107027-53681,Tyreek Hill,WR,9000,MIA,SEA,0,,09-22-2024,3
4,107027-63115,Lamar Jackson,QB,8800,BAL,DAL,0,,09-22-2024,3


In [38]:
dk_df.head()

Unnamed: 0,ID,Name,Position,Salary,Team,Opponent,home_team,Status,Date,Week
0,35940249,CeeDee Lamb,WR,8800,DAL,BAL,1.0,,09-22-2024,3
1,35940251,Justin Jefferson,WR,8600,MIN,HOU,1.0,,09-22-2024,3
2,35939981,Christian McCaffrey,RB,8500,SF,LAR,0.0,IR,09-22-2024,3
3,35940253,Tyreek Hill,WR,8400,MIA,SEA,0.0,,09-22-2024,3
4,35940255,Amon-Ra St. Brown,WR,8200,DET,ARI,0.0,,09-22-2024,3


In [39]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 587 entries, 0 to 586
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   ID         587 non-null    int64  
 1   Name       587 non-null    object 
 2   Position   587 non-null    object 
 3   Salary     587 non-null    int64  
 4   Team       587 non-null    object 
 5   Opponent   587 non-null    object 
 6   home_team  587 non-null    float64
 7   Status     80 non-null     object 
 8   Date       587 non-null    object 
 9   Week       587 non-null    int64  
dtypes: float64(1), int64(3), object(6)
memory usage: 46.0+ KB


In [40]:
fd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 880 entries, 0 to 879
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   ID         880 non-null    object
 1   Name       880 non-null    object
 2   Position   880 non-null    object
 3   Salary     880 non-null    int64 
 4   Team       880 non-null    object
 5   Opponent   880 non-null    object
 6   home_team  880 non-null    int32 
 7   Status     147 non-null    object
 8   Date       880 non-null    object
 9   Week       880 non-null    int64 
dtypes: int32(1), int64(2), object(7)
memory usage: 65.4+ KB


In [41]:
dk_df[dk_df[['Opponent', 'home_team']].isnull().any(axis = 1)]

Unnamed: 0,ID,Name,Position,Salary,Team,Opponent,home_team,Status,Date,Week


In [42]:
#Drops all free agents since they can't be rostered
dk_df = dk_df[dk_df['Team'] != 'FA']
fd_df = fd_df[fd_df['Team'] != 'FA']

In [43]:
#Filling Status missing values with Active, which we'll assume since there is no injury designation.
dk_df['Status'].fillna('Active', inplace=True)
fd_df['Status'].fillna('Active', inplace = True)

In [44]:
dk_df['Position'] = dk_df['Position'].replace({'DST':'D'})
dk_df['home_team'] = dk_df['home_team'].astype('int')
dk_df.reset_index(drop = True, inplace = True)

In [45]:
# Converting team names in opponent columns for FD and DK
fd_df['Name'] = fd_df['Name'].apply(convert_team_names)

In [46]:
dk_df['Name'] = dk_df['Name'].apply(convert_team_names)

In [47]:
fd_df.rename(columns = {'Name': 'name', 'Position': 'position', 'Salary': 'salary', 'Team': 'team', 'Opponent': 'opponent',\
                       'Status': 'status', 'Date': 'date', 'Week': 'week'}, inplace = True)

In [48]:
dk_df.rename(columns = {'Name': 'name', 'Position': 'position', 'Salary': 'salary', 'Team': 'team', 'Opponent': 'opponent',\
                       'Status': 'status', 'Date': 'date', 'Week': 'week'}, inplace = True)

In [49]:
fd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 880 entries, 0 to 879
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   ID         880 non-null    object
 1   name       880 non-null    object
 2   position   880 non-null    object
 3   salary     880 non-null    int64 
 4   team       880 non-null    object
 5   opponent   880 non-null    object
 6   home_team  880 non-null    int32 
 7   status     880 non-null    object
 8   date       880 non-null    object
 9   week       880 non-null    int64 
dtypes: int32(1), int64(2), object(7)
memory usage: 65.4+ KB


In [50]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 587 entries, 0 to 586
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   ID         587 non-null    int64 
 1   name       587 non-null    object
 2   position   587 non-null    object
 3   salary     587 non-null    int64 
 4   team       587 non-null    object
 5   opponent   587 non-null    object
 6   home_team  587 non-null    int32 
 7   status     587 non-null    object
 8   date       587 non-null    object
 9   week       587 non-null    int64 
dtypes: int32(1), int64(3), object(6)
memory usage: 43.7+ KB


In [51]:
fd_df.head()

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
0,107027-86631,CeeDee Lamb,WR,9300,DAL,BAL,1,Active,09-22-2024,3
1,107027-85671,Justin Jefferson,WR,9200,MIN,HOU,1,Active,09-22-2024,3
2,107027-86997,Amon-Ra St. Brown,WR,9100,DET,ARI,0,Active,09-22-2024,3
3,107027-53681,Tyreek Hill,WR,9000,MIA,SEA,0,Active,09-22-2024,3
4,107027-63115,Lamar Jackson,QB,8800,BAL,DAL,0,Active,09-22-2024,3


In [52]:
dk_df.head()

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
0,35940249,CeeDee Lamb,WR,8800,DAL,BAL,1,Active,09-22-2024,3
1,35940251,Justin Jefferson,WR,8600,MIN,HOU,1,Active,09-22-2024,3
2,35939981,Christian McCaffrey,RB,8500,SF,LAR,0,IR,09-22-2024,3
3,35940253,Tyreek Hill,WR,8400,MIA,SEA,0,Active,09-22-2024,3
4,35940255,Amon-Ra St. Brown,WR,8200,DET,ARI,0,Active,09-22-2024,3


In [53]:
dk_df.tail()

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
582,35940805,Miller Forristall,TE,2500,LAR,SF,1,Active,09-22-2024,3
583,35940973,MIN,D,2400,MIN,HOU,1,Active,09-22-2024,3
584,35940974,CAR,D,2400,CAR,LV,0,Active,09-22-2024,3
585,35940975,ARI,D,2300,ARI,DET,1,Active,09-22-2024,3
586,35940976,LAR,D,2300,LAR,SF,1,Active,09-22-2024,3


In [54]:
fd_names = set(list(fd_df['name']))

In [55]:
dk_names = set(list(dk_df['name']))

In [56]:
fd_not_in_dk = list(fd_names.difference(dk_names))

In [57]:
dk_not_in_fd = list(dk_names.difference(fd_names))

In [58]:
dk_names.difference(fd_names)

{'Evan Deckers', 'Sione Vaki', 'Zach Wood'}

In [59]:
#It seems as though FanDuel just naturally lists more available players than DraftKings.
#FanDuel now includes Sunday night games in its main slate, but that alone doesn't account
#for all the names in the FanDuel list that aren't in the DraftKings list.
#What really matters is whether these names for both sites match up 
#with the way the names are written in the nfl_data_py package

# Adding to database
This will put the data we've just wrangled into our database, and it will be joined with the features used to make predictions.

In [60]:
import sqlite3

# Establish the connection to the database
conn = sqlite3.connect('nfl_dfs.db')

# Define the table names for FanDuel and DraftKings
fd_table_name = 'fd_table_' + str(week) + '_24'
dk_table_name = 'dk_table_' + str(week) + '_24'

# Specify data types for the FanDuel dataframe
fd_dtype = {
    'ID': 'TEXT',
    'Name': 'TEXT',
    'Position': 'TEXT',
    'Salary': 'INTEGER',
    'Team': 'TEXT',
    'Opponent': 'TEXT',
    'home_team': 'INTEGER',
    'Status': 'TEXT',
    'Date': 'TEXT',
    'Week': 'INTEGER'
}

# Write the FanDuel dataframe to the SQLite table
fd_df.to_sql(fd_table_name, conn, if_exists='replace', index=False, dtype=fd_dtype)

# Confirm that the FanDuel data has been written
print(f"Data written to table {fd_table_name} in SQLite database nfl_dfs.db")

# Specify data types for the DraftKings dataframe
dk_dtype = {
    'ID': 'INTEGER',
    'Name': 'TEXT',
    'Position': 'TEXT',
    'Salary': 'INTEGER',
    'Team': 'TEXT',
    'Opponent': 'TEXT',
    'home_team': 'INTEGER',
    'Status': 'TEXT',
    'Date': 'TEXT',
    'Week': 'INTEGER'
}

# Write the DraftKings dataframe to the SQLite table
dk_df.to_sql(dk_table_name, conn, if_exists='replace', index=False, dtype=dk_dtype)

# Confirm that the DraftKings data has been written
print(f"Data written to table {dk_table_name} in SQLite database nfl_dfs.db")

# Close the connection
conn.close()


Data written to table fd_table_3_24 in SQLite database nfl_dfs.db
Data written to table dk_table_3_24 in SQLite database nfl_dfs.db


In [61]:
# fd_df.to_csv('FD_' + date_string + '_prep.csv', index = False)
# dk_df.to_csv('DK_' + date_string + '_prep.csv', index = False)