# Processing FanDuel and DraftKings CSVs
This notebook, the first step in the live deployment of our model, takes the player lists and processes them into a format that can be merged with the data that will be used for the model.<br>

This notebook can be repeated at various points during the week to update injury status, as that information is included in the FanDuel csv and will be transferred to the DraftKings csv.

In [1]:
import numpy as np
import pandas as pd
#from datetime import datetime, timedelta
import os
from functions import get_current_weekday, calculate_nfl_week, get_next_sunday, get_current_year

In [2]:
import re

# Automating datestring
This function gets the date string we need to read the CSVs. It returns the date of the next Sunday. Then the following function uses the date to get the week number of the season.

In [3]:
day = get_current_weekday()

In [4]:
date_string = get_next_sunday(day)

In [5]:
date_string

'2024-10-20'

In [6]:
week = calculate_nfl_week(date_string)

In [7]:
season = get_current_year()

# Renaming CSVs
So that we can just download the CSVs and move them into our directory without renaming them, the code below will rename the FanDuel and DraftKings player lists.<br>

**This requires a little bit of housekeeping** as we first need to delete the previous week's FanDuel and DraftKings CSVs before downloading the new ones. Yes, it would be easier to automate the process of deleting the CSVs and then downloading the new ones, but we don't want to do any scraping on the FanDuel and DraftKings sites. If we do too much of it, we're blocked from playing.

In [8]:
# Get the list of files in the current directory
files = os.listdir()

In [9]:
# Loop through each file in the directory
for file_name in files:
    if file_name.startswith("FanDuel"):
        # Read the FanDuel CSV file
        fd_df = pd.read_csv(file_name)
        
        # Create the new filename
        new_fd_filename = f"FD_{date_string}_list.csv"
        
        # Save the DataFrame with the new filename
        fd_df.to_csv(new_fd_filename, index=False)
        
        print(f"Renamed FanDuel CSV to: {new_fd_filename}")

        # Delete the original FanDuel CSV file
        os.remove(file_name)
        print(f"Deleted original FanDuel CSV: {file_name}")
    
    elif file_name.startswith("DKSalaries"):
        # Read the DK CSV file
        dk_df = pd.read_csv(file_name)
        
        # Create the new filename
        new_dk_filename = f"DK_{date_string}_list.csv"
        
        # Save the DataFrame with the new filename
        dk_df.to_csv(new_dk_filename, index=False)
        
        print(f"Renamed DK CSV to: {new_dk_filename}")

        # Delete the original FanDuel CSV file
        os.remove(file_name)
        print(f"Deleted original FanDuel CSV: {file_name}")

Renamed DK CSV to: DK_2024-10-20_list.csv
Deleted original FanDuel CSV: DKSalaries (14).csv
Renamed FanDuel CSV to: FD_2024-10-20_list.csv
Deleted original FanDuel CSV: FanDuel-NFL-2024 ET-10 ET-20 ET-108151-players-list (1).csv


# Reading and processing FanDuel csv

In [10]:
fd_df = pd.read_csv('FD_' + date_string + '_list.csv')

In [11]:
fd_df.rename(columns = {'Nickname': 'Name'}, inplace = True)

In [12]:
fd_df = fd_df[['Id', 'Position', 'First Name', 'Name', 'Last Name', 'Salary', 'Game', 'Team', 'Opponent', 'Injury Indicator']]

In [13]:
fd_df[['Away_Team', 'Home_Team']] = fd_df['Game'].str.split('@', expand=True)

In [14]:
fd_df.drop(columns = ['Game'], inplace = True)

In [15]:
fd_df['home_team'] = np.where(fd_df['Team'] == fd_df['Home_Team'], 1, 0)

In [16]:
fd_df.drop(columns = ['Home_Team', 'Away_Team'], inplace = True)

In [17]:
#Replace JAC with JAX so that team abbreviations for Jaguars match
fd_df['Team'] = fd_df['Team'].replace({'JAC' : 'JAX'})
fd_df['Opponent'] = fd_df['Opponent'].replace({'JAC' : 'JAX'})

In [18]:
fd_df.head()

Unnamed: 0,Id,Position,First Name,Name,Last Name,Salary,Team,Opponent,Injury Indicator,home_team
0,108151-85671,WR,Justin,Justin Jefferson,Jefferson,9400,MIN,DET,,1
1,108151-85701,WR,Ja'Marr,Ja'Marr Chase,Chase,9300,CIN,CLE,,0
2,108151-62239,QB,Josh,Josh Allen,Allen,9200,BUF,TEN,,1
3,108151-102785,QB,Jayden,Jayden Daniels,Daniels,9100,WAS,CAR,,1
4,108151-64401,RB,Saquon,Saquon Barkley,Barkley,9000,PHI,NYG,,0


In [19]:
fd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 725 entries, 0 to 724
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Id                725 non-null    object
 1   Position          725 non-null    object
 2   First Name        725 non-null    object
 3   Name              725 non-null    object
 4   Last Name         725 non-null    object
 5   Salary            725 non-null    int64 
 6   Team              725 non-null    object
 7   Opponent          725 non-null    object
 8   Injury Indicator  134 non-null    object
 9   home_team         725 non-null    int32 
dtypes: int32(1), int64(1), object(8)
memory usage: 53.9+ KB


# Reading and processing DraftKings csv

In [20]:
dk_df = pd.read_csv('DK_' + date_string + '_list.csv')

In [21]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 498 entries, 0 to 497
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Position          498 non-null    object 
 1   Name + ID         498 non-null    object 
 2   Name              498 non-null    object 
 3   ID                498 non-null    int64  
 4   Roster Position   498 non-null    object 
 5   Salary            498 non-null    int64  
 6   Game Info         498 non-null    object 
 7   TeamAbbrev        498 non-null    object 
 8   AvgPointsPerGame  498 non-null    float64
dtypes: float64(1), int64(2), object(6)
memory usage: 35.1+ KB


In [22]:
##We'll need to match FD and DK names so we can merge FD injury designations to DK.
#Then hopefully the names we have will match the names used for the model.
#Then we'll pare down the DK columns, create a mini df for home teams and merge that with DK
#The goal of this notebook should be to have FD and DK dataframes that include:
#full name, salary, position, team, opponent, home team (binary), injury designation, id (just in case)

This dictionary and function will be used later to convert team names to abbrevation strings

In [23]:
nfl_teams = [
    'Bills', 'Dolphins', 'Patriots', 'Jets',
    'Bengals', 'Browns', 'Ravens', 'Steelers',
    'Texans', 'Colts', 'Jaguars', 'Titans',
    'Broncos', 'Chiefs', 'Raiders', 'Chargers',
    'Cowboys', 'Giants', 'Eagles', 'Commanders',
    'Bears', 'Lions', 'Packers', 'Vikings',
    'Falcons', 'Panthers', 'Saints', 'Buccaneers',
    'Cardinals', 'Rams', '49ers', 'Seahawks'
]

In [24]:
team_abbreviations = [
    'BUF', 'MIA', 'NE', 'NYJ',
    'CIN', 'CLE', 'BAL', 'PIT',
    'HOU', 'IND', 'JAX', 'TEN',
    'DEN', 'KC', 'LV', 'LAC',
    'DAL', 'NYG', 'PHI', 'WAS',
    'CHI', 'DET', 'GB', 'MIN',
    'ATL', 'CAR', 'NO', 'TB',
    'ARI', 'LAR', 'SF', 'SEA'
]

In [25]:
# Creating the dictionary
team_abbrev = dict(zip(nfl_teams, team_abbreviations))

In [26]:
team_abbrev

{'Bills': 'BUF',
 'Dolphins': 'MIA',
 'Patriots': 'NE',
 'Jets': 'NYJ',
 'Bengals': 'CIN',
 'Browns': 'CLE',
 'Ravens': 'BAL',
 'Steelers': 'PIT',
 'Texans': 'HOU',
 'Colts': 'IND',
 'Jaguars': 'JAX',
 'Titans': 'TEN',
 'Broncos': 'DEN',
 'Chiefs': 'KC',
 'Raiders': 'LV',
 'Chargers': 'LAC',
 'Cowboys': 'DAL',
 'Giants': 'NYG',
 'Eagles': 'PHI',
 'Commanders': 'WAS',
 'Bears': 'CHI',
 'Lions': 'DET',
 'Packers': 'GB',
 'Vikings': 'MIN',
 'Falcons': 'ATL',
 'Panthers': 'CAR',
 'Saints': 'NO',
 'Buccaneers': 'TB',
 'Cardinals': 'ARI',
 'Rams': 'LAR',
 '49ers': 'SF',
 'Seahawks': 'SEA'}

In [27]:
# Function to convert team names to abbreviations, leaving other values unchanged
def convert_team_names(opponent):
    return team_abbrev.get(opponent, opponent)

In [28]:
fd_def = fd_df[fd_df['Position'] == 'D']

# Name matching
Name matching will be a chore. Let's start with defense, where DK uses just team name while FD uses city and name. We'll change it to just names in FD.<br>

The following line of code takes the city names off defense for FD.

In [29]:
fd_df['Name'] = np.where(fd_df['Position'] == 'D', fd_df['Last Name'], fd_df['Name'])

In [30]:
dk_df['Name'] = dk_df['Name'].str.strip()

# Merging columns to DK
We'll merge Opponent, home_team (binary) and Status (injury) to the DK dataframe. But first, we have to take suffixes off all names and then match the names.

In [31]:
fd_df.rename(columns = {'Injury Indicator': 'Status', 'Id': 'ID'}, inplace = True)

In [32]:
dk_df.rename(columns = {'TeamAbbrev': 'Team'}, inplace = True)

In [33]:
fd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 725 entries, 0 to 724
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   ID          725 non-null    object
 1   Position    725 non-null    object
 2   First Name  725 non-null    object
 3   Name        725 non-null    object
 4   Last Name   725 non-null    object
 5   Salary      725 non-null    int64 
 6   Team        725 non-null    object
 7   Opponent    725 non-null    object
 8   Status      134 non-null    object
 9   home_team   725 non-null    int32 
dtypes: int32(1), int64(1), object(8)
memory usage: 53.9+ KB


In [34]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 498 entries, 0 to 497
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Position          498 non-null    object 
 1   Name + ID         498 non-null    object 
 2   Name              498 non-null    object 
 3   ID                498 non-null    int64  
 4   Roster Position   498 non-null    object 
 5   Salary            498 non-null    int64  
 6   Game Info         498 non-null    object 
 7   Team              498 non-null    object 
 8   AvgPointsPerGame  498 non-null    float64
dtypes: float64(1), int64(2), object(6)
memory usage: 35.1+ KB


In [35]:
def clean_name(name):
    # Remove periods between initials like C.J., D.J. (case-sensitive)
    name = re.sub(r'\b([A-Z])\.\s*([A-Z])\.\b', r'\1\2', name)
    
    # Remove common suffixes like Jr., Sr., III, II, IV (case-sensitive)
    cleaned_name = re.sub(r'(\,|\.|Sr|Jr|III|II|IV)', '', name).strip()
    
    return cleaned_name

In [36]:
fd_df['Name'] = fd_df['Name'].apply(clean_name)
dk_df['Name'] = dk_df['Name'].apply(clean_name)

In [37]:
from rapidfuzz import process, fuzz

In [38]:
SIMILARITY_THRESHOLD = 80

In [39]:
def fuzzy_match(name, dk_names):
    match, score, _ = process.extractOne(name, dk_names, scorer=fuzz.token_sort_ratio)
    return match if score >= SIMILARITY_THRESHOLD else None

In [40]:
dk_names = dk_df['Name']

In [41]:
len(dk_names)

498

This checks for names in DraftKings that have a matching score of at least 80 with names in FanDuel, and adds a column to the FanDuel dataframe.

In [42]:
fd_df['matched_name'] = fd_df['Name'].apply(lambda x: fuzzy_match(x, dk_names))

Now we filter the dataframe for FanDuel names that don't have DraftKings matches.

In [43]:
unmatched_in_fd = fd_df[fd_df['matched_name'].isna()]

We filter out the Sunday night teams since DraftKings does not include the Sunday night game in the main slate.

In [44]:
unmatched_in_fd = unmatched_in_fd[~((unmatched_in_fd['Team'] == 'NYJ') | (unmatched_in_fd['Team'] == 'PIT'))]

In [45]:
unmatched_in_fd.reset_index(inplace = True, drop = True)

In [46]:
unmatched_in_fd.sort_values(by = ['Salary'], ascending = False)

Unnamed: 0,ID,Position,First Name,Name,Last Name,Salary,Team,Opponent,Status,home_team,matched_name
0,108151-89147,QB,John,John Paddock,Paddock,6000,ATL,SEA,,1,
10,108151-52929,QB,Jake,Jake Luton,Luton,6000,CAR,WAS,Q,0,
1,108151-114763,QB,Gavin,Gavin Hardison,Hardison,6000,MIA,IND,,0,
18,108151-63469,QB,Ben,Ben DiNucci,DiNucci,6000,BUF,TEN,,1,
16,108151-56286,QB,Trace,Trace McSorley,McSorley,6000,WAS,CAR,,1,
...,...,...,...,...,...,...,...,...,...,...,...
56,108151-102735,TE,Trey,Trey Knox,Knox,4000,MIN,DET,Q,1,
55,108151-90408,WR,Dax,Dax Milne,Milne,4000,LV,LAR,,0,
54,108151-131328,RB,Zach,Zach Evans,Evans,4000,LAR,LV,,1,
53,108151-61693,TE,Danny,Danny Pinter,Pinter,4000,IND,MIA,,1,


Since 4000 is the min FLEX salary on FanDuel and 6000 is the min QB salary, most unmatched names with those salaries are probably players who wouldn't be rostered anyway.

In [47]:
unmatched_in_fd['Salary'].value_counts()

Salary
4000    133
6000     19
Name: count, dtype: int64

Now we check for names with a matching score of 80 or more, and see if it's the same player.

In [48]:
fd_df[(fd_df['Name'] != fd_df['matched_name']) & (fd_df['matched_name'].notna())]

Unnamed: 0,ID,Position,First Name,Name,Last Name,Salary,Team,Opponent,Status,home_team,matched_name
146,108151-158412,RB,Braelon,Braelon Allen,Allen,6000,NYJ,PIT,,0,Brandon Allen
200,108151-39829,WR,Mike,Mike Williams,Williams,5400,NYJ,PIT,,0,Malik Willis
278,108151-133477,TE,Darnell,Darnell Washington,Washington,4500,PIT,NYJ,,1,Montrell Washington
468,108151-90480,TE,Joel,Joel Wilson,Wilson,4000,NYG,PHI,,1,Jeff Wilson
505,108151-55748,WR,James,James Washington,Washington,4000,ATL,SEA,,1,Casey Washington
516,108151-88547,WR,Jalen,Jalen Wayne,Wayne,4000,GB,HOU,,1,Jared Wayne
620,108151-26355,TE,Logan,Logan Thomas,Thomas,4000,SF,KC,,1,Ian Thomas


In [49]:
fd_merge = fd_df[['Name', 'Team', 'Opponent', 'home_team', 'Status']]

In [50]:
dk_df = pd.merge(dk_df, fd_merge, on = ['Name', 'Team'], how = 'left')

Now we still have to fill missing Opponent and home_team values in DK for the players not in the FD list.

In [51]:
team_dict = {team: (group['Opponent'].values[0], group['home_team'].values[0]) 
             for team, group in fd_df.groupby('Team')}

In [52]:
dk_df['Opponent_filled'] = dk_df['Team'].map(lambda x: team_dict.get(x, (None, None))[0])
dk_df['home_team_filled'] = dk_df['Team'].map(lambda x: team_dict.get(x, (None, None))[1])

# Fill missing values in the original columns with the new columns
dk_df['Opponent'].fillna(dk_df['Opponent_filled'], inplace=True)
dk_df['home_team'].fillna(dk_df['home_team_filled'], inplace=True)

# Drop the temporary filled columns
dk_df.drop(columns=['Opponent_filled', 'home_team_filled'], inplace=True)

In [53]:
dk_df = dk_df[['ID', 'Name', 'Position', 'Salary', 'Team', 'Opponent', 'home_team', 'Status']]
fd_df = fd_df[['ID', 'Name', 'Position', 'Salary', 'Team', 'Opponent', 'home_team', 'Status']]

In [54]:
#Adding date and week indicator columns
fd_df['Date'] = pd.to_datetime(date_string).strftime('%m-%d-%Y')
dk_df['Date'] = pd.to_datetime(date_string).strftime('%m-%d-%Y')
fd_df['Week'] = week
dk_df['Week'] = week

In [55]:
fd_df.head()

Unnamed: 0,ID,Name,Position,Salary,Team,Opponent,home_team,Status,Date,Week
0,108151-85671,Justin Jefferson,WR,9400,MIN,DET,1,,10-20-2024,7
1,108151-85701,Ja'Marr Chase,WR,9300,CIN,CLE,0,,10-20-2024,7
2,108151-62239,Josh Allen,QB,9200,BUF,TEN,1,,10-20-2024,7
3,108151-102785,Jayden Daniels,QB,9100,WAS,CAR,1,,10-20-2024,7
4,108151-64401,Saquon Barkley,RB,9000,PHI,NYG,0,,10-20-2024,7


In [56]:
dk_df.head()

Unnamed: 0,ID,Name,Position,Salary,Team,Opponent,home_team,Status,Date,Week
0,36291011,Justin Jefferson,WR,8500,MIN,DET,1.0,,10-20-2024,7
1,36291013,Ja'Marr Chase,WR,8400,CIN,CLE,0.0,,10-20-2024,7
2,36291015,Amon-Ra St Brown,WR,8300,DET,MIN,0.0,,10-20-2024,7
3,36290791,Saquon Barkley,RB,8200,PHI,NYG,0.0,,10-20-2024,7
4,36290793,Kyren Williams,RB,8100,LAR,LV,1.0,,10-20-2024,7


In [57]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 498 entries, 0 to 497
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   ID         498 non-null    int64  
 1   Name       498 non-null    object 
 2   Position   498 non-null    object 
 3   Salary     498 non-null    int64  
 4   Team       498 non-null    object 
 5   Opponent   498 non-null    object 
 6   home_team  498 non-null    float64
 7   Status     86 non-null     object 
 8   Date       498 non-null    object 
 9   Week       498 non-null    int64  
dtypes: float64(1), int64(3), object(6)
memory usage: 39.0+ KB


In [58]:
fd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 725 entries, 0 to 724
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   ID         725 non-null    object
 1   Name       725 non-null    object
 2   Position   725 non-null    object
 3   Salary     725 non-null    int64 
 4   Team       725 non-null    object
 5   Opponent   725 non-null    object
 6   home_team  725 non-null    int32 
 7   Status     134 non-null    object
 8   Date       725 non-null    object
 9   Week       725 non-null    int64 
dtypes: int32(1), int64(2), object(7)
memory usage: 53.9+ KB


In [59]:
dk_df[dk_df[['Opponent', 'home_team']].isnull().any(axis = 1)]

Unnamed: 0,ID,Name,Position,Salary,Team,Opponent,home_team,Status,Date,Week


In [60]:
#Drops all free agents since they can't be rostered
dk_df = dk_df[dk_df['Team'] != 'FA']
fd_df = fd_df[fd_df['Team'] != 'FA']

In [61]:
#Filling Status missing values with Active, which we'll assume since there is no injury designation.
dk_df['Status'].fillna('Active', inplace=True)
fd_df['Status'].fillna('Active', inplace = True)

In [62]:
dk_df['Position'] = dk_df['Position'].replace({'DST':'D'})
dk_df['home_team'] = dk_df['home_team'].astype('int')
dk_df.reset_index(drop = True, inplace = True)

In [63]:
# Converting team names in opponent columns for FD and DK
fd_df['Name'] = fd_df['Name'].apply(convert_team_names)

In [64]:
dk_df['Name'] = dk_df['Name'].apply(convert_team_names)

In [65]:
fd_df.rename(columns = {'Name': 'name', 'Position': 'position', 'Salary': 'salary', 'Team': 'team', 'Opponent': 'opponent',\
                       'Status': 'status', 'Date': 'date', 'Week': 'week'}, inplace = True)

In [66]:
dk_df.rename(columns = {'Name': 'name', 'Position': 'position', 'Salary': 'salary', 'Team': 'team', 'Opponent': 'opponent',\
                       'Status': 'status', 'Date': 'date', 'Week': 'week'}, inplace = True)

In [67]:
fd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 725 entries, 0 to 724
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   ID         725 non-null    object
 1   name       725 non-null    object
 2   position   725 non-null    object
 3   salary     725 non-null    int64 
 4   team       725 non-null    object
 5   opponent   725 non-null    object
 6   home_team  725 non-null    int32 
 7   status     725 non-null    object
 8   date       725 non-null    object
 9   week       725 non-null    int64 
dtypes: int32(1), int64(2), object(7)
memory usage: 53.9+ KB


In [68]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 498 entries, 0 to 497
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   ID         498 non-null    int64 
 1   name       498 non-null    object
 2   position   498 non-null    object
 3   salary     498 non-null    int64 
 4   team       498 non-null    object
 5   opponent   498 non-null    object
 6   home_team  498 non-null    int32 
 7   status     498 non-null    object
 8   date       498 non-null    object
 9   week       498 non-null    int64 
dtypes: int32(1), int64(3), object(6)
memory usage: 37.1+ KB


In [69]:
fd_df.head()

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
0,108151-85671,Justin Jefferson,WR,9400,MIN,DET,1,Active,10-20-2024,7
1,108151-85701,Ja'Marr Chase,WR,9300,CIN,CLE,0,Active,10-20-2024,7
2,108151-62239,Josh Allen,QB,9200,BUF,TEN,1,Active,10-20-2024,7
3,108151-102785,Jayden Daniels,QB,9100,WAS,CAR,1,Active,10-20-2024,7
4,108151-64401,Saquon Barkley,RB,9000,PHI,NYG,0,Active,10-20-2024,7


In [70]:
fd_df[fd_df['name'] == 'CJ Stroud']

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
13,108151-129471,CJ Stroud,QB,8400,HOU,GB,0,Active,10-20-2024,7


In [71]:
fd_df[fd_df['name'].isin(['Davante Adams', 'Amari Cooper', 'Cam Akers'])]

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
60,108151-45889,Davante Adams,WR,6900,NYJ,PIT,0,Q,10-20-2024,7
87,108151-31001,Amari Cooper,WR,6100,BUF,TEN,1,Active,10-20-2024,7
223,108151-89230,Cam Akers,RB,5000,MIN,DET,1,Active,10-20-2024,7


In [72]:
dk_df[dk_df['name'].isin(['Davante Adams', 'Amari Cooper', 'Cam Akers'])]

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
22,36291031,Davante Adams,WR,7100,LV,LAR,0,Active,10-20-2024,7
69,36291065,Amari Cooper,WR,5800,BUF,TEN,1,Active,10-20-2024,7
144,36290891,Cam Akers,RB,4500,HOU,GB,0,Active,10-20-2024,7


In [73]:
#One-offs for traded players
dk_df = dk_df[dk_df['name'] != 'Davante Adams']

In [74]:
dk_df.loc[dk_df['name'] == 'Cam Akers', 'team'] = 'MIN'
dk_df.loc[dk_df['name'] == 'Cam Akers', 'opponent'] = 'DET'
dk_df.loc[dk_df['name'] == 'Cam Akers', 'home_team'] = 1

In [75]:
dk_df[dk_df['position'] == 'QB']

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
9,36290724,Jayden Daniels,QB,7600,WAS,CAR,1,Active,10-20-2024,7
11,36290725,Josh Allen,QB,7500,BUF,TEN,1,Active,10-20-2024,7
17,36290726,Jalen Hurts,QB,7300,PHI,NYG,0,Active,10-20-2024,7
21,36290727,CJ Stroud,QB,7100,HOU,GB,0,Active,10-20-2024,7
24,36290728,Patrick Mahomes,QB,7000,KC,SF,0,Active,10-20-2024,7
...,...,...,...,...,...,...,...,...,...,...
229,36290770,Sam Hartman,QB,4000,WAS,CAR,1,Active,10-20-2024,7
236,36290763,Joshua Dobbs,QB,4000,SF,KC,1,Active,10-20-2024,7
237,36290765,Brandon Allen,QB,4000,SF,KC,1,Active,10-20-2024,7
238,36290766,Tanner Mordecai,QB,4000,SF,KC,1,Active,10-20-2024,7


In [76]:
dk_df.tail()

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
493,36291481,Jody Fortson,TE,2500,KC,SF,0,Active,10-20-2024,7
494,36291483,Peyton Hendershot,TE,2500,KC,SF,0,Active,10-20-2024,7
495,36291485,James Winchester,TE,2500,KC,SF,0,Active,10-20-2024,7
496,36291631,CAR,D,2400,CAR,WAS,0,Active,10-20-2024,7
497,36291632,TEN,D,2300,TEN,BUF,0,Active,10-20-2024,7


In [77]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 497 entries, 0 to 497
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   ID         497 non-null    int64 
 1   name       497 non-null    object
 2   position   497 non-null    object
 3   salary     497 non-null    int64 
 4   team       497 non-null    object
 5   opponent   497 non-null    object
 6   home_team  497 non-null    int32 
 7   status     497 non-null    object
 8   date       497 non-null    object
 9   week       497 non-null    int64 
dtypes: int32(1), int64(3), object(6)
memory usage: 40.8+ KB


In [78]:
# fd_names = set(list(fd_df['name']))

In [79]:
# dk_names = set(list(dk_df['name']))

In [80]:
# fd_not_in_dk = list(fd_names.difference(dk_names))

In [81]:
# dk_not_in_fd = list(dk_names.difference(fd_names))

In [82]:
# dk_names.difference(fd_names)

In [83]:
# fd_names.difference(dk_names)

In [84]:
#It seems as though FanDuel just naturally lists more available players than DraftKings.
#FanDuel now includes Sunday night games in its main slate, but that alone doesn't account
#for all the names in the FanDuel list that aren't in the DraftKings list.
#What really matters is whether these names for both sites match up 
#with the way the names are written in the nfl_data_py package

# Adding to database
This will put the data we've just wrangled into our database, and it will be joined with the features used to make predictions.

In [85]:
import sqlite3

# Establish the connection to the database
conn = sqlite3.connect('nfl_dfs.db')

# Define the table names for FanDuel and DraftKings
fd_table_name = 'fd_table_' + str(week) + '_24'
dk_table_name = 'dk_table_' + str(week) + '_24'

# Specify data types for the FanDuel dataframe
fd_dtype = {
    'ID': 'TEXT',
    'Name': 'TEXT',
    'Position': 'TEXT',
    'Salary': 'INTEGER',
    'Team': 'TEXT',
    'Opponent': 'TEXT',
    'home_team': 'INTEGER',
    'Status': 'TEXT',
    'Date': 'TEXT',
    'Week': 'INTEGER'
}

# Write the FanDuel dataframe to the SQLite table
fd_df.to_sql(fd_table_name, conn, if_exists='replace', index=False, dtype=fd_dtype)

# Confirm that the FanDuel data has been written
print(f"Data written to table {fd_table_name} in SQLite database nfl_dfs.db")

# Specify data types for the DraftKings dataframe
dk_dtype = {
    'ID': 'INTEGER',
    'Name': 'TEXT',
    'Position': 'TEXT',
    'Salary': 'INTEGER',
    'Team': 'TEXT',
    'Opponent': 'TEXT',
    'home_team': 'INTEGER',
    'Status': 'TEXT',
    'Date': 'TEXT',
    'Week': 'INTEGER'
}

# Write the DraftKings dataframe to the SQLite table
dk_df.to_sql(dk_table_name, conn, if_exists='replace', index=False, dtype=dk_dtype)

# Confirm that the DraftKings data has been written
print(f"Data written to table {dk_table_name} in SQLite database nfl_dfs.db")

# Close the connection
conn.close()


Data written to table fd_table_7_24 in SQLite database nfl_dfs.db
Data written to table dk_table_7_24 in SQLite database nfl_dfs.db


In [86]:
# fd_df.to_csv('FD_' + date_string + '_prep.csv', index = False)
# dk_df.to_csv('DK_' + date_string + '_prep.csv', index = False)