# Processing FanDuel and DraftKings CSVs
This notebook, the first step in the live deployment of our model, takes the player lists and processes them into a format that can be merged with the data that will be used for the model.<br>

This notebook can be repeated at various points during the week to update injury status, as that information is included in the FanDuel csv and will be transferred to the DraftKings csv.

In [79]:
import numpy as np
import pandas as pd
#from datetime import datetime, timedelta
import os
from functions import get_current_weekday, calculate_nfl_week, get_next_sunday, get_current_year

In [80]:
import re

# Automating datestring
This function gets the date string we need to read the CSVs. It returns the date of the next Sunday. Then the following function uses the date to get the week number of the season.

In [81]:
day = get_current_weekday()

In [82]:
date_string = get_next_sunday(day)

In [83]:
date_string

'2024-10-06'

In [84]:
week = calculate_nfl_week(date_string)

In [85]:
season = get_current_year()

# Renaming CSVs
So that we can just download the CSVs and move them into our directory without renaming them, the code below will rename the FanDuel and DraftKings player lists.<br>

**This requires a little bit of housekeeping** as we first need to delete the previous week's FanDuel and DraftKings CSVs before downloading the new ones. Yes, it would be easier to automate the process of deleting the CSVs and then downloading the new ones, but we don't want to do any scraping on the FanDuel and DraftKings sites. If we do too much of it, we're blocked from playing.

In [86]:
# Get the list of files in the current directory
files = os.listdir()

In [87]:
# Loop through each file in the directory
for file_name in files:
    if file_name.startswith("FanDuel"):
        # Read the FanDuel CSV file
        fd_df = pd.read_csv(file_name)
        
        # Create the new filename
        new_fd_filename = f"FD_{date_string}_list.csv"
        
        # Save the DataFrame with the new filename
        fd_df.to_csv(new_fd_filename, index=False)
        
        print(f"Renamed FanDuel CSV to: {new_fd_filename}")

        # Delete the original FanDuel CSV file
        os.remove(file_name)
        print(f"Deleted original FanDuel CSV: {file_name}")
    
    elif file_name.startswith("DKSalaries"):
        # Read the DK CSV file
        dk_df = pd.read_csv(file_name)
        
        # Create the new filename
        new_dk_filename = f"DK_{date_string}_list.csv"
        
        # Save the DataFrame with the new filename
        dk_df.to_csv(new_dk_filename, index=False)
        
        print(f"Renamed DK CSV to: {new_dk_filename}")

        # Delete the original FanDuel CSV file
        os.remove(file_name)
        print(f"Deleted original FanDuel CSV: {file_name}")

Renamed DK CSV to: DK_2024-10-06_list.csv
Deleted original FanDuel CSV: DKSalaries (14).csv
Renamed FanDuel CSV to: FD_2024-10-06_list.csv
Deleted original FanDuel CSV: FanDuel-NFL-2024 ET-10 ET-06 ET-107566-players-list.csv


# Reading and processing FanDuel csv

In [88]:
fd_df = pd.read_csv('FD_' + date_string + '_list.csv')

In [89]:
fd_df.rename(columns = {'Nickname': 'Name'}, inplace = True)

In [90]:
fd_df = fd_df[['Id', 'Position', 'First Name', 'Name', 'Last Name', 'Salary', 'Game', 'Team', 'Opponent', 'Injury Indicator']]

In [91]:
fd_df[['Away_Team', 'Home_Team']] = fd_df['Game'].str.split('@', expand=True)

In [92]:
fd_df.drop(columns = ['Game'], inplace = True)

In [93]:
fd_df['home_team'] = np.where(fd_df['Team'] == fd_df['Home_Team'], 1, 0)

In [94]:
fd_df.drop(columns = ['Home_Team', 'Away_Team'], inplace = True)

In [95]:
#Replace JAC with JAX so that team abbreviations for Jaguars match
fd_df['Team'] = fd_df['Team'].replace({'JAC' : 'JAX'})
fd_df['Opponent'] = fd_df['Opponent'].replace({'JAC' : 'JAX'})

In [96]:
fd_df.head()

Unnamed: 0,Id,Position,First Name,Name,Last Name,Salary,Team,Opponent,Injury Indicator,home_team
0,107566-86631,WR,CeeDee,CeeDee Lamb,Lamb,9400,DAL,PIT,,0
1,107566-85701,WR,Ja'Marr,Ja'Marr Chase,Chase,9300,CIN,BAL,,1
2,107566-62239,QB,Josh,Josh Allen,Allen,9300,BUF,HOU,,0
3,107566-39280,RB,Derrick,Derrick Henry,Henry,9200,BAL,CIN,,0
4,107566-91419,WR,Nico,Nico Collins,Collins,8800,HOU,BUF,,1


# Reading and processing DraftKings csv

In [97]:
dk_df = pd.read_csv('DK_' + date_string + '_list.csv')

In [98]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 495 entries, 0 to 494
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Position          495 non-null    object 
 1   Name + ID         495 non-null    object 
 2   Name              495 non-null    object 
 3   ID                495 non-null    int64  
 4   Roster Position   495 non-null    object 
 5   Salary            495 non-null    int64  
 6   Game Info         495 non-null    object 
 7   TeamAbbrev        495 non-null    object 
 8   AvgPointsPerGame  495 non-null    float64
dtypes: float64(1), int64(2), object(6)
memory usage: 34.9+ KB


In [99]:
##We'll need to match FD and DK names so we can merge FD injury designations to DK.
#Then hopefully the names we have will match the names used for the model.
#Then we'll pare down the DK columns, create a mini df for home teams and merge that with DK
#The goal of this notebook should be to have FD and DK dataframes that include:
#full name, salary, position, team, opponent, home team (binary), injury designation, id (just in case)

This dictionary and function will be used later to convert team names to abbrevation strings

In [100]:
nfl_teams = [
    'Bills', 'Dolphins', 'Patriots', 'Jets',
    'Bengals', 'Browns', 'Ravens', 'Steelers',
    'Texans', 'Colts', 'Jaguars', 'Titans',
    'Broncos', 'Chiefs', 'Raiders', 'Chargers',
    'Cowboys', 'Giants', 'Eagles', 'Commanders',
    'Bears', 'Lions', 'Packers', 'Vikings',
    'Falcons', 'Panthers', 'Saints', 'Buccaneers',
    'Cardinals', 'Rams', '49ers', 'Seahawks'
]

In [101]:
team_abbreviations = [
    'BUF', 'MIA', 'NE', 'NYJ',
    'CIN', 'CLE', 'BAL', 'PIT',
    'HOU', 'IND', 'JAX', 'TEN',
    'DEN', 'KC', 'LV', 'LAC',
    'DAL', 'NYG', 'PHI', 'WAS',
    'CHI', 'DET', 'GB', 'MIN',
    'ATL', 'CAR', 'NO', 'TB',
    'ARI', 'LAR', 'SF', 'SEA'
]

In [102]:
# Creating the dictionary
team_abbrev = dict(zip(nfl_teams, team_abbreviations))

In [103]:
team_abbrev

{'Bills': 'BUF',
 'Dolphins': 'MIA',
 'Patriots': 'NE',
 'Jets': 'NYJ',
 'Bengals': 'CIN',
 'Browns': 'CLE',
 'Ravens': 'BAL',
 'Steelers': 'PIT',
 'Texans': 'HOU',
 'Colts': 'IND',
 'Jaguars': 'JAX',
 'Titans': 'TEN',
 'Broncos': 'DEN',
 'Chiefs': 'KC',
 'Raiders': 'LV',
 'Chargers': 'LAC',
 'Cowboys': 'DAL',
 'Giants': 'NYG',
 'Eagles': 'PHI',
 'Commanders': 'WAS',
 'Bears': 'CHI',
 'Lions': 'DET',
 'Packers': 'GB',
 'Vikings': 'MIN',
 'Falcons': 'ATL',
 'Panthers': 'CAR',
 'Saints': 'NO',
 'Buccaneers': 'TB',
 'Cardinals': 'ARI',
 'Rams': 'LAR',
 '49ers': 'SF',
 'Seahawks': 'SEA'}

In [104]:
# Function to convert team names to abbreviations, leaving other values unchanged
def convert_team_names(opponent):
    return team_abbrev.get(opponent, opponent)

In [105]:
fd_def = fd_df[fd_df['Position'] == 'D']

# Name matching
Name matching will be a chore. Let's start with defense, where DK uses just team name while FD uses city and name. We'll change it to just names in FD.<br>

The following line of code takes the city names off defense for FD.

In [106]:
fd_df['Name'] = np.where(fd_df['Position'] == 'D', fd_df['Last Name'], fd_df['Name'])

In [107]:
dk_df['Name'] = dk_df['Name'].str.strip()

# Merging columns to DK
We'll merge Opponent, home_team (binary) and Status (injury) to the DK dataframe. But first, we have to take suffixes off all names and then match the names.

In [108]:
fd_df.rename(columns = {'Injury Indicator': 'Status', 'Id': 'ID'}, inplace = True)

In [109]:
dk_df.rename(columns = {'TeamAbbrev': 'Team'}, inplace = True)

In [110]:
fd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 713 entries, 0 to 712
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   ID          713 non-null    object
 1   Position    713 non-null    object
 2   First Name  713 non-null    object
 3   Name        713 non-null    object
 4   Last Name   713 non-null    object
 5   Salary      713 non-null    int64 
 6   Team        713 non-null    object
 7   Opponent    713 non-null    object
 8   Status      128 non-null    object
 9   home_team   713 non-null    int32 
dtypes: int32(1), int64(1), object(8)
memory usage: 53.0+ KB


In [111]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 495 entries, 0 to 494
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Position          495 non-null    object 
 1   Name + ID         495 non-null    object 
 2   Name              495 non-null    object 
 3   ID                495 non-null    int64  
 4   Roster Position   495 non-null    object 
 5   Salary            495 non-null    int64  
 6   Game Info         495 non-null    object 
 7   Team              495 non-null    object 
 8   AvgPointsPerGame  495 non-null    float64
dtypes: float64(1), int64(2), object(6)
memory usage: 34.9+ KB


In [112]:
def clean_name(name):
    # Remove periods between initials like C.J., D.J. (case-sensitive)
    name = re.sub(r'\b([A-Z])\.\s*([A-Z])\.\b', r'\1\2', name)
    
    # Remove common suffixes like Jr., Sr., III, II, IV (case-sensitive)
    cleaned_name = re.sub(r'(\,|\.|Sr|Jr|III|II|IV)', '', name).strip()
    
    return cleaned_name

In [113]:
fd_df['Name'] = fd_df['Name'].apply(clean_name)
dk_df['Name'] = dk_df['Name'].apply(clean_name)

In [114]:
from rapidfuzz import process, fuzz

In [115]:
SIMILARITY_THRESHOLD = 80

In [116]:
def fuzzy_match(name, dk_names):
    match, score, _ = process.extractOne(name, dk_names, scorer=fuzz.token_sort_ratio)
    return match if score >= SIMILARITY_THRESHOLD else None

In [117]:
dk_names = dk_df['Name']

In [118]:
len(dk_names)

495

This checks for names in DraftKings that have a matching score of at least 80 with names in FanDuel, and adds a column to the FanDuel dataframe.

In [119]:
fd_df['matched_name'] = fd_df['Name'].apply(lambda x: fuzzy_match(x, dk_names))

Now we filter the dataframe for FanDuel names that don't have DraftKings matches.

In [120]:
unmatched_in_fd = fd_df[fd_df['matched_name'].isna()]

We filter out the Sunday night teams since DraftKings does not include the Sunday night game in the main slate.

In [121]:
unmatched_in_fd = unmatched_in_fd[~((unmatched_in_fd['Team'] == 'PIT') | (unmatched_in_fd['Team'] == 'DAL'))]

In [122]:
unmatched_in_fd.reset_index(inplace = True, drop = True)

In [123]:
unmatched_in_fd.sort_values(by = ['Salary'], ascending = False)

Unnamed: 0,ID,Position,First Name,Name,Last Name,Salary,Team,Opponent,Status,home_team,matched_name
0,107566-86081,QB,Emory,Emory Jones,Jones,6000,BAL,CIN,,0,
1,107566-114763,QB,Gavin,Gavin Hardison,Hardison,6000,MIA,NE,,0,
2,107566-88147,QB,Chevan,Chevan Cordeiro,Cordeiro,6000,SEA,NYG,,1,
3,107566-54507,QB,Alex,Alex McGough,McGough,6000,GB,LAR,Q,0,
4,107566-185640,QB,Dresser,Dresser Winn,Winn,6000,LAR,GB,,1,
...,...,...,...,...,...,...,...,...,...,...,...
53,107566-114039,WR,Peter,Peter LeBlanc,LeBlanc,4000,CHI,CAR,,1,
52,107566-90483,TE,Bernhard,Bernhard Raimann,Raimann,4000,IND,JAX,,0,
51,107566-68975,TE,Josh,Josh Pederson,Pederson,4000,JAX,IND,Q,1,
50,107566-63851,TE,Tyree,Tyree Jackson,Jackson,4000,NYG,SEA,Q,0,


Since 4000 is the min FLEX salary on FanDuel and 6000 is the min QB salary, most unmatched names with those salaries are probably players who wouldn't be rostered anyway.

In [124]:
unmatched_in_fd['Salary'].value_counts()

Salary
4000    131
6000     16
4200      1
Name: count, dtype: int64

Now we check for names with a matching score of 80 or more, and see if it's the same player.

In [125]:
fd_df[(fd_df['Name'] != fd_df['matched_name']) & (fd_df['matched_name'].notna())]

Unnamed: 0,ID,Position,First Name,Name,Last Name,Salary,Team,Opponent,Status,home_team,matched_name
235,107566-133477,TE,Darnell,Darnell Washington,Washington,4800,PIT,DAL,,1,Parker Washington
341,107566-92166,WR,Marcus,Marcus Jones,Jones,4000,NE,MIA,Q,1,Mac Jones
365,107566-112737,WR,Isaiah,Isaiah Washington,Washington,4000,BAL,CIN,Q,0,Tahj Washington
431,107566-185639,WR,Keilahn,Keilahn Harris,Harris,4000,PIT,DAL,Q,1,Kevin Harris
441,107566-90456,WR,Ty,Ty Scott,Scott,4000,SEA,NYG,,1,Tyler Scott
494,107566-109008,WR,Jalen,Jalen Cropper,Cropper,4000,DAL,PIT,,0,Jalen Coker
614,107566-26355,TE,Logan,Logan Thomas,Thomas,4000,SF,ARI,,1,Ian Thomas


In [126]:
fd_merge = fd_df[['Name', 'Team', 'Opponent', 'home_team', 'Status']]

In [127]:
dk_df = pd.merge(dk_df, fd_merge, on = ['Name', 'Team'], how = 'left')

Now we still have to fill missing Opponent and home_team values in DK for the players not in the FD list.

In [128]:
team_dict = {team: (group['Opponent'].values[0], group['home_team'].values[0]) 
             for team, group in fd_df.groupby('Team')}

In [129]:
dk_df['Opponent_filled'] = dk_df['Team'].map(lambda x: team_dict.get(x, (None, None))[0])
dk_df['home_team_filled'] = dk_df['Team'].map(lambda x: team_dict.get(x, (None, None))[1])

# Fill missing values in the original columns with the new columns
dk_df['Opponent'].fillna(dk_df['Opponent_filled'], inplace=True)
dk_df['home_team'].fillna(dk_df['home_team_filled'], inplace=True)

# Drop the temporary filled columns
dk_df.drop(columns=['Opponent_filled', 'home_team_filled'], inplace=True)

In [130]:
dk_df = dk_df[['ID', 'Name', 'Position', 'Salary', 'Team', 'Opponent', 'home_team', 'Status']]
fd_df = fd_df[['ID', 'Name', 'Position', 'Salary', 'Team', 'Opponent', 'home_team', 'Status']]

In [131]:
#Adding date and week indicator columns
fd_df['Date'] = pd.to_datetime(date_string).strftime('%m-%d-%Y')
dk_df['Date'] = pd.to_datetime(date_string).strftime('%m-%d-%Y')
fd_df['Week'] = week
dk_df['Week'] = week

In [132]:
fd_df.head()

Unnamed: 0,ID,Name,Position,Salary,Team,Opponent,home_team,Status,Date,Week
0,107566-86631,CeeDee Lamb,WR,9400,DAL,PIT,0,,10-06-2024,5
1,107566-85701,Ja'Marr Chase,WR,9300,CIN,BAL,1,,10-06-2024,5
2,107566-62239,Josh Allen,QB,9300,BUF,HOU,0,,10-06-2024,5
3,107566-39280,Derrick Henry,RB,9200,BAL,CIN,0,,10-06-2024,5
4,107566-91419,Nico Collins,WR,8800,HOU,BUF,1,,10-06-2024,5


In [133]:
dk_df.head()

Unnamed: 0,ID,Name,Position,Salary,Team,Opponent,home_team,Status,Date,Week
0,36142122,Ja'Marr Chase,WR,8000,CIN,BAL,1,,10-06-2024,5
1,36142124,Cooper Kupp,WR,7900,LAR,GB,1,O,10-06-2024,5
2,36141894,Derrick Henry,RB,7800,BAL,CIN,0,,10-06-2024,5
3,36141829,Josh Allen,QB,7700,BUF,HOU,0,,10-06-2024,5
4,36142126,Nico Collins,WR,7700,HOU,BUF,1,,10-06-2024,5


In [134]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 495 entries, 0 to 494
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   ID         495 non-null    int64 
 1   Name       495 non-null    object
 2   Position   495 non-null    object
 3   Salary     495 non-null    int64 
 4   Team       495 non-null    object
 5   Opponent   495 non-null    object
 6   home_team  495 non-null    int32 
 7   Status     77 non-null     object
 8   Date       495 non-null    object
 9   Week       495 non-null    int64 
dtypes: int32(1), int64(3), object(6)
memory usage: 36.9+ KB


In [135]:
fd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 713 entries, 0 to 712
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   ID         713 non-null    object
 1   Name       713 non-null    object
 2   Position   713 non-null    object
 3   Salary     713 non-null    int64 
 4   Team       713 non-null    object
 5   Opponent   713 non-null    object
 6   home_team  713 non-null    int32 
 7   Status     128 non-null    object
 8   Date       713 non-null    object
 9   Week       713 non-null    int64 
dtypes: int32(1), int64(2), object(7)
memory usage: 53.0+ KB


In [136]:
dk_df[dk_df[['Opponent', 'home_team']].isnull().any(axis = 1)]

Unnamed: 0,ID,Name,Position,Salary,Team,Opponent,home_team,Status,Date,Week


In [137]:
#Drops all free agents since they can't be rostered
dk_df = dk_df[dk_df['Team'] != 'FA']
fd_df = fd_df[fd_df['Team'] != 'FA']

In [138]:
#Filling Status missing values with Active, which we'll assume since there is no injury designation.
dk_df['Status'].fillna('Active', inplace=True)
fd_df['Status'].fillna('Active', inplace = True)

In [139]:
dk_df['Position'] = dk_df['Position'].replace({'DST':'D'})
dk_df['home_team'] = dk_df['home_team'].astype('int')
dk_df.reset_index(drop = True, inplace = True)

In [140]:
# Converting team names in opponent columns for FD and DK
fd_df['Name'] = fd_df['Name'].apply(convert_team_names)

In [141]:
dk_df['Name'] = dk_df['Name'].apply(convert_team_names)

In [142]:
fd_df.rename(columns = {'Name': 'name', 'Position': 'position', 'Salary': 'salary', 'Team': 'team', 'Opponent': 'opponent',\
                       'Status': 'status', 'Date': 'date', 'Week': 'week'}, inplace = True)

In [143]:
dk_df.rename(columns = {'Name': 'name', 'Position': 'position', 'Salary': 'salary', 'Team': 'team', 'Opponent': 'opponent',\
                       'Status': 'status', 'Date': 'date', 'Week': 'week'}, inplace = True)

In [144]:
fd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 713 entries, 0 to 712
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   ID         713 non-null    object
 1   name       713 non-null    object
 2   position   713 non-null    object
 3   salary     713 non-null    int64 
 4   team       713 non-null    object
 5   opponent   713 non-null    object
 6   home_team  713 non-null    int32 
 7   status     713 non-null    object
 8   date       713 non-null    object
 9   week       713 non-null    int64 
dtypes: int32(1), int64(2), object(7)
memory usage: 53.0+ KB


In [145]:
dk_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 495 entries, 0 to 494
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   ID         495 non-null    int64 
 1   name       495 non-null    object
 2   position   495 non-null    object
 3   salary     495 non-null    int64 
 4   team       495 non-null    object
 5   opponent   495 non-null    object
 6   home_team  495 non-null    int32 
 7   status     495 non-null    object
 8   date       495 non-null    object
 9   week       495 non-null    int64 
dtypes: int32(1), int64(3), object(6)
memory usage: 36.9+ KB


In [146]:
fd_df.head()

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
0,107566-86631,CeeDee Lamb,WR,9400,DAL,PIT,0,Active,10-06-2024,5
1,107566-85701,Ja'Marr Chase,WR,9300,CIN,BAL,1,Active,10-06-2024,5
2,107566-62239,Josh Allen,QB,9300,BUF,HOU,0,Active,10-06-2024,5
3,107566-39280,Derrick Henry,RB,9200,BAL,CIN,0,Active,10-06-2024,5
4,107566-91419,Nico Collins,WR,8800,HOU,BUF,1,Active,10-06-2024,5


In [158]:
dk_df[dk_df['position'] == 'QB']

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
3,36141829,Josh Allen,QB,7700,BUF,HOU,0,Active,10-06-2024,5
8,36141830,Lamar Jackson,QB,7500,BAL,CIN,0,Active,10-06-2024,5
13,36141831,CJ Stroud,QB,7200,HOU,BUF,1,Active,10-06-2024,5
20,36141832,Jayden Daniels,QB,6800,WAS,CLE,1,Active,10-06-2024,5
24,36141833,Kyler Murray,QB,6700,ARI,SF,0,Active,10-06-2024,5
...,...,...,...,...,...,...,...,...,...,...
224,36141876,Clayton Tune,QB,4000,ARI,SF,0,Active,10-06-2024,5
232,36141871,Jimmy Garoppolo,QB,4000,LAR,GB,1,Active,10-06-2024,5
233,36141872,Stetson Bennett,QB,4000,LAR,GB,1,Active,10-06-2024,5
234,36141873,Sean Clifford,QB,4000,GB,LAR,0,Active,10-06-2024,5


In [148]:
dk_df.tail()

Unnamed: 0,ID,name,position,salary,team,opponent,home_team,status,date,week
490,36142600,Jakob Johnson,TE,2500,NYG,SEA,0,Active,10-06-2024,5
491,36142602,Joel Wilson,TE,2500,NYG,SEA,0,Active,10-06-2024,5
492,36142731,CAR,D,2400,CAR,CHI,0,Active,10-06-2024,5
493,36142732,NYG,D,2400,NYG,SEA,0,Active,10-06-2024,5
494,36142733,ARI,D,2300,ARI,SF,0,Active,10-06-2024,5


In [149]:
# fd_names = set(list(fd_df['name']))

In [150]:
# dk_names = set(list(dk_df['name']))

In [151]:
# fd_not_in_dk = list(fd_names.difference(dk_names))

In [152]:
# dk_not_in_fd = list(dk_names.difference(fd_names))

In [153]:
# dk_names.difference(fd_names)

In [154]:
# fd_names.difference(dk_names)

In [155]:
#It seems as though FanDuel just naturally lists more available players than DraftKings.
#FanDuel now includes Sunday night games in its main slate, but that alone doesn't account
#for all the names in the FanDuel list that aren't in the DraftKings list.
#What really matters is whether these names for both sites match up 
#with the way the names are written in the nfl_data_py package

# Adding to database
This will put the data we've just wrangled into our database, and it will be joined with the features used to make predictions.

In [156]:
import sqlite3

# Establish the connection to the database
conn = sqlite3.connect('nfl_dfs.db')

# Define the table names for FanDuel and DraftKings
fd_table_name = 'fd_table_' + str(week) + '_24'
dk_table_name = 'dk_table_' + str(week) + '_24'

# Specify data types for the FanDuel dataframe
fd_dtype = {
    'ID': 'TEXT',
    'Name': 'TEXT',
    'Position': 'TEXT',
    'Salary': 'INTEGER',
    'Team': 'TEXT',
    'Opponent': 'TEXT',
    'home_team': 'INTEGER',
    'Status': 'TEXT',
    'Date': 'TEXT',
    'Week': 'INTEGER'
}

# Write the FanDuel dataframe to the SQLite table
fd_df.to_sql(fd_table_name, conn, if_exists='replace', index=False, dtype=fd_dtype)

# Confirm that the FanDuel data has been written
print(f"Data written to table {fd_table_name} in SQLite database nfl_dfs.db")

# Specify data types for the DraftKings dataframe
dk_dtype = {
    'ID': 'INTEGER',
    'Name': 'TEXT',
    'Position': 'TEXT',
    'Salary': 'INTEGER',
    'Team': 'TEXT',
    'Opponent': 'TEXT',
    'home_team': 'INTEGER',
    'Status': 'TEXT',
    'Date': 'TEXT',
    'Week': 'INTEGER'
}

# Write the DraftKings dataframe to the SQLite table
dk_df.to_sql(dk_table_name, conn, if_exists='replace', index=False, dtype=dk_dtype)

# Confirm that the DraftKings data has been written
print(f"Data written to table {dk_table_name} in SQLite database nfl_dfs.db")

# Close the connection
conn.close()


Data written to table fd_table_5_24 in SQLite database nfl_dfs.db
Data written to table dk_table_5_24 in SQLite database nfl_dfs.db


In [157]:
# fd_df.to_csv('FD_' + date_string + '_prep.csv', index = False)
# dk_df.to_csv('DK_' + date_string + '_prep.csv', index = False)