# Training a Neural Network Algorithm for Fantasy Basketball Predictions

By: Matt Purvis

This project will perform data preparation and split historical fantasy basketball data into a training set, validation set and test set. It will then train a neural network for regression using the Keras library to minimize the error when predicting fantasy basketball scores on the validation set. Finally it will test the algorithm's performance on the test set. 

In [66]:
import pandas as pd
import numpy as np
import os 
from datetime import datetime, timedelta, date
from fuzzywuzzy import fuzz
import math
from collections import OrderedDict
import pickle
import time

# Read in the Data

In [67]:
start_time = time.time()

firstgame = pd.read_csv("C:\\Users\\v-mpurvis\\OneDrive\\Personal Files\\Fantasy Basketball\\Original Files\\draftkings_NBA_2021-01-22_players.csv")
injuries =  pd.read_csv("C:\\Users\\v-mpurvis\\OneDrive\\Personal Files\\Fantasy Basketball\\Historical Injuries\\master_injuries_twitter.csv")
common_lineups = pd.read_csv('C:\\Users\\v-mpurvis\\OneDrive\\Personal Files\\Fantasy Basketball\\DepthCharts\\NBA_Fantasy_Common_Lineups.csv')
depth_charts = pd.read_csv('C:\\Users\\v-mpurvis\\OneDrive\\Personal Files\\Fantasy Basketball\\DepthCharts\\NBA_Fantasy_Depth_Charts.csv')
Pace = pd.read_csv('C:\\Users\\v-mpurvis\\OneDrive\\Personal Files\\Fantasy Basketball\\Pace\\pace.csv')
usage = pd.read_excel('C:\\Users\\v-mpurvis\\OneDrive\\Personal Files\\Fantasy Basketball\\Usage\\BBM_PlayerRankings.xls')
usage = usage.sort_values(by = ['Team', 'USG'], axis = 0, ascending = False)
usage['rank'] = usage.groupby('Team').rank(method='first', ascending = False)
spreads = pd.read_csv('C:\\Users\\v-mpurvis\\OneDrive\\Personal Files\\Fantasy Basketball\\Historical Spread\\Hist_Spread.csv')
opp_ppg = pd.read_csv('C:\\Users\\v-mpurvis\\OneDrive\\Personal Files\\Fantasy Basketball\\Opponent Points per game\\Opp_Points_per_Game.csv')
df = pd.read_csv('C:\\Users\\v-mpurvis\\OneDrive\\Personal Files\\Fantasy Basketball\\Original Files\\masterdf.csv')
pd.set_option('mode.chained_assignment', None)
df

Unnamed: 0,Player,Inj,Likes,Pos,Team,Opp,Def v Pos,VegasPts,Salary,Floor,...,Opp Pts/G,Avg Opp Pa,Injured St,Avg Pts La,Avg Mins L,Overall Av,PPM Proj,Is Starter,Is Injured,SS Project
0,Anthony Davis,,,PF/C,LAL,vs LAC,13th,110.75,10000,51.4,...,,,,,,,,,,
1,Kevin Durant,,,PF,BKN,vs GSW,8th,121.00,9800,49.7,...,,,,,,,,,,
2,LeBron James,,,PG/SF,LAL,vs LAC,13th,110.75,9600,51.5,...,,,,,,,,,,
3,Stephen Curry,,,PG,GSW,@ BKN,7th,114.00,9300,49.8,...,,,,,,,,,,
4,Kawhi Leonard,,,SG/SF,LAC,@ LAL,10th,108.75,9100,49.7,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32849,Zach Collins,,,C,POR,vs HOU,21st,126.75,3000,0.0,...,,,,,,,,,,
32850,Derrick Jones Jr.,,,SF,POR,vs HOU,19th,126.75,3000,0.0,...,,,,,,,,,,
32851,Nassir Little,,,SF/PF,POR,vs HOU,19th,126.75,3000,0.1,...,,,,,,,,,,
32852,D.J. Wilson,,,C,HOU,@ POR,26th,111.75,3000,0.0,...,,,,,,,,,,


# Defining Functions to prep the data

In [68]:
# Will take the defense vs position column and transform it into a numerical column
def removeth(th):
    if('st' in th):
        th = int(th.replace('st',''))
    elif('nd' in th):
        th = int(th.replace('nd',''))
    elif('rd' in th):
        th = int(th.replace('rd',''))
    elif('th' in th):
        th = int(th.replace('th',''))
    return th

In [69]:
# Will create two lists, one for players and one for teams. Will also transform the Defense vs position column and create the
# homevaway column

def dostuff(df):
    players = []
    homevaway = []
    teams = []
    for index, row in df.iterrows():
        if(row['Player'] not in players):
            players.append(row['Player'])
        Opp = row['Opp']
        if('vs' in Opp):
            homevaway.append('home')
        else:
            homevaway.append('away')
        if(row['Team'] not in teams) :
            teams.append(row['Team'])
    df['home_or_away'] = homevaway
    df['DVP'] = df['Def v Pos'].apply(removeth)  
    return players, teams

In [70]:
# Will fetch the opp points per game data from the opp_ppg dataset and create a column in our master df

def getoppgcol(df):
    oppg = []
    for index, row in df.iterrows(): 
        home_or_away = row['home_or_away']  
        opp = row['Opp'].split(' ')[1] 
        d = opp_ppg[opp_ppg['Team'] == opp].reset_index()
        if(home_or_away == 'home'):
            d = d.loc[0,'awayppg']
            oppg.append(d)
        else:
            d = d.loc[0,'homeppg']
            oppg.append(d)
    df['oppg'] = oppg

In [71]:
# Uses the master file to create a dataframe for team schedules for all 30 teams in the league

def getteamscheduledf(teams):
    dfteamschedule = pd.DataFrame(columns = ['Team','Dates', 'Game'])
    for t in teams:
        tlist = []
        dates = []
        for index, row in df[df.Team==t].iterrows():
            if(row['Date'] not in dates):
                dates.append(row['Date'])
        games = list(range(1,len(dates) + 1))
        for i in dates:
            tlist.append(t)
        dff = pd.DataFrame()
        dff['Team'] = tlist
        dff['Dates'] = dates
        dff['Game'] = games
        dfteamschedule = pd.concat([dfteamschedule,dff])
    return dfteamschedule

In [72]:
# Creates a fuzzy matching function that uses the fuzzywuzzy library and handles exceptions

def getfuzzymatching(playerlist,fuzzedlist,unfuzzedlist):
    list1 = ['Bogdan Bogdanovic','Bojan Bogdanovic','Cole Anthony','Carmelo Anthony','Cam Johnson','James Johnson','Jordan Nwora','Jordan Ford'
         ,'Zion Williamson', 'Lou Williams', 'Marvin Williams', 'Mikal Bridges','Miles Bridges', 'Jalen McDaniels','Jaden McDaniels', 'Derrick Jones Jr.'
         ,'Derrick Rose','Derrick White', 'Patrick Mills', 'Patrick Williams','James Ennis', 'James Harden','Justin James','Justin Patton'
         ,'Anthony Gill', 'Anthony Davis','Jordan Bone', 'Jordan Bell']
    for p in playerlist:
        #for p in players:
        for i in fuzzedlist:
            if(fuzz.partial_ratio(i, p) > 80):
                if( i not in unfuzzedlist and i not in list1 and p not in list1 and p not in unfuzzedlist):
                    unfuzzedlist.append(p)
                elif(i == 'Bogdan Bogdanovic' and p != 'Bojan Bogdanovic'):
                    unfuzzedlist.append(p)
                elif(i == 'Bojan Bogdanovic' and p != 'Bogdan Bogdanovic'):
                    unfuzzedlist.append(p)
                elif(i == 'Cole Anthony' and p != 'Carmelo Anthony'):
                    unfuzzedlist.append(p)
                elif(i == 'Carmelo Anthony' and p != 'Cole Anthony'):
                    unfuzzedlist.append(p)
                elif(i == 'Cam Johnson' and p != 'James Johnson'):
                    unfuzzedlist.append(p)
                elif(i == 'James Johnson' and p != 'Cam Johnson'):
                    unfuzzedlist.append(p)
                elif(i == 'Jordan Ford' and p != 'Jordan Nwora'):
                    unfuzzedlist.append(p)
                elif(i == 'Jordan Nwora' and p != 'Jordan Ford'):
                    unfuzzedlist.append(p)
                elif(i == 'Zion Williamson' and p != 'Lou Williams' and p != 'Marvin Williams'):
                    unfuzzedlist.append(p)
                elif(i == 'Lou Williams' and p != 'Zion Williamson' and p != 'Marvin Williams'):
                    unfuzzedlist.append(p)
                elif(i == 'Marvin Williams' and p != 'Lou Williams' and p != 'Zion Williamson'):
                    unfuzzedlist.append(p)
                elif(i == 'Miles Bridges' and p != 'Mikal Bridges'):
                    unfuzzedlist.append(p)
                elif(i == 'Mikal Bridges' and p != 'Miles Bridges'):
                    unfuzzedlist.append(p)
                elif(i == 'Jalen McDaniels' and p != 'Jaden McDaniels'):
                    unfuzzedlist.append(p)
                elif(i == 'Jaden McDaniels' and p != 'Jalen McDaniels'):
                    unfuzzedlist.append(p)
                elif(i == 'Derrick Jones Jr.' and p != 'Derrick Rose' and p != 'Derrick White'):
                    unfuzzedlist.append(p)
                elif(i == 'Derrick Rose' and p != 'Derrick Jones Jr.' and p != 'Derrick White'):
                    unfuzzedlist.append(p)
                elif(i == 'Derrick White' and p != 'Derrick Rose' and p != 'Derrick Jones Jr.'):
                    unfuzzedlist.append(p)
                elif(i == 'Patrick Mills' and p != 'Patrick Williams'):
                    unfuzzedlist.append(p)
                elif(i == 'Patrick Williams' and p != 'Patrick Mills'):
                    unfuzzedlist.append(p)
                elif(i == 'James Ennis' and p != 'James Harden'):
                    unfuzzedlist.append(p)
                elif(i == 'James Harden' and p != 'James Ennis'):
                    unfuzzedlist.append(p)
                elif(i == 'Justin Patton' and p != 'Justin James'):
                    unfuzzedlist.append(p)
                elif(i == 'Justin James' and p != 'Justin Patton'):
                    unfuzzedlist.append(p)
                elif(i == 'Anthony Gill' and p != 'Anthony Davis'):
                    unfuzzedlist.append(p)
                elif(i == 'Anthony Davis' and p != 'Anthony Gill'):
                    unfuzzedlist.append(p)
                elif(i == 'Jordan Bone' and p != 'Jordan Bell'):
                    unfuzzedlist.append(p)
                elif(i == 'Jordan Bell' and p != 'Jordan Bone'):
                    unfuzzedlist.append(p)
                    
    if('Cam Payne' in fuzzedlist and 'Cameron Payne' not in unfuzzedlist and 'Cameron Payne' in playerlist):
        unfuzzedlist.append('Cameron Payne')
    elif('Cam Johnson' in fuzzedlist and 'Cameron Johnson' not in unfuzzedlist and 'Cameron Johnson' in playerlist):
        unfuzzedlist.append('Cameron Johnson')
    elif('Ish Smith' in fuzzedlist and 'Ishmael Smith' not in unfuzzedlist and 'Ishmael Smith' in playerlist):
        unfuzzedlist.append('Ishmael Smith')
    elif('Raul Neto' in fuzzedlist and 'Raulzinho Neto' not in unfuzzedlist and 'Raulzinho Neto' in playerlist):
        unfuzzedlist.append('Raulzinho Neto')
    elif('Willy Hernangomez' in fuzzedlist  and 'Guillermo Hernangomez' not in unfuzzedlist and 'Guillermo Hernangomez' in playerlist):
        unfuzzedlist.append('Guillermo Hernangomez')
    elif('Patty Mills' in fuzzedlist  and 'Patrick Mills' not in unfuzzedlist and 'Patrick Mills' in playerlist):
        unfuzzedlist.append('Patrick Mills')
    elif('Lu Dort' in fuzzedlist  and 'Luguentz Dort' not in unfuzzedlist and 'Luguentz Dort' in playerlist):
        unfuzzedlist.append('Luguentz Dort')
    elif('Mo Bamba' in fuzzedlist  and 'Mohamed Bamba' not in unfuzzedlist and 'Mohamed Bamba' in playerlist):
        unfuzzedlist.append('Mohamed Bamba')
    elif('Nic Claxton' in fuzzedlist  and 'Nicolas Claxton' not in unfuzzedlist and 'Nicolas Claxton' in playerlist):
        unfuzzedlist.append('Nicolas Claxton')
    elif("De’Anthony Melton" in fuzzedlist  and "De'Anthony Melton" not in unfuzzedlist and "De'Anthony Melton" in playerlist):
        unfuzzedlist.append("De'Anthony Melton")
    elif("Wes Iwundu" in fuzzedlist  and "Wesley Iwundu" not in unfuzzedlist and "Wesley Iwundu" in playerlist):
        unfuzzedlist.append("Wesley Iwundu")
    elif("Moe Wagner" in fuzzedlist  and "Moritz Wagner" not in unfuzzedlist and "Moritz Wagner" in playerlist):
        unfuzzedlist.append("Moritz Wagner")
    elif("Shaq Harrison" in fuzzedlist  and "Shaquille Harrison" not in unfuzzedlist and "Shaquille Harrison" in playerlist):
        unfuzzedlist.append("Shaquille Harrison")
    elif('Cam Reynolds' in fuzzedlist and 'Cameron Reynolds' not in unfuzzedlist and 'Cameron Reynolds' in playerlist):
        unfuzzedlist.append('Cameron Reynolds')
    elif('Cristiano Felicio' in fuzzedlist and 'Cristiano Da Silva Felicio' not in unfuzzedlist and 'Cristiano Da Silva Felicio' in playerlist):
        unfuzzedlist.append('Cristiano Da Silva Felicio')

In [73]:
# Will be used to remove leading and trailing whitespaces

def strip(name):
    name = name.strip()
    return name

In [74]:
# Used to clean the injuries dataframe by removing players that are not really players and creates a list of all players 
# announced as injured on Fantasy Labs twitter account since 2/14/2020 (earliest date scraped from twitter)

def cleaninjuries(injuries):
    listt = ['per coach',"won't be",'Tyler Johnson will start if James Harden',"@TheRealNiggaaaa Assuming he's going to be held out for rest","Cady Lalanne"
             ,"Lakers intend","Spurs to","Knicks to","Blazers plan","Edmond Sumner expected to start if Malcolm Brogdon","Knicks plan",'Magic plan','Cameron Oliver'
             ,'Thunder to','John Wall will','ruled out', 'Simi Shittu','Anderson Varejao','Kings to','Magic sign','Magic intend','Bucks plan']
    
    listt2 = ['Cam Payne','Cam Reynolds','Frank Mason','LaMarcus Aldridge working','Moe Harkless','Cam Johnson','Cam Reddish','Nic Claxton','Nicolas Batum'
              ,'PJ Dozier','PJ Tucker','Patrick Patterson','Patty Mills','JJ Redick','Moe Wagner','Otto Porter Jr.','Wendell Carter', 'Al Farouq Aminu', 
              'Glenn Robinson', 'Terrance Ferguson', 'Terrence Davis', 'Kira Lewis', 'Larry Nance','Norvelle Pelle','Wes Matthews',"DeAndre' Bembry",'Gary Trent',
              'Troy Brown Jr.','Luke Doncic', 'Zach Lavine' ]
    injuries = injuries[injuries['Player'].notna()]
    injuries['Player'] = injuries['Player'].apply(strip)
    injuries = injuries.query('Player not in @listt', engine = 'python')
    injurieslist = list(injuries['Player'].unique())
    for i in range(0,7):
        for i in injurieslist:
            if(i in listt2):
                injurieslist.remove(i)
    return injurieslist, injuries

In [75]:
# Used to get the matched names of the injured players in our master df, since twitter and fantasy cruncher use different 
# naming conventions

def getmatchedinjuries(players, injurieslist):
    matchedinjuries = []
    for i in range(0,13):  
        getfuzzymatching(players,injurieslist,matchedinjuries)
    matchedinjuries = list(OrderedDict.fromkeys(matchedinjuries)) 
    return matchedinjuries

In [76]:
# Adds back players that were difficult to match using the fuzzymatching function to the injuries list while adding the actual
# match to the matchedinjuries list. Also creates a lookup dataframe for incorrect and correct names that will be used later on.

def addmatchestolists(injurieslist, matchedinjuries):
    injurieslist.sort()
    matchedinjuries.sort()
    injurieslist.append('Gary Trent')
    matchedinjuries.append('Gary Trent Jr.')
    injurieslist.append('Zach Lavine')
    matchedinjuries.append('Zach LaVine')
    injurieslist.append('Luke Doncic')
    matchedinjuries.append('Luka Doncic')
    injurieslist.append('Troy Brown Jr.')
    matchedinjuries.append('Troy Brown')
    injurieslist.append('Cam Payne')
    matchedinjuries.append('Cameron Payne')
    injurieslist.append('Cameron Payne')
    matchedinjuries.append('Cameron Payne')
    injurieslist.append("DeAndre' Bembry")
    matchedinjuries.append('DeAndre Bembry')
    injurieslist.append('Wes Matthews')
    matchedinjuries.append('Wesley Matthews')
    injurieslist.append('Norvelle Pelle')
    matchedinjuries.append('Norvel Pelle')
    injurieslist.append('Larry Nance')
    matchedinjuries.append('Larry Nance Jr.')
    injurieslist.append('Cam Reynolds')
    matchedinjuries.append('Cameron Reynolds')
    injurieslist.append('Kira Lewis')
    matchedinjuries.append('Kira Lewis Jr.')
    injurieslist.append('Moe Wagner')
    matchedinjuries.append('Moritz Wagner')
    injurieslist.append('Terrance Ferguson')
    matchedinjuries.append('Terrance Ferguson')
    injurieslist.append('Terrence Davis')
    matchedinjuries.append('Terence Davis')
    injurieslist.append('Otto Porter Jr.')
    matchedinjuries.append('Otto Porter')
    injurieslist.append('Wendell Carter')
    matchedinjuries.append('Wendell Carter Jr.')
    injurieslist.append('Glenn Robinson')
    matchedinjuries.append('Glenn Robinson III')
    injurieslist.append('Al Farouq Aminu')
    matchedinjuries.append('Al-Farouq Aminu')
    injurieslist.append('Cam Reddish')
    matchedinjuries.append('Cam Reddish')
    injurieslist.append('Cam Johnson')
    matchedinjuries.append('Cameron Johnson')
    injurieslist.append('Frank Mason')
    matchedinjuries.append('Frank Mason III')
    injurieslist.append('LaMarcus Aldridge working')
    matchedinjuries.append('LaMarcus Aldridge')
    injurieslist.append('Moe Harkless')
    matchedinjuries.append('Maurice Harkless')
    injurieslist.append('Nicolas Batum')
    matchedinjuries.append('Nicolas Batum')
    injurieslist.append('Nic Claxton')
    matchedinjuries.append('Nicolas Claxton')
    injurieslist.append('PJ Dozier')
    matchedinjuries.append('PJ Dozier')
    injurieslist.append('PJ Tucker')
    matchedinjuries.append('P.J. Tucker')
    injurieslist.append('Patrick Patterson')
    matchedinjuries.append('Patrick Patterson')
    injurieslist.append('Patty Mills')
    matchedinjuries.append('Patrick Mills')
    injurieslist.append('JJ Redick')
    matchedinjuries.append('J.J. Redick')
    dfinjur = pd.DataFrame()
    dfinjur['incorrect'] = injurieslist
    dfinjur['correct'] = matchedinjuries
    return dfinjur

In [77]:
# Adds a column indicating the game number and a column indicating the team of the player to the injuries dataframe using the 
# dfinjur dataframe above to match the player name from the injuries dataset to the player name in the master df so that the
# relevant data can be queried

def addgamecolinj(injuries, dfinjur, df):
    teams = []
    games = []
    for index, row in injuries.iterrows():
        datel = row['Date']
        p =  row["Player"]
        player = dfinjur[dfinjur['incorrect'] == p].reset_index(drop = True).loc[0, 'correct']
        teamdat = df[df['Player'] == player].reset_index(drop = True)
        try:
            team = teamdat[teamdat['Date'] >= datel].reset_index(drop = True).loc[0,'Team']
            gamedat = dfteamschedule[dfteamschedule['Team'] == team].reset_index(drop = True)
            game = gamedat[gamedat['Dates'] >= datel].reset_index(drop = True).loc[0,'Game']
            teams.append(team)
            games.append(game)
        except KeyError:
            team = teamdat.iloc[-1, 4]
            gamedat = dfteamschedule[dfteamschedule['Team'] == team].reset_index(drop = True)
            game = gamedat.iloc[-1, 2]
            teams.append(team)
            games.append(game)
    injuries['Team'] = teams
    injuries['Game'] = games

In [78]:
# Creates a list of stud basketball players (identified by their usage percentage on the team - has to be 1st or 2nd ranked
# for usage). Additional exceptional players are also added (identified by subject matter expert fantasy player)

def getstuds(usage):
    studs = list(usage[usage['rank'] <= 2].loc[:,'Name'])
    studs.append('James Harden')
    studs.append('Clint Capela')
    studs.append('Kemba Walker')
    studs.append('Domantas Sabonis')
    studs.append('Jusuf Nurkic')
    studs.append('Richaun Holmes')
    studs.append('Ben Simmons')
    studs.append('Jrue Holiday')
    return studs

In [79]:
# Creates two starter lists to identify starters

def getstarters(depth_charts):
    teams = ['HOU','DET','BKN']
    dethoubknstarters = list(depth_charts.query('Team in @teams', engine = 'python').loc[:,'Starter'])
    otherstarters = list(depth_charts.query('Team not in @teams', engine = 'python').loc[:,'Starter'])
    backups = list(depth_charts.query('Team not in @teams', engine = 'python').loc[:,'Second'])
    starters = []
    starters2 = []
    getfuzzymatching(players,dethoubknstarters,starters)
    getfuzzymatching(players,otherstarters,starters2)
    return starters, starters2

In [80]:
# Used to convert a datestring to a date

def getdate(datestr):
    datet = datetime.strptime(datestr, '%m/%d/%Y')
    return datet

In [81]:
# Adds the isstarter, Game, isstud and backtoback columns to the master df. Isstarter identifies if player is a starter,
# isstud identifies if player is a stud, game identifies the game of the record using the dfteamschedule dataset created earlier,
# Backtoback identifies games where the player played the previous day

def addmorecolsdf(df, starters, starters2, studs):
    isstarter = []
    games = []
    isstud = []
    backtoback = []
    for index, row in df.iterrows():
        p = row['Player']
        datel = datetime.strptime(row['Date'], '%m/%d/%Y')
        teamdat = df[df['Player'] == p].reset_index(drop = True)
        if(p in starters or row['Player'] in starters2):
            isstarter.append(True)
        else:
            isstarter.append(False)
        try:
            team = teamdat[teamdat['Date'].apply(getdate) >= datel].reset_index(drop = True).loc[0,'Team']
            gamedat = dfteamschedule[dfteamschedule['Team'] == team].reset_index(drop = True)
            game = gamedat[gamedat['Dates'].apply(getdate) >= datel].reset_index(drop = True).loc[0,'Game']
            games.append(game)
        except KeyError:
            team = teamdat.iloc[-1, 4]
            gamedat = dfteamschedule[dfteamschedule['Team'] == team].reset_index(drop = True)
            game = gamedat.iloc[-1, 2]
            games.append(0)
        if (row['Player'] in studs):
            isstud.append(True)
        else:
            isstud.append(False)
        yesterdate = datel - timedelta(days = 1)
        yesterdate = yesterdate.date()
        d = teamdat[teamdat['Date'].apply(getdate) == str(yesterdate)]
        if(len(d) > 0):
            backtoback.append(True)
        else:
            backtoback.append(False)
    df['isstarter'] = isstarter
    df['isstud'] = isstud
    df['Game'] = games
    df['BacktoBack'] = backtoback

In [83]:
# uses the game and the player name to query the injuries dataset and creates an isinjured column for the master df

def getisinjured(df):
    isinjured = []
    for index, row in df.iterrows():
        player = row['Player']
        game = row['Game']
        try:
            injdata = injuries[injuries['Player'] == player]
            gamei = len(injdata[injdata['Game']==game].loc[:,'Game'])
            if(gamei > 0):
                isinjured.append(True)
            else:
                isinjured.append(False)
        except KeyError:
            isinjured.append(False)
    df['isinjured'] = isinjured

In [84]:
# Creates two datasets indicating the number of injured starters and the number of injured studs for a given date

def dfinjstartandstud(teams):
    injuredstuds = []
    injuredstarters = []
    dates = []
    teamss = []
    datestuds = []
    teamstuds = []
    for t in teams:
        d = df37injuredstarters[df37injuredstarters['Team'] == t]
        s = df37injuredstuds[df37injuredstuds['Team'] == t]
        for index, row in d.iterrows():
            dates.append(row['Date'])
            d2 = d[d['Date'] == row['Date']]
            injuredstarters.append(len(d2))
            teamss.append(row['Team'])
        for index, row in s.iterrows():
            datestuds.append(row['Date'])
            s2 = s[s['Date'] == row['Date']]
            injuredstuds.append(len(s2))
            teamstuds.append(row['Team'])
    dfinjstart = pd.DataFrame()
    dfinjstart['Team'] = teamss
    dfinjstart['InjuredStarters'] = injuredstarters
    dfinjstart['Dates'] = dates
    dfinjstart = dfinjstart.drop_duplicates()
    dfinjstud = pd.DataFrame()
    dfinjstud['Team'] = teamstuds
    dfinjstud['Injuredstuds'] = injuredstuds
    dfinjstud['Dates'] = datestuds
    dfinjstud = dfinjstud.drop_duplicates()
    return dfinjstud, dfinjstart

In [85]:
''' 
Uses above dfinjstud and dfinjstart datasets to create columns for the filtered dataset (df37) indicating the number of 
injured starters and injured studs for that game. Also creates columns capturing the players avg fantasy 
points and mins in the last 5 games (if player hasn't played in the last 5 games it grabs their last 5 recorded stats).
It also creates a column for avg fantasy points against the opp for that week (if none it pulls last 3 game avg fantasy points).
It creates a column for overall mins and overall points averages for the player (for the season), creates a points per min 
column, and uses the predicted mins and the points per min calculation to create a ppm projection for the game. Finally it 
creates a column for the players' team average pace and the players' opp average pace using the pace dataset. 
'''

def getinjstartsandstuds(df37):
    startinj = []
    studinj = []
    avgfps = []
    avgmins = []
    avgfpsopp = []
    ovravg = []
    ovrmin = []
    avgteampace = []
    avgopppace = []
    for index, row in df37.iterrows():
        t = row['Team']
        dat = row['Date']
        try:
            data = dfinjstart.loc[(dfinjstart['Team'] == t) & (dfinjstart['Dates'] == dat)].reset_index(drop = True).loc[0,'InjuredStarters']
            startinj.append(data)
        except KeyError:
            startinj.append(0)
        try:
            data2 = dfinjstud.loc[(dfinjstud['Team'] == t) & (dfinjstud['Dates'] == dat)].reset_index(drop = True).loc[0,'Injuredstuds']
            studinj.append(data2)
        except KeyError:
            studinj.append(0)
        p = row['Player']
        gamelower = row['Game'] - 5
        game = row['Game']
        data = df[df['Player'] == p]
        data2 = data.loc[(data['Game'] >= gamelower) & (data['Game'] < game)]
        avgfp = data2.loc[:,'Score'].median()
        avgmin = data2.loc[:,'Mins'].median()
        if(math.isnan(avgfp) == True):
            avgfp = data['Score'].dropna()[-5:].median()
            if(math.isnan(avgfp) == True):
                avgfp = 0
        if(math.isnan(avgmin) == True):
            avgmin = data['Mins'].dropna()[-5:].median()
            if(math.isnan(avgmin) == True):
                avgmin = 0
        avgfps.append(avgfp)
        avgmins.append(avgmin)
        opp = row['Opp'].split(' ')[1]
        oppdata = data.query('Opp.str.contains(@opp) and Date != @dat', engine = 'python')
        oppdata = oppdata.loc[:,'Score']
        avgagainstoppp = oppdata.median()
        if(math.isnan(avgagainstoppp) == True):
            gamelower = row['Game'] - 3
            data2 = data.loc[(data['Game'] >= gamelower) & (data['Game'] < game)]
            avgagainstoppp = data2.loc[:,'Score'].median()
        if(math.isnan(avgagainstoppp) == True):
            avgagainstoppp = data['Score'].dropna()[-3:].median()
            if(math.isnan(avgagainstoppp) == True):
               avgagainstoppp = 0
        avgfpsopp.append(avgagainstoppp)
        avg = data.loc[:,'Score'].mean()
        avg2 = data.loc[:,'Mins'].mean()
        if(math.isnan(avg)):
            ovravg.append(0)
        else:
            ovravg.append(avg)
        if(math.isnan(avg2)):
            ovrmin.append(0)
        else:
            ovrmin.append(avg2)
        teamd = Pace[Pace.ABR == t].reset_index(drop = True)
        teampace = teamd.loc[0,'Pace']
        oppd = Pace[Pace.ABR == opp].reset_index(drop = True)
        opppace = oppd.loc[0,'Pace']
        avgteampace.append(teampace)
        avgopppace.append(opppace)
    df37['InjuredStarters'] = startinj
    df37['InjuredStuds'] = studinj  
    df37['AveragePointsLast5Games'] = avgfps
    df37['AverageMinsLast5Games'] = avgmins
    df37['Averagefantasypointagainstopp'] = avgfpsopp
    df37['ovravg'] = ovravg
    df37['ovrmin'] = ovrmin
    df37['PPM'] = df37['ovravg'] / df37['ovrmin']
    df37['PPM proj'] = df37['PPM'] * df37['Proj Mins']
    df37['Averageteampace'] = avgteampace
    df37['Averageopppace'] = avgopppace

# Load lists and columns

In [86]:
players, teams = dostuff(df)                                       # Creates player and team list
getoppgcol(df)                                                     # Gets oppg column
dfteamschedule = getteamscheduledf(teams)                          # Loads team schedule dataset
injurieslist, injuries = cleaninjuries(injuries)                   # Gets injury list and cleans injuries dataframe
matchedinjuries = getmatchedinjuries(players, injurieslist)        # Creates a matched injuries list using fuzzy matching
dfinjur = addmatchestolists(injurieslist, matchedinjuries)         # Creates lookup dataframe for incorrectly formatted names
addgamecolinj(injuries, dfinjur, df)                               # Adds game column to injuries dataset
starters, starters2 = getstarters(depth_charts)                    # Creates two starter lists
studs = getstuds(usage)                                            # Creates studs list
addmorecolsdf(df, starters, starters2, studs)                      # Adds all columns specified in function to master df
getisinjured(df)                                                   # Adds isinjured column to master df

Filters the master df down to dates that are >= Feb 14th, 2021 (earliest captured injury data from twitter) ~ game 37 for most teams

In [87]:
#games 37 and above (earliest injury data)
df37 = df[df['Date'].apply(getdate) >= datetime.strptime('02/14/2021', '%m/%d/%Y')]    
df37starters = df37[df37['isstarter'] == True]                      # Creates starter df
df37injuredstarters = df37starters[df37starters['Proj Mins'] == 0]  # Created injured starters df
df37studs = df37[df37['isstud'] == True]                            # Creates studs df
df37injuredstuds = df37studs[df37studs['Proj Mins'] == 0]           # Creates injured studs df
dfinjstud, dfinjstart = dfinjstartandstud(teams)                    # Loads the two dfs for inj starters and inj studs
getinjstartsandstuds(df37)                                          # Creates columns showing number of inj starters and studs and others

# Creating features and targets

In [88]:
new_df =  df37[['Player','Team','VegasPts','Salary','Floor','Ceiling','STDV', 'USG','Proj Mins','Con.','home_or_away','DVP','oppg','isstarter','isstud'
               ,'isinjured','BacktoBack','InjuredStarters','InjuredStuds','AveragePointsLast5Games','AverageMinsLast5Games','Averagefantasypointagainstopp'
               ,'ovravg','ovrmin','PPM proj','Averageteampace','Averageopppace', 'Score']] 
  
new_df = new_df.drop('Team', axis = 1)
final = pd.get_dummies(new_df, columns = ['home_or_away','isstarter','isstud','isinjured','BacktoBack'], drop_first = True)
Xdf = final.drop(['Player','Score'], axis = 1).fillna(0) #features
ydf = final[['Score']].fillna(0)                         #targets
Xdf.shape                                                #check shape of features

(20375, 25)

In [89]:
final.iloc[:10, :13] # Look at first 13 columns of final

Unnamed: 0,Player,VegasPts,Salary,Floor,Ceiling,STDV,USG,Proj Mins,Con.,DVP,oppg,InjuredStarters,InjuredStuds
12479,Luka Doncic,120.25,11100,46.4,86.1,15.85,36.1,36.5,72,18,115.0,0,0
12480,Giannis Antetokounmpo,120.25,10900,45.8,76.1,12.13,33.2,33.75,78,10,118.4,1,1
12481,Nikola Jokic,108.0,10700,40.4,72.8,12.94,28.2,37.0,78,24,105.5,2,0
12482,LeBron James,111.0,10200,41.5,66.9,10.17,31.5,35.75,80,15,110.6,0,0
12483,Nikola Vucevic,100.5,9800,35.8,66.3,12.21,28.5,36.75,75,21,107.3,2,0
12484,Kawhi Leonard,114.25,9700,0.0,15.0,9.98,30.3,0.0,79,11,109.6,2,2
12485,Damian Lillard,115.25,9600,41.7,70.0,11.34,31.4,36.5,77,23,110.9,2,2
12486,Karl-Anthony Towns,110.75,9400,41.2,55.2,5.59,26.2,33.0,88,22,109.1,1,1
12487,Anthony Davis,111.0,9300,41.8,65.4,9.45,27.4,34.25,79,11,110.6,0,0
12488,De'Aaron Fox,117.0,9000,36.3,75.9,15.87,30.6,36.25,61,4,113.9,1,0


In [90]:
final.iloc[:10, 13:] # Look at last 13 columns of final

Unnamed: 0,AveragePointsLast5Games,AverageMinsLast5Games,Averagefantasypointagainstopp,ovravg,ovrmin,PPM proj,Averageteampace,Averageopppace,Score,home_or_away_home,isstarter_True,isstud_True,isinjured_True,BacktoBack_True
12479,64.5,37.0,59.0,54.808333,34.783333,57.513297,99.4,100.9,66.25,1,1,1,0,0
12480,64.5,34.0,73.25,55.236364,32.690909,57.025862,104.3,103.2,74.75,0,1,1,0,0
12481,57.25,37.0,45.375,57.727273,35.045455,60.946822,99.3,100.7,63.5,1,1,1,0,0
12482,61.0,41.0,61.0,50.869048,34.095238,53.337902,100.7,99.3,51.0,0,1,1,0,0
12483,61.75,37.0,57.0,48.996094,33.484375,53.774528,100.8,99.1,33.5,0,1,1,0,0
12484,49.5,35.0,37.5,45.21875,34.145833,0.0,99.0,99.5,,1,1,1,0,0
12485,48.125,35.0,36.75,48.472222,35.714286,49.538611,100.9,99.4,57.75,0,1,1,0,0
12486,42.0,31.5,44.75,49.366667,33.688889,48.35719,104.2,101.6,45.25,0,1,1,0,0
12487,45.75,35.0,37.75,42.44697,31.848485,45.647657,100.7,99.3,21.5,0,1,0,0,0
12488,51.375,38.0,57.0,44.40625,35.107143,45.851825,102.2,102.7,43.0,1,1,1,0,0


# Train_Test_Val_Split

In [91]:
import matplotlib.pyplot as plt
import seaborn as sns

X = Xdf.values
y = ydf.values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state =52)    # Create train and test 

X_test, X_val, y_test, y_val = train_test_split(X_test, y_test, test_size=0.5, random_state=52) # Split Test into Validation and Test Sets

# Create and train NN Regression Algorithm

In [92]:
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.metrics import mean_absolute_error, mean_squared_error, explained_variance_score
from tensorflow.keras.models import save_model, load_model
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping

# Create a save point for model's best weights and create early stopping mechanism for training to prevent overfitting
filepath = "weights.best.hdf5"
keras_callbacks   = [
      EarlyStopping(monitor='val_loss', patience=10, mode='min', min_delta=0.0001),
      ModelCheckpoint(filepath, monitor='val_loss', verbose = 1, save_best_only=True, mode='min')
]


# Scale the features
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_val = scaler.transform(X_val)

# Create the Model
model = Sequential()
# there are 25 features, so we will have 26 neurons (one for each feature and 1 for errors)
model.add(Dense(26,activation = 'relu'))
# final output fantasy score prediction
model.add(Dense(1))
# use mean squared error and Adam to minimize the cost function
model.compile(optimizer='adam', loss='mse')
# fit the model to the training sets and validate with validation sets
model.fit(x=X_train, y=y_train
          ,validation_data = (X_val, y_val)
          ,batch_size = 1 # use 1 batch since training set is small to ensure the cost function converges
          ,epochs=50,     # max epochs 
          callbacks = keras_callbacks, # implement early stopping to prevent overfitting
          verbose=1) # validation after each epoch it will check loss on the validation data so we can see not only on 
# how well it does with training data, but also val data. 

model.load_weights(filepath)    # Loads the best weights that were saved to the filepath
predictions = model.predict(X_test) # Creates predictions for the test set

print(mean_absolute_error(y_test,predictions)) # Prints mean absolute error for test set and predictions

print(np.sqrt(mean_squared_error(y_test,predictions))) # Prints root mean squared error (Penalizes larger errors)

print(explained_variance_score(y_test, predictions)) # Prints r2/ Variance explained between actual scores and predictions

Epoch 1/50
Epoch 00001: val_loss improved from inf to 54.66850, saving model to weights.best.hdf5
Epoch 2/50
Epoch 00002: val_loss improved from 54.66850 to 53.72157, saving model to weights.best.hdf5
Epoch 3/50
Epoch 00003: val_loss improved from 53.72157 to 53.68359, saving model to weights.best.hdf5
Epoch 4/50
Epoch 00004: val_loss improved from 53.68359 to 52.82522, saving model to weights.best.hdf5
Epoch 5/50
Epoch 00005: val_loss did not improve from 52.82522
Epoch 6/50
Epoch 00006: val_loss did not improve from 52.82522
Epoch 7/50
Epoch 00007: val_loss improved from 52.82522 to 52.54840, saving model to weights.best.hdf5
Epoch 8/50
Epoch 00008: val_loss did not improve from 52.54840
Epoch 9/50
Epoch 00009: val_loss did not improve from 52.54840
Epoch 10/50
Epoch 00010: val_loss did not improve from 52.54840
Epoch 11/50
Epoch 00011: val_loss did not improve from 52.54840
Epoch 12/50
Epoch 00012: val_loss did not improve from 52.54840
Epoch 13/50
Epoch 00013: val_loss did not impr

Epoch 00035: val_loss did not improve from 52.23431
4.9701683993075845
7.361444719829239
0.7882382622704388


We had a 4.9 Mean absolute error (On average the predictions were off by almost 5 fantasy points). If we penalize larger errors more then that number jumps up to 7.3 fantasy points. Variance explained was almost 80% (not bad!!!)

# Save the model and the scaler to use on new datasets

In [94]:
filepath = 'C:\\Users\\v-mpurvis\\OneDrive\\Personal Files\\Fantasy Basketball\\Python Programs\\test model3\\'
save_model(model, filepath)

with open('C:\\Users\\v-mpurvis\\OneDrive\\Personal Files\\Fantasy Basketball\\Python Programs\\test model3\\scaler','wb') as file:
    pickle.dump(scaler, file)

Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: C:\Users\v-mpurvis\OneDrive\Personal Files\Fantasy Basketball\Python Programs\test model3\assets
