DraftKings NFL Constraint Satisfaction
===

This is the companion code to a [blog post](https://zwlevonian.medium.com/integer-linear-programming-with-pulp-optimizing-a-draftkings-nfl-lineup-5e7524dd42d3) I wrote on Medium.

In [16]:
import pandas as pd

In [17]:
import pulp

### Load in the weekly data

In [18]:
df = pd.read_csv('DKSalaries.csv')
len(df)

430

In [19]:
df.sample(n=5)

Unnamed: 0,Position,Name + ID,Name,ID,Roster Position,Salary,Game Info,TeamAbbrev,AvgPointsPerGame
63,RB,Melvin Gordon III (15713519),Melvin Gordon III,15713519,RB/FLEX,5300,DEN@ATL 11/08/2020 01:00PM ET,DEN,15.72
325,WR,Tony Brown (15713863),Tony Brown,15713863,WR/FLEX,3000,NYG@WAS 11/08/2020 01:00PM ET,WAS,0.0
395,TE,Josh Oliver (15714061),Josh Oliver,15714061,TE/FLEX,2500,HOU@JAX 11/08/2020 01:00PM ET,JAX,0.0
213,RB,Senorise Perry (15713581),Senorise Perry,15713581,RB/FLEX,4000,CHI@TEN 11/08/2020 01:00PM ET,TEN,0.45
169,RB,Qadree Ollison (15713661),Qadree Ollison,15713661,RB/FLEX,4000,DEN@ATL 11/08/2020 01:00PM ET,ATL,0.3


In [20]:
# trim any postponed games, since those can't be included in a lineup
df = df[df['Game Info'] != 'Postponed']
len(df)

430

In [21]:
exclude_list = ['Dak Prescott']
df = df[~df['Name'].isin(exclude_list)]
len(df)

429

In [22]:
# this is equivalent to an extra constraint that requires playing only players with a minimum cost
# does not apply to DST, since that's kind of a special category
df = df[(df.Salary >= 4000)|(df['Roster Position'] == 'DST')]
len(df)

249

### Create the constraint problem

Goal: maximize AvgPointsPerGame

 - TotalPlayers = 9
 - TotalSalary <= 50000
 - TotalPosition_WR = 3
 - TotalPosition_RB = 2
 - TotalPosition_TE = 1
 - TotalPosition_QB = 1
 - TotalPosition_FLEX = 1
 - TotalPosition_DST = 1
 - Each player in only one position (relevant only for FLEX)
 

In [23]:
prob = pulp.LpProblem('DK_NFL_weekly', pulp.LpMaximize)

In [24]:
player_vars = [pulp.LpVariable(f'player_{row.ID}', cat='Binary') for row in df.itertuples()]

In [25]:
# total assigned players constraint
prob += pulp.lpSum(player_var for player_var in player_vars) == 9

In [26]:
# position constraints
# TODO fix this, currently won't work
# as it makes the problem infeasible
def get_position_sum(player_vars, df, position):
    return pulp.lpSum([player_vars[i] * (position in df['Roster Position'].iloc[i]) for i in range(len(df))])
    
prob += get_position_sum(player_vars, df, 'QB') == 1
prob += get_position_sum(player_vars, df, 'DST') == 1

# to account for the FLEX position, we allow additional selections of the 3 FLEX-eligible roles
prob += get_position_sum(player_vars, df, 'RB') >= 2
prob += get_position_sum(player_vars, df, 'WR') >= 3
prob += get_position_sum(player_vars, df, 'TE') >= 1

In [27]:
# total salary constraint
prob += pulp.lpSum(df.Salary.iloc[i] * player_vars[i] for i in range(len(df))) <= 50000

In [28]:
# finally, specify the goal
prob += pulp.lpSum([df.AvgPointsPerGame.iloc[i] * player_vars[i] for i in range(len(df))])

In [29]:
# solve and print the status
prob.solve()
print(pulp.LpStatus[prob.status])

Optimal


In [30]:
# for each of the player variables, 
total_salary_used = 0
mean_AvgPointsPerGame = 0
for i in range(len(df)):
    if player_vars[i].value() == 1:
        row = df.iloc[i]
        print(row['Roster Position'], row.Name, row.TeamAbbrev, row.Salary, row.AvgPointsPerGame)
        total_salary_used += row.Salary
        mean_AvgPointsPerGame += row.AvgPointsPerGame
#mean_AvgPointsPerGame /= 9  # divide by total players in roster to get a mean
total_salary_used, mean_AvgPointsPerGame

RB/FLEX Dalvin Cook MIN 8200 28.65
QB Russell Wilson SEA 7600 32.01
WR/FLEX Tyler Lockett SEA 6800 22.07
WR/FLEX Corey Davis TEN 5900 17.98
RB/FLEX Melvin Gordon III DEN 5300 15.72
WR/FLEX CeeDee Lamb DAL 4900 14.21
TE/FLEX Hunter Henry LAC 4000 9.63
WR/FLEX Keelan Cole JAX 4000 12.37
DST Colts  IND 3300 11.71


(50000, 164.35)