In [1]:
import csv 
from pulp import *

\section{Fantasy Football}

In fantasy football every participant can assemble a team, that consists of 
    
\begin{itemize}
\item 1 $\times$ Quarterback
\item 1 $\times$ Tight end
\item 2 $\times$ Running backs
\item 3 $\times$ Wide receivers
\item 1 $\times$ Defense & Special teams
\item 1 $\times$ Flex
\end{itemize}

Every Position is awarded points through a pre-defined rating system;
e.g. Rushing yards, Touchdowns etc.

Every draft has a salary/cost.
When picking a team, the salary/cost must not exceed the salary cap.



\section{The Data}

In this project, we consider fantasy football facilitated by https://www.draftkings.co.uk/.
The salary cap here is $50.000$, and we can download the data (like salary and position) of a player from there



In [2]:
with open('DKSalaries.csv', 'r') as f:
            reader = list(csv.reader(f))
reader

[['QB', 'RB', 'RB', 'WR', 'WR', 'WR', 'TE', 'FLEX', 'DST', '', 'Instructions'],
 ['',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '1. Locate the player you want to select in the list below '],
 ['',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '2. Copy the ID of your player (you can use the Name + ID column or the ID column) '],
 ['',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '3. Paste the ID into the roster position desired '],
 ['',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  "4. You must include an ID for each player; you cannot use just the player's name "],
 ['',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '5. You can create up to 500 lineups per file '],
 [' '],
 ['',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  'Position',
  'Name + ID',
  'Name',
  'ID',
  'Roster Position',
  'Salary',
  'Game Info',
  'TeamAbbrev',
  'AvgPointsPerGame'],
 ['',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  '',
  'RB',


The projections are downloaded from the following website https://fantasyfootballanalytics.net/

In [3]:
import pandas as pd
pd.read_csv('ffa_customrankings2020.csv').head()

Unnamed: 0,playerId,player,team,position,age,exp,bye,points,lower,upper,...,dropoff,tier,ptSpread,overallECR,positionECR,sdRank,risk,sleeper,actualPoints,salary
0,2564832,Jalen Hurts,PHI,QB,50.0,0.0,,25.076629,24.98,25.12647,...,1.756675,1.0,0.14647,,,,,,,
1,2558125,Patrick Mahomes,KC,QB,,3.0,,25.011871,23.5106,25.909615,...,3.731826,1.0,2.399015,,,,,,,
2,2506363,Aaron Rodgers,GB,QB,37.0,15.0,,21.628036,19.774,23.55919,...,0.850655,2.0,3.78519,,,,,,,
3,2560757,Lamar Jackson,BAL,QB,23.0,2.0,,20.932055,18.058083,22.349667,...,0.546536,2.0,4.291584,,,,,,,
4,2558063,Deshaun Watson,HOU,QB,25.0,3.0,,20.622707,18.72,21.73816,...,0.703974,2.0,3.01816,,,,,,,


The data processing is handled through the custom class Data.

In [4]:
from Data_processing import Data
data = Data()
data.Get_cost('DKSalaries.csv')
data.Get_proj('ffa_customrankings2020.csv',0.5)
data.match_data()

\section{Problem and Motivation}

Our goal is it now to find the team with the best possible projected score while staying below the salary cap. 

\begin{equation}
\mathrm{max} \, \, \, c^\intercal x\\
\mathrm{s.t.}\,\,\, Ax \stackrel{\leq}{=} b.
\end{equation}

Let $N$ be the number of variables (determined by the number of players and teams in the NFL)

$c \in \mathbb{R}^N$ is the vector containing the projected points of each player

$x \in \{0,1\}^N$ represents the individual players,

$A$ enforces the necessary constraints (e.g. only one Quarterback)

Why do we need this? Could we not just brut force the solution?

Let's see how many combinations $C$ we need to try:

\begin{equation}
C = {\mathrm{\#WR} \brack 3} {\mathrm{\#RB} \brack 2} \times (\mathrm{\#FLX}-5)\times \mathrm{\#QB}\times \mathrm{\#TE}\times\mathrm{\#DST}
\end{equation}

In [5]:
import scipy.special
(scipy.special.binom(len(data.Cost_RB.keys()),2)
 *
 scipy.special.binom(len(data.Cost_WR.keys()),3)
 *
 (len(data.Cost_FLX.keys())-5)*len(data.Cost_QB.keys())
 *
 len(data.Cost_DST.keys()))

21599372793600.0

This is the number of possible teams (some of them will not be admissible, i.e. above the salary cap)

\section{Define the IP}
All variables and constraints are defined using pulp an open-source library for mixed-integer optimisation, that comes with a free solver.

We start by defining the problem variables; they will be saved in a dictionary. In this way, we can index Projections, Costs and our Variables with the same key (namely the name of the Player)


In [6]:
QB = LpVariable.dicts('QB',{qb for qb in data.Cost_QB.keys()}, cat = LpBinary)
TE = LpVariable.dicts('TE',{te for te in data.Cost_TE.keys()}, cat = LpBinary)
RB = LpVariable.dicts('RB',{rb for rb in data.Cost_RB.keys()}, cat = LpBinary)
WR = LpVariable.dicts('WR',{wr for wr in data.Cost_WR.keys()}, cat = LpBinary)
DST = LpVariable.dicts('DST', {dst for dst in data.Cost_DST.keys()}, 
                       cat = LpBinary)
FLX = LpVariable.dicts('FLX', {flx for flx in data.Cost_FLX.keys()}, 
                       cat = LpBinary)

Initialize the problem

In [7]:
prob = LpProblem('Fantasy Football',LpMaximize)



In [8]:
print(DST)
type(DST['Broncos'])

{'Chiefs': DST_Chiefs, 'Cowboys': DST_Cowboys, 'Seahawks': DST_Seahawks, 'Bears': DST_Bears, 'Falcons': DST_Falcons, 'Eagles': DST_Eagles, 'Giants': DST_Giants, 'Texans': DST_Texans, 'Chargers': DST_Chargers, 'Panthers': DST_Panthers, 'Steelers': DST_Steelers, 'Broncos': DST_Broncos, 'Jaguars': DST_Jaguars, 'Browns': DST_Browns, 'Rams': DST_Rams, 'Ravens': DST_Ravens, 'Jets': DST_Jets, 'Bengals': DST_Bengals, 'Colts': DST_Colts}


pulp.pulp.LpVariable

\subsection{The cost function}
The cost function takes the following form:
\begin{equation}
\sum^N_{i=1} p_i x_i - r \sum^N_{i=1} r_i x_i,
\end{equation}
where $p_i$ denotes the projected score of player/special Team and $x_i$ dentoes the binary variable. We have also introduced the hyperparameter $r$, which penalises risky choicses. The risk of an player/sepcial team $r_i$ is essentially calcualted by looking at the standard deviation)

In [9]:
r = 0.1
prob += (lpSum(QB[qb]*data.Proj_QB[qb] for qb in data.Cost_QB.keys()) 
         +
         lpSum(TE[te]*data.Proj_TE[te] for te in data.Cost_TE.keys())
         + 
         lpSum(RB[rb]*data.Proj_RB[rb] for rb in data.Cost_RB.keys()) 
         + 
         lpSum(WR[wr]*data.Proj_WR[wr] for wr in data.Cost_WR.keys()) 
         +  
         lpSum([DST[dst]*data.Proj_DST[dst] for dst in data.Cost_DST.keys()])
         + 
         lpSum([FLX[flx]*data.Proj_FLX[flx] for flx in data.Cost_FLX.keys()]) 
         -
         r*(lpSum(QB[qb]*data.Risk_QB[qb] for qb in data.Cost_QB.keys())
            +
            lpSum(TE[te]*data.Risk_TE[te] for te in data.Cost_TE.keys())
            +
            lpSum(RB[rb]*data.Risk_RB[rb] for rb in data.Cost_RB.keys())
            + 
            lpSum(WR[wr]*data.Risk_WR[wr] for wr in data.Cost_WR.keys())
            + 
            lpSum([DST[dst]*data.Risk_DST[dst] for dst in data.Cost_DST.keys()])
            + 
            lpSum([FLX[flx]*data.Risk_FLX[flx] for flx 
                   in data.Cost_FLX.keys()])))


\subsection{The Constraints}
First we habe to mae sure that the exact number of required players is chosen e.g.
\begin{equation}
\sum_{x_i \mathrm{is} \, \mathrm{WR}} x_i = 3
\end{equation}

In [10]:
prob +=(lpSum(QB[qb] for qb in data.Cost_QB.keys()) == 1)
prob +=(lpSum(TE[te] for te in data.Cost_TE.keys()) == 1)
prob +=(lpSum(RB[rb] for rb in data.Cost_RB.keys()) == 2)
prob +=(lpSum(WR[wr] for wr in data.Cost_WR.keys()) == 3 )
prob +=(lpSum(DST[dst] for dst in data.Cost_DST.keys()) == 1 )
prob +=(lpSum(FLX[flx] for flx in data.Cost_FLX.keys()) == 1 )

The next constraint makes sure that we do not select the same player as a Wide Receiver/Running back and Flex. We see again why storing the variables in a dictionary is useful 

In [11]:
for wr in data.Cost_WR.keys():
    prob += (FLX[wr]+WR[wr] <= 1)
for rb in  data.Cost_RB.keys():
    prob += (RB[rb]+FLX[rb] <= 1)

We now make sure that the lineup stays below the salary cap 
\begin{equation}
\sum^N_{i=1} \mathrm{salary}_i x_i \leq 50.000
\end{equation}

In [12]:
salary_cap = 50000
prob += (lpSum(QB[qb]*data.Cost_QB[qb] for qb in data.Cost_QB.keys())
         + 
         lpSum(TE[te]*data.Cost_TE[te] for te in data.Cost_TE.keys()) 
         + 
         lpSum(RB[rb]*data.Cost_RB[rb] for rb in data.Cost_RB.keys()) 
         + 
         lpSum(WR[wr]*data.Cost_WR[wr] for wr in data.Cost_WR.keys()) 
         + 
         lpSum(DST[dst]*data.Cost_DST[dst] for dst in data.Cost_DST.keys())
         + 
         lpSum(FLX[flx]*data.Cost_FLX[flx] for flx in data.Cost_FLX.keys())
         <= salary_cap)


This constraint is optional and again depends on the Hyperparameter Max_per_team.
It enforces that our lineup contains a maximum of Max_per_team players from one team.

In [13]:
max_per_team =2
for t in data.Teams:
    prob += (lpSum(QB[qb] for qb in data.Cost_QB.keys() 
                   if data.Player_Team[qb] == t)
             +
             lpSum(TE[te] for te in data.Cost_TE.keys() 
                   if data.Player_Team[te] == t)
             + 
             lpSum(RB[rb] for rb in data.Cost_RB.keys()
                   if data.Player_Team[rb] == t) 
             + 
             lpSum(WR[wr] for wr in data.Cost_WR.keys() 
                     if data.Player_Team[wr] == t) 
             + 
             lpSum(DST[dst] for dst in data.Cost_DST.keys() 
                   if data.Player_Team[dst] == t)
             +
             lpSum(FLX[flx] for flx in data.Cost_FLX.keys() 
                   if data.Player_Team[flx] == t)
        <= max_per_team)

We now call the solver provided by pulp that solves the optimisation problem in no time.

In [14]:
prob.solve()

1

This is the optimal team for the given week:

In [15]:
for v in prob.variables():
    if v.varValue > 0:
        print( v.name, "=", v.varValue)
        

DST_Texans = 1.0
FLX_David_Johnson = 1.0
QB_Jalen_Hurts = 1.0
RB_Austin_Ekeler = 1.0
RB_Nick_Chubb = 1.0
TE_Dallas_Goedert = 1.0
WR_Amari_Cooper = 1.0
WR_Jerry_Jeudy = 1.0
WR_Marquise_Brown = 1.0


In [16]:
prob += (lpSum(v for v in prob.variables() if v.varValue > 0) 
    <= 0) 
prob.solve()         

1

It is possible to add new constraints and solve the optimisation problem again. 
Here we chose to add the constraint that no player of the previous line can be chosen again. In principle this can be limited to any number e.g. we can add a constraint that allows to reuse $n$ players.

In [17]:
for v in prob.variables():
    if v.varValue > 0:
        print( v.name, "=", v.varValue)

DST_Ravens = 1.0
FLX_Nick_Chubb = 1.0
QB_Patrick_Mahomes = 1.0
RB_David_Johnson = 1.0
RB_David_Montgomery = 1.0
TE_Hayden_Hurst = 1.0
WR_Brandin_Cooks = 1.0
WR_Cam_Sims = 1.0
WR_Marvin_Hall = 1.0
