# Optimizing Fantasy Football Squad
## Linear Programming with Python PuLP <br /><br />

## Problem Statement
To start playing [Fantasy Premiere League](https://fantasy.premierleague.com/a/squad/selection), we need to select 15 Premiere League football players within a budget of £100m. The players must consist of 2 goalkeepers, 5, defenders, 5 midfielders, and 3 forwarders. Each player has total score that shows how good he is. How would you come up with the best combination of players that will win the game?

**Model the problem** <br />
We can see an optimization problem here to be solved. So we'll model the problem.
<code>
**maximize**: 
total_score_player1 × total_score_player1 + total_score_player2 × player2 + total_score_player3 × player3 + ... + total_score_playerN × playerN
**constraints**:
player1, player2, player3, ..., playerN are either 0 or 1
cost_player1 × player1 + cost_player2 × player2 + cost_player3 × player3 + ... + cost_playerN × playerN <= 100
is_gkp_player1 × player1 + is_gkp_player2 × player2 + is_gkp_player3 × player3 + ... + is_gkp_playerN × playerN = 2
is_def_player1 × player1 + is_def_player2 × player2 + is_def_player3 × player3 + ... + is_def_playerN × playerN = 5
is_mid_player1 × player1 + is_mid_player2 × player2 + is_mid_player3 × player3 + ... + is_mid_playerN × playerN = 5 
is_fwd_player1 × player1 + is_fwd_player2 × player2 + is_fwd_player3 × player3 + ... + is_fwd_playerN × playerN = 3 

</code>

To create a winning squad, I don't want to spend all budget to 15 of the players equally. I want to spend more money to 11 main players and less to the 4 subtitutes. First, choose 4 subtitute by choosing the cheapest players, then see how much left in your budget.
In my case with 4-5-1 formation, I choose 1 goalkeeper, 1 defender, and 2 forwarders as subtitute. I have £83 left in my budget to choose 1 goalkeeper, 4 defenders, 5 midfielders, and 1 forwarder.

Maximize 1 goalkeeper, 4 defenders, 5 midfielders, and 1 forwarder with total £83m

><code>
**maximize**: 
total_score_player1 × total_score_player1 + total_score_player2 × player2 + total_score_player3 × player3 + ... + total_score_playerN × playerN
**constraints**:
player1, player2, player3, ..., playerN are either 0 or 1
cost_player1 × player1 + cost_player2 × player2 + cost_player3 × player3 + ... + cost_playerN × playerN <= 83
is_gkp_player1 × player1 + is_gkp_player2 × player2 + is_gkp_player3 × player3 + ... + is_gkp_playerN × playerN = 1
is_def_player1 × player1 + is_def_player2 × player2 + is_def_player3 × player3 + ... + is_def_playerN × playerN = 4
is_mid_player1 × player1 + is_mid_player2 × player2 + is_mid_player3 × player3 + ... + is_mid_playerN × playerN = 5
is_fwd_player1 × player1 + is_fwd_player2 × player2 + is_fwd_player3 × player3 + ... + is_fwd_playerN × playerN = 1
</code>

I'll use [Python PuLP](https://www.coin-or.org/PuLP/pulp.html) linear programming to solve the optimization problem and [Python Pandas](https://pandas.pydata.org/) to prepare the data.

In [1]:
import pandas as pd
from pulp import *
import json
import requests
import time

## Prepare the Data
We can find json data for fantasy premiere league [here](https://fantasy.premierleague.com/drf/bootstrap-static). <br />
Pay attention to some important attributes here. *now_cost* represents the player's cost ten times (i.e. £4.5m is 45 here), *total_points* is the total score we are trying to maximize, *element_type* is player's position.

The player's details are stored inside the json node *elements*, load these into a python pandas data frame.

In [19]:
url = 'https://fantasy.premierleague.com/api/bootstrap-static/'
r = ''
while r == '':
    try:
        r = requests.get(url)
    except:
        time.sleep(5)
        continue
r = requests.get(url)
data = json.dumps((r.json()["elements"]))
df = pd.read_json(data, orient = 'records')

In [20]:
#we don't want players that possibly will not play next round.
#df = df[(df['chance_of_playing_next_round'] == 100.0)|pd.isnull(df['chance_of_playing_next_round'])].reset_index(drop=True)

In [21]:
#drop unnecessary attributes
df = df[['id','element_type', 'team_code', 'now_cost', 'first_name', 'second_name','total_points', 'form']]

In [22]:
#replace the values in element_type to its respective positions, find it from the original json
df.element_type.replace({1: 'gkp', 2: 'def', 3: 'mid', 4:'fwd'}, inplace = True)

Create four new columns representing each value in element_type. Set the values of the new columns to 1 or 0, element_type:gkp becomes element_type_gkp:1, element_type_def:0, element_type_mid:0, element_type_fwd:0. Pandas has cool function to do the job. 

This is needed so that we can process it with our model.

In [23]:
df = pd.get_dummies(df, columns = ['element_type'])


In [24]:
#df = df[df.id!=114] #rule out Diego Costa , He's being transferred but officially still in his team, that's why still in our table

## Model & Solve the Problem
Each player (each row) will be a variable with possible outcome 1 or 0, 1 means chosen. So we have hundreds of variables. We can use PuLP *lpSum* and Sum the columns.

In [25]:
#input dataframe, # of players for each position. return a dataframe of chosen players
def doOptimizePlayers(df, total_cost, n_gkp, n_def, n_mid, n_fwd): 
    df1 = df.copy()
    idplayers = df1.index.tolist()
    prob = LpProblem("fantasy football create squad",LpMaximize)
    players_vars = LpVariable.dicts("plyr", idplayers,0,1,LpInteger)

    prob += lpSum([df1.total_points[i]*players_vars[i] for i in idplayers]), "objective: maximize total points"

    prob += lpSum([df1.now_cost[i]*players_vars[i] for i in idplayers]) <= total_cost,"cost constrain"

    prob += lpSum([df1.element_type_gkp[i]*players_vars[i] for i in idplayers]) == n_gkp, "# goalkeepers"
    prob += lpSum([df1.element_type_def[i]*players_vars[i] for i in idplayers]) == n_def, "# defenders"
    prob += lpSum([df1.element_type_mid[i]*players_vars[i] for i in idplayers]) == n_mid, "# midfielders"
    prob += lpSum([df1.element_type_fwd[i]*players_vars[i] for i in idplayers]) == n_fwd, "# forwarders"   

    #prob.writeLP("FantasyFootball.lp")
    prob.solve()

    values = []
    variables = {x.name: x.varValue for x in prob.variables()}
    for i in players_vars:
        key = players_vars[i]
        values.append(variables[str(key)])
    df1['chosen'] = pd.Series(values, index = df1.index)

    df_chosen = df1[df1.chosen > 0].sort_values('total_points', ascending = False)
    return df_chosen

Let's see the five key players! I allocated now_cost 410 for them

In [None]:
df_chosen1 = doOptimizePlayers(df,830,1,4,5,1)
df_chosen1

In [36]:
#See if we've used up all 410, we reuse the money if there's some left
df_chosen1.now_cost.sum()

830

In [37]:
df_chosen1.total_points.sum()

2063

In [None]:
#We don't want the players we've chosen in the table anymore
df = df[~df.index.isin(df_chosen1.index)]

Let's see the other five main players, I allocated now_cost 395 

In [None]:
#df = df[df.id!=159]
#dx = df.loc[df.id!=468]
#dx = dx.loc[dx.id!=352]
df_chosen2  = doOptimizePlayers(df, 45, 0, 0, 0, 1)
df_chosen2

In [None]:
print df_chosen2.now_cost.sum()
print df_chosen2.total_points.sum()

In [None]:
df = df[~df.index.isin(df_chosen2.index)]

This time we optimize the four subtitute players with the rest of the money

In [None]:
df_chosen3 = doOptimizePlayers(df, 195, 1, 1, 1, 1)
print df_chosen3
df_chosen3.now_cost.sum()

## Final Result!

In [None]:
#combine all results
df_final = pd.concat([df_chosen1,df_chosen2,df_chosen3])[['now_cost', 'first_name', 'second_name','total_points','element_type_def','element_type_fwd','element_type_gkp','element_type_mid', 'team_code']]

In [None]:
#reverse pd.get_dummies element_type into a column position
x = df_final[['element_type_def','element_type_fwd','element_type_gkp','element_type_mid']]
df_final['position'] = x.idxmax(1) 
df_final.drop(axis = 1, labels = ['element_type_def','element_type_fwd','element_type_gkp','element_type_mid']).sort_values(['position','total_points'], ascending = False)

The player looks good! woohoo.

Notes: <br />
* Yeah, playing fantasy football has a lot more to do than this. But it actually helps, lol! And I had fun doing the math & coding, that's the important part.
* Python PuLP might be easy to read if we don't have that many variables. It's not nice to do for loop, iterating the values in the table to feed the PuLP equation. I like [Scipy Linprog's](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.optimize.linprog.html) syntax better, unfortunately it doesn't support integer linear programming.