# What this notebook does?

Problem: I don't have my team selected for fantasy premier league  
Solution: Use data from [vaastav](https://github.com/vaastav/Fantasy-Premier-League) and method from [knapsack](https://medium.com/@kangeugine/fantasy-football-as-a-data-scientist-part-2-knapsack-problem-6b7083955e93)

# Load

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pulp import *

sns.set(style="white")
%matplotlib inline

# Data

In [2]:
cleaned_players_data = pd.read_csv("/Users/eugine_kang/Documents/hobby/Fantasy-Premier-League/data/2022-23/cleaned_players.csv")
players_raw_data = pd.read_csv("/Users/eugine_kang/Documents/hobby/Fantasy-Premier-League/data/2022-23/players_raw.csv")
teams_data = pd.read_csv("/Users/eugine_kang/Documents/hobby/Fantasy-Premier-League/data/2022-23/teams.csv")[['code', 'name']]
teams_data.columns = ['team_code', 'team_name']
_data = pd.merge(cleaned_players_data, 
    players_raw_data[['first_name', 'second_name', 'team_code']], 
    how="left", 
    on=['first_name', 'second_name']
)
data = pd.merge(_data, teams_data, how="left", on=["team_code"])

# Squad Selection
- 100 pounds spending limit
- 15 players
- 2 GK
- 5 DEF
- 5 MID
- 3 STR
- Select a maximum of 3 players from a single team

In [3]:
n = data.shape[0]
player = [str(i) for i in range(n)]
point = {str(i): float(data['total_points'][i]) for i in range(n)} 
cost = {str(i): float(data['now_cost'][i]) for i in range(n)}
gk = {str(i): 1 if data['element_type'][i] == 'GK' else 0 for i in range(n)}
defe = {str(i): 1 if data['element_type'][i] == 'DEF' else 0 for i in range(n)}
mid = {str(i): 1 if data['element_type'][i] == 'MID' else 0 for i in range(n)}
stri = {str(i): 1 if data['element_type'][i] == 'FWD' else 0 for i in range(n)}
xi = {str(i): 1 for i in range(n)}

In [4]:
prob = LpProblem("Fantasy_Football",LpMaximize)
player_vars = LpVariable.dicts("Players",player,0,1,LpBinary)

In [5]:
# objective function
prob += lpSum([point[i]*player_vars[i] for i in player]), "Total Cost"

# constraint
prob += lpSum([player_vars[i] for i in player]) == 15, "Total 11 Players"
prob += lpSum([cost[i] * player_vars[i] for i in player]) <= 1000, "Total Cost"
prob += lpSum([gk[i] * player_vars[i] for i in player]) == 2, "2 GK"
prob += lpSum([defe[i] * player_vars[i] for i in player]) == 5, "5 DEK"
prob += lpSum([mid[i] * player_vars[i] for i in player]) == 5, "5 MID"
prob += lpSum([stri[i] * player_vars[i] for i in player]) == 3, "3 FWD"

# team constraint
for j in data['team_name'].unique():
    team = {str(i): 1 if data['team_name'][i] == j else 0 for i in range(n)}
    prob += lpSum([team[i] * player_vars[i] for i in player]) <= 3, f"Max 3 players from {j}"

In [6]:
# solve
status = prob.solve()

Welcome to the CBC MILP Solver 
Version: 2.10.8 
Build Date: May  6 2022 

command line - cbc /var/folders/xx/ndggzzd56zdfznymtxfg5q5w0000gn/T/d0e984c3cbf5489ca44d4659ea61bfb1-pulp.mps max timeMode elapsed branch printingOptions all solution /var/folders/xx/ndggzzd56zdfznymtxfg5q5w0000gn/T/d0e984c3cbf5489ca44d4659ea61bfb1-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 31 COLUMNS
At line 3546 RHS
At line 3573 BOUNDS
At line 4095 ENDATA
Problem MODEL has 26 rows, 521 columns and 2084 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 2491.88 - 0.00 seconds
Cgl0004I processed model has 26 rows, 471 columns (471 integer (452 of which binary)) and 1884 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 2 integers unsatisfied sum - 0.25
Cbc0038I Solution found of -2482
Cbc0038I Cleaned solution of -2482
Cbc0038I Before mini branch and bound, 469 integers a

In [7]:
# The status of the solution is printed to the screen
print("Status:", LpStatus[prob.status])

Status: Optimal


In [8]:
selection = {}
for v in prob.variables():
    index = int(v.name.split("_")[1])
    selection[index] = v.varValue
    # print(v.name, "=", v.varValue)

In [9]:
data['integer_programming'] = 0.0
for i in selection:
    data.loc[i, 'integer_programming'] = selection[i]

In [10]:
XI = data[data['integer_programming'] == 1.0]
TOTAL_POINTS = XI['total_points'].sum()
TOTAL_COST = XI['now_cost'].sum()
TOTAL_PLAYERS = XI.shape[0]
print("Total points:{}, cost:£{}, and with players:{}".format(TOTAL_POINTS, TOTAL_COST, TOTAL_PLAYERS))

Total points:2485, cost:£1000, and with players:15


In [16]:
XI[['first_name', 'second_name', 'element_type', 'team_name', 'total_points', 'now_cost']] \
    .sort_values(by=['team_name'], ascending=False) \
    .reset_index(drop=True)

Unnamed: 0,first_name,second_name,element_type,team_name,total_points,now_cost
0,José,Malheiro de Sá,GK,Wolves,146,50
1,Michail,Antonio,FWD,West Ham,140,75
2,Jarrod,Bowen,MID,West Ham,206,85
3,James,Ward-Prowse,MID,Southampton,159,65
4,João,Cancelo,DEF,Man City,201,70
5,Bernardo,Veiga de Carvalho e Silva,MID,Man City,155,70
6,Virgil,van Dijk,DEF,Liverpool,183,65
7,Alisson,Ramses Becker,GK,Liverpool,176,55
8,Trent,Alexander-Arnold,DEF,Liverpool,208,75
9,James,Maddison,MID,Leicester,181,80
