# Building a Fantasy Football team with ML :-)

## 021_Pick_initial_team

### Acknowledgements

I found this Kaggle post by Gavin Ng incredibly useful for this part of the project. I followed a very similar approach to him to pick my initial team, and learned how to formulate the optimisation problem with pulp from his code.
    https://www.kaggle.com/gavinjpng/fpl-prediction-and-selection

In [1]:
# Libraries
import pandas as pd
import numpy as np
import pulp

In [2]:
# Import clean csv from previous notebook (011)
players_df = pd.read_csv('../Data/Players/players_df.csv')

In [3]:
players_df.head()

Unnamed: 0,first_name,second_name,season,element_type,cost_change_start,now_cost,form,ict_index,chance_of_playing_next_round,points_per_game,total_points,player_id,team_id,start_cost,position
0,David,Ospina,1617,1,-3,47,0.0,2.9,100,1.0,2,david_ospina,Arsenal,50,GK
1,Petr,Cech,1617,1,-1,54,0.0,82.0,100,3.8,134,petr_cech,Arsenal,55,GK
2,Laurent,Koscielny,1617,2,1,61,0.0,112.7,0,3.7,121,laurent_koscielny,Arsenal,60,DEF
3,Per,Mertesacker,1617,2,-2,48,0.0,1.8,100,1.0,1,per_mertesacker,Arsenal,50,DEF
4,Gabriel Armando,de Abreu,1617,2,-2,48,0.0,50.0,75,2.4,45,gabriel_armando_de_abreu,Arsenal,50,DEF


The FPL website (https://fantasy.premierleague.com/) nicely details the rules for picking an initial team.

<b>Squad Size</b>\
To join the game select a fantasy football squad of 15 players, consisting of:\
2 Goalkeepers\
5 Defenders\
5 Midfielders\
3 Forwards

<b>Budget</b>\
The total value of your initial squad must not exceed £100 million.

<b>Players Per Team</b>\
You can select up to 3 players from a single Premier League team.

Whilst 15 players are picked, only the starting 11 (providing they play) can generate points. If they don't play, the players on the bench are substituted on (bench prioritisation will be ignored for now). In the past, I've chosen 4 cheap players to fill the bench, leaving more money to buy a 'better' starting 11. By 'better', I mean a team that is likely to earn more points. Another alternative is to treat the starting 11 and the subs equally, and pick a well-rounded team. Whilst I believe the former would result in a higher scoring initial team, factoring a cheap bench into the problem complicates things further. Which teams should the subs be picked from? Which positions should they play?
Instead, I will ignore the fact that the bench cannot generate points for now, and build the 'best' team of 15 that I can. I may come back to this in a future iteration.

I'll use the total points won by each player from the previous season to pick my initial team. This means that the newly promoted players will not feature in the initial team, but they could be transferred in at a later point. 

This is an optimisation problem, where I'll want to maximise the number of points scored last season with the following constraints:
- A player either is or is not picked - a binary problem
- The starting cost of the team at the start of the season must be below £100m
- The team must consist of 2 Goalkeepers, 5 Defenders, 5 Midfielders, 3 Forwards
- A maximum of 3 players from the same club can be selected

In [4]:
# I will use data from the 1920 season to pick my 2021 team.

prev_season_df = players_df[players_df['season'] == 1920][['player_id', 'team_id', 'total_points']].reset_index(drop=True)
prev_season_df.columns=['player_id', 'team_last_season', 'points_last_season']

In [5]:
prev_season_df.head()

Unnamed: 0,player_id,team_last_season,points_last_season
0,shkodran_mustafi,Arsenal,43
1,hector_bellerin,Arsenal,44
2,sead_kolasinac,Arsenal,55
3,ainsley_maitland_niles,Arsenal,41
4,sokratis_papastathopoulos,Arsenal,57


In [6]:
# Now I'll reduce the current season players and join their points from last season
# Promoted teams will not be in this table, and so their players will not feature in the initial team will not
# This is because they had no points in the prev season

curr_season_df = players_df[players_df['season'] == 2021].reset_index(drop=True)

possible_players_df = curr_season_df.merge(prev_season_df, how='inner', on=['player_id'])
possible_players_df.head()

Unnamed: 0,first_name,second_name,season,element_type,cost_change_start,now_cost,form,ict_index,chance_of_playing_next_round,points_per_game,total_points,player_id,team_id,start_cost,position,team_last_season,points_last_season
0,Mesut,Özil,2021,3,-3,67,0.0,0.0,0,0.0,0,mesut_ozil,Arsenal,70,MID,Arsenal,53
1,Sokratis,Papastathopoulos,2021,2,-2,48,0.0,0.0,0,0.0,0,sokratis_papastathopoulos,Arsenal,50,DEF,Arsenal,57
2,David,Luiz Moreira Marinho,2021,2,-1,54,2.8,41.0,100,2.1,40,david_luiz_moreira_marinho,Arsenal,55,DEF,Arsenal,94
3,Pierre-Emerick,Aubameyang,2021,3,-5,115,2.6,140.7,100,4.4,106,pierre_emerick_aubameyang,Arsenal,120,MID,Arsenal,205
4,Cédric,Soares,2021,2,-4,46,0.8,26.5,100,3.1,28,cedric_soares,Arsenal,50,DEF,Arsenal,61


In [7]:
possible_players_df.team_id.unique()

array(['Arsenal', 'Aston Villa', 'Brighton and Hove Albion', 'Burnley',
       'Chelsea', 'Crystal Palace', 'Everton', 'Fulham', 'Leicester City',
       'Liverpool', 'Manchester City', 'Manchester United',
       'Newcastle United', 'Sheffield United', 'Southampton',
       'Tottenham Hotspur', 'West Bromwich Albion', 'West Ham United',
       'Wolverhampton Wanderers'], dtype=object)

In [8]:
# Decision variables - the players. Whether they are selected (1) or not (0)
def create_decision_vars(df):
    
    return [pulp.LpVariable(i, cat="Binary") for i in df.player_id]


# Optimise the total points scored last season for the players
def create_optimisation_function(df, decision_vars):
    optim_fun = ""

    for i, player in enumerate(decision_vars):
        optim_fun += df.points_last_season[i]*player
        
    return optim_fun



def create_constraint_money(df, decision_vars):
    formula = ""
    
    for rownum, row in df.iterrows():
        for i, player in enumerate(decision_vars):
            if rownum==i:
                formula += (row['start_cost']*player)

    return (formula <= 1000)



def create_constraint_position(df, decision_vars, position, amt):
    
    formula = ""
    player_positions = df.position
    
    for i, player in enumerate(decision_vars):
        if player_positions[i] == position:
            formula += 1*player
            
    return(formula == amt)



def create_constraint_team(df, decision_vars, team):
    
    formula = ""
    player_team = df.team_id
    
    for i, player in enumerate(decision_vars):
        if player_team[i] == team:
            formula += 1*player
            
    return(formula <= 3)
        

# Initialise lp problem
lp = pulp.LpProblem('select_initial_team', pulp.LpMaximize)

# Create decision vars
decision_vars = create_decision_vars(possible_players_df)

# Add optimisation function to lp prob
lp += create_optimisation_function(possible_players_df, decision_vars)

# Add £100m spend contraint
lp += create_constraint_money(possible_players_df, decision_vars)

# Add position constraints
# (I'd like to loop through the positions and teams for the following contraints but unsure how currently.)
lp += create_constraint_position(possible_players_df, decision_vars, "GK", 2) 
lp += create_constraint_position(possible_players_df, decision_vars, "DEF", 5) 
lp += create_constraint_position(possible_players_df, decision_vars, "MID", 5) 
lp += create_constraint_position(possible_players_df, decision_vars, "FWD", 3)

# Add max 3 per club constraints
lp += create_constraint_team(possible_players_df, decision_vars, "Arsenal")
lp += create_constraint_team(possible_players_df, decision_vars, "Aston Villa")
lp += create_constraint_team(possible_players_df, decision_vars, "Brighton and Hove Albion")
lp += create_constraint_team(possible_players_df, decision_vars, "Burnley")
lp += create_constraint_team(possible_players_df, decision_vars, "Chelsea")
lp += create_constraint_team(possible_players_df, decision_vars, "Crystal Palace")
lp += create_constraint_team(possible_players_df, decision_vars, "Everton")
lp += create_constraint_team(possible_players_df, decision_vars, "Leicester City")
lp += create_constraint_team(possible_players_df, decision_vars, "Liverpool")
lp += create_constraint_team(possible_players_df, decision_vars, "Manchester City")
lp += create_constraint_team(possible_players_df, decision_vars, "Manchester United")
lp += create_constraint_team(possible_players_df, decision_vars, "Newcastle United")
lp += create_constraint_team(possible_players_df, decision_vars, "Sheffield United")
lp += create_constraint_team(possible_players_df, decision_vars, "Southampton")
lp += create_constraint_team(possible_players_df, decision_vars, "Tottenham Hotspur")
lp += create_constraint_team(possible_players_df, decision_vars, "West Ham United")
lp += create_constraint_team(possible_players_df, decision_vars, "Wolverhampton Wanderers")

In [9]:
lp.writeLP('select_initial_team.lp')
sol = lp.solve()

In [10]:
varsdict = {}
for v in lp.variables():
    varsdict[v.name] = v.varValue

In [11]:
initial_team = pd.Series(varsdict, name = 'selected')
initial_team.index.name = 'player_id'
initial_team = initial_team.reset_index()

In [12]:
initial_team = initial_team[initial_team.selected==1].reset_index(drop=True)
initial_team = initial_team.merge(curr_season_df, how='inner', on='player_id')
initial_team

Unnamed: 0,player_id,selected,first_name,second_name,season,element_type,cost_change_start,now_cost,form,ict_index,chance_of_playing_next_round,points_per_game,total_points,team_id,start_cost,position
0,andrew_robertson,1.0,Andrew,Robertson,2021,2,1,71,3.6,165.5,100.0,3.9,113,Liverpool,70,DEF
1,danny_ings,1.0,Danny,Ings,2021,4,-1,84,1.0,131.9,50.0,4.3,95,Southampton,85,FWD
2,dean_henderson,1.0,Dean,Henderson,2021,1,-3,52,4.0,11.9,,4.2,25,Manchester United,55,GK
3,declan_rice,1.0,Declan,Rice,2021,3,-3,47,2.2,108.6,,2.6,76,West Ham United,50,MID
4,jack_grealish,1.0,Jack,Grealish,2021,3,5,75,0.0,249.1,75.0,5.9,129,Aston Villa,70,MID
5,john_egan,1.0,John,Egan,2021,2,-3,47,0.0,47.2,0.0,1.5,32,Sheffield United,50,DEF
6,john_lundstram,1.0,John,Lundstram,2021,3,-6,49,2.0,77.9,100.0,1.8,43,Sheffield United,55,MID
7,jordan_ayew,1.0,Jordan,Ayew,2021,4,-4,56,2.4,76.4,100.0,2.4,59,Crystal Palace,60,FWD
8,kevin_de_bruyne,1.0,Kevin,De Bruyne,2021,3,4,119,4.8,259.6,100.0,5.4,125,Manchester City,115,MID
9,mark_noble,1.0,Mark,Noble,2021,3,-5,45,0.8,7.9,100.0,1.1,18,West Ham United,50,MID


In [13]:
# Check

# start cost <= 1000 ?
print(initial_team.start_cost.sum()) # true

# 2 gk, 5 def, 5 mid, 3 fwd ?
print(initial_team.position.value_counts()) # true

# no more than 3 per club ?
print(initial_team.team_id.value_counts()) # true

1000
MID    5
DEF    5
FWD    3
GK     2
Name: position, dtype: int64
Liverpool                  3
West Ham United            2
Sheffield United           2
Wolverhampton Wanderers    1
Manchester City            1
Manchester United          1
Tottenham Hotspur          1
Southampton                1
Crystal Palace             1
Aston Villa                1
Burnley                    1
Name: team_id, dtype: int64
