# Fantasy Premier League team selection

Let's find an optimal team for the [Fantasy Premier League](https://fantasy.premierleague.com/) using [mathematical programming](https://en.wikipedia.org/wiki/Mathematical_optimization) with [Opvious](https://www.opvious.io)!

<div class="alert alert-block alert-info">
    &#9432; The code in this notebook can be executed directly from your browser when accessed via <a href="https://www.opvious.io/notebooks/retro/notebooks/?path=examples/fantasy-premier-league.ipynb">opvious.io/notebooks</a>.
</div>

## Setup

We start by downloading player statistics (team, cost, total points, etc.). The data is available in table format [here](https://gist.github.com/mtth/f59bdde8694223c06f77089b71b48d17).

In [1]:
%pip install opvious

In [2]:
import opvious

_PLAYER_DATA_URL = "https://gist.githubusercontent.com/mtth/f59bdde8694223c06f77089b71b48d17/raw/6f1568cb2ff69450f06e3b8045d504af74bb701f/fpl-2023-07-26.csv"

async def _download_player_data():
    """Downloads a dataframe of player statistics"""
    df = await opvious.executors.fetch_csv(_PLAYER_DATA_URL)
    # Some player names are not unique, we disambiguate by suffixing with the team's name
    df["id"] = df.apply(lambda r: f"{r['name']}-{r['team']}", axis=1)
    return df.set_index("id", verify_integrity=True)

player_data = await _download_player_data()
player_data

Unnamed: 0_level_0,name,team,position,cost,status,minutes,total_points,bonus,points_per_game,selected_by_percent
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Balogun-ARS,Balogun,ARS,FWD,4.5,Available,0,0,0,0.0,1.5
Cédric-ARS,Cédric,ARS,DEF,4.0,Available,223,10,0,1.2,0.4
M.Elneny-ARS,M.Elneny,ARS,MID,4.5,Available,111,6,0,1.2,0.2
Fábio Vieira-ARS,Fábio Vieira,ARS,MID,5.5,Available,500,40,2,1.8,0.1
Gabriel-ARS,Gabriel,ARS,DEF,5.0,Available,3409,146,15,3.8,19.2
...,...,...,...,...,...,...,...,...,...,...
N.Semedo-WOL,N.Semedo,WOL,DEF,4.5,Available,2633,75,5,2.1,0.3
Toti-WOL,Toti,WOL,DEF,4.5,Available,978,43,4,2.5,0.2
Boubacar Traore-WOL,Boubacar Traore,WOL,MID,4.5,Available,405,14,0,1.4,0.6
Cunha-WOL,Cunha,WOL,FWD,5.5,Available,961,39,6,2.3,0.1


## Formulation

The next step is to formulate team selection as an [integer program](https://en.wikipedia.org/wiki/Integer_programming) using `opvious`' [declarative modeling API](https://opvious.readthedocs.io/en/stable/modeling.html). For simplicity we omit transfers and use a multiplication factor to estimate the value of substitute players.


<div class="alert alert-block alert-info">
    &#9432; You do not need to understand the code below to use it for selecting a team. Feel free to skip ahead to the next section to see it in action!
</div>

In [3]:
import opvious.modeling as om

class TeamSelection(om.Model):
    """Fantasy Premier League team selection integer program"""
    
    players = om.Dimension()
    positions = om.Dimension()
    teams = om.Dimension()
    
    # Player data
    player_cost = om.Parameter.non_negative(players)
    player_value = om.Parameter.non_negative(players)
    player_team = om.Parameter.indicator(players, teams)
    player_position = om.Parameter.indicator(players, positions)
    
    # Number of players per position
    squad_formation = om.Parameter.natural(positions)
    starter_min_formation = om.Parameter.natural(positions)
    starter_max_formation = om.Parameter.natural(positions)
    
    # Outputs
    is_picked = om.Variable.indicator(players)
    is_starter = om.Variable.indicator(players)
    is_captain = om.Variable.indicator(players)
    is_vice_captain = om.Variable.indicator(players)
    
    def __init__(self, substitution_factor=0.1):
        self.substitution_factor = substitution_factor
        
    @om.constraint
    def total_picked_cost_is_within_budget(self):
        yield om.total(self.is_picked(p) * self.player_cost(p) for p in self.players) <= 100
        
    @om.constraint
    def at_most_3_picked_per_team(self):
        for t in self.teams:
            yield om.total(self.is_picked(p) * self.player_team(p, t) for p in self.players) <= 3

    @om.constraint
    def exactly_11_starters(self):
        yield self.is_starter.total() == 11
        
    @om.constraint
    def starters_are_picked(self):
        for p in self.players:
            yield self.is_starter(p) <= self.is_picked(p)
            
    @om.constraint
    def captain_is_starter(self):
        for p in self.players:
            yield self.is_captain(p) <= self.is_starter(p)
            
    @om.constraint
    def vice_captain_is_starter(self):
        for p in self.players:
            yield self.is_vice_captain(p) <= self.is_starter(p)
        
    @om.constraint
    def exactly_one_captain(self):
        yield self.is_captain.total() == 1
        
    @om.constraint
    def exactly_one_vice_captain(self):
        yield self.is_vice_captain.total() == 1
        
    @om.constraint
    def captain_is_not_vice_captain(self):
        for p in self.players:
            yield self.is_captain(p) + self.is_vice_captain(p) <= 1
    
    @om.constraint
    def picked_positions_match_formation(self):
        for q in self.positions:
            count = om.total(self.is_picked(p) * self.player_position(p, q) for p in self.players)
            yield count == self.squad_formation(q)
            
    @om.constraint
    def starter_positions_match_min_formation(self):
        for q in self.positions:
            count = om.total(self.is_starter(p) * self.player_position(p, q) for p in self.players)
            yield count >= self.starter_min_formation(q)

    @om.constraint
    def starter_positions_match_max_formation(self):
        for q in self.positions:
            count = om.total(self.is_starter(p) * self.player_position(p, q) for p in self.players)
            yield count <= self.starter_max_formation(q)

    def picked_player_value(self, p):
        return (
            self.substitution_factor * (self.is_picked(p) + self.is_vice_captain(p)) +
            (1 - self.substitution_factor) * self.is_starter(p) + self.is_captain(p)
        ) * self.player_value(p)

    @om.objective
    def maximize_total_value_of_picked_players(self):
        return om.total(self.picked_player_value(p) for p in self.players)

## Application

We are now ready to find an optimal squad!

_Optimal_ is defined as maximizing the team's value, computed as:

* the sum of its starter players' values, plus
* the sum of its substitute player's values multiplied by a `substitution_factor` (0.1 by default), plus
* the captain's value (achieving the bonus effect since the captain is always a starter), plus
* the vice-captain's value multiplied by the `substitution_factor`.

Each individual player's value is computed as a weighted average of their total points and points per
game. The weight is controlled by `total_vs_per_game_ratio`: setting this to 1 will
only consider total points, setting it to 0 will only consider points per game, 0.5 will use the mean.

To allow for personal preferences and judgments, it's also possible to change players' values by specifying per-player multipliers. Setting a high value for a player will make them more likely to be picked, setting a 0 multiplier will prevent them from being picked.

In [4]:
import opvious
import pandas as pd

_client = opvious.Client.default(token=None)  # Replace `None` with your API access token

async def find_optimal_squad(
    substitution_factor=0.1,
    total_vs_per_game_ratio=1,
    player_multipliers=None,
):
    """Returns a squad which maximizes team value (see above) while respecting FPL rules"""
    players = player_data[player_data['status'] == 'Available'].drop("status", axis=1)
    multipliers = player_multipliers or {}
    solution = await _client.solve(
        opvious.Problem(
            TeamSelection(substitution_factor).specification(),
            parameters={
                "playerCost": players["cost"],
                "playerValue": players.apply(
                    lambda r: max(1,  # New/transferred players have a value of 0
                        total_vs_per_game_ratio * r["total_points"] +
                        (1 - total_vs_per_game_ratio) * r["points_per_game"]
                    ) * multipliers.get(r.name, 1),
                    axis=1,
                ),
                "playerTeam": players["team"],
                "playerPosition": players["position"],
                "squadFormation": {"GKP": 2, "DEF": 5, "MID": 5, "FWD": 3},
                "starterMinFormation": {"GKP": 1, "DEF": 3, "FWD": 1},
                "starterMaxFormation": {"GKP": 1, "DEF": 5, "MID": 5, "FWD": 3},
            }
        )
    )
    selected = pd.concat({
        key: solution.outputs.variable(key)["value"]
        for key in ["isPicked", "isStarter", "isCaptain", "isViceCaptain"]
    }, axis=1).fillna(0).astype(int)
    return pd.concat([players, selected], axis=1, join="inner").drop(["isPicked", "name", "team"], axis=1)

<div class="alert alert-block alert-warning">
    &#9888; You will need an Opvious API access token to run the function above since the data size exceeds the limit for guest solves. Once you've created one <a href="https://hub.cloud.opvious.io/authorizations">here</a> (signing up is free), simply edit the cell above and insert it where indicated.
</div>

Let's see what we get when solving with the default parameters (note the three columns on the right which indicate whether a player is on the starting roster, is captain, and is vice-captain).

In [5]:
await find_optimal_squad()

Unnamed: 0,position,cost,minutes,total_points,bonus,points_per_game,selected_by_percent,isStarter,isCaptain,isViceCaptain
Martinelli-ARS,MID,8.0,2789,198,18,5.5,14.3,1,0,0
Ødegaard-ARS,MID,8.5,3132,212,30,5.7,20.2,1,0,0
White-ARS,DEF,5.5,3054,156,12,4.1,9.7,1,0,0
Douglas Luiz-AVL,MID,5.5,2922,142,17,3.8,2.9,1,0,0
Mings-AVL,DEF,4.5,3150,130,17,3.7,15.4,0,0,0
Semenyo-BOU,FWD,4.5,250,18,1,1.6,1.7,0,0,0
Mee-BRE,DEF,5.0,3269,143,11,3.9,7.5,1,0,0
Raya-BRE,GKP,5.0,3420,166,20,4.4,9.6,1,0,0
Gross-BHA,MID,6.5,3240,159,14,4.3,4.7,1,0,0
Leno-FUL,GKP,4.5,3240,142,17,3.9,9.0,0,0,0


We also tweak the parameters to get a different team. For example if we:

* think that Mohamed Salah is undervalued in the current statistics, and
* want to also consider points per game (instead of only total points),

we would run it with the following arguments:

In [6]:
await find_optimal_squad(
    player_multipliers={"Salah-LIV": 1.5}, # 50% value boost
    total_vs_per_game_ratio=0.5, # Evaluate players on average of total points and per-game points
)

Unnamed: 0,position,cost,minutes,total_points,bonus,points_per_game,selected_by_percent,isStarter,isCaptain,isViceCaptain
Gabriel-ARS,DEF,5.0,3409,146,15,3.8,19.2,1,0,0
Martinelli-ARS,MID,8.0,2789,198,18,5.5,14.3,1,0,0
Ødegaard-ARS,MID,8.5,3132,212,30,5.7,20.2,1,0,0
Archer-AVL,FWD,4.5,43,6,0,1.0,6.9,0,0,0
Douglas Luiz-AVL,MID,5.5,2922,142,17,3.8,2.9,1,0,0
Mings-AVL,DEF,4.5,3150,130,17,3.7,15.4,1,0,0
Semenyo-BOU,FWD,4.5,250,18,1,1.6,1.7,0,0,0
Mee-BRE,DEF,5.0,3269,143,11,3.9,7.5,1,0,0
Raya-BRE,GKP,5.0,3420,166,20,4.4,9.6,1,0,0
Leno-FUL,GKP,4.5,3240,142,17,3.9,9.0,0,0,0


Up to you to try it out with different parameters and see if you can find your squad!