# Sources of Fantasy Premier League Points 

#### Analysing from which stats did players earned most points for Fantasy Premier League during the 2023-24 season

Focus of this analysis is to determine which stats brought the most points during the 2023-24 season of Fantasy Premier League, in order to give guidelines for building teams in the future. 

Fantasy Premier Leagues is a fantasy sports game centered around the English Premier League. It allows users tu build teams, consisting of real Premier League players, whose then earn points by achieving various statistical outcomes. Some rules that are important for understanding this analysis are:
1. The goal is to acumulate most points over the course of a season
2. Your team needs to have 2 goalkeepers, 5 defenders, 5 midfielders and 3 attackers
3. Only 11 player's points in a team can count for any given gameweek
4. You have a budget of 100 to build the team
5. Each player is given a price according to an estimation of his performance
6. Different stats are valued differently
For a detailed list of rules you can visit: https://fantasy.premierleague.com/help/rules

From the above mentioned rules it's easy to notice that when building a team a user has to make sure he picks players that will bring in most points, but also manage the budget and determine where he can spend or save. So, this analysis focuses on two main points that should help when constructing a team:
1. Determine which stats earn most points during the season for each position
2. Determine if it's worth spending the budget on more expensive players and in which situations

This notebook contains some data manipulation of the processed data gathered from: https://github.com/vaastav/Fantasy-Premier-League/blob/master/data/2023-24/cleaned_players.csv
The analysis to achieve the two mentioned goals can be found after the data is prepared.

### 0. Importing libraries and data

In [38]:
#Importing data manipulation and visualisation libraries

import pandas as pd
import numpy as np

from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.express as px

In [2]:
#Importing data
all_players = pd.read_csv('player_points.csv')

### 1. Preprocessing data

In the following 2 cells player data is first split by player positions, and top 10% and the rest of the players are also separated for each of the positions. This will help with the analysis later. Finally, the percentage of total points for each of the stats is calculated for both top and bottom players.

In [3]:
#Split players by position
goalkeepers = all_players[all_players['element_type']=='GK']
defenders = all_players[all_players['element_type']=='DEF']
midfielders = all_players[all_players['element_type']=='MID']
forwards = all_players[all_players['element_type']=='FWD']

#Find number of players in top 10% for each position
num_top_goalkeepers = len(goalkeepers)//10
num_top_defenders = len(defenders)//10
num_top_midfielders = len(midfielders)//10
num_top_forwards = len(forwards)//10

#Separate top 10% of players for each position
top_goalkeepers = goalkeepers.sort_values(by='total_points', ascending=False).iloc[:num_top_goalkeepers]
top_defenders = defenders.sort_values(by='total_points', ascending=False).iloc[:num_top_defenders]
top_midfielders = midfielders.sort_values(by='total_points', ascending=False).iloc[:num_top_midfielders]
top_forwards = forwards.sort_values(by='total_points', ascending=False).iloc[:num_top_forwards]

#Separate bottom 90% of players for each position
rest_goalkeepers = goalkeepers.sort_values(by='total_points', ascending=False).iloc[num_top_goalkeepers:]
rest_defenders = defenders.sort_values(by='total_points', ascending=False).iloc[num_top_defenders:]
rest_midfielders = midfielders.sort_values(by='total_points', ascending=False).iloc[num_top_midfielders:]
rest_forwards = forwards.sort_values(by='total_points', ascending=False).iloc[num_top_forwards:]

In [4]:
#Calculate percentage of total points from each stat by position for top and bottom players
stats_columns = all_players.columns[4:]

def points_percent(dataframe, stats_columns=stats_columns, total_column='total_points'):
    percents = dataframe[stats_columns].sum().div(dataframe[total_column].sum(), axis=0)*100
    percents.index=['Minutes', 'Bonus', 'Goals Scored', 'Assists', 'Penalty Misses', 'Own Goals', 'Goals Conceded', 'Saves', 'Penalty Saves', 'Clean Sheets', 'Yellow Cards', 'Red Cards']

    return percents

#Apply percentage calculation function and sort by absolute value of percentage
top_gk_percent = points_percent(top_goalkeepers).sort_values(ascending=False, key=abs)
rest_gk_percent = points_percent(rest_goalkeepers).sort_values(ascending=False, key=abs)

top_def_percent = points_percent(top_defenders).sort_values(ascending=False, key=abs)
rest_def_percent = points_percent(rest_defenders).sort_values(ascending=False, key=abs)

top_mid_percent = points_percent(top_midfielders).sort_values(ascending=False, key=abs)
rest_mid_percent = points_percent(rest_midfielders).sort_values(ascending=False, key=abs)

top_fwd_percent = points_percent(top_forwards).sort_values(ascending=False, key=abs)
rest_fwd_percent = points_percent(rest_forwards).sort_values(ascending=False, key=abs)

### 2. Most valuable stats by position

Since some stast are worth more and some less points, it's useful to know which stats acutally bring more points over the course of the season. At first glance, it would make sense that those are the stats that are valued more, but they might not happen often enough to truly bring more points than stats that are worth less points, but maybe happen more often during the course of a season.

As there are stats that bring different number of points depending on the position of the player who achieves them, the analysis will be done for each position separately.

In order to find most important stats, they will be compared between top 10% of players in each position and the rest of the players in the same position. This method shows what actually distinguishes the top players that should be picked, or in other words, which stats bring them the points that separate them from the rest.

Looking at the grouped bar charts for each of the positions should easily show the differences. The bar charts will show percentage of total points that top and rest of the players earned from each of the stats.

#### 2.1 Goalkeepers

In [6]:
trace1 = go.Bar(
    x=top_gk_percent.index,
    y=top_gk_percent.values,
    name='Top GKs',
    marker_color='blue',
    text=top_gk_percent.values.round(1)
)

trace2 = go.Bar(
    x=rest_gk_percent.index,
    y=rest_gk_percent.values,
    name='Rest of GKs',
    marker_color='red',
    text=rest_gk_percent.values.round(1)
)

fig = go.Figure(data=[trace1, trace2])

fig.update_layout(
    barmode='group',
    title_text='Goalkeepers Points Breakdown Comparison',
    xaxis_title='Stats',
    yaxis_title='Percent of Points',
    legend={'orientation': 'h', 'yanchor': 'top', 'y': 10},
    template='plotly_white'
)

fig.show()

The chart above (as it will be for the rest of the positions) shows the percentage of total points that players earned from each of the stats. For example, top 10% of GKs got 29.7% of their total points from clean sheets, while the rest of the goalkeepers got 21% of their total points from clean sheets. 

It's expected that top goalkeepers have a much smaller percentage of points that come from minutes played, since the other stats are awarded with more points.

What's interesting is that top goalkeepers earn higher percentage from clean sheets than the rest, while also getting smaller percentage of points from saves than the rest. This points to the fact that the clean sheets are what separates the top goalkeepers, saves are just an added bonus, but not something to focus on. Medians of points from saves and clean sheets for 10% of goalkeepers and next 10% can help confirm this conclusion.

In [106]:
print('Top 10% goalkeepers saves points median: ' + str(top_goalkeepers.saves_points.median()))
print('Next 10% goalkeepers saves points median: ' + str(rest_goalkeepers.head(num_top_goalkeepers).saves_points.median()))
print('')
print('Top 10% goalkeepers clean sheet points median: ' + str(top_goalkeepers.clean_sheet_points.median()))
print('Next 10% goalkeepers clean sheet points median: ' + str(rest_goalkeepers.head(num_top_goalkeepers).clean_sheet_points.median()))

Top 10% goalkeepers saves points median: 27.0
Next 10% goalkeepers saves points median: 20.5

Top 10% goalkeepers clean sheet points median: 34.0
Next 10% goalkeepers clean sheet points median: 16.0


The medians confirm that the main difference comes from clean sheets, and saves are just an added bonus. But, since saves can help achieve clean sheets, it would be useful to check if this slightly higher number of saves for top goalkeepers is what actually alows them to get more clean sheets.

In [107]:
fig = px.scatter(top_goalkeepers,
            x='saves_points',
            y='clean_sheet_points',
            color='total_points',
            template='plotly_white',
            title='Saves vs Clean Sheets for Top Goalkeepers',
            labels={'saves_points': 'Points from Saves', 'clean_sheet_points': 'Points from Clean Sheets', 'total_points': 'Total Points'})

fig.show()

From the chart above it's noticeable that there are 2 types of goalkeepers. 
First, there are two on the left side of the chart, with low number of saves and high number of clean sheets, which points to them playing for teams which play styles don't allow for a lot of shots on their goal. 
Second, the rest of the field has slightly higher number of saves than the median for the next best 10%, but clean sheet numbers remain high. Those additional saves are probably, to some extent, what drives higher clean sheet numbers. So, since we can assume this group of best keepers and the next best 10% are facing around the same number of shots, the difference in clean sheets is probably driven by some combination of goalkeeper's ability and their teams overall defensive strengths, which leads to shots that are easier to save.

So, to conclude the goalkeeper part of the analysis, best options are the goalkeepers with high number of clean sheets, and that could come from their team facing little shots in general, or from combination of keeper's ability and their team's defensive strength.

#### 2.2 Defenders

In [7]:
trace1 = go.Bar(
    x=top_def_percent.index,
    y=top_def_percent.values,
    name='Top Defenders',
    marker_color='blue',
    text=top_def_percent.values.round(1)
)

trace2 = go.Bar(
    x=rest_def_percent.index,
    y=rest_def_percent.values,
    name='Rest of Defenders',
    marker_color='red',
    text=rest_def_percent.values.round(1)
)

fig = go.Figure(data=[trace1, trace2])

fig.update_layout(
    barmode='group',
    title_text='Defenders Points Breakdown Comparison',
    xaxis_title='Stats',
    yaxis_title='Percent of Points',
    legend={'orientation': 'h', 'yanchor': 'top', 'y': 10},
    template='plotly_white'
)

fig.show()

When looking at defenders, they can get points for both not conceding goals, but they also can have goals and assists more often, which was not the case with goalkeepers.

And, from the picture above, even though they get larger amount of points for a goal than for a clean sheet (6 and 4 respectively), they create a much bigger difference with clean sheets, than goals scored or assists. When point deductions from goals conceded are included, than it's obvious that the main difference between the top and the rest is in conceding lesser amount of goals, while goals are additional benefit. This can be confirmed by looking at the medians for these two stats.

Focus on players from teams with good defence, goals are added bonus (1. Full backs on teams with good defence 2. Players on teams with good defence 3. Players that generate goal contributions)

In [112]:
print('Top 10% defenders goals scored points median: ' + str(top_defenders.goals_scored_points.median()))
print('Next 10% defenders goals scored points median: ' + str(rest_defenders.head(num_top_defenders).goals_scored_points.median()))
print('')
print('Top 10% defenders clean sheet points median: ' + str(top_defenders.clean_sheet_points.median()))
print('Next 10% defenders clean sheet points median: ' + str(rest_defenders.head(num_top_defenders).clean_sheet_points.median()))

Top 10% defenders goals scored points median: 12.0
Next 10% defenders goals scored points median: 6.0

Top 10% defenders clean sheet points median: 36.0
Next 10% defenders clean sheet points median: 20.0


So the best defenders would be those on good defensive teams, who can also chip in goals (or assists). Since there are not a lot of those players, or they can be expensive, next focus, especially when looking for value, is to look for defenders on teams with good defenses.

#### 2.3 Midfielders

In [8]:
trace1 = go.Bar(
    x=top_mid_percent.index,
    y=top_mid_percent.values,
    name='Top Midfielders',
    marker_color='blue',
    text=top_mid_percent.values.round(1)
)

trace2 = go.Bar(
    x=rest_mid_percent.index,
    y=rest_mid_percent.values,
    name='Rest of Midfielders',
    marker_color='red',
    text=rest_mid_percent.values.round(1)
)

fig = go.Figure(data=[trace1, trace2])

fig.update_layout(
    barmode='group',
    title_text='Midfielders Points Breakdown Comparison',
    xaxis_title='Stats',
    yaxis_title='Percent of Points',
    legend={'orientation': 'h', 'yanchor': 'top', 'y': 10},
    template='plotly_white'
)

fig.show()

With midfielders focus shifts to goals and assists, even though they can get some points for clean sheets.

Since the midfield players have the most variety in the roles they have, it's important to distinguish those that are most worthy for fantasy. 

By far, the biggest difference that top players create if from the goals they score, while assits are secondary. Clean sheets don't create a meaningful difference either way, and the same goes for yellow cards but to a lesser extent. This actually separates midfielders into three groups that can be ranked by how they are worth in fantasy:
1. Goal scoring midfielders - Wingers who could often could be considered as forwards outside of fantasy
2. Creative midfielders - Those tasked with creating chances and assisting to their teammates
3. Ball controling or defensive midfielders - Usually provide balance to the team on the field

So, as it's known in the fantasy community, it pays off the most to go after goal scoring wingers as midfield options.

#### 2.4 Forwards

In [9]:
trace1 = go.Bar(
    x=top_fwd_percent.index,
    y=top_fwd_percent.values,
    name='Top Forwards',
    marker_color='blue',
    text=top_fwd_percent.values.round(1)
)

trace2 = go.Bar(
    x=rest_fwd_percent.index,
    y=rest_fwd_percent.values,
    name='Rest of Forwards',
    marker_color='red',
    text=rest_fwd_percent.values.round(1)
)

fig = go.Figure(data=[trace1, trace2])

fig.update_layout(
    barmode='group',
    title_text='Forwards Points Breakdown Comparison',
    xaxis_title='Stats',
    yaxis_title='Percent of Points',
    legend={'orientation': 'h', 'yanchor': 'top', 'y': 10},
    template='plotly_white'
)

fig.show()

With attackers, it's similar as it is with midfielders. Looking for forwards that can provide goal contributions, especially score goals, is what will bring the most points.

Now when there is a breakdown of what type of players to look for at each position, next step is to understand where the budget can be saved when looking for the desired types of players.

### 3. Cost Analysis

Since there are budgetary restrictions when assembling a team, it's important to find value in cheaper players who bring large amount of points. Looking at the scatter charts of player's cost and their total points will help in determining where players could spend less without losing out on points.

In [39]:
fig = px.scatter(all_players,
            x='now_cost',
            y='total_points',
            color='element_type',
            template='plotly_white',
            title='Cost vs Points',
            labels={'now_cost': 'Cost', 'total_points': 'Total Points', 'element_type': 'Position'})

fig.update_layout(legend={'orientation': 'h', 
                        'yanchor': 'top', 
                        'y': 10}
                        )

fig.show()

On the chart above every dot represents a player, the more the point is to the right the higher is the player's value, the higher the dot is the higher the number of points, and the color represents positions of the players (see the legend above the chart).

It's obvious that the midfielders and forwards cost than players in other two positions. Spending above 8 almost certainly gets you at least 100 points, while there is a big difference when spending between 5.5 and 8. Picking the right players in that range could be where the users with best ranking make the difference.

The distribution of dots for every position shows that in lower ranges of value for each positions, spending additional budget should get you more points, but as the player values rise, you get certainty that the player will not go below certain threshold, not necessarily more points.

In order to determine where the budget can be saved in each of the position, looking at chart like the one above by position will help.

In [116]:
fig = make_subplots(rows=2, cols=2,
                    subplot_titles=("Goalkeepers", "Defenders", "Midfielders", "Forwards"))

trace1 = go.Scatter(x=goalkeepers['now_cost'], 
                    y=goalkeepers['total_points'], 
                    mode='markers',
                    name='Goalkeepers')
fig.add_trace(trace1, row=1, col=1)

trace2 = go.Scatter(x=defenders['now_cost'], 
                    y=defenders['total_points'], 
                    mode='markers',
                    name='Defenders')
fig.add_trace(trace2, row=1, col=2)

trace3 = go.Scatter(x=midfielders['now_cost'], 
                    y=midfielders['total_points'], 
                    mode='markers',
                    name='Midfielders')
fig.add_trace(trace3, row=2, col=1)

trace4 = go.Scatter(x=forwards['now_cost'], 
                    y=forwards['total_points'], 
                    mode='markers',
                    name='Forwards')
fig.add_trace(trace4, row=2, col=2)

fig.update_layout(
    title_text="Cost vs Points by Position",
    showlegend=False,
    template='plotly_white',
    height=700
)

<b>Goalkeepers</b><br>
With goalkeepers, drawing the line at value of 4.5 would clearly separate better and worse performing players. Although, spending more than 5 doesn't bring additional benefits.<br>
Going for just above mid range goalkeepers should pay off

<b>Defenders</b><br>
Spending less than 4.5 will hardly bring higher number of points. Value can be found when spending between 4.5 and 5.5, but choices need to be made carefully. Spending between 5.5 and 6 brings same number of points than spending more than 6.<br>
No need to spend above 6. Finding right players in 4.5 to 5.5 range could make the most difference, and getting some of the best ranked defenders can be done with less than 6.

<b>Midfielders</b><br>
Spending on premium midfielders (above 7.5, or 8 especially) pays off as they bring more certainty for high number of points. There is not much more risk with players worth between 5.5 and 6 than those from 6 to 7.<br>
Spending the money on top costing midfielders pays off.

<b>Forwards</b><br>
The most valuable of forwards bring certainty for very large number of points. But there's also a lot of value to be found in 5-6 range.<br>
Combining some of the top value forwards with those in 5-6 range could get forward line where all players are among top scoring overall.

### 4. Final Summary

The general guidlines for building an FPL team, supported by the above analysis are:

<b>Goalkeepers</b><br>
1. Medium priced goalkeepers with high number of clean sheets are the best value
2. Clean sheets can come from not conceding a lot of shots, or combination of shot stopping ability and conceding low quality shots

<b>Defenders</b><br>
1. There is a big potential to save money for other positions by going for defenders on teams that don't conced goals
2. Goals and especially assists are secondary

<b>Midfielders</b><br>
1. Spending large portion of the budget on midfielders who score a lot of goals pays off

<b>Forwards</b><br>
1. Finding low costing forwards who are in form could save money without losing out on points
2. Using the left over money to take some of the very top costing forwards as they guarntee high number of points