# Fantasy Premier League 2022/2023 Season Analysis:

Fantasy Premier League (FPL) is a game played by 10m+ people globally and is a game which is base on a real action in Premier League. Your goal is to build most point team. Each manager of a team are selecting 15 players from all of the teams in Premier League and scores points. Each FPL manager is given a starting budget of £100 million and must pick a total of 15 players: 2 goalkeepers, 5 defenders, 5 midfielders and 3 forwards. You are limited to a maximum of three players from each Premier League team. Players in your team score points based on real-life performances with the main contributors to points totals being goals, assists (attacking side) and cleansheets(cs) (defensive side). Players can also lost their points for cards or own goal. 

Data source:
Anand Vaastav's Github: https://github.com/vaastav/Fantasy-Premier-League/

The goal of this project is to analysis diffrent statistics and find impact of these on points and patterns. With this conclusions search for undervalued players. Because in my opinion finding these kind of players before everyone else is a key to get to top ranks which is having more points than opponents.

Let's get started with importing data and getting relevant columns for analysis. 


In [1]:
import numpy as np
import pandas as pd
import warnings

#Turn off warnings
warnings.filterwarnings('ignore')

df_raw = pd.read_csv(filepath_or_buffer='data/players_raw.csv',  sep=',') # raw data players - data frame
df_teams = pd.read_csv(filepath_or_buffer='data/teams.csv',  sep=',') # teams - data frame

#Getting relevant columns
cols = ['web_name', 'first_name', 'second_name', 'team', 'element_type', 'total_points', 'points_per_game', 'now_cost', 'goals_scored', 'assists',
        'ict_index', 'selected_by_percent', 'minutes', 'bonus', 'clean_sheets', 'clean_sheets_per_90', 'creativity', 'expected_assists',
        'expected_assists_per_90', 'expected_goal_involvements', 'expected_goal_involvements_per_90', 'expected_goals', 'expected_goals_conceded',
        'expected_goals_conceded_per_90', 'expected_goals_per_90' , 'goals_conceded', 'goals_conceded_per_90', 'influence', 'own_goals',
        'penalties_missed', 'penalties_order', 'penalties_saved', 'red_cards', 'saves', 'saves_per_90', 'selected_rank', 'starts', 'starts_per_90',
        'threat', 'value_form', 'value_season', 'yellow_cards']
df = df_raw[cols]

Printing types of all of relevant columns

In [2]:
print(df.dtypes)

web_name                              object
first_name                            object
second_name                           object
team                                   int64
element_type                           int64
total_points                           int64
points_per_game                      float64
now_cost                               int64
goals_scored                           int64
assists                                int64
ict_index                            float64
selected_by_percent                  float64
minutes                                int64
bonus                                  int64
clean_sheets                           int64
clean_sheets_per_90                  float64
creativity                           float64
expected_assists                     float64
expected_assists_per_90              float64
expected_goal_involvements           float64
expected_goal_involvements_per_90    float64
expected_goals                       float64
expected_g

Changing all of columns that have integer type to float. 

In [3]:
#Change columns to float
cols = ['assists', 'bonus', 'clean_sheets', 'goals_conceded', 'goals_scored', 'minutes', 'now_cost','own_goals', 'penalties_missed', 'penalties_saved',
        'red_cards', 'saves', 'selected_rank', 'starts', 'total_points', 'yellow_cards']
df[cols] = df[cols].astype('float64')

Replace team_codes column with team columns which is a short name of club. Using connecting linked columns in teams.csv we replace matching column team with shortcut name of team using dictionary. Eg. 1 change to ARS, 2 change to AVL. 

In [4]:
#Replacing team code with short name by connecting linked columns
team_codes = list(df_teams['id'])
team_names = list(df_teams['short_name'])
team_dict = dict(zip(team_codes, team_names)) # use the zip to 2 lists to create lists of tuples
df['team'] = df['team'].replace(team_dict)

Changing "NAN" values to 0 in column penalties_order.

In [5]:
#Replacing "NAN" values to 0 in column penalties_order
df.replace(np.nan,0, inplace=True)

Changing column now_cost by divided by 10 and changing name to price.

In [6]:
#Changing column now_cost by divided by 10 and changing name to price
df.rename(columns = {'now_cost':'price'}, inplace = True)
df['price'] = df['price'] / 10

Changing column element_type to position and adding new column goal_involvements which is goal + assists.

In [7]:
#Changing column element_type to position
df.rename(columns = {'element_type':'position'}, inplace = True)
position_dict = {1: 'Goalkeeper', 2: "Defender", 3: "Midfielder", 4: "Forward"}
df['position'] = df['position'].replace(position_dict)

#Adding new columne goal_involvements
df['goal_involvements'] = df['goals_scored'] + df['assists']

We will now create price ranges for each position to group the data. Also move the price_level column to the right position in dataframe.

In [8]:
#Creating intervals for each positions, diffrent bins for each position
bins = {'Goalkeeper': (3.5,4.0,4.5,5.0,5.5), 'Defender' : (3.5,4.5,5.0,5.5,8.0), 'Midfielder':(4.0,5.5,6.5,8.0,13.5),
        'Forward': (4.0,5.5,6.5,8.0, 12.5)}
labels = ["Budget", "Mid", "High", "Premium"]

def func_bins(row):
    return pd.cut([row['price']], bins=bins[row['position']], labels=labels)[0]

df['price_level'] = df.apply(func_bins, axis=1)

#Moving the 'price_level' column to the right position in dataframe
price_level_col = df.pop('price_level')
df.insert(8, 'price_level', price_level_col)

Checking for missing data and if all data is right we exporting dataframe to cleaned csv file players_clean. 

In [9]:
#Checking for missing data
print(df.isnull().sum())

#Exporting cleaned data
df.to_csv('data/players_clean.csv', index=False)

web_name                             0
first_name                           0
second_name                          0
team                                 0
position                             0
total_points                         0
points_per_game                      0
price                                0
price_level                          0
goals_scored                         0
assists                              0
ict_index                            0
selected_by_percent                  0
minutes                              0
bonus                                0
clean_sheets                         0
clean_sheets_per_90                  0
creativity                           0
expected_assists                     0
expected_assists_per_90              0
expected_goal_involvements           0
expected_goal_involvements_per_90    0
expected_goals                       0
expected_goals_conceded              0
expected_goals_conceded_per_90       0
expected_goals_per_90    

The data is cleared . We can now get down to analysing the data.