<center><h1> Using the FPL API to retrieve and sort historical data </h1></center>

The current objective of this notebook is utilize the FPL API to retrieve data aggregate season data from seasons prior to 23/24.

I will also be using `pandasql` to query and join information from other tables in the API.
I will then be saving the historical dataframe to csv files which will be stored locally and used in future notebooks. 

The previous season 22/23 has already been removed by FPL in wake of the upcoming season 23/24. Therefore I will use the gameweek by gameweek data from <a href="https://github.com/vaastav/Fantasy-Premier-League">vaastav's GitHub.</a>

In [1]:
# importing the necessary libraries and making a request to the FPL API to retreive data
# viewing the different types of data outputted
import requests, json
import pandas as pd
pd.set_option('display.max_columns', None)
base_url = 'https://fantasy.premierleague.com/api/'
r = requests.get(base_url+'bootstrap-static/').json()
r.keys()

dict_keys(['events', 'game_settings', 'phases', 'teams', 'total_players', 'elements', 'element_stats', 'element_types'])

In [2]:
# cutting the element types as some of the entries are list types and cannot be queried with SQL
# additionally, I only need the first few columns
type_df = pd.json_normalize(r['element_types'])
type_df = type_df.iloc[:,:5]
type_df

Unnamed: 0,id,plural_name,plural_name_short,singular_name,singular_name_short
0,1,Goalkeepers,GKP,Goalkeeper,GKP
1,2,Defenders,DEF,Defender,DEF
2,3,Midfielders,MID,Midfielder,MID
3,4,Forwards,FWD,Forward,FWD


In [3]:
# viewing team information
teams_df = pd.json_normalize(r['teams'])
teams_df.head()

Unnamed: 0,code,draw,form,id,loss,name,played,points,position,short_name,strength,team_division,unavailable,win,strength_overall_home,strength_overall_away,strength_attack_home,strength_attack_away,strength_defence_home,strength_defence_away,pulse_id
0,3,0,,1,0,Arsenal,0,0,0,ARS,4,,False,0,1230,1285,1250,1250,1210,1320,1
1,7,0,,2,0,Aston Villa,0,0,0,AVL,3,,False,0,1115,1175,1130,1190,1100,1160,2
2,91,0,,3,0,Bournemouth,0,0,0,BOU,3,,False,0,1060,1095,1050,1100,1060,1090,127
3,94,0,,4,0,Brentford,0,0,0,BRE,3,,False,0,1125,1205,1120,1220,1130,1190,130
4,36,0,,5,0,Brighton,0,0,0,BHA,3,,False,0,1165,1210,1120,1200,1210,1240,131


In [4]:
# viewing player information
player_df = pd.json_normalize(r['elements'])
player_df.head()

Unnamed: 0,chance_of_playing_next_round,chance_of_playing_this_round,code,cost_change_event,cost_change_event_fall,cost_change_start,cost_change_start_fall,dreamteam_count,element_type,ep_next,ep_this,event_points,first_name,form,id,in_dreamteam,news,news_added,now_cost,photo,points_per_game,second_name,selected_by_percent,special,squad_number,status,team,team_code,total_points,transfers_in,transfers_in_event,transfers_out,transfers_out_event,value_form,value_season,web_name,minutes,goals_scored,assists,clean_sheets,goals_conceded,own_goals,penalties_saved,penalties_missed,yellow_cards,red_cards,saves,bonus,bps,influence,creativity,threat,ict_index,starts,expected_goals,expected_assists,expected_goal_involvements,expected_goals_conceded,influence_rank,influence_rank_type,creativity_rank,creativity_rank_type,threat_rank,threat_rank_type,ict_index_rank,ict_index_rank_type,corners_and_indirect_freekicks_order,corners_and_indirect_freekicks_text,direct_freekicks_order,direct_freekicks_text,penalties_order,penalties_text,expected_goals_per_90,saves_per_90,expected_assists_per_90,expected_goal_involvements_per_90,expected_goals_conceded_per_90,goals_conceded_per_90,now_cost_rank,now_cost_rank_type,form_rank,form_rank_type,points_per_game_rank,points_per_game_rank_type,selected_rank,selected_rank_type,starts_per_90,clean_sheets_per_90
0,,,232223,0,0,0,0,0,4,1.5,,0,Folarin,0.0,1,False,,,45,232223.jpg,0.0,Balogun,2.1,False,,a,1,3,0,0,0,0,0,0.0,0.0,Balogun,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,487,45,486,45,477,47,489,47,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,401,61,224,7,491,47,115,23,0.0,0.0
1,,,58822,0,0,0,0,0,2,1.5,,0,Cédric,0.0,2,False,,,40,58822.jpg,1.2,Alves Soares,0.4,False,,a,1,3,10,0,0,0,0,0.0,2.5,Cédric,223,0,0,0,3,0,0,0,0,0,0,0,56,56.4,51.4,10.0,11.9,2,0.06,0.19,0.25,2.83,327,125,281,100,322,124,328,121,,,,,,,0.02,0.0,0.08,0.1,1.14,1.21,490,140,9,3,342,123,266,90,0.81,0.0
2,,,153256,0,0,0,0,0,3,1.5,,0,Mohamed,0.0,3,False,,,45,153256.jpg,1.2,Elneny,0.2,False,,a,1,3,6,0,0,0,0,0.0,1.3,M.Elneny,111,0,0,0,2,0,0,0,0,0,0,0,27,4.6,5.4,0.0,1.1,1,0.0,0.04,0.04,1.29,370,170,340,166,453,179,369,169,,,,,,,0.0,0.0,0.03,0.03,1.05,1.62,330,188,153,9,344,153,386,115,0.81,0.0
3,,,438098,0,0,0,0,0,3,2.5,,0,Fábio,0.0,4,False,,,55,438098.jpg,1.8,Ferreira Vieira,0.1,False,,a,1,3,40,0,0,0,0,0.0,7.3,Fábio Vieira,500,1,2,2,5,0,0,0,0,0,0,2,134,116.0,180.6,123.0,41.5,3,0.86,1.39,2.25,5.28,287,125,160,102,195,111,256,120,,,,,,,0.15,0.0,0.25,0.4,0.95,0.9,130,84,495,196,285,122,460,143,0.54,0.36
4,,,226597,0,0,0,0,0,2,2.8,,0,Gabriel,0.0,5,False,,,50,226597.jpg,3.8,dos Santos Magalhães,14.2,False,,a,1,3,146,0,0,0,0,0.0,29.2,Gabriel,3409,3,0,14,43,0,0,0,5,0,0,15,723,743.8,131.4,401.0,127.7,38,5.04,0.66,5.7,41.84,31,6,193,53,78,3,75,9,,,,,,,0.13,0.0,0.02,0.15,1.1,1.14,236,30,403,180,65,14,30,11,1.0,0.37


In [5]:
player_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 579 entries, 0 to 578
Data columns (total 88 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   chance_of_playing_next_round          7 non-null      float64
 1   chance_of_playing_this_round          0 non-null      object 
 2   code                                  579 non-null    int64  
 3   cost_change_event                     579 non-null    int64  
 4   cost_change_event_fall                579 non-null    int64  
 5   cost_change_start                     579 non-null    int64  
 6   cost_change_start_fall                579 non-null    int64  
 7   dreamteam_count                       579 non-null    int64  
 8   element_type                          579 non-null    int64  
 9   ep_next                               579 non-null    object 
 10  ep_this                               0 non-null      object 
 11  event_points       

In [6]:
# Loops through index of players dataframe, extracts player id, requests API to return player id element summary
# Returns two dataframes given by API for history in past seasons 
# and history: A list of player’s previous fixtures and its match stats

for x in player_df.index :
    element_id = player_df.id[x]
    print(f'Index:{x}\nElement_id:{element_id}\n\n')
    r = requests.get(f'{base_url}element-summary/{element_id}/').json()

    if x == 0 :
        history_past_df = pd.DataFrame(r['history_past'])
    else :
        history_past_df = pd.concat([history_past_df, pd.DataFrame(r['history_past'])])

Index:0
Element_id:1


Index:1
Element_id:2


Index:2
Element_id:3


Index:3
Element_id:4


Index:4
Element_id:5


Index:5
Element_id:6


Index:6
Element_id:7


Index:7
Element_id:8


Index:8
Element_id:9


Index:9
Element_id:10


Index:10
Element_id:11


Index:11
Element_id:12


Index:12
Element_id:13


Index:13
Element_id:14


Index:14
Element_id:15


Index:15
Element_id:16


Index:16
Element_id:17


Index:17
Element_id:18


Index:18
Element_id:19


Index:19
Element_id:20


Index:20
Element_id:21


Index:21
Element_id:22


Index:22
Element_id:23


Index:23
Element_id:24


Index:24
Element_id:25


Index:25
Element_id:26


Index:26
Element_id:27


Index:27
Element_id:28


Index:28
Element_id:29


Index:29
Element_id:30


Index:30
Element_id:31


Index:31
Element_id:578


Index:32
Element_id:32


Index:33
Element_id:33


Index:34
Element_id:34


Index:35
Element_id:35


Index:36
Element_id:36


Index:37
Element_id:37


Index:38
Element_id:38


Index:39
Element_id:39


Index:40
Element_i

Index:312
Element_id:312


Index:313
Element_id:313


Index:314
Element_id:314


Index:315
Element_id:315


Index:316
Element_id:316


Index:317
Element_id:317


Index:318
Element_id:318


Index:319
Element_id:319


Index:320
Element_id:320


Index:321
Element_id:321


Index:322
Element_id:322


Index:323
Element_id:323


Index:324
Element_id:324


Index:325
Element_id:325


Index:326
Element_id:326


Index:327
Element_id:327


Index:328
Element_id:328


Index:329
Element_id:329


Index:330
Element_id:330


Index:331
Element_id:331


Index:332
Element_id:332


Index:333
Element_id:333


Index:334
Element_id:334


Index:335
Element_id:335


Index:336
Element_id:336


Index:337
Element_id:337


Index:338
Element_id:338


Index:339
Element_id:339


Index:340
Element_id:340


Index:341
Element_id:575


Index:342
Element_id:341


Index:343
Element_id:342


Index:344
Element_id:343


Index:345
Element_id:344


Index:346
Element_id:345


Index:347
Element_id:346


Index:348
Element_id:347


I

Now that we have all summary statistics of previous seasons prior to 22/23 in `history_past_df` we must get player `first_name` and `second_name` by joining on `player_df`'s `code` column to `history_past_df`'s `element_code` column.

To do these joins we will use a SQL query via `pandasql` library.

We will also be joining tables to obtain the position and team of the player.

In [7]:
from pandasql import sqldf

query = """SELECT p.first_name ||' '|| p.second_name AS name, t.short_name AS team, td.plural_name_short AS position, hp.* 
        FROM history_past_df hp 
        JOIN player_df p ON hp.element_code = p.code 
        JOIN teams_df t ON p.team_code = t.code 
        JOIN type_df td ON p.element_type = td.id"""
final_history_df = sqldf(query)
final_history_df.head()

Unnamed: 0,name,team,position,season_name,element_code,start_cost,end_cost,total_points,minutes,goals_scored,assists,clean_sheets,goals_conceded,own_goals,penalties_saved,penalties_missed,yellow_cards,red_cards,saves,bonus,bps,influence,creativity,threat,ict_index,starts,expected_goals,expected_assists,expected_goal_involvements,expected_goals_conceded
0,Folarin Balogun,ARS,FWD,2021/22,232223,50,47,2,69,0,0,0,1,0,0,0,0,0,0,0,2,4.2,3.7,17.0,1.9,0,0.0,0.0,0.0,0.0
1,Cédric Alves Soares,ARS,DEF,2015/16,58822,50,47,86,1965,0,2,9,19,0,0,0,3,0,0,7,506,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0
2,Cédric Alves Soares,ARS,DEF,2016/17,58822,50,49,102,2515,0,3,11,36,0,0,0,7,0,0,11,584,591.0,648.9,155.0,139.6,0,0.0,0.0,0.0,0.0
3,Cédric Alves Soares,ARS,DEF,2017/18,58822,50,47,85,2794,0,3,7,44,0,0,0,3,0,0,2,481,507.0,455.6,159.0,112.3,0,0.0,0.0,0.0,0.0
4,Cédric Alves Soares,ARS,DEF,2018/19,58822,45,42,52,1493,1,2,4,31,0,0,0,4,0,0,7,273,309.0,226.5,103.0,63.9,0,0.0,0.0,0.0,0.0


Now that we have queried the data from the FPL API and have an aggregation of previous seasons, we can save this dataframe to a csv file to take a more detailed look in the next notebook.

In [8]:
final_history_df.to_csv('../data/final_history.csv')

The `players_df` dataframe has important information on current players that can be selected from for the upcoming season. We will save a reduced format of this with relevant information. The statistics in this dataframe are from the latest season 22/23 where some newly promoted players and new transfers will not have any data (filled with 0s).

In [9]:
query = """SELECT p.first_name || ' ' || p.second_name AS name, t.short_name AS team, td.plural_name_short AS position,
        p.ep_next AS xP, p.now_cost AS cost, p.total_points, p.minutes, p.goals_scored, p.assists, p.clean_sheets, p.goals_conceded, p.own_goals,
        p.penalties_saved, p.penalties_missed, p.yellow_cards, p.red_cards, p.saves, p.bonus, p.bps, p.influence, p.creativity,
        p.threat, p.ict_index, p.starts, p.expected_goals, p.expected_assists, p.expected_goal_involvements, p.expected_goals_conceded
        FROM player_df p
        JOIN teams_df t ON p.team_code = t.code 
        JOIN type_df td ON p.element_type = td.id"""

reduced_players = sqldf(query)
reduced_players.head()

Unnamed: 0,name,team,position,xP,cost,total_points,minutes,goals_scored,assists,clean_sheets,goals_conceded,own_goals,penalties_saved,penalties_missed,yellow_cards,red_cards,saves,bonus,bps,influence,creativity,threat,ict_index,starts,expected_goals,expected_assists,expected_goal_involvements,expected_goals_conceded
0,Folarin Balogun,ARS,FWD,1.5,45,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0
1,Cédric Alves Soares,ARS,DEF,1.5,40,10,223,0,0,0,3,0,0,0,0,0,0,0,56,56.4,51.4,10.0,11.9,2,0.06,0.19,0.25,2.83
2,Mohamed Elneny,ARS,MID,1.5,45,6,111,0,0,0,2,0,0,0,0,0,0,0,27,4.6,5.4,0.0,1.1,1,0.0,0.04,0.04,1.29
3,Fábio Ferreira Vieira,ARS,MID,2.5,55,40,500,1,2,2,5,0,0,0,0,0,0,2,134,116.0,180.6,123.0,41.5,3,0.86,1.39,2.25,5.28
4,Gabriel dos Santos Magalhães,ARS,DEF,2.8,50,146,3409,3,0,14,43,0,0,0,5,0,0,15,723,743.8,131.4,401.0,127.7,38,5.04,0.66,5.7,41.84


In [10]:
reduced_players.to_csv('../data/players.csv')