# Storing Fantasy Premier League data in a database - part 1
- Author: Steffan Rees
- 17/04/2022

## Background
In this series of notebooks we will cover the following:
- Extracting data from an API (part 1)
- Transforming the data (part 1)
- Storing the data in a database (part 1)
- Querying the database (part 2)
- Comparing SQL to pandas (part 2)

### How do extract data from https://fantasy.premierleague.com?
Before jumping to using a web scraping library, we can check whether an API exists. Luckily for us one does exist which makes the tasks a lot easier. However, how do we find out that an API exists in the first place? To check whether an API is being used to populate data on a website, we can investigate the network of a given web page. For example on https://fantasy.premierleague.com/statistics, we see a GET request being made to https://fantasy.premierleague.com/api/bootstrap-static/ which contains some data we can extract!

Once we have found the API call, we can use software such as Postman to retrieve the code needed to make the API call. Postman also allows easier viewing of the structure of the data.

To save time finding all the individual end points, we can reference the excellent Medium post https://medium.com/@frenzelts/fantasy-premier-league-api-endpoints-a-detailed-guide-acbd5598eb19.

In this notebook we will focus on the constant end points - the variable end points will be left as an exercise.

In [1]:
# Import libraries
import requests
import pandas as pd
import sqlite3

In [2]:
# Connect to the database
def sql_connection(db):
    try:
        con = sqlite3.connect(db)
    except Error:
        print(Error)
        
    cursor = con.cursor()
    return con, cursor

In [3]:
# Drop table if it exists
def drop_table_if_exists(cur, con, table):
    query = f'DROP TABLE IF EXISTS {table}'
    cur.execute(query)
    con.commit()

In [4]:
# Constant end points
BOOTSRAP_STATIC_URL = "https://fantasy.premierleague.com/api/bootstrap-static/"
FIXTURES_URL = "https://fantasy.premierleague.com/api/fixtures/"
EVENT_STATUS_URL = "https://fantasy.premierleague.com/api/event-status/"
SET_PIECE_URL = "https://fantasy.premierleague.com/api/team/set-piece-notes/"

#### Variable end points
- https://fantasy.premierleague.com/api/element-summary/{element_id}/
- https://fantasy.premierleague.com/api/event/{event_id}/live/
- https://fantasy.premierleague.com/api/entry/{manager_id}/
- https://fantasy.premierleague.com/api/entry/{manager_id}/history/
- https://fantasy.premierleague.com/api/entry/{manager_id}/event/{gameweek}/picks/'
- https://fantasy.premierleague.com/api/leagues-classic/{league_id}/standings
- https://fantasy.premierleague.com/api/dream-team/{event_id}/

## General information

In [5]:
# Extract the information available in the bootstrap-static end point
def get_bootstrap_static(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        raise Exception(f"Request failed. Return code is {response.status_code}.")

In [6]:
bootstrap_static = get_bootstrap_static(BOOTSRAP_STATIC_URL)

In [7]:
# What subsets of data exist?
bootstrap_static.keys()

dict_keys(['events', 'game_settings', 'phases', 'teams', 'total_players', 'elements', 'element_stats', 'element_types'])

### Events
Basic information of every gameweek such as average score, highest score, top scoring player, most captained, etc.

In [8]:
# First gameweek
bootstrap_static['events'][0]

{'id': 1,
 'name': 'Gameweek 1',
 'deadline_time': '2021-08-13T17:30:00Z',
 'average_entry_score': 69,
 'finished': True,
 'data_checked': True,
 'highest_scoring_entry': 5059647,
 'deadline_time_epoch': 1628875800,
 'deadline_time_game_offset': 0,
 'highest_score': 150,
 'is_previous': False,
 'is_current': False,
 'is_next': False,
 'cup_leagues_created': False,
 'h2h_ko_matches_created': False,
 'chip_plays': [{'chip_name': 'bboost', 'num_played': 145658},
  {'chip_name': '3xc', 'num_played': 225749}],
 'most_selected': 275,
 'most_transferred_in': 1,
 'top_element': 277,
 'top_element_info': {'id': 277, 'points': 20},
 'transfers_made': 0,
 'most_captained': 233,
 'most_vice_captained': 277}

In [9]:
# Last gameweek
bootstrap_static['events'][-1]

{'id': 38,
 'name': 'Gameweek 38',
 'deadline_time': '2022-05-22T12:30:00Z',
 'average_entry_score': 0,
 'finished': False,
 'data_checked': False,
 'highest_scoring_entry': None,
 'deadline_time_epoch': 1653222600,
 'deadline_time_game_offset': 0,
 'highest_score': None,
 'is_previous': False,
 'is_current': False,
 'is_next': False,
 'cup_leagues_created': False,
 'h2h_ko_matches_created': False,
 'chip_plays': [],
 'most_selected': None,
 'most_transferred_in': None,
 'top_element': None,
 'top_element_info': None,
 'transfers_made': 0,
 'most_captained': None,
 'most_vice_captained': None}

In [10]:
# Extract the current gameweek
def get_current_gameweek():
    bootstrap_static = get_bootstrap_static(BOOTSRAP_STATIC_URL)
    events = bootstrap_static['events']
    for index in range(len(events)):
        for key in events[index]:
            if key == 'finished' and events[index][key] == False:
                gameweek = events[index]['id']
                break
        else:
            continue
        return gameweek

In [11]:
# Current gameweek
current_gameweek = get_current_gameweek()
bootstrap_static['events'][current_gameweek-1]

{'id': 35,
 'name': 'Gameweek 35',
 'deadline_time': '2022-04-30T10:00:00Z',
 'average_entry_score': 35,
 'finished': False,
 'data_checked': False,
 'highest_scoring_entry': 425643,
 'deadline_time_epoch': 1651312800,
 'deadline_time_game_offset': 0,
 'highest_score': 125,
 'is_previous': False,
 'is_current': True,
 'is_next': False,
 'cup_leagues_created': True,
 'h2h_ko_matches_created': True,
 'chip_plays': [{'chip_name': 'bboost', 'num_played': 66736},
  {'chip_name': 'freehit', 'num_played': 36325},
  {'chip_name': 'wildcard', 'num_played': 127267},
  {'chip_name': '3xc', 'num_played': 37782}],
 'most_selected': 233,
 'most_transferred_in': 263,
 'top_element': 359,
 'top_element_info': {'id': 359, 'points': 19},
 'transfers_made': 6143352,
 'most_captained': 233,
 'most_vice_captained': 233}

In [12]:
# Extract wildcard information
def extract_wildcard_info(chips, chip_type):
    for index in range(len(chips)):
        for key in chips[index]:
            if chips[index][key] == chip_type:
                num_played = chips[index]['num_played']
                break
        else:
            continue
        return num_played

In [13]:
# Store events data in a dataframe
def events_table():
    bootstrap_static = get_bootstrap_static(BOOTSRAP_STATIC_URL)
    events_df = pd.DataFrame(bootstrap_static['events'])
    return (events_df
            .assign(most_points_score = events_df.top_element_info.apply(lambda x: None if x is None else x['points']),
                    bboost_played = events_df.chip_plays.apply(lambda x: None if not x else extract_wildcard_info(x, 'bboost')),
                    freehit_played = events_df.chip_plays.apply(lambda x: None if not x else extract_wildcard_info(x, 'freehit')),
                    wildcard_played = events_df.chip_plays.apply(lambda x: None if not x else extract_wildcard_info(x, 'wildcard')),
                    triple_captain_played = events_df.chip_plays.apply(lambda x: None if not x else extract_wildcard_info(x, '3xc')),
                   )
            .astype({'id': 'int8',
                     'average_entry_score': 'int8',
                     'highest_scoring_entry': 'Int32',
                     'deadline_time_game_offset': 'int8',
                     'highest_score': 'Int32',
                     'most_selected': 'Int32',
                     'most_transferred_in': 'Int32',
                     'top_element': 'Int32',
                     'transfers_made': 'int32',
                     'most_captained': 'Int32',
                     'most_vice_captained': 'Int32',
                     'most_points_score': 'Int8',
                     'bboost_played': 'Int32',
                     'freehit_played': 'Int32',
                     'wildcard_played': 'Int32',
                     'triple_captain_played': 'Int32'
                    })
            .drop(columns=['top_element_info', 'chip_plays'])
            .rename(columns={'id': 'gameweek_id',
                             'name': 'gameweek_name',
                             'most_selected': 'most_selected_id',
                             'most_transferred_in': 'most_transferred_in_id',
                             'top_element': 'top_player_id',
                             'highest_scoring_entry': 'highest_scoring_entry_id',
                             'most_captained': 'most_captained_id',
                             'most_vice_captained': 'most_vice_captained_id'
                            })
           )

In [14]:
events = events_table()
print(events.shape)
events.head()

(38, 26)


Unnamed: 0,gameweek_id,gameweek_name,deadline_time,average_entry_score,finished,data_checked,highest_scoring_entry_id,deadline_time_epoch,deadline_time_game_offset,highest_score,...,most_transferred_in_id,top_player_id,transfers_made,most_captained_id,most_vice_captained_id,most_points_score,bboost_played,freehit_played,wildcard_played,triple_captain_played
0,1,Gameweek 1,2021-08-13T17:30:00Z,69,True,True,5059647,1628875800,0,150,...,1,277,0,233,277,20,145658,,,225749
1,2,Gameweek 2,2021-08-21T10:00:00Z,56,True,True,6882931,1629540000,0,146,...,272,142,12038724,233,277,18,95038,102410.0,277209.0,269514
2,3,Gameweek 3,2021-08-28T10:00:00Z,54,True,True,7516002,1630144800,0,119,...,419,268,15553648,277,277,18,94049,117627.0,372083.0,138714
3,4,Gameweek 4,2021-09-11T10:00:00Z,57,True,True,7797969,1631354400,0,120,...,579,432,28985870,233,233,13,77204,168976.0,1117718.0,157121
4,5,Gameweek 5,2021-09-17T17:30:00Z,55,True,True,3139954,1631899800,0,144,...,579,44,18706283,233,579,15,49533,127510.0,708431.0,104192


In [15]:
events.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38 entries, 0 to 37
Data columns (total 26 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   gameweek_id                38 non-null     int8  
 1   gameweek_name              38 non-null     object
 2   deadline_time              38 non-null     object
 3   average_entry_score        38 non-null     int8  
 4   finished                   38 non-null     bool  
 5   data_checked               38 non-null     bool  
 6   highest_scoring_entry_id   35 non-null     Int32 
 7   deadline_time_epoch        38 non-null     int64 
 8   deadline_time_game_offset  38 non-null     int8  
 9   highest_score              35 non-null     Int32 
 10  is_previous                38 non-null     bool  
 11  is_current                 38 non-null     bool  
 12  is_next                    38 non-null     bool  
 13  cup_leagues_created        38 non-null     bool  
 14  h2h_ko_match

In [16]:
# Load dataframe into a SQLite db table
con, cursor = sql_connection('fpl.db')
drop_table_if_exists(cursor, con, 'gameweeks')
events.to_sql(name='gameweeks', con=con, if_exists='replace', index=False, dtype={'gameweek_id': 'INTEGER PRIMARY KEY'})
con.commit()
con.close()

### Teams
Basic information of current Premier League clubs.

In [17]:
bootstrap_static['teams'][0]

{'code': 3,
 'draw': 0,
 'form': None,
 'id': 1,
 'loss': 0,
 'name': 'Arsenal',
 'played': 0,
 'points': 0,
 'position': 0,
 'short_name': 'ARS',
 'strength': 4,
 'team_division': None,
 'unavailable': False,
 'win': 0,
 'strength_overall_home': 1250,
 'strength_overall_away': 1270,
 'strength_attack_home': 1150,
 'strength_attack_away': 1210,
 'strength_defence_home': 1190,
 'strength_defence_away': 1220,
 'pulse_id': 1}

In [18]:
# Store teams data in a dataframe
def teams_table():
    bootstrap_static = get_bootstrap_static(BOOTSRAP_STATIC_URL)
    teams_df = pd.DataFrame(bootstrap_static['teams'])
    teams_df = (teams_df
                .astype({'code': 'int8',
                         'draw': 'int8',
                         'loss': 'int8',
                         'played': 'int8',
                         'points': 'int8',
                         'position': 'int8',
                         'strength': 'int8',
                         'win': 'int8',
                         'strength_overall_home': 'int16',
                         'strength_overall_away': 'int16',
                         'strength_attack_home': 'int16',
                         'strength_attack_away': 'int16',
                         'strength_defence_home': 'int16',
                         'strength_defence_away': 'int16',
                         'pulse_id': 'int16'
                        })
                .rename(columns={'code': 'team_code',
                                 'id': 'team_id',
                                 'name': 'team_name',
                                 'short_name': 'team_short_name'
                                })
               )
    cols_to_move = ['team_id', 'team_code', 'team_name', 'team_short_name']
    teams_df = teams_df[cols_to_move + [ col for col in teams_df.columns if col not in cols_to_move ]]
    return teams_df

In [19]:
teams = teams_table()
print(teams.shape)
teams.head()

(20, 21)


Unnamed: 0,team_id,team_code,team_name,team_short_name,draw,form,loss,played,points,position,...,team_division,unavailable,win,strength_overall_home,strength_overall_away,strength_attack_home,strength_attack_away,strength_defence_home,strength_defence_away,pulse_id
0,1,3,Arsenal,ARS,0,,0,0,0,0,...,,False,0,1250,1270,1150,1210,1190,1220,1
1,2,7,Aston Villa,AVL,0,,0,0,0,0,...,,False,0,1100,1100,1140,1110,1090,1090,2
2,3,94,Brentford,BRE,0,,0,0,0,0,...,,False,0,1060,1070,1120,1150,1080,1120,130
3,4,36,Brighton,BHA,0,,0,0,0,0,...,,False,0,1100,1090,1160,1160,1100,1120,131
4,5,90,Burnley,BUR,0,,0,0,0,0,...,,False,0,1060,1060,1080,1130,1060,1100,43


In [20]:
teams.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 21 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   team_id                20 non-null     int64 
 1   team_code              20 non-null     int8  
 2   team_name              20 non-null     object
 3   team_short_name        20 non-null     object
 4   draw                   20 non-null     int8  
 5   form                   0 non-null      object
 6   loss                   20 non-null     int8  
 7   played                 20 non-null     int8  
 8   points                 20 non-null     int8  
 9   position               20 non-null     int8  
 10  strength               20 non-null     int8  
 11  team_division          0 non-null      object
 12  unavailable            20 non-null     bool  
 13  win                    20 non-null     int8  
 14  strength_overall_home  20 non-null     int16 
 15  strength_overall_away  20

In [21]:
con, cursor = sql_connection('fpl.db')
drop_table_if_exists(cursor, con, 'teams')
teams.to_sql(name='teams', con=con, if_exists='replace', index=False, dtype={'team_id': 'INTEGER PRIMARY KEY'})
con.commit()
con.close()

### Element type (player position)

In [22]:
bootstrap_static['element_types'][0]

{'id': 1,
 'plural_name': 'Goalkeepers',
 'plural_name_short': 'GKP',
 'singular_name': 'Goalkeeper',
 'singular_name_short': 'GKP',
 'squad_select': 2,
 'squad_min_play': 1,
 'squad_max_play': 1,
 'ui_shirt_specific': True,
 'sub_positions_locked': [12],
 'element_count': 83}

In [23]:
def position_table():
    bootstrap_static = get_bootstrap_static(BOOTSRAP_STATIC_URL)
    positions_df = pd.DataFrame(bootstrap_static['element_types'])
    return (positions_df
            .assign(sub_positions_locked = positions_df.sub_positions_locked.apply(lambda x: None if x == [] else int(''.join([str(i) for i in x]))))
            .astype({'id': 'int8',
                     'squad_select': 'int8',
                     'squad_min_play': 'int8',
                     'squad_max_play': 'int8',
                     'sub_positions_locked': 'Int8',
                     'element_count': 'int16'
                    })
            .rename(columns={'id': 'position_id',
                             'plural_name_short': 'position_name_short',
                             'singular_name': 'position_name'
                            })
            .drop(columns=['plural_name', 'singular_name_short'])
           )

In [24]:
positions = position_table()
print(positions.shape)
positions.head()

(4, 9)


Unnamed: 0,position_id,position_name_short,position_name,squad_select,squad_min_play,squad_max_play,ui_shirt_specific,sub_positions_locked,element_count
0,1,GKP,Goalkeeper,2,1,1,True,12.0,83
1,2,DEF,Defender,5,3,5,False,,247
2,3,MID,Midfielder,5,2,5,False,,305
3,4,FWD,Forward,3,1,3,False,,95


In [25]:
positions.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   position_id           4 non-null      int8  
 1   position_name_short   4 non-null      object
 2   position_name         4 non-null      object
 3   squad_select          4 non-null      int8  
 4   squad_min_play        4 non-null      int8  
 5   squad_max_play        4 non-null      int8  
 6   ui_shirt_specific     4 non-null      bool  
 7   sub_positions_locked  1 non-null      Int8  
 8   element_count         4 non-null      int16 
dtypes: Int8(1), bool(1), int16(1), int8(4), object(2)
memory usage: 228.0+ bytes


In [26]:
con, cursor = sql_connection('fpl.db')
drop_table_if_exists(cursor, con, 'positions')
positions.to_sql(name='positions', con=con, if_exists='replace', index=False, dtype={'position_id': 'INTEGER PRIMARY KEY'})
con.commit()
con.close()

### Elements (players)
Information of all Premier League players including points, status, value, match stats (goals, assists, etc.), ICT index, etc.

In [27]:
bootstrap_static['elements'][0]

{'chance_of_playing_next_round': 100,
 'chance_of_playing_this_round': 100,
 'code': 80201,
 'cost_change_event': 0,
 'cost_change_event_fall': 0,
 'cost_change_start': -5,
 'cost_change_start_fall': 5,
 'dreamteam_count': 1,
 'element_type': 1,
 'ep_next': '0.5',
 'ep_this': '0.5',
 'event_points': 0,
 'first_name': 'Bernd',
 'form': '0.0',
 'id': 1,
 'in_dreamteam': False,
 'news': '',
 'news_added': '2022-02-11T08:00:15.144286Z',
 'now_cost': 45,
 'photo': '80201.jpg',
 'points_per_game': '2.5',
 'second_name': 'Leno',
 'selected_by_percent': '0.9',
 'special': False,
 'squad_number': None,
 'status': 'a',
 'team': 1,
 'team_code': 3,
 'total_points': 10,
 'transfers_in': 80277,
 'transfers_in_event': 50,
 'transfers_out': 203364,
 'transfers_out_event': 279,
 'value_form': '0.0',
 'value_season': '2.2',
 'web_name': 'Leno',
 'minutes': 360,
 'goals_scored': 0,
 'assists': 0,
 'clean_sheets': 1,
 'goals_conceded': 9,
 'own_goals': 0,
 'penalties_saved': 0,
 'penalties_missed': 0,
 '

In [28]:
def players_table():
    bootstrap_static = get_bootstrap_static(BOOTSRAP_STATIC_URL)
    players_df = pd.DataFrame(bootstrap_static['elements'])
    players_df = (players_df
                  .assign(now_cost = players_df.now_cost.astype(float) / 10)
                  .astype({'chance_of_playing_next_round': 'Int8',
                           'chance_of_playing_this_round': 'Int8',
                           'code': 'int32',
                           'cost_change_event': 'int8',
                           'cost_change_event_fall': 'int8',
                           'cost_change_start': 'int8',
                           'cost_change_start_fall': 'int8',
                           'dreamteam_count': 'int8',
                           'element_type': 'int8',
                           'ep_next': 'float',
                           'ep_this': 'float',
                           'event_points': 'int8',
                           'form': 'float',
                           'id': 'int16',
                           'points_per_game': 'float',
                           'selected_by_percent': 'float',
                           'squad_number': 'Int8',
                           'team': 'int8',
                           'team_code': 'int8',
                           'total_points': 'int16',
                           'transfers_in': 'int32',
                           'transfers_in_event': 'int32',
                           'transfers_out': 'int32',
                           'transfers_out_event': 'int32',
                           'value_form': 'float',
                           'value_season': 'float',
                           'minutes': 'int16',
                           'goals_scored': 'int8',
                           'assists': 'int8',
                           'clean_sheets': 'int8',
                           'goals_conceded': 'int8',
                           'own_goals': 'int8',
                           'penalties_saved': 'int8',
                           'penalties_missed': 'int8',
                           'yellow_cards': 'int8',
                           'red_cards': 'int8',
                           'saves': 'int16',
                           'bonus': 'int8',
                           'bps': 'int16',
                           'influence': 'float',
                           'creativity': 'float',
                           'threat': 'float',
                           'ict_index': 'float',
                           'influence_rank': 'Int16',
                           'influence_rank_type': 'Int16',
                           'creativity_rank': 'Int16',
                           'creativity_rank_type': 'Int16',
                           'threat_rank': 'Int16',
                           'threat_rank_type': 'Int16',
                           'ict_index_rank': 'Int16',
                           'ict_index_rank_type': 'Int16',
                           'corners_and_indirect_freekicks_order': 'Int8',
                           'direct_freekicks_order': 'Int8',
                           'penalties_order': 'Int8'
                          })
                  .drop(columns=['photo'])
                  .rename(columns={'code': 'player_code',
                                   'id': 'player_id',
                                   'team': 'team_id',
                                   'element_type': 'position_id'
                                  })
                 )
    cols_to_move = ['player_id', 'player_code', 'first_name', 'second_name', 'web_name', 'team_id', 'team_code']
    players_df = players_df[cols_to_move + [ col for col in players_df.columns if col not in cols_to_move ]]
    return players_df

In [29]:
players = players_table()
print(players.shape)
players.head()

(730, 66)


Unnamed: 0,player_id,player_code,first_name,second_name,web_name,team_id,team_code,chance_of_playing_next_round,chance_of_playing_this_round,cost_change_event,...,threat_rank,threat_rank_type,ict_index_rank,ict_index_rank_type,corners_and_indirect_freekicks_order,corners_and_indirect_freekicks_text,direct_freekicks_order,direct_freekicks_text,penalties_order,penalties_text
0,1,80201,Bernd,Leno,Leno,1,3,100,100,0,...,594,60,438,30,,,,,,
1,2,115918,Rúnar Alex,Rúnarsson,Rúnarsson,1,3,0,0,0,...,485,19,538,54,,,,,,
2,3,47431,Willian,Borges Da Silva,Willian,1,3,0,0,0,...,726,303,726,303,,,,,,
3,4,54694,Pierre-Emerick,Aubameyang,Aubameyang,1,3,0,0,0,...,51,20,154,29,,,,,,
4,5,58822,Cédric,Soares,Cédric,1,3,100,100,0,...,319,104,269,89,2.0,,3.0,,,


In [30]:
players.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 730 entries, 0 to 729
Data columns (total 66 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   player_id                             730 non-null    int16  
 1   player_code                           730 non-null    int32  
 2   first_name                            730 non-null    object 
 3   second_name                           730 non-null    object 
 4   web_name                              730 non-null    object 
 5   team_id                               730 non-null    int8   
 6   team_code                             730 non-null    int8   
 7   chance_of_playing_next_round          565 non-null    Int8   
 8   chance_of_playing_this_round          565 non-null    Int8   
 9   cost_change_event                     730 non-null    int8   
 10  cost_change_event_fall                730 non-null    int8   
 11  cost_change_start  

In [31]:
con, cursor = sql_connection('fpl.db')
drop_table_if_exists(cursor, con, 'players')
players.to_sql(name='players', con=con, if_exists='replace', index=False, dtype={'player_id': 'INTEGER PRIMARY KEY'})
con.commit()
con.close()

### Player Stats

In [32]:
bootstrap_static['element_stats']

[{'label': 'Minutes played', 'name': 'minutes'},
 {'label': 'Goals scored', 'name': 'goals_scored'},
 {'label': 'Assists', 'name': 'assists'},
 {'label': 'Clean sheets', 'name': 'clean_sheets'},
 {'label': 'Goals conceded', 'name': 'goals_conceded'},
 {'label': 'Own goals', 'name': 'own_goals'},
 {'label': 'Penalties saved', 'name': 'penalties_saved'},
 {'label': 'Penalties missed', 'name': 'penalties_missed'},
 {'label': 'Yellow cards', 'name': 'yellow_cards'},
 {'label': 'Red cards', 'name': 'red_cards'},
 {'label': 'Saves', 'name': 'saves'},
 {'label': 'Bonus', 'name': 'bonus'},
 {'label': 'Bonus Points System', 'name': 'bps'},
 {'label': 'Influence', 'name': 'influence'},
 {'label': 'Creativity', 'name': 'creativity'},
 {'label': 'Threat', 'name': 'threat'},
 {'label': 'ICT Index', 'name': 'ict_index'}]

In [33]:
# Store player summary data in a dataframe
def players_stats_table():
    bootstrap_static = get_bootstrap_static(BOOTSRAP_STATIC_URL)
    player_stats_df = pd.DataFrame(bootstrap_static['element_stats'])
    player_stats_df = (player_stats_df
                       .assign(player_stat_id = player_stats_df.index + 1)
                       .astype({'player_stat_id': 'int8'})
                      )
    player_stats_df = player_stats_df[['player_stat_id', 'label', 'name']]
    return player_stats_df

In [34]:
player_stats = players_stats_table()
print(player_stats.shape)
player_stats.head()

(17, 3)


Unnamed: 0,player_stat_id,label,name
0,1,Minutes played,minutes
1,2,Goals scored,goals_scored
2,3,Assists,assists
3,4,Clean sheets,clean_sheets
4,5,Goals conceded,goals_conceded


In [35]:
player_stats.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17 entries, 0 to 16
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   player_stat_id  17 non-null     int8  
 1   label           17 non-null     object
 2   name            17 non-null     object
dtypes: int8(1), object(2)
memory usage: 417.0+ bytes


In [36]:
con, cursor = sql_connection('fpl.db')
drop_table_if_exists(cursor, con, 'player_stats')
player_stats.to_sql(name='player_stats', con=con, if_exists='replace', index=False, dtype={'player_stat_id': 'INTEGER PRIMARY KEY'})
con.commit()
con.close()

## Fixtures

In [37]:
# Extract information available from the fixtures end point
def get_fixtures(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        raise Exception(f"Request failed. Return code is {response.status_code}.")

In [38]:
fixtures = get_fixtures(FIXTURES_URL)

In [39]:
fixtures[0]

{'code': 2210271,
 'event': 1,
 'finished': True,
 'finished_provisional': True,
 'id': 1,
 'kickoff_time': '2021-08-13T19:00:00Z',
 'minutes': 90,
 'provisional_start_time': False,
 'started': True,
 'team_a': 1,
 'team_a_score': 0,
 'team_h': 3,
 'team_h_score': 2,
 'stats': [{'identifier': 'goals_scored',
   'a': [],
   'h': [{'value': 1, 'element': 77}, {'value': 1, 'element': 81}]},
  {'identifier': 'assists', 'a': [], 'h': [{'value': 1, 'element': 91}]},
  {'identifier': 'own_goals', 'a': [], 'h': []},
  {'identifier': 'penalties_saved', 'a': [], 'h': []},
  {'identifier': 'penalties_missed', 'a': [], 'h': []},
  {'identifier': 'yellow_cards', 'a': [], 'h': []},
  {'identifier': 'red_cards', 'a': [], 'h': []},
  {'identifier': 'saves',
   'a': [{'value': 1, 'element': 1}],
   'h': [{'value': 4, 'element': 80}]},
  {'identifier': 'bonus',
   'a': [],
   'h': [{'value': 3, 'element': 81},
    {'value': 2, 'element': 91},
    {'value': 1, 'element': 80}]},
  {'identifier': 'bps',
  

In [40]:
# Fixtures for current gameweek
[f for f in fixtures if f['event'] == current_gameweek]

[{'code': 2210615,
  'event': 35,
  'finished': True,
  'finished_provisional': True,
  'id': 345,
  'kickoff_time': '2022-04-30T11:30:00Z',
  'minutes': 90,
  'provisional_start_time': False,
  'started': True,
  'team_a': 11,
  'team_a_score': 1,
  'team_h': 14,
  'team_h_score': 0,
  'stats': [{'identifier': 'goals_scored',
    'a': [{'value': 1, 'element': 239}],
    'h': []},
   {'identifier': 'assists', 'a': [{'value': 1, 'element': 240}], 'h': []},
   {'identifier': 'own_goals', 'a': [], 'h': []},
   {'identifier': 'penalties_saved', 'a': [], 'h': []},
   {'identifier': 'penalties_missed', 'a': [], 'h': []},
   {'identifier': 'yellow_cards',
    'a': [{'value': 1, 'element': 230},
     {'value': 1, 'element': 238},
     {'value': 1, 'element': 240}],
    'h': [{'value': 1, 'element': 310}]},
   {'identifier': 'red_cards', 'a': [], 'h': []},
   {'identifier': 'saves',
    'a': [{'value': 2, 'element': 231}],
    'h': [{'value': 9, 'element': 295}]},
   {'identifier': 'bonus',
   

In [41]:
def fixtures_table():
    response = get_fixtures(FIXTURES_URL)
    fixtures_df = pd.DataFrame(response)
    fixtures_df = (fixtures_df
                   .drop(columns=['stats']) # extracting fixture stats left as an exercise
                   .astype({'code': 'int32',
                            'event': 'Int8',
                            'id': 'int16',
                            'minutes': 'int8',
                            'team_h': 'int8',
                            'team_a': 'int8',
                            'team_a_score': 'Int8',
                            'team_h_score': 'Int8',
                            'team_h_difficulty': 'int8',
                            'team_a_difficulty': 'int8',
                            'pulse_id': 'int32'
                           })
                   .rename(columns={'code': 'fixture_code',
                                    'event': 'gameweek_id',
                                    'id': 'fixture_id',
                                    'team_h': 'home_team_id',
                                    'team_a': 'away_team_id'
                                   })
                  )
    cols_to_move = ['fixture_id', 'fixture_code', 'gameweek_id']
    fixtures_df = fixtures_df[cols_to_move + [ col for col in fixtures_df.columns if col not in cols_to_move ]]
    return fixtures_df

In [42]:
fixtures = fixtures_table()
print(fixtures.shape)
fixtures.head()

(380, 16)


Unnamed: 0,fixture_id,fixture_code,gameweek,finished,finished_provisional,kickoff_time,minutes,provisional_start_time,started,away_team_id,team_a_score,home_team_id,team_h_score,team_h_difficulty,team_a_difficulty,pulse_id
0,1,2210271,1,True,True,2021-08-13T19:00:00Z,90,False,True,1,0,3,2,4,2,66342
1,6,2210276,1,True,True,2021-08-14T11:30:00Z,90,False,True,10,1,13,5,2,4,66347
2,2,2210272,1,True,True,2021-08-14T14:00:00Z,90,False,True,4,2,5,1,2,2,66343
3,3,2210273,1,True,True,2021-08-14T14:00:00Z,90,False,True,7,0,6,3,2,4,66344
4,4,2210274,1,True,True,2021-08-14T14:00:00Z,90,False,True,16,1,8,3,2,2,66345


In [43]:
fixtures.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 380 entries, 0 to 379
Data columns (total 16 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   fixture_id              380 non-null    int16 
 1   fixture_code            380 non-null    int32 
 2   gameweek                380 non-null    Int8  
 3   finished                380 non-null    bool  
 4   finished_provisional    380 non-null    bool  
 5   kickoff_time            380 non-null    object
 6   minutes                 380 non-null    int8  
 7   provisional_start_time  380 non-null    bool  
 8   started                 380 non-null    bool  
 9   away_team_id            380 non-null    int8  
 10  team_a_score            341 non-null    Int8  
 11  home_team_id            380 non-null    int8  
 12  team_h_score            341 non-null    Int8  
 13  team_h_difficulty       380 non-null    int8  
 14  team_a_difficulty       380 non-null    int8  
 15  pulse_

In [44]:
con, cursor = sql_connection('fpl.db')
drop_table_if_exists(cursor, con, 'fixtures')
fixtures.to_sql(name='fixtures', con=con, if_exists='replace', index=False, dtype={'fixture_id': 'INTEGER PRIMARY KEY',
                                                                                   'finished': 'INT',
                                                                                   'finished_provisional': 'INT',
                                                                                   'started': 'INT'
                                                                                  })
con.commit()
con.close()

## Summary
We have extracted fantasy premier league data using the API, transformed the data using pandas, and loaded the data into a SQLite database. In part 2, we will query the database and compare SQL queries with pandas equivalents.