## Scraping data from the FPL website and storing in filepath for later use

### Understanding the FPL API

Full URL: https://fantasy.premierleague.com/api/bootstrap-static/

Data included:

* A summary of all 38 gameweeks
* The game’s settings
* Basic information on all 20 PL teams
* Total number of FPL Users and overall chip usage
* Basic information on all Premier League players
* List of stats that FPL keeps track of
* The different FPL positions

### Scraping individual player data

I scrape individual player data, week by week - and create a dataframe for each gameweek, compiling all the data for all the players together.

In [11]:
import requests
import pandas as pd
from tqdm import tqdm
import os

In [3]:
# Base URL for the Fantasy Premier League API
base_url = 'https://fantasy.premierleague.com/api/'

# Function to get JSON data from a given URL
def get_json(url):
    try:
        response = requests.get(url)  # Sends a GET request to the URL
        response.raise_for_status()  # Will raise an HTTPError if the request is unsuccessful
        return response.json()  # Returns the JSON content from the response
    except requests.exceptions.RequestException:
        return None  # Returns None if there is any exception (e.g., network error, invalid URL)

# Function to create a DataFrame for a specific player (ID) and gameweek
def create_gameweek_specific_player_df(player_id, gameweek):
    player_data = get_json(base_url + f'element-summary/{player_id}/')  # Fetches player data
    if player_data:
        history_df = pd.DataFrame(player_data['history'])  # Converts the 'history' part of player data to a DataFrame
        return history_df[history_df['round'] == gameweek]  # Filters and returns data for the specified gameweek
    else:
        return pd.DataFrame()  # Returns an empty DataFrame if no player data is found

# Part of the code responsible for saving the data to file directory so we do not need to re-run scraping.
player_save_path = "/Users/evgenigeorgiev/Documents/Jupyter Projects/FPL/player_dataframes"
if not os.path.exists(player_save_path):
    os.makedirs(player_save_path)  # Creates the directory if it does not exist

player_ids = range(800)  # Adjust as necessary - this defines the range of player IDs to fetch

for gw in range(1, 39):  # Iterates through gameweeks 1 to 38
    gw_data = []  # Initializes an empty list to store gameweek data
    with tqdm(total=len(player_ids), desc=f"Processing GW{gw}") as pbar:  # Initializes a progress bar
        for player_id in player_ids:  # Iterates through each player ID
            gw_specific_df = create_gameweek_specific_player_df(player_id, gw)  # Gets the gameweek-specific DataFrame
            if not gw_specific_df.empty:
                gw_specific_df['player_id'] = player_id  # Adds the player ID to the DataFrame
                gw_data.append(gw_specific_df)  # Appends the DataFrame to the list
            pbar.update(1)  # Updates the progress bar

    if gw_data:
        combined_gw_df = pd.concat(gw_data, ignore_index=True)  # Combines all DataFrame into one
        file_path = os.path.join(player_save_path, f'gw{gw}_players.csv')  # Defines the file path
        combined_gw_df.to_csv(file_path, index=False)  # Saves the DataFrame to a CSV file
    else:
        print(f"No data available for Gameweek {gw}.")  # Prints a message if there is no data for the gameweek


Processing GW1:   0%|                                   | 0/800 [00:00<?, ?it/s]Fatal Python error: config_get_locale_encoding: failed to get the locale encoding: nl_langinfo(CODESET) failed
Python runtime state: preinitialized

Processing GW1: 100%|█████████████████████████| 800/800 [01:05<00:00, 12.28it/s]
Processing GW2: 100%|█████████████████████████| 800/800 [01:03<00:00, 12.54it/s]
Processing GW3: 100%|█████████████████████████| 800/800 [01:05<00:00, 12.14it/s]
Processing GW4: 100%|█████████████████████████| 800/800 [01:04<00:00, 12.35it/s]
Processing GW5: 100%|█████████████████████████| 800/800 [01:07<00:00, 11.86it/s]
Processing GW6: 100%|█████████████████████████| 800/800 [01:04<00:00, 12.41it/s]
Processing GW7: 100%|█████████████████████████| 800/800 [01:06<00:00, 12.02it/s]
Processing GW8: 100%|█████████████████████████| 800/800 [01:06<00:00, 12.02it/s]
Processing GW9: 100%|█████████████████████████| 800/800 [01:07<00:00, 11.86it/s]
Processing GW10: 100%|████████████████████

No data available for Gameweek 21.


Processing GW22: 100%|████████████████████████| 800/800 [01:05<00:00, 12.20it/s]


No data available for Gameweek 22.


Processing GW23: 100%|████████████████████████| 800/800 [01:05<00:00, 12.21it/s]


No data available for Gameweek 23.


Processing GW24: 100%|████████████████████████| 800/800 [01:05<00:00, 12.20it/s]


No data available for Gameweek 24.


Processing GW25: 100%|████████████████████████| 800/800 [01:06<00:00, 12.09it/s]


No data available for Gameweek 25.


Processing GW26: 100%|████████████████████████| 800/800 [01:04<00:00, 12.38it/s]


No data available for Gameweek 26.


Processing GW27: 100%|████████████████████████| 800/800 [01:05<00:00, 12.16it/s]


No data available for Gameweek 27.


Processing GW28: 100%|████████████████████████| 800/800 [01:05<00:00, 12.19it/s]


No data available for Gameweek 28.


Processing GW29: 100%|████████████████████████| 800/800 [01:05<00:00, 12.22it/s]


No data available for Gameweek 29.


Processing GW30: 100%|████████████████████████| 800/800 [01:04<00:00, 12.34it/s]


No data available for Gameweek 30.


Processing GW31: 100%|████████████████████████| 800/800 [01:05<00:00, 12.23it/s]


No data available for Gameweek 31.


Processing GW32: 100%|████████████████████████| 800/800 [01:05<00:00, 12.28it/s]


No data available for Gameweek 32.


Processing GW33: 100%|████████████████████████| 800/800 [01:05<00:00, 12.20it/s]


No data available for Gameweek 33.


Processing GW34: 100%|████████████████████████| 800/800 [01:06<00:00, 12.01it/s]


No data available for Gameweek 34.


Processing GW35: 100%|████████████████████████| 800/800 [01:05<00:00, 12.22it/s]


No data available for Gameweek 35.


Processing GW36: 100%|████████████████████████| 800/800 [01:04<00:00, 12.42it/s]


No data available for Gameweek 36.


Processing GW37: 100%|████████████████████████| 800/800 [01:03<00:00, 12.59it/s]


No data available for Gameweek 37.


Processing GW38: 100%|████████████████████████| 800/800 [01:05<00:00, 12.29it/s]

No data available for Gameweek 38.





In [9]:
# Specify the path to the file
file_path = '/Users/evgenigeorgiev/Documents/Jupyter Projects/FPL/player_dataframes/gw1_players.csv'

# Read the CSV file into a DataFrame
gameweek_1_df = pd.read_csv(file_path)

# Print Columns names & Display the DataFrame
print(list(gameweek_1_df))
gameweek_1_df

['element', 'fixture', 'opponent_team', 'total_points', 'was_home', 'kickoff_time', 'team_h_score', 'team_a_score', 'round', 'minutes', 'goals_scored', 'assists', 'clean_sheets', 'goals_conceded', 'own_goals', 'penalties_saved', 'penalties_missed', 'yellow_cards', 'red_cards', 'saves', 'bonus', 'bps', 'influence', 'creativity', 'threat', 'ict_index', 'starts', 'expected_goals', 'expected_assists', 'expected_goal_involvements', 'expected_goals_conceded', 'value', 'transfers_balance', 'selected', 'transfers_in', 'transfers_out', 'player_id']


Unnamed: 0,element,fixture,opponent_team,total_points,was_home,kickoff_time,team_h_score,team_a_score,round,minutes,...,expected_goals,expected_assists,expected_goal_involvements,expected_goals_conceded,value,transfers_balance,selected,transfers_in,transfers_out,player_id
0,1,2,16,0,True,2023-08-12T12:00:00Z,2.0,1.0,1,0,...,0.0,0.00,0.00,0.00,45,0,59090,0,0,1
1,2,2,16,0,True,2023-08-12T12:00:00Z,2.0,1.0,1,0,...,0.0,0.00,0.00,0.00,40,0,29866,0,0,2
2,3,2,16,0,True,2023-08-12T12:00:00Z,2.0,1.0,1,0,...,0.0,0.00,0.00,0.00,45,0,10880,0,0,3
3,4,2,16,0,True,2023-08-12T12:00:00Z,2.0,1.0,1,0,...,0.0,0.00,0.00,0.00,55,0,9548,0,0,4
4,5,2,16,1,True,2023-08-12T12:00:00Z,2.0,1.0,1,4,...,0.0,0.00,0.00,0.02,50,0,2743150,0,0,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
653,655,7,8,1,True,2023-08-12T14:00:00Z,0.0,1.0,1,3,...,0.0,0.01,0.01,0.00,45,0,0,0,0,655
654,656,7,8,1,True,2023-08-12T14:00:00Z,0.0,1.0,1,3,...,0.0,0.00,0.00,0.00,45,0,0,0,0,656
655,657,7,17,0,False,2023-08-12T14:00:00Z,0.0,1.0,1,0,...,0.0,0.00,0.00,0.00,45,0,0,0,0,657
656,658,7,17,0,False,2023-08-12T14:00:00Z,0.0,1.0,1,0,...,0.0,0.00,0.00,0.00,45,0,0,0,0,658


### Create df storing player names with corresponding ID

In [21]:
def get_json(url):
    response = requests.get(url)
    return response.json()

# Base URL for the Fantasy Premier League API
base_url = 'https://fantasy.premierleague.com/api/'

# Fetch overall player data
data = get_json(base_url + 'bootstrap-static/')
players = data['elements']

# Extracting player ID, first name, and second name
player_data = [{'id': player['id'], 'first_name': player['first_name'], 'second_name': player['second_name']} for player in players]

# Creating a DataFrame
players_df = pd.DataFrame(player_data)

# Sort the DataFrame by player ID and reset the index
players_df = players_df.sort_values(by='id').reset_index(drop=True)

# Specify the directory where you want to save the file
save_directory = "/Users/evgenigeorgiev/Documents/Jupyter Projects/FPL/player_dataframes/"

# Ensure that the directory exists; if not, create it
if not os.path.exists(save_directory):
    os.makedirs(save_directory)

# Specify the file path
file_path = os.path.join(save_directory, 'player_ids.csv')

# Save the DataFrame to a CSV file
players_df.to_csv(file_path, index=False)

# Optional: Print a message to confirm that the file has been saved
print(f"Player IDs data saved to {file_path}")
players_df

Player IDs data saved to /Users/evgenigeorgiev/Documents/Jupyter Projects/FPL/player_dataframes/player_ids.csv


Unnamed: 0,id,first_name,second_name
0,1,Folarin,Balogun
1,2,Cédric,Alves Soares
2,3,Mohamed,Elneny
3,4,Fábio,Ferreira Vieira
4,5,Gabriel,dos Santos Magalhães
...,...,...,...
766,768,Rhys,Bennett
767,769,Owen,Hesketh
768,770,Ty,Barnett
769,771,Micah,Hamilton


### Scraping data for fixture list

In [22]:
base_url = 'https://fantasy.premierleague.com/api/'

def get_json(url):
    response = requests.get(url)
    return response.json()

# Fetch fixtures data
fixtures_data = get_json(base_url + 'fixtures/')
fixtures_df = pd.DataFrame(fixtures_data)

# Function to filter fixtures data for a specific gameweek
def create_gameweek_specific_fixtures_df(fixtures_df, gameweek):
    return fixtures_df[fixtures_df['event'] == gameweek]

# Directory for saving fixtures data
fixtures_save_path = "/Users/evgenigeorgiev/Documents/Jupyter Projects/FPL/fixtures/"

# Ensure the directory exists
if not os.path.exists(fixtures_save_path):
    os.makedirs(fixtures_save_path)

# Save each gameweek's fixtures data
for gw in tqdm(range(1, 39), desc="Saving fixtures data"):
    gw_specific_df = create_gameweek_specific_fixtures_df(fixtures_df, gw)
    gw_specific_df.to_csv(os.path.join(fixtures_save_path, f'fixtures_df_gw{gw}.csv'), index=False)

Saving fixtures data:   0%|                              | 0/38 [00:00<?, ?it/s]Fatal Python error: config_get_locale_encoding: failed to get the locale encoding: nl_langinfo(CODESET) failed
Python runtime state: preinitialized

Saving fixtures data: 100%|████████████████████| 38/38 [00:00<00:00, 899.75it/s]


In [24]:
# Specify the path to the GW1 fixtures file
gw1_file_path = "/Users/evgenigeorgiev/Documents/Jupyter Projects/FPL/fixtures/fixtures_df_gw1.csv"

# Read the CSV file into a DataFrame
gw1_fixtures_df = pd.read_csv(gw1_file_path)

# Display the DataFrame
gw1_fixtures_df


Unnamed: 0,code,event,finished,finished_provisional,id,kickoff_time,minutes,provisional_start_time,started,team_a,team_a_score,team_h,team_h_score,stats,team_h_difficulty,team_a_difficulty,pulse_id
0,2367538,1.0,True,True,1,2023-08-11T19:00:00Z,90,False,True,13,3.0,6,0.0,"[{'identifier': 'goals_scored', 'a': [{'value'...",5,2,93321
1,2367540,1.0,True,True,2,2023-08-12T12:00:00Z,90,False,True,16,1.0,1,2.0,"[{'identifier': 'goals_scored', 'a': [{'value'...",2,4,93322
2,2367539,1.0,True,True,3,2023-08-12T14:00:00Z,90,False,True,19,1.0,3,1.0,"[{'identifier': 'goals_scored', 'a': [{'value'...",2,2,93323
3,2367541,1.0,True,True,4,2023-08-12T14:00:00Z,90,False,True,12,1.0,5,4.0,"[{'identifier': 'goals_scored', 'a': [{'value'...",2,3,93324
4,2367542,1.0,True,True,5,2023-08-12T14:00:00Z,90,False,True,10,1.0,9,0.0,"[{'identifier': 'goals_scored', 'a': [{'value'...",2,2,93325
5,2367544,1.0,True,True,7,2023-08-12T14:00:00Z,90,False,True,8,1.0,17,0.0,"[{'identifier': 'goals_scored', 'a': [{'value'...",2,2,93327
6,2367543,1.0,True,True,6,2023-08-12T16:30:00Z,90,False,True,2,1.0,15,5.0,"[{'identifier': 'goals_scored', 'a': [{'value'...",3,4,93326
7,2367545,1.0,True,True,8,2023-08-13T13:00:00Z,90,False,True,18,2.0,4,2.0,"[{'identifier': 'goals_scored', 'a': [{'value'...",3,3,93328
8,2367546,1.0,True,True,9,2023-08-13T15:30:00Z,90,False,True,11,1.0,7,1.0,"[{'identifier': 'goals_scored', 'a': [{'value'...",4,3,93329
9,2367547,1.0,True,True,10,2023-08-14T19:00:00Z,90,False,True,20,0.0,14,1.0,"[{'identifier': 'goals_scored', 'a': [], 'h': ...",2,4,93330


### Scraping Static Data (teams_df and gameweeks_df)

In [26]:
# Fetch general information
bootstrap_data = get_json(base_url + 'bootstrap-static/')
teams_df = pd.DataFrame(bootstrap_data['teams'])
gameweeks_df = pd.DataFrame(bootstrap_data['events'])

# Saving teams_df and gameweeks_df
teams_df.to_csv('/Users/evgenigeorgiev/Documents/Jupyter Projects/FPL/static_data/teams_df.csv', index=False)
gameweeks_df.to_csv('/Users/evgenigeorgiev/Documents/Jupyter Projects/FPL/static_data/gameweeks_df.csv', index=False)


In [27]:
teams_df

Unnamed: 0,code,draw,form,id,loss,name,played,points,position,short_name,...,team_division,unavailable,win,strength_overall_home,strength_overall_away,strength_attack_home,strength_attack_away,strength_defence_home,strength_defence_away,pulse_id
0,3,0,,1,0,Arsenal,0,0,0,ARS,...,,False,0,1230,1285,1250,1250,1210,1320,1
1,7,0,,2,0,Aston Villa,0,0,0,AVL,...,,False,0,1115,1175,1130,1190,1100,1160,2
2,91,0,,3,0,Bournemouth,0,0,0,BOU,...,,False,0,1060,1095,1050,1100,1060,1090,127
3,94,0,,4,0,Brentford,0,0,0,BRE,...,,False,0,1125,1205,1120,1220,1130,1190,130
4,36,0,,5,0,Brighton,0,0,0,BHA,...,,False,0,1165,1210,1120,1200,1210,1240,131
5,90,0,,6,0,Burnley,0,0,0,BUR,...,,False,0,1060,1080,1060,1080,1060,1080,43
6,8,0,,7,0,Chelsea,0,0,0,CHE,...,,False,0,1115,1160,1130,1210,1100,1110,4
7,31,0,,8,0,Crystal Palace,0,0,0,CRY,...,,False,0,1100,1100,1140,1170,1080,1085,6
8,11,0,,9,0,Everton,0,0,0,EVE,...,,False,0,1075,1100,1070,1120,1080,1080,7
9,54,0,,10,0,Fulham,0,0,0,FUL,...,,False,0,1095,1100,1090,1090,1100,1140,34


In [28]:
gameweeks_df

Unnamed: 0,id,name,deadline_time,average_entry_score,finished,data_checked,highest_scoring_entry,deadline_time_epoch,deadline_time_game_offset,highest_score,...,cup_leagues_created,h2h_ko_matches_created,chip_plays,most_selected,most_transferred_in,top_element,top_element_info,transfers_made,most_captained,most_vice_captained
0,1,Gameweek 1,2023-08-11T17:30:00Z,64,True,True,3383750.0,1691775000,0,127.0,...,False,False,"[{'chip_name': 'bboost', 'num_played': 163222}...",355.0,1.0,395.0,"{'id': 395, 'points': 14}",0,355.0,19.0
1,2,Gameweek 2,2023-08-18T17:15:00Z,44,True,True,3338487.0,1692378900,0,120.0,...,True,True,"[{'chip_name': 'bboost', 'num_played': 126778}...",355.0,195.0,108.0,"{'id': 108, 'points': 16}",13130353,355.0,19.0
2,3,Gameweek 3,2023-08-25T17:30:00Z,44,True,True,9368956.0,1692984600,0,128.0,...,True,True,"[{'chip_name': 'bboost', 'num_played': 124110}...",355.0,108.0,216.0,"{'id': 216, 'points': 19}",17619532,355.0,19.0
3,4,Gameweek 4,2023-09-01T17:30:00Z,72,True,True,4354697.0,1693589400,0,148.0,...,True,True,"[{'chip_name': 'bboost', 'num_played': 109196}...",355.0,216.0,516.0,"{'id': 516, 'points': 20}",16035365,355.0,19.0
4,5,Gameweek 5,2023-09-16T10:00:00Z,44,True,True,6211182.0,1694858400,0,102.0,...,True,True,"[{'chip_name': 'bboost', 'num_played': 96469},...",355.0,516.0,344.0,"{'id': 344, 'points': 13}",14363988,355.0,19.0
5,6,Gameweek 6,2023-09-23T12:30:00Z,68,True,True,4494759.0,1695472200,0,142.0,...,True,True,"[{'chip_name': 'bboost', 'num_played': 78528},...",355.0,343.0,430.0,"{'id': 430, 'points': 18}",12109066,355.0,308.0
6,7,Gameweek 7,2023-09-30T10:00:00Z,49,True,True,9335002.0,1696068000,0,156.0,...,True,True,"[{'chip_name': 'bboost', 'num_played': 79672},...",355.0,430.0,60.0,"{'id': 60, 'points': 23}",15579302,355.0,308.0
7,8,Gameweek 8,2023-10-07T10:00:00Z,44,True,True,10068163.0,1696672800,0,120.0,...,True,True,"[{'chip_name': 'bboost', 'num_played': 53401},...",355.0,516.0,216.0,"{'id': 216, 'points': 16}",19444885,355.0,355.0
8,9,Gameweek 9,2023-10-21T10:00:00Z,67,True,True,6931177.0,1697882400,0,152.0,...,True,True,"[{'chip_name': 'bboost', 'num_played': 50175},...",355.0,60.0,423.0,"{'id': 423, 'points': 17}",11431916,355.0,308.0
9,10,Gameweek 10,2023-10-27T17:30:00Z,66,True,True,9895223.0,1698427800,0,134.0,...,True,True,"[{'chip_name': 'bboost', 'num_played': 80903},...",355.0,60.0,13.0,"{'id': 13, 'points': 17}",12428517,355.0,355.0
