# First Look at FPL Data

***

In this file, I will look through all the data avialble through the FPL webiste. Here is a list of all files to be investigated:

- bootstrap-static
- element-summary
- element-types
- elements
- events
- fixtures
- region
- teams

I am looking for any data that will be helpful or benficial to include when attempting to predict the performance of selections.

***

## File Format
Before diving into anything, it's worth noting that all the data available from the FPL website is stored using the JSON (JavaScript Object Notation) format. JSON is a text-based method of storing data and data structures, and uses and attribute-value pair system. Types of data which can be stored in JSON files are numbers, strings, booleans, arrays and objects. Objects can store entire new data structures, and objects can be stored within other objects and arrays. This means data structures can be easily nested.

In order to convert JSON files into a readable format to work with, we need to import the JSON package.

>*__Note:__*  
>JSON can be viewed online using https://jsoneditoronline.org/, but the idea here is to learn how to use python to probe these data structures without relying on other tools.

In [1]:
import json

Now we can begin investigating the files downloaded through the FPL

***

## bootstrap-static

In [2]:
with open('../../data/json/bootstrap-static/bootstrap-static.json', 'r') as f:
    bootstrap_static_json = json.load(f)

First we need to look at what format the data has been read in as.

In [3]:
type(bootstrap_static_json)

dict

As the data current a dictionary, it's worth looking at the keys and listing them.

In [4]:
bootstrap_keys = list(bootstrap_static_json.keys())
print(bootstrap_keys)

['phases', 'elements', 'stats', 'game-settings', 'current-event', 'total-players', 'teams', 'element_types', 'last-entry-event', 'stats_options', 'next_event_fixtures', 'events', 'next-event']


### phases

To enhance readability for these data structures, it will be handy to convert them into Pandas DataFrames. This requires us to import the pandas package.

We will convert __phases__ to a DataFrame and use the head command to look at a snapshot of the values.

In [5]:
import pandas as pd
pd.set_option('display.max_columns', None)

pd.DataFrame(bootstrap_static_json['phases']).head()

Unnamed: 0,id,name,num_winners,start_event,stop_event
0,1,Overall,3,1,38
1,2,August,10,1,3
2,3,September,10,4,7
3,4,October,10,8,10
4,5,November,10,11,14


The __phases__ data appears to split the season by calendar month. The __num_winners__ describes the number of people eligible for prizes from each phase. The __start_event__ and __stop_event__ describe the range of gameweeks which cover the month. This particular set of data won't be useful for our model.

### elements

In [6]:
elements = pd.DataFrame(bootstrap_static_json['elements'])
elements.shape

(624, 58)

The __elements__ structure contains summary information on all the selections. Each row belongs to a different selection. We can see using the _shape_ function that there are 624 selections and 58 columns describing different aspects of the selection.

In [7]:
elements.head()

Unnamed: 0,assists,bonus,bps,chance_of_playing_next_round,chance_of_playing_this_round,clean_sheets,code,cost_change_event,cost_change_event_fall,cost_change_start,cost_change_start_fall,creativity,dreamteam_count,ea_index,element_type,ep_next,ep_this,event_points,first_name,form,goals_conceded,goals_scored,ict_index,id,in_dreamteam,influence,loaned_in,loaned_out,loans_in,loans_out,minutes,news,news_added,now_cost,own_goals,penalties_missed,penalties_saved,photo,points_per_game,red_cards,saves,second_name,selected_by_percent,special,squad_number,status,team,team_code,threat,total_points,transfers_in,transfers_in_event,transfers_out,transfers_out_event,value_form,value_season,web_name,yellow_cards
0,0,3,130,100.0,100.0,1,11334,0,0,-3,3,0.0,0,0,1,0.5,0.5,0,Petr,0.0,9,0,20.4,1,False,205.0,0,0,0,0,585,,2018-09-29T17:31:14Z,47,0,0,0,11334.jpg,3.4,0,27,Cech,1.1,False,1.0,a,1,3,0.0,24,83497,0,136211,0,0.0,5.1,Cech,0
1,0,5,568,,,6,80201,0,0,-1,1,0.0,1,0,1,3.7,3.1,3,Bernd,2.6,42,0,80.5,2,False,807.2,0,0,0,0,2835,,,49,0,0,0,80201.jpg,3.3,0,105,Leno,4.0,False,19.0,a,1,3,0.0,106,339095,0,250834,0,0.5,21.6,Leno,0
2,0,8,319,100.0,100.0,3,51507,0,0,-1,1,29.5,1,0,2,2.7,1.1,1,Laurent,0.6,23,3,59.1,3,False,456.4,0,0,0,0,1329,,2019-05-03T08:31:19Z,54,0,0,0,51507.jpg,3.6,0,0,Koscielny,0.9,False,6.0,a,1,3,105.0,62,128478,0,92187,0,0.1,11.5,Koscielny,1
3,5,5,304,0.0,0.0,4,98745,0,0,-2,2,197.8,0,0,2,0.0,0.0,0,Héctor,0.0,21,0,73.7,4,False,261.6,0,0,0,0,1532,Knee injury - Unknown return date,2019-01-19T20:01:19Z,53,1,0,0,98745.jpg,3.2,0,0,Bellerín,4.5,False,2.0,i,1,3,280.0,60,567084,0,1143684,0,0.0,11.3,Bellerín,3
4,4,7,392,100.0,100.0,5,38411,0,0,-1,1,196.6,1,0,2,3.3,2.1,2,Nacho,1.6,24,1,83.4,5,False,413.2,0,0,0,0,1860,,2019-04-28T10:31:25Z,54,0,0,0,38411.jpg,3.5,0,0,Monreal,1.3,False,18.0,a,1,3,224.0,77,298216,0,290921,0,0.3,14.3,Monreal,5


This dataset will have some constant values such as Name and ID, but most are updated on a week-by-week basis, such as total points, goals, clean sheets, influence, creativity, etc. Therefore, all values here will be representative of an enitre season. We are more interested in the breakdown of stats week by week, so I don't believe this will be useful.

### stats

In [8]:
stats = pd.DataFrame(bootstrap_static_json['stats'])
stats.head()

Unnamed: 0,headings,categories
0,"{'category': None, 'field': 'minutes', 'abbr':...",
1,"{'category': None, 'field': 'goals_scored', 'a...",
2,"{'category': None, 'field': 'assists', 'abbr':...",
3,"{'category': None, 'field': 'clean_sheets', 'a...",
4,"{'category': None, 'field': 'goals_conceded', ...",


The __stats__ dataset contains another nested dataset called __headings__, and a column of None values named __categories__. The __categories__ are not of use, so we need to investigate the __headings__ dataset.

In [9]:
stats_headings = pd.DataFrame(bootstrap_static_json['stats']['headings'])
stats_headings.head()

Unnamed: 0,abbr,category,field,label
0,,,minutes,Minutes played
1,,,goals_scored,Goals scored
2,,,assists,Assists
3,,,clean_sheets,Clean sheets
4,,,goals_conceded,Goals conceded


These are the stats which can be used to sort players in the web API when performing transfers. These won't be useful.

### game_settings

The game settings field is a dictionary data type, and contains information regarding rules of the game.

In [10]:
gs = bootstrap_static_json['game-settings']
print(list(gs.keys()))

['game', 'element_type']


Looking at the keys, we can see one is for __game__, and the other if for __element_type__. The __game__ field containts all static game values independent of element_type (player position). The __element_type__ contains game rules which are position dependent (eg. points for a goal, clean sheet, etc.). Let us take a look at the data within __game__.

In [11]:
for n in list(gs['game'].keys()):
    print(n + " : " + str(gs['game'][n]))

scoring_ea_index : 0
league_prefix_public : League
bps_tackles : 2
league_h2h_tiebreak : +goals_scored||-goals_conceded
scoring_long_play : 2
bps_recoveries_limit : 3
facebook_app_id : 337309029685327
bps_tackled : -1
bps_errors_leading_to_goal : -3
bps_yellow_cards : -3
ui_el_hide_currency_qi : True
scoring_bonus : 1
transfers_cost : 4
default_formation : [[0, 1, 0, 2, 0], [3, 4, 5, 6, 7], [8, 9, 10, 11, 12], [0, 13, 14, 15, 0]]
bps_long_play : 6
bps_long_play_limit : 60
scoring_assists : 3
scoring_long_play_limit : 60
ui_special_shirt_exclusions : []
fifa_league_id : 454539
league_size_classic_max : 20
scoring_red_cards : -3
scoring_creativity : 0
game_timezone : Europe/London
static_game_url : /static/compass/plfpl/desktop/
currency_symbol : £
bps_target_missed : -1
bps_penalties_saved : 15
ui_use_special_shirts : False
support_email_address : plfpl@mailout.fantasy.premierleague.com
cup_start_event_id : 17
scoring_penalties_saved : 5
scoring_threat : 0
scoring_saves : 1
league_join_

The __element-type__ contains information on points which depenend on the selection's position. The breakdown of points isn't necessarily important to know for the aims and objectives that have been set out, so we will ignore these for now.

In [12]:
pd.DataFrame(gs['element_type']).head()

Unnamed: 0,1,2,3,4
bps_clean_sheets,12,12.0,0.0,0.0
bps_goals_scored,12,12.0,18.0,24.0
scoring_clean_sheets,4,4.0,1.0,0.0
scoring_goals_conceded,-1,-1.0,0.0,0.0
scoring_goals_scored,6,6.0,5.0,4.0


### current-event

This is a simple counter which keeps track of the current gameweek. As this data save downloaded at the end of the season, 38 is the answer we expect.

In [13]:
print(bootstrap_static_json['current-event'])

38


### total-players

This shows the number of players that have signed up to play this season. This isn't needed, but still mildly interesting to know.

In [14]:
print(bootstrap_static_json['total-players'])

6324237


### teams

The __teams__ field is a look-up table for team names, strength and upcoming fixtures. These were originally changed week-by-week to provide a quick glance into the following week. However, there appears to be a more in-depth team dataset which may prove more useful.

In [15]:
pd.DataFrame(bootstrap_static_json['teams']).head()

Unnamed: 0,code,current_event_fixture,draw,form,id,link_url,loss,name,next_event_fixture,played,points,position,short_name,strength,strength_attack_away,strength_attack_home,strength_defence_away,strength_defence_home,strength_overall_away,strength_overall_home,team_division,unavailable,win
0,3,"[{'is_home': False, 'day': 12, 'event_day': 1,...",0,,1,,0,Arsenal,[],0,0,0,ARS,4,1270,1240,1340,1310,1320,1260,1,False,0
1,91,"[{'is_home': False, 'day': 12, 'event_day': 1,...",0,,2,,0,Bournemouth,[],0,0,0,BOU,3,1100,1040,1130,1120,1130,1030,1,False,0
2,36,"[{'is_home': True, 'day': 12, 'event_day': 1, ...",0,,3,,0,Brighton,[],0,0,0,BHA,2,1140,1040,1070,1010,1050,1030,1,False,0
3,90,"[{'is_home': True, 'day': 12, 'event_day': 1, ...",0,,4,,0,Burnley,[],0,0,0,BUR,3,1030,990,1040,1000,1100,1070,1,False,0
4,97,"[{'is_home': False, 'day': 12, 'event_day': 1,...",0,,5,,0,Cardiff,[],0,0,0,CAR,2,1060,1030,1090,1020,1080,1030,1,False,0


### element-type

This simply lists the names of the different positions a selection can have, and a few abbreviations. Could be useful to keep handy when plotting figures.

In [16]:
pd.DataFrame(bootstrap_static_json['element_types']).head()

Unnamed: 0,id,plural_name,plural_name_short,singular_name,singular_name_short
0,1,Goalkeepers,GKP,Goalkeeper,GKP
1,2,Defenders,DEF,Defender,DEF
2,3,Midfielders,MID,Midfielder,MID
3,4,Forwards,FWD,Forward,FWD


### last-entry-event

This is the final gameweek a player can play. This must be 38.

In [17]:
print(bootstrap_static_json['last-entry-event'])

38


### stats_options

This is a list of statistics that selections can be filtered by when making transfers. This isn't a full list of stats available for each selection, so this won't be usefull.

In [18]:
pd.DataFrame(bootstrap_static_json['stats_options']).head()

Unnamed: 0,key,name
0,total_points,Total score
1,event_points,Round score
2,now_cost,Price
3,selected_by_percent,Teams selected by %
4,minutes,Minutes played


### next-event-fixture

This is usually useful for displaying upcoming events at a glace, but as this data was downloaded at the end of the season there are no upcoming event and is therefore empty.

### events

This provides summaries of each gameweek, including deadline datetime, number of players, average points, highest  score, etc. 

In [19]:
pd.DataFrame(bootstrap_static_json['events']).head()

Unnamed: 0,average_entry_score,data_checked,deadline_time,deadline_time_epoch,deadline_time_formatted,deadline_time_game_offset,finished,highest_score,highest_scoring_entry,id,is_current,is_next,is_previous,name
0,53,True,2018-08-10T18:00:00Z,1533924000,10 Aug 19:00,3600,True,137,890626,1,False,False,False,Gameweek 1
1,60,True,2018-08-18T10:30:00Z,1534588200,18 Aug 11:30,3600,True,153,639556,2,False,False,False,Gameweek 2
2,50,True,2018-08-25T10:30:00Z,1535193000,25 Aug 11:30,3600,True,122,5259490,3,False,False,False,Gameweek 3
3,44,True,2018-09-01T10:30:00Z,1535797800,01 Sep 11:30,3600,True,104,2344578,4,False,False,False,Gameweek 4
4,47,True,2018-09-15T10:30:00Z,1537007400,15 Sep 11:30,3600,True,138,923926,5,False,False,False,Gameweek 5


### next-event

This would usually be some metric describing the next gameweek, but the data was saved at the end of the season so there is no value.

***

## element-summary

There is a separate element summary file for every selection, and contains a lot of information.

In [20]:
with open('../../data/json/element-summary/23.json', 'r') as f:
    element_summary = json.load(f)

Let's start by looking at the datatype:

In [21]:
print(type(element_summary))

<class 'dict'>


As the datatype is a dictionary, we should look at the list of keys to get an idea of what we are working with

In [22]:
print(list(element_summary.keys()))

['history_past', 'fixtures_summary', 'explain', 'history_summary', 'fixtures', 'history']


### history-past

Let's begin by looking at __history-past__...

In [23]:
pd.DataFrame(element_summary['history_past']).head()

Unnamed: 0,assists,bonus,bps,clean_sheets,creativity,ea_index,element_code,end_cost,goals_conceded,goals_scored,ict_index,id,influence,minutes,own_goals,penalties_missed,penalties_saved,red_cards,saves,season,season_name,start_cost,threat,total_points,yellow_cards
0,4,12,354,4,207.1,0,54694,108,15,10,127.1,8140,484.0,1056,0,1,0,0,0,12,2017/18,105,580.0,87,0


This shows that this selection has been available in the past 12 seasons, beginning his EPL career in 2006/2007 season. Summarised here are his performances from the previous year whih invlove total points, goals scored and conceded, penalities saved, etc. However these are over the coarse of the season, and cannot be investigated further. Despite these being useful for selecting an initial team, I'm not sure these stats are useful for predicting scores from upcoming fixtures.

### fixtures-summary

This field is empty.


### explain

This seems to contain information on this selection's last game of the season. It will provided a breakdown of where their points came from, as well as the fixture information.

In [24]:
pd.DataFrame(element_summary['explain'][0]['explain']).head()

Unnamed: 0,bonus,minutes,goals_scored
name,Bonus,Minutes played,Goals scored
points,3,2,8
value,3,90,2


This lists the breakdown of the selections points gained from the previosu week. The number of columns is not fixed, and will change depending on where the selections points comes from. Unfortunately this isn't useful as it only provides information on a single game, we need points from every game.

In [25]:
pd.DataFrame(element_summary['explain'][0]['fixture']).drop(columns=['stats']).drop_duplicates().head()

Unnamed: 0,id,kickoff_time_formatted,started,event_day,deadline_time,deadline_time_formatted,code,kickoff_time,team_h_score,team_a_score,finished,minutes,provisional_start_time,finished_provisional,event,team_a,team_h
0,372,12 May 15:00,True,1,2019-05-12T13:00:00Z,12 May 14:00,987963,2019-05-12T14:00:00Z,1,3,True,90,False,True,38,1,4


This is a summary of the selection's most recent fixture. In isolation, this isn't too useful.

### history_summary

This contains a full breakdown of the selections last three games. Contained here are stats such as attempted passes, big chances created, completed passes, goals scored, etc. This is almost what we wanted, we just need the selections other games.

In [26]:
pd.DataFrame(element_summary['history_summary']).head()

Unnamed: 0,assists,attempted_passes,big_chances_created,big_chances_missed,bonus,bps,clean_sheets,clearances_blocks_interceptions,completed_passes,creativity,dribbles,ea_index,element,errors_leading_to_goal,errors_leading_to_goal_attempt,fixture,fouls,goals_conceded,goals_scored,ict_index,id,influence,key_passes,kickoff_time,kickoff_time_formatted,loaned_in,loaned_out,minutes,offside,open_play_crosses,opponent_team,own_goals,penalties_conceded,penalties_missed,penalties_saved,recoveries,red_cards,round,saves,selected,tackled,tackles,target_missed,team_a_score,team_h_score,threat,total_points,transfers_balance,transfers_in,transfers_out,value,was_home,winning_goals,yellow_cards
0,0,26,1,0,0,16,0,1,15,26.2,2,0,23,0,0,355,0,3,0,4.7,19959,15.0,2,2019-04-28T11:00:00Z,28 Apr 12:00,0,0,90,0,1,11,0,0,0,0,3,0,36,0,1149854,3,2,0,0,3,6.0,2,-154965,10220,165185,109,False,0,0
1,0,22,0,1,2,26,0,0,18,25.0,3,0,23,0,0,361,0,1,1,12.3,20574,43.0,1,2019-05-05T15:30:00Z,05 May 16:30,0,0,90,0,1,3,0,0,0,0,1,0,37,0,1141592,4,0,2,1,1,55.0,8,-12687,32488,45175,108,True,0,0
2,0,21,0,2,3,47,0,0,16,3.6,1,0,23,0,0,372,0,1,2,17.3,21190,64.6,0,2019-05-12T14:00:00Z,12 May 15:00,0,0,90,0,0,4,0,0,0,0,2,0,38,0,1196750,2,0,3,3,1,105.0,13,47790,72259,24469,108,False,1,0


### fixtures

This would usually contain a list of all upcoming fixtures. As this was saved at the end of the season, there are not more fixtures so this is empty.

In [27]:
print(element_summary['fixtures'])

[]


### history

This contains a full breakdown of all the selections statistics from the entire season, including information on the fixture in which the points were achieved. This is going to be the most useful data structure.

In [28]:
pd.DataFrame(element_summary['history']).head()

Unnamed: 0,assists,attempted_passes,big_chances_created,big_chances_missed,bonus,bps,clean_sheets,clearances_blocks_interceptions,completed_passes,creativity,dribbles,ea_index,element,errors_leading_to_goal,errors_leading_to_goal_attempt,fixture,fouls,goals_conceded,goals_scored,ict_index,id,influence,key_passes,kickoff_time,kickoff_time_formatted,loaned_in,loaned_out,minutes,offside,open_play_crosses,opponent_team,own_goals,penalties_conceded,penalties_missed,penalties_saved,recoveries,red_cards,round,saves,selected,tackled,tackles,target_missed,team_a_score,team_h_score,threat,total_points,transfers_balance,transfers_in,transfers_out,value,was_home,winning_goals,yellow_cards
0,0,20,0,0,0,5,0,0,14,12.2,0,0,23,0,0,1,0,2,0,1.4,23,1.0,1,2018-08-12T15:00:00Z,12 Aug 16:00,0,0,90,1,0,13,0,0,0,0,3,0,1,0,924935,1,0,1,2,0,1.0,2,0,0,0,110,True,0,0
1,0,11,0,2,0,-4,0,1,9,11.1,0,0,23,0,0,14,1,3,0,4.7,548,4.2,1,2018-08-18T16:30:00Z,18 Aug 17:30,0,0,90,2,0,6,0,0,0,0,0,0,2,0,903963,0,0,2,2,3,32.0,2,-88089,19244,107333,110,False,0,0
2,0,24,0,0,0,12,0,0,18,52.3,3,0,23,0,0,21,0,1,0,9.4,1075,13.4,4,2018-08-25T14:00:00Z,25 Aug 15:00,0,0,74,0,1,19,0,0,0,0,0,0,3,0,865202,1,0,1,1,3,28.0,2,-60940,107697,168637,110,True,0,0
3,0,28,0,0,2,33,0,2,23,6.2,0,0,23,0,0,33,0,2,1,7.5,1606,39.6,0,2018-09-02T12:30:00Z,02 Sep 13:30,0,0,88,0,0,5,0,0,0,0,3,0,4,0,744360,3,2,0,3,2,29.0,8,-125640,22584,148224,109,False,0,0
4,1,19,0,0,0,12,1,1,14,4.0,0,0,23,0,0,46,0,0,0,1.6,2143,0.0,0,2018-09-15T14:00:00Z,15 Sep 15:00,0,0,68,0,0,15,0,0,0,0,2,0,5,0,698556,2,0,1,2,1,12.0,5,-58195,25691,83886,109,False,0,0


***

## element_types

The elements-type data structure is identical to the data strcuture found within the bootstrap-static file. As this is a separate file with a smaller size, it will be easier to use this as the look up table.

In [29]:
with open('../../data/json/element-types/element-types.json', 'r') as f:
    element_types = json.load(f)
    
pd.DataFrame(element_types).head()

Unnamed: 0,id,plural_name,plural_name_short,singular_name,singular_name_short
0,1,Goalkeepers,GKP,Goalkeeper,GKP
1,2,Defenders,DEF,Defender,DEF
2,3,Midfielders,MID,Midfielder,MID
3,4,Forwards,FWD,Forward,FWD


***

## elements

This file contains information on the selections, including first name, second name, display name, squad number, team, position. This will be very important to keep track of, as we will have to split selections by position in order build models.

In [41]:
with open('../../data/json/elements/elements.json', 'r') as f:
    elements = json.load(f)
    
pd.DataFrame(elements).loc[[0,1,2,623],:].head()

Unnamed: 0,assists,bonus,bps,chance_of_playing_next_round,chance_of_playing_this_round,clean_sheets,code,cost_change_event,cost_change_event_fall,cost_change_start,cost_change_start_fall,creativity,dreamteam_count,ea_index,element_type,ep_next,ep_this,event_points,first_name,form,goals_conceded,goals_scored,ict_index,id,in_dreamteam,influence,loaned_in,loaned_out,loans_in,loans_out,minutes,news,news_added,now_cost,own_goals,penalties_missed,penalties_saved,photo,points_per_game,red_cards,saves,second_name,selected_by_percent,special,squad_number,status,team,team_code,threat,total_points,transfers_in,transfers_in_event,transfers_out,transfers_out_event,value_form,value_season,web_name,yellow_cards
0,0,3,130,100.0,100.0,1,11334,0,0,-3,3,0.0,0,0,1,0.5,0.5,0,Petr,0.0,9,0,20.4,1,False,205.0,0,0,0,0,585,,2018-09-29T17:31:14Z,47,0,0,0,11334.jpg,3.4,0,27,Cech,1.1,False,1.0,a,1,3,0.0,24,83497,0,136211,0,0.0,5.1,Cech,0
1,0,5,568,,,6,80201,0,0,-1,1,0.0,1,0,1,3.7,3.1,3,Bernd,2.6,42,0,80.5,2,False,807.2,0,0,0,0,2835,,,49,0,0,0,80201.jpg,3.3,0,105,Leno,4.0,False,19.0,a,1,3,0.0,106,339095,0,250834,0,0.5,21.6,Leno,0
2,0,8,319,100.0,100.0,3,51507,0,0,-1,1,29.5,1,0,2,2.7,1.1,1,Laurent,0.6,23,3,59.1,3,False,456.4,0,0,0,0,1329,,2019-05-03T08:31:19Z,54,0,0,0,51507.jpg,3.6,0,0,Koscielny,0.9,False,6.0,a,1,3,105.0,62,128478,0,92187,0,0.1,11.5,Koscielny,1
623,0,0,0,,,0,215409,0,0,0,0,0.0,0,0,2,-1.0,-1.0,0,Cameron,0.0,0,0,0.0,596,False,0.0,0,0,0,0,0,,,40,0,0,0,215409.jpg,0.0,0,0,John,0.1,False,47.0,a,20,39,0.0,0,4794,0,2057,0,0.0,0.0,John,0


***

## events

Events contains information regarding each gameweek, such as deadline, highest score, averag score etc. There isn't really anything here we can use.

In [31]:
with open('../../data/json/events/events.json', 'r') as f:
    events = json.load(f)
    
pd.DataFrame(events).head()

Unnamed: 0,average_entry_score,data_checked,deadline_time,deadline_time_epoch,deadline_time_formatted,deadline_time_game_offset,finished,highest_score,highest_scoring_entry,id,is_current,is_next,is_previous,name
0,53,True,2018-08-10T18:00:00Z,1533924000,10 Aug 19:00,3600,True,137,890626,1,False,False,False,Gameweek 1
1,60,True,2018-08-18T10:30:00Z,1534588200,18 Aug 11:30,3600,True,153,639556,2,False,False,False,Gameweek 2
2,50,True,2018-08-25T10:30:00Z,1535193000,25 Aug 11:30,3600,True,122,5259490,3,False,False,False,Gameweek 3
3,44,True,2018-09-01T10:30:00Z,1535797800,01 Sep 11:30,3600,True,104,2344578,4,False,False,False,Gameweek 4
4,47,True,2018-09-15T10:30:00Z,1537007400,15 Sep 11:30,3600,True,138,923926,5,False,False,False,Gameweek 5


***

## fixtures

The fixtures files contains full information on every fixture of the season. Information included is home and away team, kickoff datetime, fixture ID, final score, etc. This will be important for determining running points totals, goals concedede (home and away), etc.

In [37]:
with open('../../data/json/fixtures/fixtures.json', 'r') as f:
    fixtures = json.load(f)
    
pd.DataFrame(fixtures).loc[[0,1,2,379],:]

Unnamed: 0,code,deadline_time,deadline_time_formatted,event,event_day,finished,finished_provisional,id,kickoff_time,kickoff_time_formatted,minutes,provisional_start_time,started,stats,team_a,team_a_difficulty,team_a_score,team_h,team_h_difficulty,team_h_score
0,987597,2018-08-10T18:00:00Z,10 Aug 19:00,1,1,True,True,6,2018-08-10T19:00:00Z,10 Aug 20:00,90,False,True,"[{'goals_scored': {'a': [{'value': 1, 'element...",11,4,1,14,3,2
1,987598,2018-08-10T18:00:00Z,10 Aug 19:00,1,2,True,True,7,2018-08-11T11:30:00Z,11 Aug 12:30,90,False,True,"[{'goals_scored': {'a': [{'value': 1, 'element...",17,3,2,15,4,1
2,987592,2018-08-10T18:00:00Z,10 Aug 19:00,1,2,True,True,2,2018-08-11T14:00:00Z,11 Aug 15:00,90,False,True,"[{'goals_scored': {'a': [], 'h': [{'value': 1,...",5,3,0,2,2,2
379,987971,2019-05-12T13:00:00Z,12 May 14:00,38,1,True,True,380,2019-05-12T14:00:00Z,12 May 15:00,90,False,True,"[{'goals_scored': {'a': [{'value': 2, 'element...",19,3,4,18,3,1


***

## region

This contains the possible locations the player can be playing from. Not important for the project.

In [33]:
with open('../../data/json/region/region.json', 'r') as f:
    region = json.load(f)
    
pd.DataFrame(region).head()

Unnamed: 0,id,iso_code_long,iso_code_short,name
0,1,AFG,AF,Afghanistan
1,2,ALB,AL,Albania
2,3,DZA,DZ,Algeria
3,4,ASM,AS,American Samoa
4,5,AND,AD,Andorra


***

## teams

This is identical to the team data structure within the bootstrap-static file. It contains the team name, ID, short name, home and away strength ratings, etc. This will be useful to keep handy when engineering features as it can be used as a look-up table.

In [40]:
with open('../../data/json/teams/teams.json', 'r') as f:
    teams = json.load(f)
    
pd.DataFrame(teams).loc[[0,1,2,19],:].head()

Unnamed: 0,code,current_event_fixture,draw,form,id,link_url,loss,name,next_event_fixture,played,points,position,short_name,strength,strength_attack_away,strength_attack_home,strength_defence_away,strength_defence_home,strength_overall_away,strength_overall_home,team_division,unavailable,win
0,3,"[{'is_home': False, 'month': 5, 'event_day': 1...",0,,1,,0,Arsenal,[],0,0,0,ARS,4,1270,1240,1340,1310,1320,1260,1,False,0
1,91,"[{'is_home': False, 'month': 5, 'event_day': 1...",0,,2,,0,Bournemouth,[],0,0,0,BOU,3,1100,1040,1130,1120,1130,1030,1,False,0
2,36,"[{'is_home': True, 'month': 5, 'event_day': 1,...",0,,3,,0,Brighton,[],0,0,0,BHA,2,1140,1040,1070,1010,1050,1030,1,False,0
19,39,"[{'is_home': False, 'month': 5, 'event_day': 1...",0,,20,,0,Wolves,[],0,0,0,WOL,3,1200,1180,1100,1080,1150,1130,1,False,0


***

# Summary

From investigating the avaialble data, there are three useful datasets that can be used.

- element-summary > history
- fixtures
- teams
- elements

The __element-summary > history__ contains the points obtained from each game for each selection. This is the most important data structure.  

The __fixtures__ will be important to know as knowing who the selection earned points against will be needed to engineer features.

The __teams__ dataset will be useful as a look-up table for plotting figures when team names are needed. The home and away strength are alos included which may be useful for features (although I'm unsure how they are calculated).

The __elements__ dataset contained information on the individual selections, such as squad number, position, team, etc., which are all very useful for building the model.

### Next Steps

The next step would be to look at these three datasets in a bit more detail and clean up the information contained.