# First Look at FPL Data

In this file, I will look through all the data avialble through the FPL webiste. Here is a list of all files to be investigated:

- bootstrap-static
- element-summary
- element-types
- elements
- events
- fixtures
- region
- teams

## File Format
Before diving into anything, it's worth noting that all the data available from the FPL website is stored using the JSON (JavaScript Object Notation) format. JSON is a text-based method of storing data and data structures, and uses and attribute-value pair system. Types of data which can be stored in JSON files are numbers, strings, booleans, arrays and objects. Objects can store entire new data structures, and objects can be stored within other objects and arrays. This means data structures can be easily nested.

In order to convert JSON files into a readable format to work with, we need to import the JSON package.

In [2]:
import json

Now we can begin investigating the files downloaded through the FPL 

## bootstrap-static

In [3]:
with open('../../data/json/bootstrap-static/bootstrap-static.json', 'r') as f:
    bootstrap_static_json = json.load(f)

First we need to look at what format the data has been read in as.

In [4]:
type(bootstrap_static_json)

dict

As the data current a dictionary, it's worth looking at the keys and listing them.

In [5]:
bootstrap_keys = list(bootstrap_static_json.keys())
print(bootstrap_keys)

['phases', 'elements', 'stats', 'game-settings', 'current-event', 'total-players', 'teams', 'element_types', 'last-entry-event', 'stats_options', 'next_event_fixtures', 'events', 'next-event']


### phases

To enhance readability for these data structures, it will be handy to convert them into Pandas DataFrames. This requires us to import the pandas package.

We will convert __phases__ to a DataFrame and use the head command to look at a snapshot of the values.

In [6]:
import pandas as pd
pd.DataFrame(bootstrap_static_json['phases']).head()

Unnamed: 0,id,name,num_winners,start_event,stop_event
0,1,Overall,3,1,38
1,2,August,10,1,3
2,3,September,10,4,7
3,4,October,10,8,10
4,5,November,10,11,14


The __phases__ data appears to split the season by calendar month. The __num_winners__ describes the number of people eligible for prizes from each phase. The __start_event__ and __stop_event__ describe the range of gameweeks which cover the month. This particular set of data won't be useful for our model.

### elements

In [7]:
elements = pd.DataFrame(bootstrap_static_json['elements'])
elements.shape

(624, 58)

The __elements__ structure contains summary information on all the selections. Each row belongs to a different selection. We can see using the _shape_ function that there are 624 selections and 58 columns describing different aspects of the selection.

In [8]:
elements.head()

Unnamed: 0,assists,bonus,bps,chance_of_playing_next_round,chance_of_playing_this_round,clean_sheets,code,cost_change_event,cost_change_event_fall,cost_change_start,...,threat,total_points,transfers_in,transfers_in_event,transfers_out,transfers_out_event,value_form,value_season,web_name,yellow_cards
0,0,3,130,100.0,100.0,1,11334,0,0,-3,...,0.0,24,83497,0,136211,0,0.0,5.1,Cech,0
1,0,5,568,,,6,80201,0,0,-1,...,0.0,106,339095,0,250834,0,0.5,21.6,Leno,0
2,0,8,319,100.0,100.0,3,51507,0,0,-1,...,105.0,62,128478,0,92187,0,0.1,11.5,Koscielny,1
3,5,5,304,0.0,0.0,4,98745,0,0,-2,...,280.0,60,567084,0,1143684,0,0.0,11.3,Bellerín,3
4,4,7,392,100.0,100.0,5,38411,0,0,-1,...,224.0,77,298216,0,290921,0,0.3,14.3,Monreal,5


This dataset will have some constant values such as Name and ID, but most are updated on a week-by-week basis, such as total points, goals, clean sheets, influence, creativity, etc. Therefore, all values here will be representative of an enitre season. We are more interested in the breakdown of stats week by week, so I don't believe this will be useful.

### stats

In [16]:
stats = pd.DataFrame(bootstrap_static_json['stats'])
stats.head()

Unnamed: 0,headings,categories
0,"{'category': None, 'field': 'minutes', 'abbr':...",
1,"{'category': None, 'field': 'goals_scored', 'a...",
2,"{'category': None, 'field': 'assists', 'abbr':...",
3,"{'category': None, 'field': 'clean_sheets', 'a...",
4,"{'category': None, 'field': 'goals_conceded', ...",


The __stats__ dataset contains another nested dataset called __headings__, and a column of None values named __categories__. The __categories__ are not of use, so we need to investigate the __headings__ dataset.

In [18]:
stats_headings = pd.DataFrame(bootstrap_static_json['stats']['headings'])
stats_headings.head()

Unnamed: 0,abbr,category,field,label
0,,,minutes,Minutes played
1,,,goals_scored,Goals scored
2,,,assists,Assists
3,,,clean_sheets,Clean sheets
4,,,goals_conceded,Goals conceded


These are the stats which can be used to sort players in the web API when performing transfers. These won't be useful.

### game_settings

In [37]:
game_settings = bootstrap_static_json['game-settings']['game']