##### In this brief data exploration tutorial I will be taking a look at the FPL API to retreive player stats from last season and try to come up with features that may be useful in picking a team. You can find the notebook here: 


### Importing Libraries & setting up the display to show 500 rows and 500 columns
##### We will be importing the requests library to request the fantasy premier league's API response & converting it to multiple DataFrames

In [None]:
import requests
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

### Base URL
##### This is the base url for the data, it contains several keys, but we are mostly interested in 3 keys for now: Element, Element_type & Teams. The response here provides us data for the last season now, once the new season starts it will populate current season's data. I have added the columns of the DataFrames in the next few lines. 

In [None]:
base_url = 'https://fantasy.premierleague.com/api/bootstrap-static/'

### Base URL for historical data, we will use this later
##### This is the base url for the historical data for the players, we need to add player IDs which can be found in the element DataFrame, and generate responses for the players we are interested in. We can do a moving average analysis of the players to check for form indication

In [None]:

historical_data_base_url= 'https://fantasy.premierleague.com/api/element-summary/'

### Getting the data as a json and converting into relevant dataframes

In [None]:
r = requests.get(base_url)
json = r.json()

elements_df = pd.DataFrame(json['elements'])
elements_types_df = pd.DataFrame(json['element_types'])
teams_df = pd.DataFrame(json['teams'])

In [None]:
teams_df.columns

In [None]:
elements_df.columns

In [None]:
elements_types_df.columns

### Let's have a look at one row 

In [None]:
elements_df.head(1)

In [None]:
elements_types_df.head(1)

In [None]:
teams_df.head(1)

### Element type is basically a numerical column that needs to be mapped with the element_type dataframe to retrieve player position, we get team names from the teams_df

In [None]:
elements_df['element_type'] = elements_df.element_type.map(elements_types_df.set_index('id').singular_name)

elements_df['team'] = elements_df.team.map(teams_df.set_index('id').name)

### We will use the ID column from elements to pull historical data

In [None]:
player_name_id = ['id','first_name','second_name','team']
elements_df.head(5)[player_name_id]

#### Element_type is player position which needs to be mapped with element_type df & team needs to be mapped with teams_df

In [None]:
elements_df.loc[:,'code':].head()

### Finding the best players with the provided ICT index rank 
##### They provide with an index rank for the players based on Influence, Creativity and Threat. We will take a look at the top players based on ICT index, Top points last season & Value_season which is total_points divided by current cost of the player

In [None]:
player_detail_cols = ['id','first_name','second_name','team','ict_index','total_points','now_cost','value_season']

In [None]:
elements_df.sort_values(['ict_index_rank']).head(10)[player_detail_cols]

### Finding the best players with max points last season
##### The ranking of the players change slightly, with Harry Kane coming up to number 2 instead of 3 in the ICT rank, they also provide and ICT rank type wise which is the rank in the player's position

In [None]:
elements_df.sort_values(['total_points'],ascending=False).head(10)[player_detail_cols]

### Trimming the dataframe to keep relevant columns for now
##### There are a lot of columns which we can ignore now for the timebeing. I think the following columns will be helpful in finding out a perfect team

In [None]:
trimmed_elements_df = elements_df[['first_name','second_name',\
                                   'id','ict_index',
                                'code',\
                                'team',\
                                'element_type','selected_by_percent','now_cost',\
                                'minutes','transfers_in',\
                                'ict_index_rank', 'ict_index_rank_type',\
                                'value_season','total_points','points_per_game']]

### Creating new features which will be used to create another ranking mechanism
##### Points per minute is an useful metric because if a player is a non starter, we need to pick players who have the maximum output in the shortest amount of time played, since number of games have not been provided we are taking minutes divided by 90 to calculate total matches played. Note this may not be 100% accurate

In [None]:
trimmed_elements_df['points_per_minute'] = trimmed_elements_df['total_points']/trimmed_elements_df['minutes']
trimmed_elements_df['matches'] = trimmed_elements_df['minutes']/90
trimmed_elements_df['value_season'] =trimmed_elements_df['value_season'].astype(float)


### Filtering down more with players less than 900 minutes in the season & value_seaon which is basically (total_points/cost) less than 15

In [None]:
trimmed_elements_filtered = trimmed_elements_df[(trimmed_elements_df['minutes']>900)&(trimmed_elements_df['value_season']>15)]

trimmed_elements_filtered.sort_values('value_season',ascending=False).head(11)[player_detail_cols]

### Players with lowest matches played>> These players may be potential non-starters. 

In [None]:
trimmed_elements_filtered.sort_values('matches').head()

### Correlation between ICT Index & Total Points
##### The correlation between the ICT index and total points is really high. So it would be wise to take in players with high ICT index and High value- to optimize the total budget

In [None]:
trimmed_elements_filtered['ict_index'].astype(float).corr(trimmed_elements_filtered['total_points'])

### Making a column called seaon_played to find out players with good probability of playing
##### Some of the columns are in string format, we need to convert them to float format to make some calculations

In [None]:
trimmed_elements_filtered.points_per_game = trimmed_elements_filtered.points_per_game.astype(float)

trimmed_elements_filtered['matches'] = trimmed_elements_filtered['matches'].apply(lambda x: 38 if np.ceil(x)>=38 else np.ceil(x))

trimmed_elements_filtered['season_played'] = trimmed_elements_filtered['matches']/38

trimmed_elements_filtered.head()

### Checking goal keepers with high starts in the season

In [None]:
trimmed_elements_filtered[trimmed_elements_filtered['element_type']=='Goalkeeper'].sort_values(by=['season_played'],ascending=False).head(5)

### Players who have played the whole season

In [None]:
trimmed_elements_filtered[trimmed_elements_filtered['season_played']==1].sort_values(['ict_index_rank_type'])

### Player positions with highest matches played during the season

In [None]:
trimmed_elements_filtered[trimmed_elements_filtered['season_played']==1]['element_type'].value_counts()

# Next step will be to build an budget optimizer and create a team with certain contstraints. Will be back soon! 