## FPL Fantasy Premier League Data Analysis

This notebook is intended to be the formal ADE for FPL data. It collects
useful data from other notebooks in this directory.  The primary source
is the data collected by vaastav in https://github.com/vaastav/Fantasy-Premier-League.git

We use that data as the basis of the analysis.  The goal is to figure out which elements
in the data are useful for building a model that can make weekly recommendations for
a user of the FPL game.


In [2]:
# Imports to collect data and 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# I use emacs ein.  These settings make the plots visible with dark emacs background
plt.rcParams["figure.facecolor"] = "white"
plt.rcParams["axes.facecolor"] = "white"
plt.rcParams["savefig.facecolor"] = "white"


In [5]:
# load all four years of data for players.  Start with the raw data.

years = ['2016-17', '2017-18', '2018-19', '2019-20']

praw = {}
for y in years:
    praw[y] = pd.read_csv("data/{:}/players_raw.csv".format(y), index_col='id')
    praw[y].info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 683 entries, 1 to 683
Data columns (total 56 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   assists                       683 non-null    int64  
 1   bonus                         683 non-null    int64  
 2   bps                           683 non-null    int64  
 3   chance_of_playing_next_round  683 non-null    object 
 4   chance_of_playing_this_round  683 non-null    object 
 5   clean_sheets                  683 non-null    int64  
 6   code                          683 non-null    int64  
 7   cost_change_event             683 non-null    int64  
 8   cost_change_event_fall        683 non-null    int64  
 9   cost_change_start             683 non-null    int64  
 10  cost_change_start_fall        683 non-null    int64  
 11  creativity                    683 non-null    float64
 12  dreamteam_count               683 non-null    int64  
 13  ea_in

## Player table

The player table contains between 600-666 players each year and 56
columns on the first three years, 59 columns in the fourth year. The
following columns were added *(creativity_rank, creativity_rank_type,
influence_rank, influence_rank_type, ict_index_rank,
ict_index_rank_type, threat_rank, threat_rank_type, news_added)*, the
following columns were removed *(loans_in, loans_out, loaned_in,
loaned_out, ea_index)*

For yearly summary training, the following columns are not believed to
be useful *(photo, news, news_added, in_dreamteam,fist_name, web_name, second_name)*

The following columns are not useful for yearly analysis but are
probably highly likely to be useful for weekly predictions
*(chance_of_playing_next_round, chance_of_playing_this_round, ep_this,
ep_next, cost_change_event, cost_change_event_fall)*.

Note the column *ep_this* and *ep_next* which is FPL's prediction of
points this and next game. We will remove that as it is likely to
cause over-fitting on our own model.

In [19]:
useful_columns = ['assists',
                  'bonus',
                  'bps',
#                  'chance_of_playing_next_round',
#                  'chance_of_playing_this_round',
                  'clean_sheets',
                  'code',
#                  'cost_change_event',
#                  'cost_change_event_fall',
                  'cost_change_start',
                  'cost_change_start_fall',
                  'creativity',
                  'dreamteam_count',
#                  'ea_index',
                  'element_type',
#                  'ep_next',
#                  'ep_this',
                  'event_points',
                  'form',
                  'goals_conceded',
                  'goals_scored',
                  'ict_index',
#                  'id',
                  'in_dreamteam',
                  'influence',
                  'minutes',
                  'now_cost',
                  'own_goals',
                  'penalties_missed',
                  'penalties_saved',
                  'points_per_game',
                  'red_cards',
                  'saves',
                  'selected_by_percent',
                  'special',
                  'squad_number',
                  'status',
                  'team',
                  'team_code',
                  'threat',
                  'total_points',
                  'transfers_in',
                  'transfers_in_event',
                  'transfers_out',
                  'transfers_out_event',
                  'value_form',
                  'value_season',
                  'yellow_cards']


c = {}
for i in useful_columns:
    print("\n*** {:} ***".format(i))
    for y in years:
        c[i] = praw[y][i]
        print("=== {:} ===".format(y))
        print(c[i].describe())

    


*** assists ***
=== 2016-17 ===
count    683.000000
mean       1.373353
std        2.516163
min        0.000000
25%        0.000000
50%        0.000000
75%        2.000000
max       21.000000
Name: assists, dtype: float64
=== 2017-18 ===
count    647.000000
mean       1.401855
std        2.508659
min        0.000000
25%        0.000000
50%        0.000000
75%        2.000000
max       18.000000
Name: assists, dtype: float64
=== 2018-19 ===
count    624.000000
mean       1.495192
std        2.536408
min        0.000000
25%        0.000000
50%        0.000000
75%        2.000000
max       15.000000
Name: assists, dtype: float64
=== 2019-20 ===
count    666.000000
mean       1.352853
std        2.395509
min        0.000000
25%        0.000000
50%        0.000000
75%        2.000000
max       23.000000
Name: assists, dtype: float64

*** bonus ***
=== 2016-17 ===
count    683.000000
mean       3.582723
std        5.724629
min        0.000000
25%        0.000000
50%        0.000000
75%     


=== 2018-19 ===
count    624.000000
mean       0.030449
std        0.220977
min        0.000000
25%        0.000000
50%        0.000000
75%        0.000000
max        3.000000
Name: penalties_missed, dtype: float64
=== 2019-20 ===
count    666.000000
mean       0.030030
std        0.195434
min        0.000000
25%        0.000000
50%        0.000000
75%        0.000000
max        2.000000
Name: penalties_missed, dtype: float64

*** penalties_saved ***
=== 2016-17 ===
count    683.000000
mean       0.024890
std        0.189833
min        0.000000
25%        0.000000
50%        0.000000
75%        0.000000
max        2.000000
Name: penalties_saved, dtype: float64
=== 2017-18 ===
count    647.000000
mean       0.032457
std        0.230487
min        0.000000
25%        0.000000
50%        0.000000
75%        0.000000
max        3.000000
Name: penalties_saved, dtype: float64
=== 2018-19 ===
count    624.000000
mean       0.025641
std        0.217935
min        0.000000
25%        0.000000
