## Import Libraries
Let's import our fpl_draft_league tool and alias it as fpl. 

In [1]:
import sys
sys.path.append("../") # Enables importing from parent directory
import fpl_draft_league.fpl_draft_league as fpl
import fpl_draft_league.utils as utils

## Getting the data from draft.premierleague.com


In [6]:
utils.get_json('lee.gower17@gmail.com')

Enter Password:  ·······


## Inspecting the Data

Using `fpl.get_dataframes(json_file)` we can pull 3 useful dataframes! 
* League entries
* Matches
* Current standings

In [2]:
league_entry_df = utils.get_data('league_entries')
matches_df = utils.get_data('matches')
standings_df = utils.get_data('standings')

The league entries dataframe contains all 10 league participants, with some IDs, names and waiver picks. Most useful bit here is probably a lookup between names, team names and ids. Also the waiver pick may be interesting to compare to performance!!

In [3]:
league_entry_df.head()

Unnamed: 0,entry_id,entry_name,id,joined_time,player_first_name,player_last_name,short_name,waiver_pick
0,144241,HanksYouAndGoodnight,144429,2019-07-26T13:41:41.866344Z,Benji,Hanks,BH,4
1,173515,RP’s Perks,173794,2019-07-29T14:42:11.017260Z,Rebecca,Perkins,RP,2
2,173968,Wado Wanderers,174248,2019-07-29T15:18:11.896421Z,John,Wadelin,JW,3
3,174011,Liquid Football,174291,2019-07-29T15:22:02.928452Z,James,Fry,JF,6
4,174687,Mein Bergkampf,174967,2019-07-29T16:15:32.391771Z,Liam,Gower,LG,8


The standings dataframe is again quite obvious, 10 rows for each team and their points, their score, their rank. Cool. The only thing is that this is a "BC" view, (Business Current)... it would be cool to see the rankings over time so you can see movers and shakers.

In [4]:
standings_df.head()

Unnamed: 0,last_rank,league_entry,matches_drawn,matches_lost,matches_played,matches_won,points_against,points_for,rank,rank_sort,total
0,1,199619,0,7,38,18,873,1113,1,1,54
1,2,361492,0,9,38,16,926,941,2,2,48
2,3,174967,2,10,38,13,937,1028,3,3,41
3,4,405873,0,12,38,13,1001,1100,4,4,39
4,5,174291,0,12,38,13,973,1014,5,5,39


The matches dataframe has every match, including unplayed matches and details about who played who, who scored and so on. The `winning_league_entry` and `winning_method` are all "None" so I'm not exactly sure what this is. 

In [5]:
matches_df.head()

Unnamed: 0,event,finished,league_entry_1,league_entry_1_points,league_entry_2,league_entry_2_points,started,winning_league_entry,winning_method
0,1,True,197646,46,144429,43,True,,
1,1,True,199619,36,177091,30,True,,
2,1,True,361492,40,174967,30,True,,
3,1,True,405873,69,174291,51,True,,
4,1,True,173794,23,174248,68,True,,


## Standings Over Time

The first thing I want to explore is league standings over time (week by week). 

I realise that with all of the match data in `matches_df` I can essentially rebuild the history of standings. The only tricky thing is that the `matches_df` is a row per matchup, not a row per team's match. This makes it difficult to plot because I basically need a row by row of team, week, result.

The `fpl.get_points_over_time` function will basically produce a row per team's match, and then produce a plot of the standings over time for you.

In [7]:
stacked_df = fpl.get_matches_stacked(matches_df, league_entry_df)

## Streaks
The next thing I want to explore are winning streaks.
* Who holds the record?!
* Who is someone to watch out for on a hot current streak?

In [8]:
df = fpl.get_streaks(stacked_df)
df.head()

Unnamed: 0,match,team,score,points,margin,binary,streak
2,1,Benji,43,0,-3,-1,-1
18,2,Benji,25,3,1,1,1
22,3,Benji,27,0,-12,-1,-1
30,4,Benji,38,3,9,1,1
43,5,Benji,20,0,-10,-1,-1


### What are people's record streaks?

In [9]:
df[['team', 'streak']].groupby(['team']).max().sort_values(by='streak', ascending=False)

Unnamed: 0_level_0,streak
team,Unnamed: 1_level_1
Cory,5
Dave,5
ben,5
Liam,4
Benji,3
Huw,3
James,3
Rebecca,3
Thomas,3
John,2


### Who's on the hot streak now?

In [10]:
df[df['match'] == df.match.max()].sort_values(by='streak', ascending=False)

Unnamed: 0,match,team,score,points,margin,binary,streak
241,25,Dave,52,3,17,1,5
244,25,James,54,3,9,1,2
246,25,Rebecca,33,3,3,1,2
240,25,Benji,47,3,7,1,1
249,25,Thomas,42,3,2,1,1
247,25,Cory,40,0,-7,-1,-1
243,25,Liam,40,0,-2,-1,-1
248,25,Huw,45,0,-9,-1,-2
245,25,John,35,0,-17,-1,-3
242,25,ben,30,0,-3,-1,-3


In [8]:
matches_group = matches_df_stacked.groupby('match')

In [None]:
matches_group.groups

In [24]:
gw_highscores = matches_df_stacked.iloc[matches_group['score'].idxmax()]

In [29]:
gw_highscores

Unnamed: 0,match,team,score,points
3,1,Cory,69,3
13,2,Thomas,39,3
23,3,Liam,47,3
37,4,Liam,63,3
49,5,Cory,58,3
52,6,Liam,80,3
60,7,Benji,63,3
76,8,Dave,58,3
86,9,Dave,59,3
99,10,ben,83,3


In [28]:
gw_highscores[['team','score']].groupby('team').count().sort_values(by='score', ascending=False)

Unnamed: 0_level_0,score
team,Unnamed: 1_level_1
Cory,6
Liam,4
Dave,3
James,2
ben,2
Benji,1
Huw,1
Rebecca,1
Thomas,1


In [17]:
def find_highscores(group):
    
    group['gw_highscore_index'] = group['score'].idxmax()
    
    return group

In [18]:
df = find_highscores(matches_group)

TypeError: 'DataFrameGroupBy' object does not support item assignment