In [27]:
import pandas as pd
import numpy as np
import os
import sys
from pathlib import Path
from IPython.display import display

sys.path.append('..')

In [28]:
# path to project directory
path = Path('../')

In [29]:
# read in training dataset
train_df = pd.read_csv(path/'fpl_predictor/data/train_v8.csv', 
                       index_col=0, 
                       dtype={'season':str,
                              'squad':str,
                              'comp':str})

## The FPL dataset

These are the fields in the base dataset which are updated after the conclusion of every gameweek.

In [30]:
# summary of fields
train_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 114801 entries, 0 to 114802
Data columns (total 59 columns):
 #   Column                                      Non-Null Count   Dtype  
---  ------                                      --------------   -----  
 0   player                                      114801 non-null  object 
 1   gw                                          114801 non-null  int64  
 2   position                                    114801 non-null  int64  
 3   minutes                                     114801 non-null  int64  
 4   team                                        114801 non-null  object 
 5   opponent_team                               114801 non-null  object 
 6   relative_market_value_team                  46738 non-null   float64
 7   relative_market_value_opponent_team         46715 non-null   float64
 8   was_home                                    114801 non-null  bool   
 9   total_points                                114801 non-null  int64  
 

Each row represents one player's performance in a single fixture, and will be unique across the player's name, their team, and kickoff time fields:

- player (player name)
- team (the player's team)
- kickoff_time (kickoff time for the fixture)

The fixtures are futher defined with the following fields:

- opponent_team (the opposition team)
- was_home (was it a home game for the player)
- season (e.g. '1920' for the 2019/20 season)
- gw (the FPL gameweek in which the fixture occured)
- crowds (were there crowds present at the match)

There is evidence that the lack of crowds during the Covid-19 pandemic reduced the home advantage so I added a field for this. All games played between the 15th March 2020 and 17th June 2021 are marked as having no crowds (in practice some fans were allowed in certain stadiums in November 2020, but it's relatively few matches with very few fans so I just left them all as false).

Note that there can be multiple fixtures (i.e. rows for a given player) in a single gameweek - so called double or triple gameweeks.

The position that a player plays is also given, this will be consistent for each player within seasons, but may change between seasons:

- position (1 - goalkeeper, 2 - defender, 3 - midfielder, 4 - forward)

There are then fields with the player (or team's) FPL metrics for fixture e.g. the number of munites played, points scored, assists, goals, goals conceded while on the field, etc. Anything that is used in the game. This should be 100% complete for all rows.

There are also further stats from the start of the 2017/18 season, taken from sources outside of the FPL game, such as expected goals, expected assists, passes, dribbles, interceptions, etc. These will only be present if the player played (i.e. had at least 1 in the minutes field) in the fixture, otherwise the fields are null.

Other incomplete fields for FPL data are:

- transfer and selected values (transfers_in, transfers_out, transfers_balance, selected) - these were only collected from the start of the 2019/20 season, and require further investigation as to what they actually represent (in other words, treat with caution when modelling); values prior to the 2019/20 are set to 0
- play_proba - again only collected from the start of the 2019/20 season, this is the probability that the the player would actually be available for the fixture according to the FPL website (note that the time that this is captured each week  varies); values prior to the 2019/20 are null, and they are also null for any new players in a given gameweek (i.e. players that FPL has added to the game during that gameweek)

Finally, team transfer market value is taken from transfermarkt each week (from the 2019/20 season onwards) or a single value has been taken for the whole season:

- relative_market_value_team - the market value for the team taken during that gameweek (non null from start of 2019/20 season)
- relative_market_value_opponent_team - the market value for the opposition team taken during that gameweek (non null from start of 2019/20 season)
- relative_market_value_team_season - a single value for the team's value from the the start of each season 
- relative_market_value_opponent_team_season - a single value for the opposition team's value from the the start of each season 

In [5]:
# take a look at some data
pd.options.display.max_columns = None
train_df.head(10)

Unnamed: 0,player,gw,position,minutes,team,opponent_team,relative_market_value_team,relative_market_value_opponent_team,was_home,total_points,assists,bonus,bps,clean_sheets,creativity,goals_conceded,goals_scored,ict_index,influence,own_goals,penalties_missed,penalties_saved,red_cards,saves,selected,team_a_score,team_h_score,threat,transfers_balance,transfers_in,transfers_out,yellow_cards,kickoff_time,season,play_proba,relative_market_value_team_season,relative_market_value_opponent_team_season,date,squad,comp,shots_total,shots_on_target,touches,pressures,tackles,interceptions,blocks,xg,npxg,xa,sca,gca,passes_completed,passes,passes_pct,carries,dribbles_completed,dribbles,crowds
0,Aaron Cresswell,1,2,0,West Ham United,Chelsea,,,False,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,14023,1,2,0.0,0,0,0,0,2016-08-15T19:00:00Z,1617,,0.895471,2.243698,2016-08-15,,,,,,,,,,,,,,,,,,,,,True
1,Aaron Lennon,1,3,15,Everton,Tottenham Hotspur,,,True,1,0,0,6,0,0.3,0,0,0.9,8.2,0,0,0,0,0,13918,1,1,0.0,0,0,0,0,2016-08-13T14:00:00Z,1617,,1.057509,1.43369,2016-08-13,,,,,,,,,,,,,,,,,,,,,True
2,Aaron Ramsey,1,3,60,Arsenal,Liverpool,,,True,2,0,0,5,0,4.9,3,0,3.0,2.2,0,0,0,0,0,163170,4,3,23.0,0,0,0,0,2016-08-14T15:00:00Z,1617,,1.944129,1.46586,2016-08-14,,,,,,,,,,,,,,,,,,,,,True
3,Abdoulaye Doucouré,1,3,0,Watford,Southampton,,,False,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,1051,1,1,0.0,0,0,0,0,2016-08-13T14:00:00Z,1617,,0.7042,0.796805,2016-08-13,,,,,,,,,,,,,,,,,,,,,True
4,Abdul Rahman Baba,1,2,0,Chelsea,West Ham United,,,True,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,1243,1,2,0.0,0,0,0,0,2016-08-15T19:00:00Z,1617,,2.243698,0.895471,2016-08-15,,,,,,,,,,,,,,,,,,,,,True
5,Abel Hernández,1,4,90,Hull City,Leicester City,,,True,5,1,0,10,0,12.2,1,0,5.7,14.4,0,0,0,0,0,26039,1,2,30.0,0,0,0,0,2016-08-13T11:30:00Z,1617,,0.494447,0.650832,2016-08-13,,,,,,,,,,,,,,,,,,,,,True
6,Adama Diomande,1,4,90,Hull City,Leicester City,,,True,8,0,2,29,0,16.8,1,1,10.7,45.2,0,0,0,0,0,38151,1,2,45.0,0,0,0,0,2016-08-13T11:30:00Z,1617,,0.494447,0.650832,2016-08-13,,,,,,,,,,,,,,,,,,,,,True
7,Adam Clayton,1,3,90,Middlesbrough,Stoke City,,,True,2,0,0,6,0,2.2,1,0,1.4,3.2,0,0,0,0,0,17663,1,1,9.0,0,0,0,0,2016-08-13T14:00:00Z,1617,,0.452793,0.718705,2016-08-13,,,,,,,,,,,,,,,,,,,,,True
8,Adam Federici,1,1,0,Bournemouth,Manchester United,,,True,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,4315,3,1,0.0,0,0,0,0,2016-08-14T12:30:00Z,1617,,0.384921,1.983179,2016-08-14,,,,,,,,,,,,,,,,,,,,,True
9,Adam Forshaw,1,3,69,Middlesbrough,Stoke City,,,True,1,0,0,3,0,1.3,1,0,0.3,2.0,0,0,0,0,0,2723,1,1,0.0,0,0,0,1,2016-08-13T14:00:00Z,1617,,0.452793,0.718705,2016-08-13,,,,,,,,,,,,,,,,,,,,,True


In [6]:
# looking at start of 2017/18 season when the additional stats are available
train_df[train_df['season'] == '1718'].head(10)

Unnamed: 0,player,gw,position,minutes,team,opponent_team,relative_market_value_team,relative_market_value_opponent_team,was_home,total_points,assists,bonus,bps,clean_sheets,creativity,goals_conceded,goals_scored,ict_index,influence,own_goals,penalties_missed,penalties_saved,red_cards,saves,selected,team_a_score,team_h_score,threat,transfers_balance,transfers_in,transfers_out,yellow_cards,kickoff_time,season,play_proba,relative_market_value_team_season,relative_market_value_opponent_team_season,date,squad,comp,shots_total,shots_on_target,touches,pressures,tackles,interceptions,blocks,xg,npxg,xa,sca,gca,passes_completed,passes,passes_pct,carries,dribbles_completed,dribbles,crowds
23679,Aaron Cresswell,1,2,9,West Ham United,Manchester United,,,False,0,0,0,3,0,0.6,2,0,1.9,0.4,0,0,0,0,0,25136,0,4,18.0,0,0,0,0,2017-08-13T15:00:00Z,1718,,0.86633,2.110135,2017-08-13,West Ham,Premier League,1.0,0.0,15.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,9.0,12.0,75.0,11.0,0.0,0.0,True
23680,Aaron Lennon,1,3,0,Everton,Stoke City,,,True,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,4681,0,1,0.0,0,0,0,0,2017-08-12T14:00:00Z,1718,,1.134226,0.581587,2017-08-12,,,,,,,,,,,,,,,,,,,,,True
23681,Aaron Mooy,1,3,90,Huddersfield Town,Crystal Palace,,,False,6,1,0,22,1,46.9,0,0,8.7,40.2,0,0,0,0,0,59955,3,0,0.0,0,0,0,0,2017-08-12T14:00:00Z,1718,,0.210654,0.635984,2017-08-12,Huddersfield,Premier League,0.0,0.0,71.0,11.0,2.0,2.0,1.0,0.0,0.0,0.5,4.0,1.0,45.0,64.0,70.3,44.0,1.0,1.0,True
23682,Aaron Ramsey,1,3,23,Arsenal,Leicester City,,,True,6,0,0,16,0,11.2,0,1,6.7,29.6,0,0,0,0,0,33792,3,4,26.0,0,0,0,0,2017-08-11T18:45:00Z,1718,,2.0735,0.824624,2017-08-11,Arsenal,Premier League,4.0,1.0,19.0,6.0,0.0,0.0,1.0,0.3,0.3,0.2,2.0,0.0,9.0,11.0,81.8,13.0,0.0,0.0,True
23683,Abdoulaye Doucouré,1,3,90,Watford,Liverpool,,,True,9,0,2,36,0,25.2,3,1,10.9,48.6,0,0,0,0,0,1207,3,3,35.0,0,0,0,0,2017-08-12T11:30:00Z,1718,,0.547242,1.619155,2017-08-12,Watford,Premier League,1.0,1.0,79.0,12.0,1.0,1.0,2.0,0.6,0.6,0.1,3.0,0.0,59.0,70.0,84.3,54.0,2.0,2.0,True
23684,Adam Federici,1,1,0,Bournemouth,West Bromwich Albion,,,False,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,224542,0,1,0.0,0,0,0,0,2017-08-12T14:00:00Z,1718,,0.379765,0.541354,2017-08-12,,,,,,,,,,,,,,,,,,,,,True
23685,Adam Lallana,1,3,0,Liverpool,Watford,,,False,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,45019,3,3,0.0,0,0,0,0,2017-08-12T11:30:00Z,1718,,1.619155,0.547242,2017-08-12,,,,,,,,,,,,,,,,,,,,,True
23686,Adam Smith,1,2,10,Bournemouth,West Bromwich Albion,,,False,1,0,0,2,0,0.6,0,0,0.3,2.2,0,0,0,0,0,50168,0,1,0.0,0,0,0,0,2017-08-12T14:00:00Z,1718,,0.379765,0.541354,2017-08-12,Bournemouth,Premier League,0.0,0.0,7.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,6.0,66.7,6.0,0.0,0.0,True
23687,Ademola Lookman,1,3,0,Everton,Stoke City,,,True,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,8034,0,1,0.0,0,0,0,0,2017-08-12T14:00:00Z,1718,,1.134226,0.581587,2017-08-12,,,,,,,,,,,,,,,,,,,,,True
23688,Adrian Mariappa,1,2,0,Watford,Liverpool,,,True,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,171365,3,3,0.0,0,0,0,0,2017-08-12T11:30:00Z,1718,,0.547242,1.619155,2017-08-12,,,,,,,,,,,,,,,,,,,,,True


Since this is a time series problem we have some functions that create various rolling totals and averages for players and teams

In [7]:
def player_lag_features(df, features, lags):    
    df_new = df.copy()
    player_lag_vars = []
    
    # need minutes for per game stats, add to front of list
    features.insert(0, 'minutes')

    # calculate totals for each lag period
    for feature in features:
        for lag in lags:
            feature_name = feature + '_last_' + str(lag)
            minute_name = 'minutes_last_' + str(lag)
            
            if lag == 'all':
                df_new[feature_name] = df_new.groupby(['player'])[feature].apply(lambda x: x.cumsum() - x)
            else: 
                df_new[feature_name] = df_new.groupby(['player'])[feature].apply(lambda x: x.rolling(min_periods=1, 
                                                                                            window=lag+1).sum() - x)
            if feature != 'minutes':

                pg_feature_name = feature + '_pg_last_' + str(lag)
                player_lag_vars.append(pg_feature_name)
                
                df_new[pg_feature_name] = 90 * df_new[feature_name] / df_new[minute_name]
                
                # some cases of -1 points and 0 minutes cause -inf values
                # change these to NaN
                df_new[pg_feature_name] = df_new[pg_feature_name].replace([np.inf, -np.inf], np.nan)
            
            else: player_lag_vars.append(minute_name)
                
    return df_new, player_lag_vars

In [30]:
# team level lag features
def team_lag_features(df, features, lags):
    team_lag_vars = []
    df_new = df.copy()
    
    for feature in features:
        feature_team_name = feature + '_team'
        feature_conceded_team_name = feature_team_name + '_conceded'
        feature_team = (df.groupby(['team', 'season', 'gw',
                                   'kickoff_time', 'opponent_team'])
                        [feature].sum().rename(feature_team_name).reset_index())
        
        # join back for points conceded
        feature_team = feature_team.merge(feature_team,
                           left_on=['team', 'season', 'gw',
                                    'kickoff_time', 'opponent_team'],
                           right_on=['opponent_team', 'season', 'gw',
                                     'kickoff_time', 'team'],
                           how='left',
                           suffixes = ('', '_conceded'))
                
        feature_team.drop(['team_conceded', 'opponent_team_conceded'], axis=1, inplace=True)
                
        for lag in lags:
            feature_name = feature + '_team_last_' + str(lag)
            feature_conceded_name = feature + '_team_conceded_last_' + str(lag)
            pg_feature_name = feature + '_team_pg_last_' + str(lag)
            pg_feature_conceded_name = feature + '_team_conceded_pg_last_' + str(lag)
            
            team_lag_vars.extend([pg_feature_name])#, pg_feature_conceded_name])
            
            if lag == 'all':
                feature_team[feature_name] = (feature_team.groupby('team')[feature_team_name]
                                              .apply(lambda x: x.cumsum() - x))
                
                feature_team[feature_conceded_name] = (feature_team.groupby('team')[feature_conceded_team_name]
                                              .apply(lambda x: x.cumsum() - x))
                
                feature_team[pg_feature_name] = (feature_team[feature_name]
                                                 / feature_team.groupby('team').cumcount())
                
                feature_team[pg_feature_conceded_name] = (feature_team[feature_conceded_name]
                                                 / feature_team.groupby('team').cumcount())
                
            else:
                feature_team[feature_name] = (feature_team.groupby('team')[feature_team_name]
                                              .apply(lambda x: x.rolling(min_periods=1, 
                                                                         window=lag + 1).sum() - x))
                
                feature_team[feature_conceded_name] = (feature_team.groupby('team')[feature_conceded_team_name]
                                              .apply(lambda x: x.rolling(min_periods=1, 
                                                                         window=lag + 1).sum() - x))
                
                feature_team[pg_feature_name] = (feature_team[feature_name] / 
                                                 feature_team.groupby('team')[feature_team_name]
                                                 .apply(lambda x: x.rolling(min_periods=1, 
                                                                            window=lag + 1).count() - 1))
                
                feature_team[pg_feature_conceded_name] = (feature_team[feature_conceded_name] / 
                                                 feature_team.groupby('team')[feature_conceded_name]
                                                 .apply(lambda x: x.rolling(min_periods=1, 
                                                                            window=lag + 1).count() - 1))
        
        df_new = df_new.merge(feature_team, 
                          on=['team', 'season', 'gw', 'kickoff_time', 'opponent_team'], 
                          how='left')
        
        df_new = df_new.merge(feature_team,
                 left_on=['team', 'season', 'gw', 'kickoff_time', 'opponent_team'],
                 right_on=['opponent_team', 'season', 'gw', 'kickoff_time', 'team'],
                 how='left',
                 suffixes = ('', '_opponent'))
        
        df_new.drop(['team_opponent', 'opponent_team_opponent'], axis=1, inplace=True)
        
    team_lag_vars = team_lag_vars + [team_lag_var + '_opponent' for team_lag_var in team_lag_vars]  

    return df_new, team_lag_vars

In [10]:
lag_train_df, team_lag_vars = team_lag_features(train_df, ['total_points'], ['all', 1, 2, 3])

In [11]:
team_lag_vars

['total_points_team_pg_last_all',
 'total_points_team_pg_last_1',
 'total_points_team_pg_last_2',
 'total_points_team_pg_last_3',
 'total_points_team_pg_last_all_opponent',
 'total_points_team_pg_last_1_opponent',
 'total_points_team_pg_last_2_opponent',
 'total_points_team_pg_last_3_opponent']

In [33]:
lag_train_df[lag_train_df['player'] == 'Kevin De Bruyne']

Unnamed: 0,player,gw,position,minutes,team,opponent_team,relative_market_value_team,relative_market_value_opponent_team,was_home,total_points,assists,bonus,bps,clean_sheets,creativity,goals_conceded,goals_scored,ict_index,influence,own_goals,penalties_missed,penalties_saved,red_cards,saves,selected,team_a_score,team_h_score,threat,transfers_balance,transfers_in,transfers_out,yellow_cards,kickoff_time,season,play_proba,relative_market_value_team_season,relative_market_value_opponent_team_season,date,squad,comp,shots_total,shots_on_target,touches,pressures,tackles,interceptions,blocks,xg,npxg,xa,sca,gca,passes_completed,passes,passes_pct,carries,dribbles_completed,dribbles,crowds,total_points_team,total_points_team_conceded,total_points_team_last_all,total_points_team_conceded_last_all,total_points_team_pg_last_all,total_points_team_conceded_pg_last_all,total_points_team_last_1,total_points_team_conceded_last_1,total_points_team_pg_last_1,total_points_team_conceded_pg_last_1,total_points_team_last_2,total_points_team_conceded_last_2,total_points_team_pg_last_2,total_points_team_conceded_pg_last_2,total_points_team_last_3,total_points_team_conceded_last_3,total_points_team_pg_last_3,total_points_team_conceded_pg_last_3,total_points_team_opponent,total_points_team_conceded_opponent,total_points_team_last_all_opponent,total_points_team_conceded_last_all_opponent,total_points_team_pg_last_all_opponent,total_points_team_conceded_pg_last_all_opponent,total_points_team_last_1_opponent,total_points_team_conceded_last_1_opponent,total_points_team_pg_last_1_opponent,total_points_team_conceded_pg_last_1_opponent,total_points_team_last_2_opponent,total_points_team_conceded_last_2_opponent,total_points_team_pg_last_2_opponent,total_points_team_conceded_pg_last_2_opponent,total_points_team_last_3_opponent,total_points_team_conceded_last_3_opponent,total_points_team_pg_last_3_opponent,total_points_team_conceded_pg_last_3_opponent
297,Kevin De Bruyne,1,3,90,Manchester City,Sunderland,,,True,2,0,0,6,0,25.9,1,0,5.2,3.2,0,0,0,0,0,176498,1,2,23.0,0,0,0,0,2016-08-13T16:30:00Z,1617,,2.311012,0.418392,2016-08-13,,,,,,,,,,,,,,,,,,,,,True,37,26.0,0,0.0,,,0.0,0.0,,,0.0,0.0,,,0.0,0.0,,,26.0,37.0,0.0,0.0,,,0.0,0.0,,,0.0,0.0,,,0.0,0.0,,
826,Kevin De Bruyne,2,3,87,Manchester City,Stoke City,,,False,4,1,0,19,0,51.8,1,0,8.5,21.2,0,0,0,0,0,199367,4,1,12.0,-7066,6203,13269,1,2016-08-20T11:30:00Z,1617,,2.311012,0.718705,2016-08-20,,,,,,,,,,,,,,,,,,,,,True,57,19.0,37,26.0,37.000000,26.000000,37.0,26.0,37.0,37.0,37.0,26.0,37.0,37.0,37.0,26.0,37.000000,37.000000,19.0,57.0,28.0,31.0,28.000000,31.000000,28.0,31.0,28.0,28.0,28.0,31.0,28.0,28.0,28.0,31.0,28.000000,28.000000
1371,Kevin De Bruyne,3,3,90,Manchester City,West Ham United,,,True,6,1,1,31,0,63.0,1,0,11.1,26.6,0,0,0,0,0,202158,1,3,21.0,-10163,10864,21027,0,2016-08-28T15:00:00Z,1617,,2.311012,0.895471,2016-08-28,,,,,,,,,,,,,,,,,,,,,True,53,24.0,94,45.0,47.000000,22.500000,57.0,19.0,57.0,57.0,94.0,45.0,47.0,47.0,94.0,45.0,47.000000,47.000000,24.0,53.0,91.0,61.0,45.500000,30.500000,62.0,22.0,62.0,62.0,91.0,61.0,45.5,45.5,91.0,61.0,45.500000,45.500000
1935,Kevin De Bruyne,4,3,89,Manchester City,Manchester United,,,False,13,1,3,47,0,75.6,1,1,16.9,48.0,0,0,0,0,0,202166,2,1,45.0,-9429,11646,21075,0,2016-09-10T11:30:00Z,1617,,2.311012,1.983179,2016-09-10,,,,,,,,,,,,,,,,,,,,,True,42,19.0,147,69.0,49.000000,23.000000,53.0,24.0,53.0,53.0,110.0,43.0,55.0,55.0,147.0,69.0,49.000000,49.000000,19.0,42.0,177.0,74.0,59.000000,24.666667,57.0,23.0,57.0,57.0,127.0,44.0,63.5,63.5,177.0,74.0,59.000000,59.000000
2517,Kevin De Bruyne,5,3,74,Manchester City,Bournemouth,,,True,14,1,3,57,1,59.0,0,1,17.3,71.6,0,0,0,0,0,372086,0,4,42.0,152780,160832,8052,0,2016-09-17T14:00:00Z,1617,,2.311012,0.384921,2016-09-17,,,,,,,,,,,,,,,,,,,,,True,78,15.0,189,88.0,47.250000,22.000000,42.0,19.0,42.0,42.0,95.0,43.0,47.5,47.5,152.0,62.0,50.666667,50.666667,15.0,78.0,154.0,170.0,38.500000,42.500000,62.0,24.0,62.0,62.0,102.0,58.0,51.0,51.0,124.0,120.0,41.333333,41.333333
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
111481,Kevin De Bruyne,34,3,0,Manchester City,Crystal Palace,2.401890,0.474268,False,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,1193825,2,0,0.0,14922,46095,31173,0,2021-05-01T11:30:00Z,2021,1.0,2.394374,0.476734,2021-05-01,,,,,,,,,,,,,,,,,,,,,False,69,18.0,10294,5145.0,55.643243,27.810811,40.0,23.0,40.0,40.0,66.0,66.0,33.0,33.0,134.0,84.0,44.666667,44.666667,18.0,69.0,6968.0,8164.0,37.869565,44.369565,31.0,44.0,31.0,31.0,55.0,107.0,27.5,27.5,90.0,148.0,30.000000,30.000000
112339,Kevin De Bruyne,35,3,0,Manchester City,Chelsea,2.400267,1.817212,True,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,1095870,2,1,0.0,-95682,8127,103809,0,2021-05-08T16:30:00Z,2021,1.0,2.394374,2.184688,2021-05-08,,,,,,,,,,,,,,,,,,,,,False,25,54.0,10363,5163.0,55.715054,27.758065,69.0,18.0,69.0,69.0,109.0,41.0,54.5,54.5,135.0,84.0,45.000000,45.000000,54.0,25.0,9237.0,6438.0,49.661290,34.612903,75.0,20.0,75.0,75.0,137.0,43.0,68.5,68.5,195.0,94.0,65.000000,65.000000
113136,Kevin De Bruyne,36,3,0,Manchester City,Newcastle United,2.400197,0.588864,False,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,1014198,4,3,0.0,-83627,4630,88257,0,2021-05-14T19:00:00Z,2021,0.0,2.394374,0.626058,2021-05-14,,,,,,,,,,,,,,,,,,,,,False,60,31.0,10388,5217.0,55.550802,27.898396,25.0,54.0,25.0,25.0,94.0,72.0,47.0,47.0,134.0,95.0,44.666667,44.666667,31.0,60.0,5616.0,6584.0,37.691275,44.187919,51.0,33.0,51.0,51.0,67.0,97.0,33.5,33.5,103.0,128.0,34.333333,34.333333
113774,Kevin De Bruyne,37,3,0,Manchester City,Brighton and Hove Albion,2.400323,0.595653,False,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,984462,2,3,0.0,-31923,3080,35003,0,2021-05-18T18:00:00Z,2021,1.0,2.394374,0.466873,2021-05-18,,,,,,,,,,,,,,,,,,,,,True,28,39.0,10448,5248.0,55.574468,27.914894,60.0,31.0,60.0,60.0,85.0,85.0,42.5,42.5,154.0,103.0,51.333333,51.333333,39.0,28.0,5474.0,6755.0,36.493333,45.033333,38.0,32.0,38.0,38.0,58.0,73.0,29.0,29.0,127.0,91.0,42.333333,42.333333


In [34]:
lag_train_df[lag_train_df['player'] == 'Antonio Valencia']

Unnamed: 0,player,gw,position,minutes,team,opponent_team,relative_market_value_team,relative_market_value_opponent_team,was_home,total_points,assists,bonus,bps,clean_sheets,creativity,goals_conceded,goals_scored,ict_index,influence,own_goals,penalties_missed,penalties_saved,red_cards,saves,selected,team_a_score,team_h_score,threat,transfers_balance,transfers_in,transfers_out,yellow_cards,kickoff_time,season,play_proba,relative_market_value_team_season,relative_market_value_opponent_team_season,date,squad,comp,shots_total,shots_on_target,touches,pressures,tackles,interceptions,blocks,xg,npxg,xa,sca,gca,passes_completed,passes,passes_pct,carries,dribbles_completed,dribbles,crowds,total_points_team,total_points_team_conceded,total_points_team_last_all,total_points_team_conceded_last_all,total_points_team_pg_last_all,total_points_team_conceded_pg_last_all,total_points_team_last_1,total_points_team_conceded_last_1,total_points_team_pg_last_1,total_points_team_conceded_pg_last_1,total_points_team_last_2,total_points_team_conceded_last_2,total_points_team_pg_last_2,total_points_team_conceded_pg_last_2,total_points_team_last_3,total_points_team_conceded_last_3,total_points_team_pg_last_3,total_points_team_conceded_pg_last_3,total_points_team_opponent,total_points_team_conceded_opponent,total_points_team_last_all_opponent,total_points_team_conceded_last_all_opponent,total_points_team_pg_last_all_opponent,total_points_team_conceded_pg_last_all_opponent,total_points_team_last_1_opponent,total_points_team_conceded_last_1_opponent,total_points_team_pg_last_1_opponent,total_points_team_conceded_pg_last_1_opponent,total_points_team_last_2_opponent,total_points_team_conceded_last_2_opponent,total_points_team_pg_last_2_opponent,total_points_team_conceded_pg_last_2_opponent,total_points_team_last_3_opponent,total_points_team_conceded_last_3_opponent,total_points_team_pg_last_3_opponent,total_points_team_conceded_pg_last_3_opponent
46,Antonio Valencia,1,2,90,Manchester United,Bournemouth,,,False,2,0,0,12,0,18.3,1,0,4.7,22.8,0,0,0,0,0,291254,3,1,6.0,0,0,0,0,2016-08-14T12:30:00Z,1617,,1.983179,0.384921,2016-08-14,,,,,,,,,,,,,,,,,,,,,True,50,30.0,0,0.0,,,0.0,0.0,,,0.0,0.0,,,0.0,0.0,,,30.0,50.0,0.0,0.0,,,0.0,0.0,,,0.0,0.0,,,0.0,0.0,,
571,Antonio Valencia,2,2,90,Manchester United,Southampton,,,True,6,0,0,25,1,7.5,0,0,2.8,16.2,0,0,0,0,0,340941,0,2,4.0,13032,23352,10320,0,2016-08-19T19:00:00Z,1617,,1.983179,0.796805,2016-08-19,,,,,,,,,,,,,,,,,,,,,True,70,21.0,50,30.0,50.000000,30.000000,50.0,30.0,50.0,50.0,50.0,30.0,50.0,50.0,50.0,30.0,50.000000,50.000000,21.0,70.0,31.0,34.0,31.000000,34.000000,31.0,34.0,31.0,31.0,31.0,34.0,31.0,31.0,31.0,34.0,31.000000,31.000000
1108,Antonio Valencia,3,2,90,Manchester United,Hull City,,,False,9,0,3,38,1,50.6,0,0,10.1,24.0,0,0,0,0,0,371930,1,0,26.0,13410,31224,17814,0,2016-08-27T16:30:00Z,1617,,1.983179,0.494447,2016-08-27,,,,,,,,,,,,,,,,,,,,,True,57,23.0,120,51.0,60.000000,25.500000,70.0,21.0,70.0,70.0,120.0,51.0,60.0,60.0,120.0,51.0,60.000000,60.000000,23.0,57.0,101.0,48.0,50.500000,24.000000,63.0,20.0,63.0,63.0,101.0,48.0,50.5,50.5,101.0,48.0,50.500000,50.500000
1660,Antonio Valencia,4,2,90,Manchester United,Manchester City,,,True,1,0,0,13,0,6.7,2,0,2.9,20.6,0,0,0,0,0,488199,2,1,2.0,84199,109077,24878,0,2016-09-10T11:30:00Z,1617,,1.983179,2.311012,2016-09-10,,,,,,,,,,,,,,,,,,,,,True,19,42.0,177,74.0,59.000000,24.666667,57.0,23.0,57.0,57.0,127.0,44.0,63.5,63.5,177.0,74.0,59.000000,59.000000,42.0,19.0,147.0,69.0,49.000000,23.000000,53.0,24.0,53.0,53.0,110.0,43.0,55.0,55.0,147.0,69.0,49.000000,49.000000
2241,Antonio Valencia,5,2,61,Manchester United,Watford,,,False,2,0,0,13,0,6.3,1,0,2.5,15.0,0,0,0,0,0,514792,1,3,4.0,14748,29513,14765,0,2016-09-18T11:00:00Z,1617,,1.983179,0.704200,2016-09-18,,,,,,,,,,,,,,,,,,,,,True,23,51.0,196,116.0,49.000000,29.000000,19.0,42.0,19.0,19.0,76.0,65.0,38.0,38.0,146.0,86.0,48.666667,48.666667,51.0,23.0,131.0,156.0,32.750000,39.000000,50.0,32.0,50.0,50.0,68.0,88.0,34.0,34.0,97.0,125.0,32.333333,32.333333
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65302,Antonio Valencia,35,2,0,Manchester United,Everton,,,False,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,41295,0,4,0.0,-154,62,216,0,2019-04-21T12:30:00Z,1819,,2.015531,1.039221,2019-04-21,,,,,,,,,,,,,,,,,,,,,True,13,83.0,5212,3700.0,47.816514,33.944954,43.0,32.0,43.0,43.0,64.0,72.0,32.0,32.0,109.0,102.0,36.333333,36.333333,83.0,13.0,4608.0,4425.0,41.890909,40.227273,20.0,74.0,20.0,20.0,84.0,94.0,42.0,42.0,156.0,112.0,52.000000,52.000000
65303,Antonio Valencia,35,2,0,Manchester United,Manchester City,,,True,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,41295,2,0,0.0,-154,62,216,0,2019-04-24T19:00:00Z,1819,,2.015531,2.540586,2019-04-24,,,,,,,,,,,,,,,,,,,,,True,18,66.0,5225,3783.0,47.500000,34.390909,13.0,83.0,13.0,13.0,56.0,115.0,28.0,28.0,77.0,155.0,25.666667,25.666667,66.0,18.0,6105.0,2962.0,55.500000,26.927273,60.0,24.0,60.0,60.0,114.0,54.0,57.0,57.0,185.0,73.0,61.666667,61.666667
66140,Antonio Valencia,36,2,0,Manchester United,Chelsea,,,True,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,41234,1,1,0.0,-110,24,134,0,2019-04-28T15:30:00Z,1819,,2.015531,2.540586,2019-04-28,,,,,,,,,,,,,,,,,,,,,True,35,35.0,5243,3849.0,47.234234,34.675676,18.0,66.0,18.0,18.0,31.0,149.0,15.5,15.5,74.0,181.0,24.666667,24.666667,35.0,35.0,5641.0,3671.0,50.819820,33.072072,34.0,33.0,34.0,34.0,52.0,106.0,26.0,26.0,122.0,123.0,40.666667,40.666667
66755,Antonio Valencia,37,2,0,Manchester United,Huddersfield Town,,,False,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,41199,1,1,0.0,-91,32,123,0,2019-05-05T13:00:00Z,1819,,2.015531,0.273778,2019-05-05,,,,,,,,,,,,,,,,,,,,,True,30,35.0,5278,3884.0,47.125000,34.678571,35.0,35.0,35.0,35.0,53.0,101.0,26.5,26.5,66.0,184.0,22.000000,22.000000,35.0,30.0,2298.0,3893.0,31.054054,52.608108,12.0,96.0,12.0,12.0,39.0,138.0,19.5,19.5,50.0,227.0,16.666667,16.666667


For example, the following creates totals and per game (per 90 mins) averages for points going back 1, 2, 3, 4, 5, 10, 20 and all previous weeks. This is done at both player and team level.

This has been checked for points totals, but should also work for any other stat such as player/team goals scored, assists, player goals conceded. However, it will not currently work for team level stats such as team goals conceded where adding up the goals conceded across all the team players would be incorrect.

In [46]:
# create some lag features
lag_train_df, team_lag_vars = team_lag_features(train_df, ['total_points'], ['all', 1, 2, 3, 4, 5, 10, 20])
lag_train_df, player_lag_vars = player_lag_features(lag_train_df, ['total_points'], ['all', 1, 2, 3, 4, 5, 10, 20])

You can see below that the player's (Salah) historic point totals and per game totals are given, as well as the totals for his team (Liverpool) and whichever team he is playing in that gameweek (e.g. his debut was versus Watord on the 12th August 2017, so Watford's running point totals and per game totals are also given).

Note that if it is the first game since the start of the 2016/17 season for the team or opposition, then the point totals for previous games will be 0 and the per game totals will be null. If the player has not had any minutes in the previous number of games being calculated, again
the point totals will also be 0, and per game totals null.

In [47]:
lag_train_df.shape

(114803, 151)

In [48]:
# look at resulting dataset for a player
lag_train_df[lag_train_df['player'] == 'Mohamed Salah']

Unnamed: 0,player,gw,position,minutes,team,opponent_team,relative_market_value_team,relative_market_value_opponent_team,was_home,total_points,assists,bonus,bps,clean_sheets,creativity,goals_conceded,goals_scored,ict_index,influence,own_goals,penalties_missed,penalties_saved,red_cards,saves,selected,team_a_score,team_h_score,threat,transfers_balance,transfers_in,transfers_out,yellow_cards,kickoff_time,season,play_proba,relative_market_value_team_season,relative_market_value_opponent_team_season,date,squad,comp,shots_total,shots_on_target,touches,pressures,tackles,interceptions,blocks,xg,npxg,xa,sca,gca,passes_completed,passes,passes_pct,carries,dribbles_completed,dribbles,crowds,total_points_team,total_points_team_conceded,total_points_team_last_all,total_points_team_conceded_last_all,total_points_team_pg_last_all,total_points_team_conceded_pg_last_all,total_points_team_last_1,total_points_team_conceded_last_1,total_points_team_pg_last_1,total_points_team_conceded_pg_last_1,total_points_team_last_2,total_points_team_conceded_last_2,total_points_team_pg_last_2,total_points_team_conceded_pg_last_2,total_points_team_last_3,total_points_team_conceded_last_3,total_points_team_pg_last_3,total_points_team_conceded_pg_last_3,total_points_team_last_4,total_points_team_conceded_last_4,total_points_team_pg_last_4,total_points_team_conceded_pg_last_4,total_points_team_last_5,total_points_team_conceded_last_5,total_points_team_pg_last_5,total_points_team_conceded_pg_last_5,total_points_team_last_10,total_points_team_conceded_last_10,total_points_team_pg_last_10,total_points_team_conceded_pg_last_10,total_points_team_last_20,total_points_team_conceded_last_20,total_points_team_pg_last_20,total_points_team_conceded_pg_last_20,total_points_team_opponent,total_points_team_conceded_opponent,total_points_team_last_all_opponent,total_points_team_conceded_last_all_opponent,total_points_team_pg_last_all_opponent,total_points_team_conceded_pg_last_all_opponent,total_points_team_last_1_opponent,total_points_team_conceded_last_1_opponent,total_points_team_pg_last_1_opponent,total_points_team_conceded_pg_last_1_opponent,total_points_team_last_2_opponent,total_points_team_conceded_last_2_opponent,total_points_team_pg_last_2_opponent,total_points_team_conceded_pg_last_2_opponent,total_points_team_last_3_opponent,total_points_team_conceded_last_3_opponent,total_points_team_pg_last_3_opponent,total_points_team_conceded_pg_last_3_opponent,total_points_team_last_4_opponent,total_points_team_conceded_last_4_opponent,total_points_team_pg_last_4_opponent,total_points_team_conceded_pg_last_4_opponent,total_points_team_last_5_opponent,total_points_team_conceded_last_5_opponent,total_points_team_pg_last_5_opponent,total_points_team_conceded_pg_last_5_opponent,total_points_team_last_10_opponent,total_points_team_conceded_last_10_opponent,total_points_team_pg_last_10_opponent,total_points_team_conceded_pg_last_10_opponent,total_points_team_last_20_opponent,total_points_team_conceded_last_20_opponent,total_points_team_pg_last_20_opponent,total_points_team_conceded_pg_last_20_opponent,minutes_last_all,minutes_last_1,minutes_last_2,minutes_last_3,minutes_last_4,minutes_last_5,minutes_last_10,minutes_last_20,total_points_last_all,total_points_pg_last_all,total_points_last_1,total_points_pg_last_1,total_points_last_2,total_points_pg_last_2,total_points_last_3,total_points_pg_last_3,total_points_last_4,total_points_pg_last_4,total_points_last_5,total_points_pg_last_5,total_points_last_10,total_points_pg_last_10,total_points_last_20,total_points_pg_last_20
24036,Mohamed Salah,1,3,85,Liverpool,Watford,,,False,11,1,1,26,0,2.8,2,1,8.2,24.6,0,0,0,0,0,874608,3,3,55.0,0,0,0,0,2017-08-12T11:30:00Z,1718,,1.619155,0.547242,2017-08-12,Liverpool,Premier League,5.0,1.0,32.0,18.0,0.0,1.0,0.0,1.2,1.2,0.0,2.0,1.0,15.0,21.0,71.4,26.0,3.0,3.0,True,44,43.0,1863,1231.0,49.026316,32.394737,78.0,20.0,78.0,78.0,159.0,33.0,79.5,79.5,206.0,88.0,68.666667,68.666667,266.0,112.0,66.50,66.50,292.0,154.0,58.4,58.4,510.0,301.0,51.0,51.0,916.0,661.0,45.80,45.80,43.0,44.0,1264.0,1835.0,33.263158,48.289474,15.0,90.0,15.0,15.0,41.0,144.0,20.5,20.5,63.0,203.0,21.000000,21.000000,84.0,281.0,21.00,21.00,108.0,341.0,21.6,21.6,319.0,544.0,31.9,31.9,633.0,1038.0,31.65,31.65,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,
24551,Mohamed Salah,2,3,29,Liverpool,Crystal Palace,,,True,1,0,0,0,0,12.3,0,0,5.1,10.4,0,0,0,0,0,1293309,0,1,28.0,175914,193660,17746,0,2017-08-19T14:00:00Z,1718,,1.619155,0.635984,2017-08-19,Liverpool,Premier League,2.0,2.0,18.0,4.0,1.0,0.0,0.0,0.1,0.1,0.1,5.0,0.0,9.0,12.0,75.0,14.0,1.0,1.0,True,60,29.0,1907,1274.0,48.897436,32.666667,44.0,43.0,44.0,44.0,122.0,63.0,61.0,61.0,203.0,76.0,67.666667,67.666667,250.0,131.0,62.50,62.50,310.0,155.0,62.0,62.0,522.0,312.0,52.2,52.2,901.0,680.0,45.05,45.05,29.0,60.0,1427.0,1754.0,36.589744,44.974359,15.0,66.0,15.0,15.0,34.0,135.0,17.0,17.0,116.0,144.0,38.666667,38.666667,129.0,235.0,32.25,32.25,146.0,299.0,29.2,29.2,356.0,492.0,35.6,35.6,769.0,929.0,38.45,38.45,85,85.0,85.0,85.0,85.0,85.0,85.0,85.0,11,11.647059,11.0,11.647059,11.0,11.647059,11.0,11.647059,11.0,11.647059,11.0,11.647059,11.0,11.647059,11.0,11.647059
25076,Mohamed Salah,3,3,90,Liverpool,Arsenal,,,True,11,1,0,39,1,25.3,0,1,19.9,70.4,0,0,0,0,0,1158692,0,4,103.0,-184736,27792,212528,0,2017-08-27T15:00:00Z,1718,,1.619155,2.073500,2017-08-27,Liverpool,Premier League,5.0,5.0,35.0,24.0,1.0,2.0,0.0,1.1,1.1,0.3,4.0,1.0,19.0,24.0,79.2,28.0,2.0,3.0,True,81,12.0,1967,1303.0,49.175000,32.575000,60.0,29.0,60.0,60.0,104.0,72.0,52.0,52.0,182.0,92.0,60.666667,60.666667,263.0,105.0,65.75,65.75,310.0,160.0,62.0,62.0,530.0,315.0,53.0,53.0,931.0,668.0,46.55,46.55,12.0,81.0,1956.0,1350.0,48.900000,33.750000,26.0,65.0,26.0,26.0,78.0,105.0,39.0,39.0,128.0,131.0,42.666667,42.666667,198.0,153.0,49.50,49.50,258.0,171.0,51.6,51.6,526.0,324.0,52.6,52.6,958.0,712.0,47.90,47.90,114,29.0,114.0,114.0,114.0,114.0,114.0,114.0,12,9.473684,1.0,3.103448,12.0,9.473684,12.0,9.473684,12.0,9.473684,12.0,9.473684,12.0,9.473684,12.0,9.473684
25614,Mohamed Salah,4,3,45,Liverpool,Manchester City,,,False,1,0,0,4,0,13.8,1,0,4.7,7.8,0,0,0,0,0,1422941,0,5,25.0,177596,238283,60687,0,2017-09-09T11:30:00Z,1718,,1.619155,2.016093,2017-09-09,Liverpool,Premier League,1.0,1.0,23.0,11.0,0.0,0.0,1.0,0.3,0.3,0.1,2.0,0.0,11.0,16.0,68.8,19.0,3.0,5.0,True,8,88.0,2048,1315.0,49.951220,32.073171,81.0,12.0,81.0,81.0,141.0,41.0,70.5,70.5,185.0,84.0,61.666667,61.666667,263.0,104.0,65.75,65.75,344.0,117.0,68.8,68.8,577.0,288.0,57.7,57.7,982.0,648.0,49.10,49.10,88.0,8.0,1985.0,1240.0,48.414634,30.243902,39.0,21.0,39.0,39.0,63.0,60.0,31.5,31.5,130.0,78.0,43.333333,43.333333,220.0,93.0,55.00,55.00,279.0,118.0,55.8,55.8,578.0,256.0,57.8,57.8,1080.0,550.0,54.00,54.00,204,90.0,119.0,204.0,204.0,204.0,204.0,204.0,23,10.147059,11.0,11.000000,12.0,9.075630,23.0,10.147059,23.0,10.147059,23.0,10.147059,23.0,10.147059,23.0,10.147059
26160,Mohamed Salah,5,3,90,Liverpool,Burnley,,,True,10,0,3,27,0,35.8,1,1,17.4,51.2,0,0,0,0,0,1571656,1,1,87.0,122769,224328,101559,0,2017-09-16T14:00:00Z,1718,,1.619155,0.316798,2017-09-16,Liverpool,Premier League,6.0,4.0,66.0,25.0,0.0,0.0,0.0,0.4,0.4,0.1,5.0,0.0,33.0,46.0,71.7,54.0,4.0,7.0,True,34,37.0,2056,1403.0,48.952381,33.404762,8.0,88.0,8.0,8.0,89.0,100.0,44.5,44.5,149.0,129.0,49.666667,49.666667,193.0,172.0,48.25,48.25,271.0,192.0,54.2,54.2,545.0,348.0,54.5,54.5,952.0,694.0,47.60,47.60,37.0,34.0,1540.0,1791.0,36.666667,42.642857,53.0,24.0,53.0,53.0,88.0,58.0,44.0,44.0,111.0,115.0,37.000000,37.000000,154.0,141.0,38.50,38.50,181.0,183.0,36.2,36.2,347.0,394.0,34.7,34.7,736.0,802.0,36.80,36.80,249,45.0,135.0,164.0,249.0,249.0,249.0,249.0,24,8.674699,1.0,2.000000,12.0,8.000000,13.0,7.134146,24.0,8.674699,24.0,8.674699,24.0,8.674699,24.0,8.674699
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
112487,Mohamed Salah,35,3,86,Liverpool,Southampton,2.353659,0.583754,True,6,1,0,16,1,31.7,0,0,16.0,32.4,0,0,0,0,0,2982553,0,2,96.0,83689,143359,59670,0,2021-05-08T19:15:00Z,2021,1.0,2.394822,0.495869,2021-05-08,Liverpool,Premier League,4.0,2.0,38.0,34.0,0.0,1.0,0.0,0.8,0.8,0.4,2.0,1.0,23.0,28.0,82.1,33.0,2.0,4.0,False,77,20.0,9719,5849.0,52.535135,31.616216,31.0,36.0,31.0,31.0,68.0,73.0,34.0,34.0,109.0,102.0,36.333333,36.333333,188.0,120.0,47.00,47.00,253.0,145.0,50.6,50.6,415.0,415.0,41.5,41.5,847.0,854.0,42.35,42.35,20.0,77.0,7005.0,8106.0,37.661290,43.580645,34.0,34.0,34.0,34.0,60.0,79.0,30.0,30.0,78.0,163.0,26.000000,26.000000,127.0,198.0,31.75,31.75,157.0,245.0,31.4,31.4,328.0,494.0,32.8,32.8,640.0,1053.0,32.00,32.00,11669,90.0,109.0,199.0,289.0,379.0,800.0,1586.0,994,7.666467,9.0,9.000000,10.0,8.256881,17.0,7.688442,25.0,7.785467,28.0,6.649077,44.0,4.950000,97.0,5.504414
112488,Mohamed Salah,35,3,90,Liverpool,Manchester United,2.353659,1.670283,False,7,0,0,22,0,23.5,2,1,11.2,33.2,0,0,0,0,0,2982553,4,2,55.0,83689,143359,59670,0,2021-05-13T19:15:00Z,2021,1.0,2.394822,1.840445,2021-05-13,Liverpool,Premier League,4.0,1.0,,,,0.0,,,,,,,,,,,,,False,55,30.0,9796,5869.0,52.666667,31.553763,77.0,20.0,77.0,77.0,108.0,56.0,54.0,54.0,145.0,93.0,48.333333,48.333333,186.0,122.0,46.50,46.50,265.0,140.0,53.0,53.0,463.0,384.0,46.3,46.3,818.0,863.0,40.90,40.90,30.0,55.0,8780.0,6578.0,46.951872,35.176471,29.0,47.0,29.0,29.0,84.0,69.0,42.0,42.0,140.0,120.0,46.666667,46.666667,191.0,147.0,47.75,47.75,239.0,175.0,47.8,47.8,510.0,350.0,51.0,51.0,1027.0,665.0,51.35,51.35,11755,86.0,176.0,195.0,285.0,375.0,796.0,1639.0,1000,7.656316,6.0,6.279070,15.0,7.670455,16.0,7.384615,23.0,7.263158,31.0,7.440000,43.0,4.861809,87.0,4.777303
113216,Mohamed Salah,36,3,90,Liverpool,West Bromwich Albion,2.353591,0.282547,False,10,0,3,31,0,45.1,1,1,17.4,49.8,0,0,0,0,0,3247999,2,1,79.0,257608,272307,14699,0,2021-05-16T15:30:00Z,2021,1.0,2.394822,0.185012,2021-05-16,Liverpool,Premier League,4.0,3.0,44.0,12.0,0.0,0.0,0.0,0.7,0.7,0.3,4.0,0.0,28.0,37.0,75.7,36.0,1.0,1.0,False,46,28.0,9851,5899.0,52.679144,31.545455,55.0,30.0,55.0,55.0,132.0,50.0,66.0,66.0,163.0,86.0,54.333333,54.333333,200.0,123.0,50.00,50.00,241.0,152.0,48.2,48.2,498.0,340.0,49.8,49.8,836.0,859.0,41.80,41.80,28.0,46.0,3849.0,5256.0,34.675676,47.351351,27.0,50.0,27.0,27.0,62.0,87.0,31.0,31.0,98.0,123.0,32.666667,32.666667,115.0,201.0,28.75,28.75,199.0,219.0,39.8,39.8,432.0,426.0,43.2,43.2,704.0,996.0,35.20,35.20,11845,90.0,176.0,266.0,285.0,375.0,796.0,1639.0,1007,7.651330,7.0,7.000000,13.0,6.647727,22.0,7.443609,23.0,7.263158,30.0,7.200000,48.0,5.427136,92.0,5.051861
113875,Mohamed Salah,37,3,90,Liverpool,Burnley,2.353714,0.303070,False,3,0,0,-1,1,27.3,0,0,6.9,1.2,0,0,0,0,0,3344939,3,0,40.0,108275,121305,13030,0,2021-05-19T19:15:00Z,2021,1.0,2.394822,0.344836,2021-05-19,Liverpool,Premier League,4.0,0.0,43.0,18.0,0.0,1.0,0.0,0.9,0.9,0.7,6.0,0.0,21.0,27.0,77.8,39.0,2.0,4.0,True,81,18.0,9897,5927.0,52.643617,31.526596,46.0,28.0,46.0,46.0,101.0,58.0,50.5,50.5,178.0,78.0,59.333333,59.333333,209.0,114.0,52.25,52.25,246.0,151.0,49.2,49.2,480.0,350.0,48.0,48.0,835.0,836.0,41.75,41.75,18.0,81.0,7071.0,8215.0,37.611702,43.696809,14.0,88.0,14.0,14.0,85.0,106.0,42.5,42.5,107.0,149.0,35.666667,35.666667,187.0,162.0,46.75,46.75,214.0,213.0,42.8,42.8,372.0,410.0,37.2,37.2,746.0,894.0,37.30,37.30,11935,90.0,180.0,266.0,356.0,375.0,796.0,1639.0,1017,7.669041,10.0,10.000000,17.0,8.500000,23.0,7.781955,32.0,8.089888,33.0,7.920000,55.0,6.218593,99.0,5.436242


In [49]:
# look at resulting dataset for a player
lag_train_df[(lag_train_df['player'] == 'Héctor Bellerín') & (lag_train_df['season'] == '1718')].head()

Unnamed: 0,player,gw,position,minutes,team,opponent_team,relative_market_value_team,relative_market_value_opponent_team,was_home,total_points,assists,bonus,bps,clean_sheets,creativity,goals_conceded,goals_scored,ict_index,influence,own_goals,penalties_missed,penalties_saved,red_cards,saves,selected,team_a_score,team_h_score,threat,transfers_balance,transfers_in,transfers_out,yellow_cards,kickoff_time,season,play_proba,relative_market_value_team_season,relative_market_value_opponent_team_season,date,squad,comp,shots_total,shots_on_target,touches,pressures,tackles,interceptions,blocks,xg,npxg,xa,sca,gca,passes_completed,passes,passes_pct,carries,dribbles_completed,dribbles,crowds,total_points_team,total_points_team_conceded,total_points_team_last_all,total_points_team_conceded_last_all,total_points_team_pg_last_all,total_points_team_conceded_pg_last_all,total_points_team_last_1,total_points_team_conceded_last_1,total_points_team_pg_last_1,total_points_team_conceded_pg_last_1,total_points_team_last_2,total_points_team_conceded_last_2,total_points_team_pg_last_2,total_points_team_conceded_pg_last_2,total_points_team_last_3,total_points_team_conceded_last_3,total_points_team_pg_last_3,total_points_team_conceded_pg_last_3,total_points_team_last_4,total_points_team_conceded_last_4,total_points_team_pg_last_4,total_points_team_conceded_pg_last_4,total_points_team_last_5,total_points_team_conceded_last_5,total_points_team_pg_last_5,total_points_team_conceded_pg_last_5,total_points_team_last_10,total_points_team_conceded_last_10,total_points_team_pg_last_10,total_points_team_conceded_pg_last_10,total_points_team_last_20,total_points_team_conceded_last_20,total_points_team_pg_last_20,total_points_team_conceded_pg_last_20,total_points_team_opponent,total_points_team_conceded_opponent,total_points_team_last_all_opponent,total_points_team_conceded_last_all_opponent,total_points_team_pg_last_all_opponent,total_points_team_conceded_pg_last_all_opponent,total_points_team_last_1_opponent,total_points_team_conceded_last_1_opponent,total_points_team_pg_last_1_opponent,total_points_team_conceded_pg_last_1_opponent,total_points_team_last_2_opponent,total_points_team_conceded_last_2_opponent,total_points_team_pg_last_2_opponent,total_points_team_conceded_pg_last_2_opponent,total_points_team_last_3_opponent,total_points_team_conceded_last_3_opponent,total_points_team_pg_last_3_opponent,total_points_team_conceded_pg_last_3_opponent,total_points_team_last_4_opponent,total_points_team_conceded_last_4_opponent,total_points_team_pg_last_4_opponent,total_points_team_conceded_pg_last_4_opponent,total_points_team_last_5_opponent,total_points_team_conceded_last_5_opponent,total_points_team_pg_last_5_opponent,total_points_team_conceded_pg_last_5_opponent,total_points_team_last_10_opponent,total_points_team_conceded_last_10_opponent,total_points_team_pg_last_10_opponent,total_points_team_conceded_pg_last_10_opponent,total_points_team_last_20_opponent,total_points_team_conceded_last_20_opponent,total_points_team_pg_last_20_opponent,total_points_team_conceded_pg_last_20_opponent,minutes_last_all,minutes_last_1,minutes_last_2,minutes_last_3,minutes_last_4,minutes_last_5,minutes_last_10,minutes_last_20,total_points_last_all,total_points_pg_last_all,total_points_last_1,total_points_pg_last_1,total_points_last_2,total_points_pg_last_2,total_points_last_3,total_points_pg_last_3,total_points_last_4,total_points_pg_last_4,total_points_last_5,total_points_pg_last_5,total_points_last_10,total_points_pg_last_10,total_points_last_20,total_points_pg_last_20
23870,Héctor Bellerín,1,2,90,Arsenal,Leicester City,,,True,1,0,0,9,0,27.5,3,0,7.0,17.2,0,0,0,0,0,572986,3,4,25.0,0,0,0,0,2017-08-11T18:45:00Z,1718,,2.0735,0.824624,2017-08-11,Arsenal,Premier League,1.0,1.0,75.0,15.0,0.0,1.0,0.0,0.2,0.2,0.0,6.0,1.0,56.0,65.0,86.2,59.0,1.0,4.0,True,52,40.0,1878,1245.0,49.421053,32.763158,50.0,26.0,50.0,50.0,120.0,48.0,60.0,60.0,180.0,66.0,60.0,60.0,251.0,85.0,62.75,62.75,323.0,105.0,64.6,64.6,540.0,318.0,54.0,54.0,991.0,669.0,49.55,49.55,40.0,52.0,1392.0,1737.0,36.631579,45.710526,30.0,36.0,30.0,30.0,49.0,111.0,24.5,24.5,73.0,155.0,24.333333,24.333333,151.0,176.0,37.75,37.75,212.0,198.0,42.4,42.4,440.0,388.0,44.0,44.0,789.0,916.0,39.45,39.45,2503,90.0,180.0,270.0,325.0,332.0,619.0,1177.0,119,4.278865,8.0,8.0,13.0,6.5,22.0,7.333333,22.0,6.092308,23.0,6.23494,38.0,5.52504,57.0,4.358539
24383,Héctor Bellerín,2,2,90,Arsenal,Stoke City,,,False,2,0,0,8,0,12.8,1,0,5.6,7.4,0,0,0,0,0,628098,0,1,36.0,-28590,14368,42958,0,2017-08-19T16:30:00Z,1718,,2.0735,0.581587,2017-08-19,Arsenal,Premier League,2.0,1.0,83.0,5.0,2.0,0.0,0.0,0.1,0.1,0.0,5.0,0.0,61.0,71.0,85.9,52.0,0.0,2.0,True,26,65.0,1930,1285.0,49.487179,32.948718,52.0,40.0,52.0,52.0,102.0,66.0,51.0,51.0,172.0,88.0,57.333333,57.333333,232.0,106.0,58.0,58.0,303.0,125.0,60.6,60.6,518.0,340.0,51.8,51.8,975.0,689.0,48.75,48.75,65.0,26.0,1439.0,1650.0,36.897436,42.307692,24.0,53.0,24.0,24.0,83.0,76.0,41.5,41.5,101.0,136.0,33.666667,33.666667,129.0,167.0,32.25,32.25,182.0,219.0,36.4,36.4,315.0,482.0,31.5,31.5,733.0,859.0,36.65,36.65,2593,90.0,180.0,270.0,360.0,415.0,619.0,1177.0,120,4.16506,1.0,1.0,9.0,4.5,14.0,4.666667,23.0,5.75,23.0,4.987952,34.0,4.943457,50.0,3.82328
24906,Héctor Bellerín,3,2,90,Arsenal,Liverpool,,,False,0,0,0,10,0,2.9,4,0,0.6,1.0,0,0,0,0,0,579314,0,4,2.0,-67892,7781,75673,0,2017-08-27T15:00:00Z,1718,,2.0735,1.619155,2017-08-27,Arsenal,Premier League,0.0,0.0,58.0,11.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,37.0,49.0,75.5,35.0,2.0,4.0,True,12,81.0,1956,1350.0,48.9,33.75,26.0,65.0,26.0,26.0,78.0,105.0,39.0,39.0,128.0,131.0,42.666667,42.666667,198.0,153.0,49.5,49.5,258.0,171.0,51.6,51.6,526.0,324.0,52.6,52.6,958.0,712.0,47.9,47.9,81.0,12.0,1967.0,1303.0,49.175,32.575,60.0,29.0,60.0,60.0,104.0,72.0,52.0,52.0,182.0,92.0,60.666667,60.666667,263.0,105.0,65.75,65.75,310.0,160.0,62.0,62.0,530.0,315.0,53.0,53.0,931.0,668.0,46.55,46.55,2683,90.0,180.0,270.0,360.0,450.0,619.0,1177.0,122,4.092434,2.0,2.0,3.0,1.5,11.0,3.666667,16.0,4.0,25.0,5.0,35.0,5.088853,52.0,3.976211
25441,Héctor Bellerín,4,2,90,Arsenal,Bournemouth,,,True,6,0,0,26,1,28.3,0,0,4.7,12.4,0,0,0,0,0,533242,0,3,6.0,-59067,13066,72133,0,2017-09-09T14:00:00Z,1718,,2.0735,0.379765,2017-09-09,Arsenal,Premier League,0.0,0.0,64.0,14.0,1.0,2.0,2.0,0.0,0.0,0.1,2.0,0.0,47.0,54.0,87.0,47.0,1.0,2.0,True,79,18.0,1968,1431.0,48.0,34.902439,12.0,81.0,12.0,12.0,38.0,146.0,19.0,19.0,90.0,186.0,30.0,30.0,140.0,212.0,35.0,35.0,210.0,234.0,42.0,42.0,495.0,378.0,49.5,49.5,895.0,784.0,44.75,44.75,18.0,79.0,1613.0,1739.0,39.341463,42.414634,21.0,39.0,21.0,21.0,39.0,105.0,19.5,19.5,64.0,164.0,21.333333,21.333333,100.0,194.0,25.0,25.0,148.0,219.0,29.6,29.6,372.0,406.0,37.2,37.2,747.0,857.0,37.35,37.35,2773,90.0,180.0,270.0,360.0,450.0,708.0,1267.0,122,3.959611,0.0,0.0,2.0,1.0,3.0,1.0,11.0,2.75,16.0,3.2,34.0,4.322034,52.0,3.693765
25987,Héctor Bellerín,5,2,90,Arsenal,Chelsea,,,False,5,0,0,27,1,29.8,0,0,4.3,11.4,0,0,0,0,0,508976,0,0,2.0,-30945,3865,34810,1,2017-09-17T12:30:00Z,1718,,2.0735,2.125018,2017-09-17,Arsenal,Premier League,0.0,0.0,67.0,10.0,1.0,2.0,5.0,0.0,0.0,0.4,3.0,0.0,46.0,59.0,78.0,46.0,0.0,0.0,True,57,48.0,2047,1449.0,48.738095,34.5,79.0,18.0,79.0,79.0,91.0,99.0,45.5,45.5,117.0,164.0,39.0,39.0,169.0,204.0,42.25,42.25,219.0,230.0,43.8,43.8,512.0,377.0,51.2,51.2,928.0,774.0,46.4,46.4,48.0,57.0,2251.0,1175.0,53.595238,27.97619,46.0,23.0,46.0,46.0,119.0,40.0,59.5,59.5,164.0,62.0,54.666667,54.666667,190.0,105.0,47.5,47.5,259.0,128.0,51.8,51.8,590.0,244.0,59.0,59.0,986.0,581.0,49.3,49.3,2863,90.0,180.0,270.0,360.0,450.0,708.0,1355.0,128,4.023751,6.0,6.0,6.0,3.0,8.0,2.666667,9.0,2.25,17.0,3.4,33.0,4.194915,57.0,3.785978


In [50]:
# summary with lag features added
lag_train_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 114803 entries, 0 to 114802
Columns: 151 entries, player to total_points_pg_last_20
dtypes: bool(2), float64(115), int64(26), object(8)
memory usage: 136.6+ MB


We also return a lists for the player lag and team lag variables which will help us when modelling. For now these only have the <i>per game</i> calculated values and minutes (for players).

In [51]:
player_lag_vars

['minutes_last_all',
 'minutes_last_1',
 'minutes_last_2',
 'minutes_last_3',
 'minutes_last_4',
 'minutes_last_5',
 'minutes_last_10',
 'minutes_last_20',
 'total_points_pg_last_all',
 'total_points_pg_last_1',
 'total_points_pg_last_2',
 'total_points_pg_last_3',
 'total_points_pg_last_4',
 'total_points_pg_last_5',
 'total_points_pg_last_10',
 'total_points_pg_last_20']

In [52]:
team_lag_vars

['total_points_team_pg_last_all',
 'total_points_team_conceded_pg_last_all',
 'total_points_team_pg_last_1',
 'total_points_team_conceded_pg_last_1',
 'total_points_team_pg_last_2',
 'total_points_team_conceded_pg_last_2',
 'total_points_team_pg_last_3',
 'total_points_team_conceded_pg_last_3',
 'total_points_team_pg_last_4',
 'total_points_team_conceded_pg_last_4',
 'total_points_team_pg_last_5',
 'total_points_team_conceded_pg_last_5',
 'total_points_team_pg_last_10',
 'total_points_team_conceded_pg_last_10',
 'total_points_team_pg_last_20',
 'total_points_team_conceded_pg_last_20',
 'total_points_team_pg_last_all_opponent',
 'total_points_team_conceded_pg_last_all_opponent',
 'total_points_team_pg_last_1_opponent',
 'total_points_team_conceded_pg_last_1_opponent',
 'total_points_team_pg_last_2_opponent',
 'total_points_team_conceded_pg_last_2_opponent',
 'total_points_team_pg_last_3_opponent',
 'total_points_team_conceded_pg_last_3_opponent',
 'total_points_team_pg_last_4_opponent',

Now we have an easy way of getting the points per game total for any player at any point in time. Here is Salah at the last gameweek in the 2019/20 season.

In [54]:
lag_train_df[(lag_train_df['season'] == '1920') & 
             (lag_train_df['gw'] == 38) & 
             (lag_train_df['player'] == 'Mohamed Salah')]['total_points_pg_last_all'].mean()

7.9294274300932095

Here is a check that summing up and dividing all points and minutes to that point in time gives the same answer.

In [57]:
(train_df[:89771][train_df[:89771]['player'] == 'Mohamed Salah']['total_points'].sum() * 90 
 / train_df[:89771][train_df[:89771]['player'] == 'Mohamed Salah']['minutes'].sum())

7.9294274300932095

And we can use the same approach to see the average points per game (per 90 minutes) across all players. We'll use this in the simple baseline model.

In [58]:
# points per minute across all players and minutes
(train_df['total_points'].sum() * 90 / train_df['minutes'].sum())

3.746600642069819

But need to be somewhat aware that players with appearances with predominantly low number of minutes may have artificially high point per minute values due to the fact that they will get at least 1 point over 1-10 minutes of time

In [59]:
# extreme example of points per minute for all appearances under 10 minutes
(train_df[train_df['minutes'] < 10]['total_points'].sum() * 90 / train_df[train_df['minutes'] < 10]['minutes'].sum())

22.143577188940093

The performance of any model may vary across the season. For example, performance may be worse at the start of the season due to new players / teams, and big changes within existing teams that aren't captured in the historical data. Also, we will be generating forecasts at any one point for the remainder of the season - if we're at gameweek 1, then a forecast for gameweek 2 is likely to be more accurate than for gameweek 10.

We therefore need a sensible way to combine these different situations into our validation of models.

The standard way to do this with a time series problem is assess the model on a sequence of time steps. In our case, we will do this for the most recent complete season, starting at gameweek 1 and moving through to the end of the season. In FPL we are also generally more concerned with the near future, so we'll only assess the performance of the next 6 fixtures.

Here's how the validation looks in practice:

1. Train using all data up to but not including gw 1; use model to predict gw 1-6; calculate error for gw 1-6 predictions
2. Train using all data up to but not including gw 2; use model to predict gw 2-7; calculate error for gw 2-7 predictions
3. Train using all data up to but not including gw 3; use model to predict gw 3-8; calculate error for gw 3-8 predictions

.. repeat until...

33. Train using all data up to but not including gw 33; use model to predict gw 33-38; calculate rmse for gw 33-38 predictions

We can then look at how the performance varies across the validation season, as well as averaging performance across all weeks to give us a single validation number for each model.

It will be helpful to have a function that returns indexes for the start and end of a validation periods, given a season, gameweek and length of validation.

In [60]:
# validation set indexes
# training will always be from start of data up to valid-start
def validation_gw_idx(df, season, gw, length):
    
    valid_start = df[(df['gw'] == gw) & (df['season'] == season)].index.min()
    valid_end = df[(df['gw'] == min(gw+length-1, 38)) & (df['season'] == season)].index.max()

    return (valid_start, valid_end)

In [61]:
# try it
validation_gw_idx(train_df, '1920', 1, 6)

(67936, 71131)

Now that we have a good sense of the dataset, created a few extra time-series features, and decided on a validation approach, it's time to create a simple baseline model in the next notebook.

Note: We want to use the above functions in subsequent notebooks, so to avoid having to write them out again they have been added to the util.py module, and can be imported into any notebook from the fpl_predictor modele by running:

```from fpl_predictor import util *```

This is the case for all functions in this series of notebooks.