In [1]:
import pandas as pd
import numpy as np
from pathlib import Path
from IPython.display import display

In [2]:
# path to project directory
path = Path('./')

In [3]:
# read in training dataset
train_df = pd.read_csv(path/'data/train_v4.csv', index_col=0, dtype={'season':str})

These are the fields in the base dataset, all from fpl and transfermarkt

In [4]:
# summary of fields
train_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 90437 entries, 0 to 90436
Data columns (total 37 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   player                                      90437 non-null  object 
 1   gw                                          90437 non-null  int64  
 2   position                                    90437 non-null  int64  
 3   minutes                                     90437 non-null  int64  
 4   team                                        90437 non-null  object 
 5   opponent_team                               90437 non-null  object 
 6   relative_market_value_team                  22501 non-null  float64
 7   relative_market_value_opponent_team         22501 non-null  float64
 8   was_home                                    90437 non-null  bool   
 9   total_points                                90437 non-null  int64  
 10  assists   

Each row represents one player's performance in a single fixture, and will be unique across the player name and kickoff time fields:

- player (player name)
- kickoff_time (kickoff time for the fixture)

The fixtures are futher defined with the following fields:

- team (the player's team)
- opponent_team (the opposition team)
- was_home (was it a home game for the player)
- season (e.g. '1920' for the 2019/20 season)
- gw (the FPL gameweek in which the fixture occured)

Note that there can be multiple fixtures (i.e. rows for a given player) in a single gameweek.

The position that a player plays is also given, this will be consistent for each player within seasons, but may change between seasons:

- position (1 - goalkeeper, 2 - defender, 3 - midfielder, 4 - forward)

Most of the other fields describe the player (or team's) performance in the fixture e.g. the number of munites played, points scored, assists, goals, goals conceded while on the field, etc.

All the above should be 100% complete for all rows.

Incomplete fields for FPL data are:

- transfer and selected values (transfers_in, transfers_out, transfers_balance, selected) - these were only collected from the start of the 2019/20 season, and require further investigation as to what they actually represent (in other words, treat with caution when modelling); values prior to the 2019/20 are set to 0
- play_proba - again only collected from the start of the 2019/20 season, this is the probability that the the player would actually be available for the fixture according to the FPL website (note that the time that this is captured each week  varies); values prior to the 2019/20 are null, and they are also null for any new players in a given gameweek (i.e. players that FPL has added to the game during that gameweek)

Finally, team transfer market value is taken from transfermarkt each week (for the 2019/20) season or a single value has been taken for the whole season:

- relative_market_value_team - the market value for the team scraped during that gameweek (non null from start of 2019/20 season)
- relative_market_value_opponent_team - the market value for the opposition team scraped during that gameweek (non null from start of 2019/20 season)
- relative_market_value_team_season - a single value for the team's value from the the start of each season 
- relative_market_value_team_season - a single value for the opposition team's value from the the start of each season 

In [5]:
# take a look at some data
pd.options.display.max_columns = None
train_df.head(10)

Unnamed: 0,player,gw,position,minutes,team,opponent_team,relative_market_value_team,relative_market_value_opponent_team,was_home,total_points,assists,bonus,bps,clean_sheets,creativity,goals_conceded,goals_scored,ict_index,influence,own_goals,penalties_missed,penalties_saved,red_cards,saves,selected,team_a_score,team_h_score,threat,transfers_balance,transfers_in,transfers_out,yellow_cards,kickoff_time,season,play_proba,relative_market_value_team_season,relative_market_value_opponent_team_season
0,Aaron_Cresswell,1,2,0,West Ham United,Chelsea,,,False,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,14023,1,2,0.0,0,0,0,0,2016-08-15T19:00:00Z,1617,,0.895471,2.243698
1,Aaron_Lennon,1,3,15,Everton,Tottenham Hotspur,,,True,1,0,0,6,0,0.3,0,0,0.9,8.2,0,0,0,0,0,13918,1,1,0.0,0,0,0,0,2016-08-13T14:00:00Z,1617,,1.057509,1.43369
2,Aaron_Ramsey,1,3,60,Arsenal,Liverpool,,,True,2,0,0,5,0,4.9,3,0,3.0,2.2,0,0,0,0,0,163170,4,3,23.0,0,0,0,0,2016-08-14T15:00:00Z,1617,,1.944129,1.46586
3,Abdoulaye_Doucouré,1,3,0,Watford,Southampton,,,False,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,1051,1,1,0.0,0,0,0,0,2016-08-13T14:00:00Z,1617,,0.7042,0.796805
4,Abdul Rahman_Baba,1,2,0,Chelsea,West Ham United,,,True,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,1243,1,2,0.0,0,0,0,0,2016-08-15T19:00:00Z,1617,,2.243698,0.895471
5,Abel_Hernández,1,4,90,Hull City,Leicester City,,,True,5,1,0,10,0,12.2,1,0,5.7,14.4,0,0,0,0,0,26039,1,2,30.0,0,0,0,0,2016-08-13T11:30:00Z,1617,,0.494447,0.650832
6,Adama_Diomande,1,4,90,Hull City,Leicester City,,,True,8,0,2,29,0,16.8,1,1,10.7,45.2,0,0,0,0,0,38151,1,2,45.0,0,0,0,0,2016-08-13T11:30:00Z,1617,,0.494447,0.650832
7,Adam_Clayton,1,3,90,Middlesbrough,Stoke City,,,True,2,0,0,6,0,2.2,1,0,1.4,3.2,0,0,0,0,0,17663,1,1,9.0,0,0,0,0,2016-08-13T14:00:00Z,1617,,0.452793,0.718705
8,Adam_Federici,1,1,0,Bournemouth,Manchester United,,,True,0,0,0,0,0,0.0,0,0,0.0,0.0,0,0,0,0,0,4315,3,1,0.0,0,0,0,0,2016-08-14T12:30:00Z,1617,,0.384921,1.983179
9,Adam_Forshaw,1,3,69,Middlesbrough,Stoke City,,,True,1,0,0,3,0,1.3,1,0,0.3,2.0,0,0,0,0,0,2723,1,1,0.0,0,0,0,1,2016-08-13T14:00:00Z,1617,,0.452793,0.718705


Since this is a time series problem we have some functions that create various rolling totals and averages for players and teams

In [6]:
# player level lag features
def player_lag_features(df, features, lags):    
    df_new = df.copy()
    
    # need minutes for per game stats, add to front of list
    features.insert(0, 'minutes')

    # calculate totals for each lag period
    for feature in features:
        for lag in lags:
            feature_name = feature + '_last_' + str(lag)
            
            if lag == 'all':
                df_new[feature_name] = df_new.groupby(['player'])[feature].apply(lambda x: x.cumsum() - x)
            else: 
                df_new[feature_name] = df_new.groupby(['player'])[feature].apply(lambda x: x.rolling(min_periods=1, 
                                                                                            window=lag+1).sum() - x)
            if feature != 'minutes':
                minute_name = 'minutes_last_' + str(lag)
                pg_feature_name = feature + '_pg_last_' + str(lag)
                
                df_new[pg_feature_name] = 90 * df_new[feature_name] / df_new[minute_name] 
#                 df_new[pg_feature_name] = df_new[pg_feature_name].fillna(0)
                
    return df_new

In [7]:
# team level lag features
def team_lag_features(df, features, lags):
    for feature in features:
        feature_team_name = feature + '_team'
        feature_team = (df.groupby(['team', 'season', 'gw',
                                   'kickoff_time', 'opponent_team'])
                        [feature].sum().rename(feature_team_name).reset_index())
                
        for lag in lags:
            feature_name = feature + '_team_last_' + str(lag)
            pg_feature_name = feature + '_team_pg_last_' + str(lag)
            
            if lag == 'all':
                feature_team[feature_name] = (feature_team.groupby('team')[feature_team_name]
                                              .apply(lambda x: x.cumsum() - x))
                
                feature_team[pg_feature_name] = (feature_team[feature_name]
                                                 / feature_team.groupby('team').cumcount())
                
            else:
                feature_team[feature_name] = (feature_team.groupby('team')[feature_team_name]
                                              .apply(lambda x: x.rolling(min_periods=1, 
                                                                         window=lag + 1).sum() - x))
                
                feature_team[pg_feature_name] = (feature_team[feature_name] / 
                                                 feature_team.groupby('team')[feature_team_name]
                                                 .apply(lambda x: x.rolling(min_periods=1, 
                                                                            window=lag + 1).count() - 1))
        
        df_new = df.merge(feature_team, 
                          on=['team', 'season', 'gw', 'kickoff_time', 'opponent_team'], 
                          how='left')
        
        df_new = df_new.merge(feature_team,
                 left_on=['team', 'season', 'gw', 'kickoff_time', 'opponent_team'],
                 right_on=['opponent_team', 'season', 'gw', 'kickoff_time', 'team'],
                 how='left',
                 suffixes = ('', '_opponent'))
        
        df_new.drop(['team_opponent', 'opponent_team_opponent'], axis=1)
        
        return df_new

For example, the following creates totals and per game (per 90 mins) averages for points going back 1, 2, 3, 4, 5, 10, 20 and all previous weeks. This is done at both player and team level.

This has been checked for points totals, but should also work for any other stat such as player/team goals scored, assists, player goals conceded. However, it will not currently work for team level stats such as team goals conceded where adding up the goals conceded across all the team players would be incorrect.

In [8]:
# create some lag features
lag_train_df = team_lag_features(train_df, ['total_points'], ['all', 1, 2, 3, 4, 5, 10, 20])
lag_train_df = player_lag_features(lag_train_df, ['total_points'], ['all', 1, 2, 3, 4, 5, 10, 20])

You can see below that the player's (Salah) historic point totals and per game totals are given, as well as the totals for his team (Liverpool) and whichever team he is playing in that gameweek (e.g. his debut was versus Watord on the 12th August 2017, so Watford's running point totals and per game totals are also given).

Note that if it is the first game since the start of the 2016/17 season for the team or opposition, then the point totals for previous games will be 0 and the per game totals will be null. If the player has not had any minutes in the previous number of games being calculated, then the point totals will also be 0, and per game totals null.

In [9]:
# look at resulting dataset for a player
lag_train_df[lag_train_df['player'] == 'Mohamed_Salah']

Unnamed: 0,player,gw,position,minutes,team,opponent_team,relative_market_value_team,relative_market_value_opponent_team,was_home,total_points,assists,bonus,bps,clean_sheets,creativity,goals_conceded,goals_scored,ict_index,influence,own_goals,penalties_missed,penalties_saved,red_cards,saves,selected,team_a_score,team_h_score,threat,transfers_balance,transfers_in,transfers_out,yellow_cards,kickoff_time,season,play_proba,relative_market_value_team_season,relative_market_value_opponent_team_season,total_points_team,total_points_team_last_all,total_points_team_pg_last_all,total_points_team_last_1,total_points_team_pg_last_1,total_points_team_last_2,total_points_team_pg_last_2,total_points_team_last_3,total_points_team_pg_last_3,total_points_team_last_4,total_points_team_pg_last_4,total_points_team_last_5,total_points_team_pg_last_5,total_points_team_last_10,total_points_team_pg_last_10,total_points_team_last_20,total_points_team_pg_last_20,team_opponent,opponent_team_opponent,total_points_team_opponent,total_points_team_last_all_opponent,total_points_team_pg_last_all_opponent,total_points_team_last_1_opponent,total_points_team_pg_last_1_opponent,total_points_team_last_2_opponent,total_points_team_pg_last_2_opponent,total_points_team_last_3_opponent,total_points_team_pg_last_3_opponent,total_points_team_last_4_opponent,total_points_team_pg_last_4_opponent,total_points_team_last_5_opponent,total_points_team_pg_last_5_opponent,total_points_team_last_10_opponent,total_points_team_pg_last_10_opponent,total_points_team_last_20_opponent,total_points_team_pg_last_20_opponent,minutes_last_all,minutes_last_1,minutes_last_2,minutes_last_3,minutes_last_4,minutes_last_5,minutes_last_10,minutes_last_20,total_points_last_all,total_points_pg_last_all,total_points_last_1,total_points_pg_last_1,total_points_last_2,total_points_pg_last_2,total_points_last_3,total_points_pg_last_3,total_points_last_4,total_points_pg_last_4,total_points_last_5,total_points_pg_last_5,total_points_last_10,total_points_pg_last_10,total_points_last_20,total_points_pg_last_20
24036,Mohamed_Salah,1,3,85,Liverpool,Watford,,,False,11,1,1,26,0,2.8,2,1,8.2,24.6,0,0,0,0,0,874608,3,3,55.0,0,0,0,0,2017-08-12T11:30:00Z,1718,,1.619155,0.547242,44,1863,49.026316,78.0,78.0,159.0,79.5,206.0,68.666667,266.0,66.50,292.0,58.4,510.0,51.0,916.0,45.80,Watford,Liverpool,43,1264,33.263158,15.0,15.0,41.0,20.5,63.0,21.000000,84.0,21.00,108.0,21.6,319.0,31.9,633.0,31.65,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,
24551,Mohamed_Salah,2,3,29,Liverpool,Crystal Palace,,,True,1,0,0,0,0,12.3,0,0,5.1,10.4,0,0,0,0,0,1293309,0,1,28.0,175914,193660,17746,0,2017-08-19T14:00:00Z,1718,,1.619155,0.635984,60,1907,48.897436,44.0,44.0,122.0,61.0,203.0,67.666667,250.0,62.50,310.0,62.0,522.0,52.2,901.0,45.05,Crystal Palace,Liverpool,29,1427,36.589744,15.0,15.0,34.0,17.0,116.0,38.666667,129.0,32.25,146.0,29.2,356.0,35.6,769.0,38.45,85,85.0,85.0,85.0,85.0,85.0,85.0,85.0,11,11.647059,11.0,11.647059,11.0,11.647059,11.0,11.647059,11.0,11.647059,11.0,11.647059,11.0,11.647059,11.0,11.647059
25076,Mohamed_Salah,3,3,90,Liverpool,Arsenal,,,True,11,1,0,39,1,25.3,0,1,19.9,70.4,0,0,0,0,0,1158692,0,4,103.0,-184736,27792,212528,0,2017-08-27T15:00:00Z,1718,,1.619155,2.073500,81,1967,49.175000,60.0,60.0,104.0,52.0,182.0,60.666667,263.0,65.75,310.0,62.0,530.0,53.0,931.0,46.55,Arsenal,Liverpool,12,1956,48.900000,26.0,26.0,78.0,39.0,128.0,42.666667,198.0,49.50,258.0,51.6,526.0,52.6,958.0,47.90,114,29.0,114.0,114.0,114.0,114.0,114.0,114.0,12,9.473684,1.0,3.103448,12.0,9.473684,12.0,9.473684,12.0,9.473684,12.0,9.473684,12.0,9.473684,12.0,9.473684
25614,Mohamed_Salah,4,3,45,Liverpool,Manchester City,,,False,1,0,0,4,0,13.8,1,0,4.7,7.8,0,0,0,0,0,1422941,0,5,25.0,177596,238283,60687,0,2017-09-09T11:30:00Z,1718,,1.619155,2.016093,8,2048,49.951220,81.0,81.0,141.0,70.5,185.0,61.666667,263.0,65.75,344.0,68.8,577.0,57.7,982.0,49.10,Manchester City,Liverpool,88,1985,48.414634,39.0,39.0,63.0,31.5,130.0,43.333333,220.0,55.00,279.0,55.8,578.0,57.8,1080.0,54.00,204,90.0,119.0,204.0,204.0,204.0,204.0,204.0,23,10.147059,11.0,11.000000,12.0,9.075630,23.0,10.147059,23.0,10.147059,23.0,10.147059,23.0,10.147059,23.0,10.147059
26160,Mohamed_Salah,5,3,90,Liverpool,Burnley,,,True,10,0,3,27,0,35.8,1,1,17.4,51.2,0,0,0,0,0,1571656,1,1,87.0,122769,224328,101559,0,2017-09-16T14:00:00Z,1718,,1.619155,0.316798,34,2056,48.952381,8.0,8.0,89.0,44.5,149.0,49.666667,193.0,48.25,271.0,54.2,545.0,54.5,952.0,47.60,Burnley,Liverpool,37,1540,36.666667,53.0,53.0,88.0,44.0,111.0,37.000000,154.0,38.50,181.0,36.2,347.0,34.7,736.0,36.80,249,45.0,135.0,164.0,249.0,249.0,249.0,249.0,24,8.674699,1.0,2.000000,12.0,8.000000,13.0,7.134146,24.0,8.674699,24.0,8.674699,24.0,8.674699,24.0,8.674699
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87220,Mohamed_Salah,34,3,90,Liverpool,Brighton and Hove Albion,2.432402,0.478857,False,18,1,3,47,0,24.6,1,2,24.9,91.4,0,0,0,0,0,2658414,3,1,133.0,-78275,15923,94198,0,2020-07-08T19:15:00Z,1920,1.0,2.297572,0.476156,52,8025,54.591837,74.0,74.0,87.0,43.5,178.0,59.333333,226.0,56.50,272.0,54.4,560.0,56.0,1206.0,60.30,Brighton and Hove Albion,Liverpool,28,3877,35.568807,63.0,63.0,83.0,41.5,125.0,41.666667,168.0,42.00,216.0,43.2,359.0,35.9,726.0,36.30,8672,90.0,180.0,270.0,270.0,360.0,810.0,1571.0,767,7.960101,6.0,6.000000,8.0,4.000000,19.0,6.333333,19.0,6.333333,28.0,7.000000,70.0,7.777778,133.0,7.619351
87873,Mohamed_Salah,35,3,90,Liverpool,Burnley,2.434670,0.358132,True,2,0,0,1,0,37.0,1,0,12.6,16.4,0,0,0,0,0,2758220,1,1,73.0,81427,110472,29045,0,2020-07-11T14:00:00Z,1920,1.0,2.297572,0.441799,43,8077,54.574324,52.0,52.0,126.0,63.0,139.0,46.333333,230.0,57.50,278.0,55.6,540.0,54.0,1211.0,60.55,Burnley,Liverpool,35,5576,37.675676,62.0,62.0,99.0,49.5,165.0,55.000000,225.0,56.25,237.0,47.4,472.0,47.2,821.0,41.05,8762,90.0,180.0,270.0,360.0,360.0,810.0,1593.0,785,8.063228,18.0,18.000000,24.0,12.000000,26.0,8.666667,37.0,9.250000,37.0,9.250000,74.0,8.222222,148.0,8.361582
88528,Mohamed_Salah,36,3,82,Liverpool,Arsenal,2.458682,1.500845,False,2,0,0,-3,0,3.3,2,0,9.1,7.4,0,0,0,0,0,2609146,1,2,80.0,-111419,24828,136247,0,2020-07-15T19:15:00Z,1920,1.0,2.297572,1.448866,31,8120,54.496644,43.0,43.0,95.0,47.5,169.0,56.333333,182.0,45.50,273.0,54.6,494.0,49.4,1191.0,59.55,Arsenal,Liverpool,38,6792,45.583893,27.0,27.0,61.0,30.5,127.0,42.333333,201.0,50.25,268.0,53.6,508.0,50.8,891.0,44.55,8852,90.0,180.0,270.0,360.0,450.0,810.0,1683.0,787,8.001582,2.0,2.000000,20.0,10.000000,26.0,8.666667,28.0,7.000000,39.0,7.800000,60.0,6.666667,150.0,8.021390
89191,Mohamed_Salah,37,3,78,Liverpool,Chelsea,2.451453,2.064820,True,5,1,0,15,0,16.8,3,0,9.5,25.6,0,0,0,0,0,2559844,3,5,53.0,-69649,24957,94606,0,2020-07-22T19:15:00Z,1920,1.0,2.297572,1.798870,64,8151,54.340000,31.0,31.0,74.0,37.0,126.0,42.000000,200.0,50.00,213.0,42.6,466.0,46.6,1148.0,57.40,Chelsea,Liverpool,33,7388,49.253333,62.0,62.0,82.0,41.0,129.0,43.000000,210.0,52.50,247.0,49.4,500.0,50.0,890.0,44.50,8934,82.0,172.0,262.0,352.0,442.0,802.0,1675.0,789,7.948287,2.0,2.195122,4.0,2.093023,22.0,7.557252,28.0,7.159091,30.0,6.108597,59.0,6.620948,139.0,7.468657


In [10]:
# summary with lag features added
lag_train_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 90437 entries, 0 to 90436
Data columns (total 97 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   player                                      90437 non-null  object 
 1   gw                                          90437 non-null  int64  
 2   position                                    90437 non-null  int64  
 3   minutes                                     90437 non-null  int64  
 4   team                                        90437 non-null  object 
 5   opponent_team                               90437 non-null  object 
 6   relative_market_value_team                  22501 non-null  float64
 7   relative_market_value_opponent_team         22501 non-null  float64
 8   was_home                                    90437 non-null  bool   
 9   total_points                                90437 non-null  int64  
 10  assists   

We can look at the points per game total for any player at any point in time. Here is Salah.

In [11]:
lag_train_df[(lag_train_df['season'] == '1920') & 
             (lag_train_df['gw'] == 38) & 
             (lag_train_df['player'] == 'Mohamed_Salah')]['total_points_pg_last_all'].mean()

7.9294274300932095

Here is a check that summing up and dividing all points and minutes to that point in time gives the same answer.

In [12]:
(train_df[:89771][train_df[:89771]['player'] == 'Mohamed_Salah']['total_points'].sum() * 90 
 / train_df[:89771][train_df[:89771]['player'] == 'Mohamed_Salah']['minutes'].sum())

7.9294274300932095

And we can use the same approach to see the average points per game (per 90 minutes) across all players. We'll use this in the simple baseline model.

In [13]:
# points per minute across all players and minutes
(train_df['total_points'].sum() * 90 / train_df['minutes'].sum())

3.7388996509104

But need to be somewhat aware that players with appearances with predominantly low number of minutes may have artificially high point per minute values due to the fact that they will get at least 1 point over 1-10 minutes of time

In [14]:
# extreme example of points per minute for all appearances under 10 minutes
(train_df[train_df['minutes'] < 10]['total_points'].sum() * 90 / train_df[train_df['minutes'] < 10]['minutes'].sum())

22.189395937747296

Any model is likely to perform worse at the start of the season due to new players / teams, and big changes within existing teams. It is also potentially better for games closer to the final game included in training.

We therefore need a sensisble way to combine these different situations into our validation of models. I decided to validate by looking at four points during the season (i.e. training the model four times, each up to one of these points), and assessing the RMSE for three different sets of predictions.

Validation points:
- Start of season
- 25% through season (gw 10)
- 50% through season (gw 20)
- 75% through season (gw 30)

Predictions used for RMSE
- gameweek following the last gameweek in training
- 2nd and 3rd gameweek
- 4th, 5th, 6th gameweek

The idea is that, when the model is trained each gameweek, we care most about the next few fixtures, and in particularly the next fixture.

In [45]:
# validation set indexes
# training will always be from start of data up to valid-start
def validation_gw_idx(season, gw):
    
    valid_start = train_df[(train_df['gw'] == gw) & (train_df['season'] == season)].index.min()
    valid_end_gw1 = train_df[(train_df['gw'] == gw) & (train_df['season'] == season)].index.max()
    valid_end_gw3 = train_df[(train_df['gw'] == min(gw+2, 38)) & (train_df['season'] == season)].index.max()
    valid_end_gw6 = train_df[(train_df['gw'] == min(gw+5, 38)) & (train_df['season'] == season)].index.max()
    
    return (valid_start, valid_end_gw1, valid_end_gw3, valid_end_gw6)

def validation_season_idx(season, gws):
    
    valid_idx_tups = []
    
    for gw in gws:
        valid_idx_tups.append(validation_gw_idx(season, gw))
    
    return valid_idx_tups        

In [46]:
validation_season_idx('1920', [1, 10, 20, 30])

[(67936, 68461, 69519, 71131),
 (72784, 73339, 74455, 76144),
 (78390, 78973, 80156, 82053),
 (84433, 85193, 86485, 88444)]

In [47]:
validation_gw_idx('1920', 1)

(67936, 68461, 69519, 71131)