### Training a Random Forest Regressor model
Training a random forest regressor that predicts FPL points for the upcoming GW for players. This is a general model, and uses position as one of the predictors. A future step might be to produce a separate model for each position so that position-specific features can be better considered.

Rolling game statistics are key to the model - they will be computed on the previous three games for each player, and used as predictor features.

In [29]:
import pickle
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score, explained_variance_score

In [2]:
data = pd.read_csv("final_data_official.csv")
data

Unnamed: 0,name,club,element,fixture,opponent_team,total_points,was_home,kickoff_time,team_h_score,team_a_score,...,ict_index,value,transfers_balance,selected,transfers_in,transfers_out,position,season,opponent_team_name,result
0,Aaron Cresswell,West Ham United,454,10,4,0,False,2016-08-15 19:00:00+00:00,2.0,1.0,...,0.0,55,0,14023,0,0,Defender,2016/2017,Chelsea,L
1,Aaron Lennon,Everton,142,3,17,1,True,2016-08-13 14:00:00+00:00,1.0,1.0,...,0.9,60,0,13918,0,0,Midfielder,2016/2017,Tottenham Hotspur,D
2,Abdoulaye Doucouré,Watford,482,7,13,0,False,2016-08-13 14:00:00+00:00,1.0,1.0,...,0.0,50,0,1051,0,0,Midfielder,2016/2017,Southampton,D
3,Adam Forshaw,Middlesbrough,286,6,14,1,True,2016-08-13 14:00:00+00:00,1.0,1.0,...,0.3,45,0,2723,0,0,Midfielder,2016/2017,Stoke City,D
4,Adam Lallana,Liverpool,205,8,1,11,False,2016-08-14 15:00:00+00:00,3.0,4.0,...,14.2,70,0,155525,0,0,Midfielder,2016/2017,Arsenal,W
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
109075,Justin Hubner,Wolverhampton Wanderers,751,149,6,0,True,2023-12-05 19:30:00+00:00,1.0,0.0,...,0.0,40,344,555,375,31,Defender,2023/2024,Burnley,W
109076,Justin Hubner,Wolverhampton Wanderers,751,160,16,0,True,2023-12-09 15:00:00+00:00,1.0,1.0,...,0.0,40,329,1078,435,106,Defender,2023/2024,Nottingham Forest,D
109077,Justin Hubner,Wolverhampton Wanderers,751,170,19,0,False,2023-12-17 14:00:00+00:00,3.0,0.0,...,0.0,40,229,1583,418,189,Defender,2023/2024,West Ham United,L
109078,Justin Hubner,Wolverhampton Wanderers,751,180,7,0,True,2023-12-24 13:00:00+00:00,2.0,1.0,...,0.0,40,42,1763,197,155,Defender,2023/2024,Chelsea,W


In [3]:
# encoding categorical variables (club, opponent club, position, and result) as integers
club_encoder = LabelEncoder()
opp_encoder = LabelEncoder()
pos_encoder = LabelEncoder()
data["club_encoded"] = club_encoder.fit_transform(data["club"])
data["opponent_encoded"] = opp_encoder.fit_transform(data["opponent_team_name"])
data["position_encoded"] = pos_encoder.fit_transform(data["position"])
# there is inherent ordinality to results - W > D > L
result_mapping = {'W': 2, 'D': 1, 'L': 0}
data['result_encoded'] = data['result'].map(result_mapping)
data

Unnamed: 0,name,club,element,fixture,opponent_team,total_points,was_home,kickoff_time,team_h_score,team_a_score,...,transfers_in,transfers_out,position,season,opponent_team_name,result,club_encoded,opponent_encoded,position_encoded,result_encoded
0,Aaron Cresswell,West Ham United,454,10,4,0,False,2016-08-15 19:00:00+00:00,2.0,1.0,...,0,0,Defender,2016/2017,Chelsea,L,31,6,0,0
1,Aaron Lennon,Everton,142,3,17,1,True,2016-08-13 14:00:00+00:00,1.0,1.0,...,0,0,Midfielder,2016/2017,Tottenham Hotspur,D,9,27,3,1
2,Abdoulaye Doucouré,Watford,482,7,13,0,False,2016-08-13 14:00:00+00:00,1.0,1.0,...,0,0,Midfielder,2016/2017,Southampton,D,29,23,3,1
3,Adam Forshaw,Middlesbrough,286,6,14,1,True,2016-08-13 14:00:00+00:00,1.0,1.0,...,0,0,Midfielder,2016/2017,Stoke City,D,19,24,3,1
4,Adam Lallana,Liverpool,205,8,1,11,False,2016-08-14 15:00:00+00:00,3.0,4.0,...,0,0,Midfielder,2016/2017,Arsenal,W,15,0,3,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
109075,Justin Hubner,Wolverhampton Wanderers,751,149,6,0,True,2023-12-05 19:30:00+00:00,1.0,0.0,...,375,31,Defender,2023/2024,Burnley,W,32,5,0,2
109076,Justin Hubner,Wolverhampton Wanderers,751,160,16,0,True,2023-12-09 15:00:00+00:00,1.0,1.0,...,435,106,Defender,2023/2024,Nottingham Forest,D,32,21,0,1
109077,Justin Hubner,Wolverhampton Wanderers,751,170,19,0,False,2023-12-17 14:00:00+00:00,3.0,0.0,...,418,189,Defender,2023/2024,West Ham United,L,32,30,0,0
109078,Justin Hubner,Wolverhampton Wanderers,751,180,7,0,True,2023-12-24 13:00:00+00:00,2.0,1.0,...,197,155,Defender,2023/2024,Chelsea,W,32,6,0,2


In [4]:
# to compute rolling statistics, we group the dataset by player
players_grp = data.groupby(["name"])
haaland = players_grp.get_group("Erling Haaland")
haaland[["ict_index", "selected"]]

Unnamed: 0,ict_index,selected
69231,14.0,3398599
69802,5.5,5226268
70381,12.5,5676608
70971,19.4,5967805
71570,18.9,6793855
72174,13.5,7781487
72768,6.3,8407273
73233,30.5,8548463
73868,12.8,8905419
74505,15.4,8989666


In [5]:
# function that computes assigns rolling averages for `cols` for each group (each player)
def rolling_averages(group, cols, new_cols):
    # sort all data by kickoff time, earliest to latest (ascending)
    group = group.sort_values(["kickoff_time"])
    # a mask that identifies where the season changes
    season_mask = group["season"] != group["season"].shift(1)
    # a cumulative count of season changes
    change_count = season_mask.cumsum()
    # rolling averages separately for each season - window resets if season changes
    group[new_cols] = (
        group.groupby(change_count)[cols]
        .rolling(window=3, closed="left", min_periods=1)
        .mean()
        .reset_index(level=0, drop=True)
    )
    group = group.dropna(subset=new_cols)
    return group

In [6]:
def rolling_averages_old(group, cols, new_cols):
    # sort data by date, ascending
    group = group.sort_values(["kickoff_time"])
    # compute rolling average - note that `closed = left` means that future data isn't used
    rolling_stats = group[cols].rolling(window = 3, closed = "left", min_periods=1).mean()
    # assign the rolling statistics back to the original dataframe
    group[new_cols] = rolling_stats
    # drop the observations containing missing values to prevent propagation of NaN values
    group = group.dropna(subset = new_cols)
    return group

In [7]:
data.columns

Index(['name', 'club', 'element', 'fixture', 'opponent_team', 'total_points',
       'was_home', 'kickoff_time', 'team_h_score', 'team_a_score', 'round',
       'minutes', 'goals_scored', 'assists', 'clean_sheets', 'goals_conceded',
       'own_goals', 'penalties_saved', 'penalties_missed', 'yellow_cards',
       'red_cards', 'saves', 'bonus', 'bps', 'influence', 'creativity',
       'threat', 'ict_index', 'value', 'transfers_balance', 'selected',
       'transfers_in', 'transfers_out', 'position', 'season',
       'opponent_team_name', 'result', 'club_encoded', 'opponent_encoded',
       'position_encoded', 'result_encoded'],
      dtype='object')

In [8]:
# the cols we want to compute rolling averages for 
cols = ["assists", "clean_sheets", "goals_scored", "goals_conceded", "ict_index", "bps", 
        "minutes", "red_cards", "saves", "selected", "transfers_balance", 
        "result_encoded", "value", "total_points"]
# append "_rolling" onto each column name
new_cols = [f"{col}_rolling" for col in cols]

In [9]:
rolling_averages(haaland, cols, new_cols).head(20)

Unnamed: 0,name,club,element,fixture,opponent_team,total_points,was_home,kickoff_time,team_h_score,team_a_score,...,ict_index_rolling,bps_rolling,minutes_rolling,red_cards_rolling,saves_rolling,selected_rolling,transfers_balance_rolling,result_encoded_rolling,value_rolling,total_points_rolling
69802,Erling Haaland,Manchester City,318,17,3,5,True,2022-08-13 14:00:00+00:00,4.0,0.0,...,14.0,48.0,77.0,0.0,0.0,3398599.0,0.0,2.0,115.0,13.0
70381,Erling Haaland,Manchester City,318,28,15,6,False,2022-08-21 15:30:00+00:00,3.0,3.0,...,9.75,32.0,75.0,0.0,0.0,4312434.0,556968.5,2.0,115.5,9.0
70971,Erling Haaland,Manchester City,318,37,7,17,True,2022-08-27 14:00:00+00:00,4.0,2.0,...,10.666667,30.666667,80.0,0.0,0.0,4767158.0,447084.666667,1.666667,116.0,8.0
71570,Erling Haaland,Manchester City,318,49,16,17,True,2022-08-31 18:30:00+00:00,6.0,0.0,...,12.466667,39.0,82.0,0.0,0.0,5623560.0,480293.0,1.666667,116.666667,9.333333
72174,Erling Haaland,Manchester City,318,51,2,9,False,2022-09-03 16:30:00+00:00,1.0,1.0,...,16.933333,60.666667,80.333333,0.0,0.0,6146089.0,317286.0,1.666667,117.333333,13.333333
72768,Erling Haaland,Manchester City,318,80,20,6,False,2022-09-17 11:30:00+00:00,0.0,3.0,...,17.266667,62.0,80.333333,0.0,0.0,6847716.0,502273.333333,1.666667,118.0,14.333333
73233,Erling Haaland,Manchester City,318,88,14,23,True,2022-10-02 13:00:00+00:00,6.0,3.0,...,12.9,47.333333,82.666667,0.0,0.0,7660872.0,514079.0,1.666667,119.0,10.666667
73868,Erling Haaland,Manchester City,318,97,17,6,True,2022-10-08 14:00:00+00:00,4.0,0.0,...,16.766667,54.333333,90.0,0.0,0.0,8245741.0,337170.333333,1.666667,120.0,12.666667
74505,Erling Haaland,Manchester City,318,106,12,2,False,2022-10-16 15:30:00+00:00,1.0,0.0,...,16.533333,51.333333,90.0,0.0,0.0,8620385.0,167291.666667,2.0,121.0,11.666667
75727,Erling Haaland,Manchester City,318,125,5,13,True,2022-10-22 14:00:00+00:00,3.0,1.0,...,19.566667,41.0,90.0,0.0,0.0,8814516.0,135821.666667,1.333333,121.666667,10.333333


In [10]:
player_rolling = data.groupby("name").apply(lambda x: rolling_averages(x, cols, new_cols))

In [11]:
player_rolling

Unnamed: 0_level_0,Unnamed: 1_level_0,name,club,element,fixture,opponent_team,total_points,was_home,kickoff_time,team_h_score,team_a_score,...,ict_index_rolling,bps_rolling,minutes_rolling,red_cards_rolling,saves_rolling,selected_rolling,transfers_balance_rolling,result_encoded_rolling,value_rolling,total_points_rolling
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Aaron Connolly,20124,Aaron Connolly,Brighton & Hove Albion,78,16,14,8,False,2020-09-20 13:00:00+00:00,0.0,3.0,...,3.400000,-3.000000,45.000000,0.0,0.0,32205.000000,0.000000,0.000000,55.0,1.000000
Aaron Connolly,20660,Aaron Connolly,Brighton & Hove Albion,78,19,13,2,True,2020-09-26 11:30:00+00:00,2.0,3.0,...,5.150000,12.000000,67.000000,0.0,0.0,33617.500000,-580.500000,1.000000,55.0,4.500000
Aaron Connolly,21211,Aaron Connolly,Brighton & Hove Albion,78,32,7,2,False,2020-10-03 14:00:00+00:00,4.0,2.0,...,4.066667,8.666667,69.000000,0.0,0.0,40863.666667,4121.666667,0.666667,55.0,3.666667
Aaron Connolly,21770,Aaron Connolly,Brighton & Hove Albion,78,40,6,4,False,2020-10-18 13:00:00+00:00,1.0,1.0,...,3.166667,12.000000,75.666667,0.0,0.0,48503.666667,3684.666667,0.666667,55.0,4.000000
Aaron Connolly,22354,Aaron Connolly,Brighton & Hove Albion,78,51,18,0,True,2020-10-26 17:30:00+00:00,1.0,1.0,...,1.866667,7.333333,50.000000,0.0,0.0,52418.333333,1074.333333,0.333333,55.0,2.666667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Ørjan Nyland,41188,Ørjan Nyland,Aston Villa,35,340,13,0,True,2021-05-09 13:05:00+00:00,1.0,3.0,...,0.000000,0.000000,0.000000,0.0,0.0,281394.333333,-2122.333333,1.000000,40.0,0.000000
Ørjan Nyland,41189,Ørjan Nyland,Aston Villa,35,180,7,0,True,2021-05-13 17:00:00+00:00,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.0,278999.000000,-2346.666667,1.000000,40.0,0.000000
Ørjan Nyland,42080,Ørjan Nyland,Aston Villa,35,355,6,0,False,2021-05-16 11:00:00+00:00,3.0,2.0,...,0.000000,0.000000,0.000000,0.0,0.0,277527.000000,-2471.000000,1.000000,40.0,0.000000
Ørjan Nyland,42690,Ørjan Nyland,Aston Villa,35,367,17,0,False,2021-05-19 17:00:00+00:00,1.0,2.0,...,0.000000,0.000000,0.000000,0.0,0.0,276053.000000,-2426.000000,0.333333,40.0,0.000000


In [12]:
player_rolling = player_rolling.droplevel("name")
player_rolling

Unnamed: 0,name,club,element,fixture,opponent_team,total_points,was_home,kickoff_time,team_h_score,team_a_score,...,ict_index_rolling,bps_rolling,minutes_rolling,red_cards_rolling,saves_rolling,selected_rolling,transfers_balance_rolling,result_encoded_rolling,value_rolling,total_points_rolling
20124,Aaron Connolly,Brighton & Hove Albion,78,16,14,8,False,2020-09-20 13:00:00+00:00,0.0,3.0,...,3.400000,-3.000000,45.000000,0.0,0.0,32205.000000,0.000000,0.000000,55.0,1.000000
20660,Aaron Connolly,Brighton & Hove Albion,78,19,13,2,True,2020-09-26 11:30:00+00:00,2.0,3.0,...,5.150000,12.000000,67.000000,0.0,0.0,33617.500000,-580.500000,1.000000,55.0,4.500000
21211,Aaron Connolly,Brighton & Hove Albion,78,32,7,2,False,2020-10-03 14:00:00+00:00,4.0,2.0,...,4.066667,8.666667,69.000000,0.0,0.0,40863.666667,4121.666667,0.666667,55.0,3.666667
21770,Aaron Connolly,Brighton & Hove Albion,78,40,6,4,False,2020-10-18 13:00:00+00:00,1.0,1.0,...,3.166667,12.000000,75.666667,0.0,0.0,48503.666667,3684.666667,0.666667,55.0,4.000000
22354,Aaron Connolly,Brighton & Hove Albion,78,51,18,0,True,2020-10-26 17:30:00+00:00,1.0,1.0,...,1.866667,7.333333,50.000000,0.0,0.0,52418.333333,1074.333333,0.333333,55.0,2.666667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41188,Ørjan Nyland,Aston Villa,35,340,13,0,True,2021-05-09 13:05:00+00:00,1.0,3.0,...,0.000000,0.000000,0.000000,0.0,0.0,281394.333333,-2122.333333,1.000000,40.0,0.000000
41189,Ørjan Nyland,Aston Villa,35,180,7,0,True,2021-05-13 17:00:00+00:00,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.0,278999.000000,-2346.666667,1.000000,40.0,0.000000
42080,Ørjan Nyland,Aston Villa,35,355,6,0,False,2021-05-16 11:00:00+00:00,3.0,2.0,...,0.000000,0.000000,0.000000,0.0,0.0,277527.000000,-2471.000000,1.000000,40.0,0.000000
42690,Ørjan Nyland,Aston Villa,35,367,17,0,False,2021-05-19 17:00:00+00:00,1.0,2.0,...,0.000000,0.000000,0.000000,0.0,0.0,276053.000000,-2426.000000,0.333333,40.0,0.000000


In [13]:
player_rolling.index = range(player_rolling.shape[0])
player_rolling

Unnamed: 0,name,club,element,fixture,opponent_team,total_points,was_home,kickoff_time,team_h_score,team_a_score,...,ict_index_rolling,bps_rolling,minutes_rolling,red_cards_rolling,saves_rolling,selected_rolling,transfers_balance_rolling,result_encoded_rolling,value_rolling,total_points_rolling
0,Aaron Connolly,Brighton & Hove Albion,78,16,14,8,False,2020-09-20 13:00:00+00:00,0.0,3.0,...,3.400000,-3.000000,45.000000,0.0,0.0,32205.000000,0.000000,0.000000,55.0,1.000000
1,Aaron Connolly,Brighton & Hove Albion,78,19,13,2,True,2020-09-26 11:30:00+00:00,2.0,3.0,...,5.150000,12.000000,67.000000,0.0,0.0,33617.500000,-580.500000,1.000000,55.0,4.500000
2,Aaron Connolly,Brighton & Hove Albion,78,32,7,2,False,2020-10-03 14:00:00+00:00,4.0,2.0,...,4.066667,8.666667,69.000000,0.0,0.0,40863.666667,4121.666667,0.666667,55.0,3.666667
3,Aaron Connolly,Brighton & Hove Albion,78,40,6,4,False,2020-10-18 13:00:00+00:00,1.0,1.0,...,3.166667,12.000000,75.666667,0.0,0.0,48503.666667,3684.666667,0.666667,55.0,4.000000
4,Aaron Connolly,Brighton & Hove Albion,78,51,18,0,True,2020-10-26 17:30:00+00:00,1.0,1.0,...,1.866667,7.333333,50.000000,0.0,0.0,52418.333333,1074.333333,0.333333,55.0,2.666667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
105561,Ørjan Nyland,Aston Villa,35,340,13,0,True,2021-05-09 13:05:00+00:00,1.0,3.0,...,0.000000,0.000000,0.000000,0.0,0.0,281394.333333,-2122.333333,1.000000,40.0,0.000000
105562,Ørjan Nyland,Aston Villa,35,180,7,0,True,2021-05-13 17:00:00+00:00,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.0,278999.000000,-2346.666667,1.000000,40.0,0.000000
105563,Ørjan Nyland,Aston Villa,35,355,6,0,False,2021-05-16 11:00:00+00:00,3.0,2.0,...,0.000000,0.000000,0.000000,0.0,0.0,277527.000000,-2471.000000,1.000000,40.0,0.000000
105564,Ørjan Nyland,Aston Villa,35,367,17,0,False,2021-05-19 17:00:00+00:00,1.0,2.0,...,0.000000,0.000000,0.000000,0.0,0.0,276053.000000,-2426.000000,0.333333,40.0,0.000000


In [14]:
player_rolling = player_rolling.drop_duplicates()
player_rolling

Unnamed: 0,name,club,element,fixture,opponent_team,total_points,was_home,kickoff_time,team_h_score,team_a_score,...,ict_index_rolling,bps_rolling,minutes_rolling,red_cards_rolling,saves_rolling,selected_rolling,transfers_balance_rolling,result_encoded_rolling,value_rolling,total_points_rolling
0,Aaron Connolly,Brighton & Hove Albion,78,16,14,8,False,2020-09-20 13:00:00+00:00,0.0,3.0,...,3.400000,-3.000000,45.000000,0.0,0.0,32205.000000,0.000000,0.000000,55.0,1.000000
1,Aaron Connolly,Brighton & Hove Albion,78,19,13,2,True,2020-09-26 11:30:00+00:00,2.0,3.0,...,5.150000,12.000000,67.000000,0.0,0.0,33617.500000,-580.500000,1.000000,55.0,4.500000
2,Aaron Connolly,Brighton & Hove Albion,78,32,7,2,False,2020-10-03 14:00:00+00:00,4.0,2.0,...,4.066667,8.666667,69.000000,0.0,0.0,40863.666667,4121.666667,0.666667,55.0,3.666667
3,Aaron Connolly,Brighton & Hove Albion,78,40,6,4,False,2020-10-18 13:00:00+00:00,1.0,1.0,...,3.166667,12.000000,75.666667,0.0,0.0,48503.666667,3684.666667,0.666667,55.0,4.000000
4,Aaron Connolly,Brighton & Hove Albion,78,51,18,0,True,2020-10-26 17:30:00+00:00,1.0,1.0,...,1.866667,7.333333,50.000000,0.0,0.0,52418.333333,1074.333333,0.333333,55.0,2.666667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
105561,Ørjan Nyland,Aston Villa,35,340,13,0,True,2021-05-09 13:05:00+00:00,1.0,3.0,...,0.000000,0.000000,0.000000,0.0,0.0,281394.333333,-2122.333333,1.000000,40.0,0.000000
105562,Ørjan Nyland,Aston Villa,35,180,7,0,True,2021-05-13 17:00:00+00:00,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.0,278999.000000,-2346.666667,1.000000,40.0,0.000000
105563,Ørjan Nyland,Aston Villa,35,355,6,0,False,2021-05-16 11:00:00+00:00,3.0,2.0,...,0.000000,0.000000,0.000000,0.0,0.0,277527.000000,-2471.000000,1.000000,40.0,0.000000
105564,Ørjan Nyland,Aston Villa,35,367,17,0,False,2021-05-19 17:00:00+00:00,1.0,2.0,...,0.000000,0.000000,0.000000,0.0,0.0,276053.000000,-2426.000000,0.333333,40.0,0.000000


In [15]:
player_rolling.to_csv("player_rolling_final.csv", index=False)

In [16]:
def make_predictions(data, predictors): 
    # split data
    train = data[data["date"] < "2023-01-01"]
    test = train = data[data["date"] > "2023-01-01"]
    # fit random forest model
    rf.fit(train[predictors], train["targets"])
    # create predictions
    preds = rf.predict(test[predictors])
    # combine predictions with actual results
    combined = pd.DataFrame(dict(actual = test["targets"], prediction = preds), index = test.index)
    # calculate precision
    precision = precision_score(test["targets"], preds)
    return combined, precision

In [17]:
player_rolling.columns

Index(['name', 'club', 'element', 'fixture', 'opponent_team', 'total_points',
       'was_home', 'kickoff_time', 'team_h_score', 'team_a_score', 'round',
       'minutes', 'goals_scored', 'assists', 'clean_sheets', 'goals_conceded',
       'own_goals', 'penalties_saved', 'penalties_missed', 'yellow_cards',
       'red_cards', 'saves', 'bonus', 'bps', 'influence', 'creativity',
       'threat', 'ict_index', 'value', 'transfers_balance', 'selected',
       'transfers_in', 'transfers_out', 'position', 'season',
       'opponent_team_name', 'result', 'club_encoded', 'opponent_encoded',
       'position_encoded', 'result_encoded', 'assists_rolling',
       'clean_sheets_rolling', 'goals_scored_rolling',
       'goals_conceded_rolling', 'ict_index_rolling', 'bps_rolling',
       'minutes_rolling', 'red_cards_rolling', 'saves_rolling',
       'selected_rolling', 'transfers_balance_rolling',
       'result_encoded_rolling', 'value_rolling', 'total_points_rolling'],
      dtype='object')

In [18]:
features = ["position_encoded", "club_encoded", "was_home", "opponent_encoded"] + new_cols
target = ["total_points"]

In [19]:
features

['position_encoded',
 'club_encoded',
 'was_home',
 'opponent_encoded',
 'assists_rolling',
 'clean_sheets_rolling',
 'goals_scored_rolling',
 'goals_conceded_rolling',
 'ict_index_rolling',
 'bps_rolling',
 'minutes_rolling',
 'red_cards_rolling',
 'saves_rolling',
 'selected_rolling',
 'transfers_balance_rolling',
 'result_encoded_rolling',
 'value_rolling',
 'total_points_rolling']

In [20]:
# splitting the dataset into training, testing, and validation sets
seasons = np.unique(player_rolling["season"])
train_seasons = seasons[:4]
test_seasons = seasons[4:]
train = player_rolling[player_rolling["season"].isin(train_seasons)]
test = player_rolling[player_rolling["season"].isin(test_seasons)]
train_seasons, test_seasons, train.shape, test.shape

(array(['2016/2017', '2017/2018', '2020/2021', '2021/2022'], dtype=object),
 array(['2022/2023', '2023/2024'], dtype=object),
 (67168, 55),
 (38398, 55))

In [21]:
rf_reg = RandomForestRegressor(max_depth=10, min_samples_leaf=12, min_samples_split=30, n_estimators=600, n_jobs=-1, random_state=21)
rf_reg.fit(train[features], np.ravel(train[target]))

In [22]:
predictions = rf_reg.predict(test[features])

In [23]:
mae = mean_absolute_error(test[target], predictions)
mse = mean_squared_error(test[target], predictions)
rmse = mean_squared_error(test[target], predictions, squared=False)  # Pass squared=False to get RMSE
r2 = r2_score(test[target], predictions)
explained_variance = explained_variance_score(test[target], predictions)

print(f'MAE: {mae}')
print(f'MSE: {mse}')
print(f'RMSE: {rmse}')
print(f'R2 Score: {r2}')
print(f'Explained Variance: {explained_variance}')

MAE: 1.0415005594073288
MSE: 3.8733586128669972
RMSE: 1.968085011595535
R2 Score: 0.2937093937261521
Explained Variance: 0.29375751060741095


In [24]:
predictions

array([0.09743383, 0.09761317, 0.10202158, ..., 2.19249591, 1.86768639,
       1.01766534])

In [116]:
with open("secondary_rf_reg_model.pickle", "wb") as output:
    pickle.dump(rf_reg, output)

In [25]:
importances = dict(zip(features, rf_reg.feature_importances_))
sorted(importances.items(), key=lambda x:x[1], reverse=True)

[('minutes_rolling', 0.43815972922700475),
 ('ict_index_rolling', 0.22112648333705545),
 ('value_rolling', 0.08667050261902932),
 ('total_points_rolling', 0.062035681366624906),
 ('selected_rolling', 0.04910096804026523),
 ('transfers_balance_rolling', 0.035139627352461913),
 ('opponent_encoded', 0.02886475880528642),
 ('bps_rolling', 0.02323929960650149),
 ('club_encoded', 0.01625004371738621),
 ('result_encoded_rolling', 0.00913422525027533),
 ('goals_conceded_rolling', 0.007904847131841973),
 ('was_home', 0.0061767239851049155),
 ('saves_rolling', 0.005503704702499795),
 ('position_encoded', 0.0037833102750960005),
 ('clean_sheets_rolling', 0.0030571205859865167),
 ('assists_rolling', 0.002259111654604019),
 ('goals_scored_rolling', 0.0014732807167991062),
 ('red_cards_rolling', 0.00012058162617679816)]