<div class="alert alert-danger">
    <h4 style="font-weight: bold; font-size: 28px;">Feature Selection using stepwise algorithms</h4>
    <p style="font-size: 20px;">NBA API Data (2022-2024)</p>
</div>

<a name="Feature-Selection"></a>

# Table of Contents

[Setup](#Setup)

[Data](#Data)

**[1. Stepwise for Total Points](#1.-Stepwise-for-Total_Points)**

**[2. Stepwise for Plus Minus](#2.-Stepwise-for-Plus_Minus)**

**[3. Stepwise for Game Winner](#3.-Stepwise-for-Game-Winner)**

# Setup

[Return to top](#Feature-Engineering)

In [1]:
import sys
from pathlib import Path
# get current working directory
cwd = %pwd
# add shared_code directory to Python sys.path
sys.path.append(str(Path(cwd).parent / "shared_code"))
# import all libraries in shared_code directory 'imports.py' file
from imports import *
%matplotlib inline

IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html


# Data

[Return to top](#Feature-Engineering)

In [2]:
# load, filter (by time) and scale data
pts_scaled_df, pm_scaled_df, res_scaled_df, test_set_obs = utl.load_and_scale_data(
    file_path='../../data/processed/nba_team_matchups_rolling_box_scores_2022_2024_r05.csv',
    seasons_to_keep=['2021-22', '2022-23', '2023-24'],
    training_season='2021-22',
    feature_prefix='ROLL_',
    scaler_type='minmax', 
    scale_target=False
)

Season 2021-22: 1186 games
Season 2022-23: 1181 games
Season 2023-24: 692 games
Total number of games across sampled seasons: 3059 games


<a name="1.-Stepwise-for-Total-Points"></a>
# 1. Stepwise for Total Points

[Return to top](#Feature-Selection)

In [3]:
# automated feature selection
pts_scaled_df_selected = utl.sequential_feature_selection(
    df=pts_scaled_df, 
    outcome_name='TOTAL_PTS', 
    estimator=LinearRegression(), 
    forward=True
)

pts_scaled_df_selected.head()

There were 9 features selected out of 36 original features



Unnamed: 0_level_0,ROLL_HOME_PTS,ROLL_HOME_FG3M,ROLL_HOME_FTA,ROLL_HOME_REB,ROLL_HOME_AST,ROLL_HOME_PF,ROLL_AWAY_PTS,ROLL_AWAY_FG3_PCT,ROLL_AWAY_DREB,TOTAL_PTS
GAME_DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2021-10-23,0.745,0.758,0.878,0.478,0.612,0.661,0.577,1.0,0.369,185
2021-10-23,0.0,0.076,0.534,0.826,0.0,0.576,0.096,0.364,0.685,198
2021-10-23,0.691,0.455,0.534,0.609,0.561,0.661,0.635,0.396,0.685,239
2021-10-23,0.727,0.53,0.382,0.348,0.918,0.661,0.25,0.317,0.369,232
2021-10-24,0.745,0.833,0.229,0.826,0.765,0.322,1.0,0.559,0.73,204


<a name="2.-Stepwise-for-Plus_Minus"></a>
# 2. Stepwise for Plus Minus

[Return to top](#Feature-Selection)

In [4]:
# automated feature selection
pm_scaled_df_selected = utl.sequential_feature_selection(
    df=pm_scaled_df, 
    outcome_name='PLUS_MINUS', 
    estimator=LinearRegression(), 
    forward=True
)

pm_scaled_df_selected.head()

There were 9 features selected out of 36 original features



Unnamed: 0_level_0,ROLL_HOME_FGA,ROLL_HOME_FG_PCT,ROLL_HOME_FG3A,ROLL_HOME_FTA,ROLL_HOME_DREB,ROLL_HOME_REB,ROLL_AWAY_FGM,ROLL_AWAY_FT_PCT,ROLL_AWAY_AST,PLUS_MINUS
GAME_DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2021-10-23,0.296,0.753,0.58,0.878,0.292,0.478,0.586,0.603,0.5,7.0
2021-10-23,0.648,0.0,0.412,0.534,0.381,0.826,0.017,0.837,0.083,-8.0
2021-10-23,0.507,0.758,0.454,0.534,0.602,0.609,0.586,0.469,0.708,29.0
2021-10-23,0.683,0.827,0.244,0.382,0.159,0.348,0.069,0.268,0.208,-10.0
2021-10-24,0.577,0.848,0.58,0.229,0.779,0.826,0.897,0.446,0.833,-10.0


<a name="3.-Stepwise-for-Game-Winner"></a>
# 3. Stepwise for Game Winner

[Return to top](#Feature-Selection)

In [5]:
# automated feature selection
res_scaled_df_selected = utl.sequential_feature_selection(
    df=res_scaled_df, 
    outcome_name='GAME_RESULT', 
    estimator=LinearRegression(), 
    forward=True
)
 
res_scaled_df_selected.head()

There were 1 features selected out of 36 original features



Unnamed: 0_level_0,ROLL_HOME_PTS,GAME_RESULT
GAME_DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2021-10-23,0.745,1
2021-10-23,0.0,0
2021-10-23,0.691,1
2021-10-23,0.727,0
2021-10-24,0.745,0
