<div class="alert alert-danger">
    <h4 style="font-weight: bold; font-size: 28px;">Feature Selection using stepwise algorithms</h4>
    <p style="font-size: 20px;">NBA API Data (2022-2024)</p>
</div>

<a name="Feature-Selection"></a>

# Table of Contents

[Setup](#Setup)

[Data](#Data)

**[1. Stepwise for Total Points](#1.-Stepwise-for-Total_Points)**

**[2. Stepwise for Plus Minus](#2.-Stepwise-for-Plus_Minus)**

**[3. Stepwise for Game Winner](#3.-Stepwise-for-Game-Winner)**

# Setup

[Return to top](#Feature-Engineering)

In [1]:
import sys
from pathlib import Path
# get current working directory
cwd = %pwd
# add shared_code directory to Python sys.path
sys.path.append(str(Path(cwd).parent / "shared_code"))
# import all libraries in shared_code directory 'imports.py' file
from imports import *
%matplotlib inline

IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html


# Data

[Return to top](#Feature-Engineering)

In [2]:
# load, filter (by time) and scale data
pts_scaled_df, pm_scaled_df, res_scaled_df, test_set_obs = utl.load_and_scale_data(
    file_path='../../data/processed/nba_team_matchups_rolling_box_scores_2022_2024_r05.csv',
    seasons_to_keep=['2021-22', '2022-23', '2023-24'],
    training_season='2021-22',
    feature_prefix='ROLL_',
    scaler_type='minmax', 
    scale_target=False
)

Season 2021-22: 1186 games
Season 2022-23: 1181 games
Season 2023-24: 692 games
Total number of games across sampled seasons: 3059 games


<a name="1.-Stepwise-for-Total-Points"></a>
# 1. Stepwise for Total Points

[Return to top](#Feature-Selection)

In [3]:
# automated feature selection and preprocessing
pts_scaled_df_selected = utl.vtreat_feature_selection(
    df=pts_scaled_df,
    outcome_name='TOTAL_PTS'
)

# how many features were selected?
print(f"There were {pts_scaled_df_selected.shape[1]-1} features selected out of {pts_scaled_df.shape[1]-1} original features\n") 
pts_scaled_df_selected.head()

There were 18 features selected out of 36 original features



Unnamed: 0,ROLL_AWAY_FTM,ROLL_AWAY_PTS,ROLL_AWAY_DREB,ROLL_HOME_PTS,ROLL_HOME_FG_PCT,ROLL_HOME_FTA,ROLL_AWAY_AST,ROLL_HOME_FGM,ROLL_HOME_FG3_PCT,ROLL_HOME_AST,ROLL_AWAY_FTA,ROLL_AWAY_FGM,ROLL_HOME_PF,ROLL_HOME_FTM,ROLL_HOME_FG3A,ROLL_AWAY_TOV,ROLL_AWAY_FGA,ROLL_AWAY_FG_PCT,TOTAL_PTS
0,0.336,0.577,0.369,0.745,0.753,0.878,0.5,0.522,0.731,0.612,0.285,0.586,0.661,0.805,0.58,0.391,0.202,0.704,185
1,0.294,0.096,0.685,0.0,0.0,0.534,0.083,0.0,0.0,0.0,0.163,0.017,0.576,0.466,0.412,0.348,0.362,0.0,198
2,0.672,0.635,0.685,0.691,0.758,0.534,0.708,0.652,0.466,0.561,0.772,0.586,0.661,0.593,0.454,0.174,0.176,0.728,239
3,0.588,0.25,0.369,0.727,0.827,0.382,0.208,0.826,0.772,0.918,0.813,0.069,0.661,0.297,0.244,0.348,0.122,0.225,232
4,0.504,1.0,0.73,0.745,0.848,0.229,0.833,0.783,0.82,0.765,0.569,0.897,0.322,0.254,0.58,0.478,1.0,0.362,204


<a name="2.-Stepwise-for-Plus_Minus"></a>
# 2. Stepwise for Plus Minus

[Return to top](#Feature-Selection)

In [4]:
# automated feature selection and preprocessing
pm_scaled_df_selected = utl.vtreat_feature_selection(
    df=pm_scaled_df,
    outcome_name='PLUS_MINUS'
)

# how many features were selected?
print(f"There were {pm_scaled_df_selected.shape[1]-1} features selected out of {pm_scaled_df.shape[1]-1} original features\n") 
pm_scaled_df_selected.head()

There were 14 features selected out of 36 original features



Unnamed: 0,ROLL_AWAY_PTS,ROLL_AWAY_DREB,ROLL_HOME_PTS,ROLL_HOME_DREB,ROLL_HOME_FG_PCT,ROLL_AWAY_AST,ROLL_HOME_FGM,ROLL_HOME_FG3_PCT,ROLL_AWAY_FT_PCT,ROLL_HOME_AST,ROLL_AWAY_FGM,ROLL_HOME_FG3A,ROLL_HOME_FG3M,ROLL_AWAY_FG_PCT,PLUS_MINUS
0,0.577,0.369,0.745,0.292,0.753,0.5,0.522,0.731,0.603,0.612,0.586,0.58,0.758,0.704,7.0
1,0.096,0.685,0.0,0.381,0.0,0.083,0.0,0.0,0.837,0.0,0.017,0.412,0.076,0.0,-8.0
2,0.635,0.685,0.691,0.602,0.758,0.708,0.652,0.466,0.469,0.561,0.586,0.454,0.455,0.728,29.0
3,0.25,0.369,0.727,0.159,0.827,0.208,0.826,0.772,0.268,0.918,0.069,0.244,0.53,0.225,-10.0
4,1.0,0.73,0.745,0.779,0.848,0.833,0.783,0.82,0.446,0.765,0.897,0.58,0.833,0.362,-10.0


<a name="3.-Stepwise-for-Game-Winner"></a>
# 3. Stepwise for Game Winner

[Return to top](#Feature-Selection)

In [5]:
# automated feature selection and preprocessing
res_scaled_df_selected = utl.vtreat_feature_selection(
    df=res_scaled_df,
    outcome_name='GAME_RESULT'
)

# how many features were selected?
print(f"There were {res_scaled_df_selected.shape[1]-1} features selected out of {res_scaled_df.shape[1]-1} original features\n") 
res_scaled_df_selected.head()

There were 12 features selected out of 36 original features



Unnamed: 0,ROLL_AWAY_PTS,ROLL_HOME_PTS,ROLL_HOME_DREB,ROLL_HOME_FG_PCT,ROLL_AWAY_AST,ROLL_HOME_FGM,ROLL_AWAY_FT_PCT,ROLL_AWAY_FGM,ROLL_AWAY_TOV,ROLL_HOME_FG3M,ROLL_HOME_REB,ROLL_AWAY_STL,GAME_RESULT
0,0.577,0.745,0.292,0.753,0.5,0.522,0.603,0.586,0.391,0.758,0.478,0.28,1
1,0.096,0.0,0.381,0.0,0.083,0.0,0.837,0.017,0.348,0.076,0.826,0.28,0
2,0.635,0.691,0.602,0.758,0.708,0.652,0.469,0.586,0.174,0.455,0.609,0.36,1
3,0.25,0.727,0.159,0.827,0.208,0.826,0.268,0.069,0.348,0.53,0.348,0.2,0
4,1.0,0.745,0.779,0.848,0.833,0.783,0.446,0.897,0.478,0.833,0.826,0.76,0
