<div class="alert alert-danger">
    <h4 style="font-weight: bold; font-size: 28px;">Feature Selection using vtreat</h4>
    <p style="font-size: 20px;">NBA API Data (2022-2024)</p>
</div>

<a name="Feature-Selection"></a>

# Table of Contents

[Setup](#Setup)

[Data](#Data)

**[1. vtreat for Total Points](#1.-vtreat-for-Total_Points)**

**[2. vtreat for Plus Minus](#2.-vtreat-for-Plus_Minus)**

**[3. vtreat for Game Winner](#3.-vtreat-for-Game-Winner)**

# Setup

[Return to top](#Feature-Engineering)

In [1]:
import sys
from pathlib import Path
# get current working directory
cwd = %pwd
# add shared_code directory to Python sys.path
sys.path.append(str(Path(cwd).parent / "shared_code"))
# import all libraries in shared_code directory 'imports.py' file
from imports import *
%matplotlib inline

IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html


# Data

[Return to top](#Feature-Engineering)

In [2]:
# load, filter (by time) and scale data
pts_scaled_df, pm_scaled_df, res_scaled_df, test_set_obs = utl.load_and_scale_data(
    file_path='../../data/processed/nba_team_matchups_rolling_box_scores_2022_2024_r05.csv',
    seasons_to_keep=['2021-22', '2022-23', '2023-24'],
    training_season='2021-22',
    feature_prefix='ROLL_',
    scaler_type='minmax', 
    scale_target=False
)

Season 2021-22: 1186 games
Season 2022-23: 1181 games
Season 2023-24: 692 games
Total number of games across sampled seasons: 3059 games


<a name="1.-vtreat-for-Total-Points"></a>
# 1. vtreat for Total Points

[Return to top](#Feature-Selection)

In [3]:
# automated feature selection and preprocessing
pts_scaled_df_selected = utl.vtreat_feature_selection(
    df=pts_scaled_df,
    outcome_name='TOTAL_PTS'
)
 
pts_scaled_df_selected.head()

There were 18 features selected out of 36 original features



Unnamed: 0,ROLL_AWAY_AST,ROLL_AWAY_FGA,ROLL_AWAY_FG_PCT,ROLL_HOME_FG3A,ROLL_HOME_FG_PCT,ROLL_AWAY_TOV,ROLL_AWAY_FTA,ROLL_HOME_FG3_PCT,ROLL_AWAY_DREB,ROLL_HOME_PTS,ROLL_HOME_PF,ROLL_HOME_FTA,ROLL_AWAY_FTM,ROLL_AWAY_FGM,ROLL_HOME_AST,ROLL_HOME_FTM,ROLL_HOME_FGM,ROLL_AWAY_PTS,TOTAL_PTS
0,0.5,0.202,0.704,0.58,0.753,0.391,0.285,0.731,0.369,0.745,0.661,0.878,0.336,0.586,0.612,0.805,0.522,0.577,185
1,0.083,0.362,0.0,0.412,0.0,0.348,0.163,0.0,0.685,0.0,0.576,0.534,0.294,0.017,0.0,0.466,0.0,0.096,198
2,0.708,0.176,0.728,0.454,0.758,0.174,0.772,0.466,0.685,0.691,0.661,0.534,0.672,0.586,0.561,0.593,0.652,0.635,239
3,0.208,0.122,0.225,0.244,0.827,0.348,0.813,0.772,0.369,0.727,0.661,0.382,0.588,0.069,0.918,0.297,0.826,0.25,232
4,0.833,1.0,0.362,0.58,0.848,0.478,0.569,0.82,0.73,0.745,0.322,0.229,0.504,0.897,0.765,0.254,0.783,1.0,204


<a name="2.-vtreat-for-Plus_Minus"></a>
# 2. vtreat for Plus Minus

[Return to top](#Feature-Selection)

In [4]:
# automated feature selection and preprocessing
pm_scaled_df_selected = utl.vtreat_feature_selection(
    df=pm_scaled_df,
    outcome_name='PLUS_MINUS'
)

pm_scaled_df_selected.head()

There were 14 features selected out of 36 original features



Unnamed: 0,ROLL_AWAY_AST,ROLL_HOME_FG3M,ROLL_AWAY_FG_PCT,ROLL_HOME_FG3A,ROLL_HOME_FG_PCT,ROLL_HOME_FG3_PCT,ROLL_AWAY_FT_PCT,ROLL_AWAY_DREB,ROLL_HOME_PTS,ROLL_AWAY_FGM,ROLL_HOME_AST,ROLL_HOME_FGM,ROLL_HOME_DREB,ROLL_AWAY_PTS,PLUS_MINUS
0,0.5,0.758,0.704,0.58,0.753,0.731,0.603,0.369,0.745,0.586,0.612,0.522,0.292,0.577,7.0
1,0.083,0.076,0.0,0.412,0.0,0.0,0.837,0.685,0.0,0.017,0.0,0.0,0.381,0.096,-8.0
2,0.708,0.455,0.728,0.454,0.758,0.466,0.469,0.685,0.691,0.586,0.561,0.652,0.602,0.635,29.0
3,0.208,0.53,0.225,0.244,0.827,0.772,0.268,0.369,0.727,0.069,0.918,0.826,0.159,0.25,-10.0
4,0.833,0.833,0.362,0.58,0.848,0.82,0.446,0.73,0.745,0.897,0.765,0.783,0.779,1.0,-10.0


<a name="3.-vtreat-for-Game-Winner"></a>
# 3. vtreat for Game Winner

[Return to top](#Feature-Selection)

In [5]:
# automated feature selection and preprocessing
res_scaled_df_selected = utl.vtreat_feature_selection(
    df=res_scaled_df,
    outcome_name='GAME_RESULT'
)

res_scaled_df_selected.head()

There were 12 features selected out of 36 original features



Unnamed: 0,ROLL_HOME_REB,ROLL_AWAY_AST,ROLL_HOME_FG3M,ROLL_HOME_FG_PCT,ROLL_AWAY_TOV,ROLL_AWAY_FT_PCT,ROLL_AWAY_STL,ROLL_HOME_PTS,ROLL_AWAY_FGM,ROLL_HOME_FGM,ROLL_HOME_DREB,ROLL_AWAY_PTS,GAME_RESULT
0,0.478,0.5,0.758,0.753,0.391,0.603,0.28,0.745,0.586,0.522,0.292,0.577,1
1,0.826,0.083,0.076,0.0,0.348,0.837,0.28,0.0,0.017,0.0,0.381,0.096,0
2,0.609,0.708,0.455,0.758,0.174,0.469,0.36,0.691,0.586,0.652,0.602,0.635,1
3,0.348,0.208,0.53,0.827,0.348,0.268,0.2,0.727,0.069,0.826,0.159,0.25,0
4,0.826,0.833,0.833,0.848,0.478,0.446,0.76,0.745,0.897,0.783,0.779,1.0,0
