# Feature Engineering

Turning game stats into predictive signals for **future** wins/losses\

**Why?**

Every feature must be based only on past games.

- Rolling averages

- Season-to-date stats

- NO Same-game stats

# Load Data

In [1]:
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd

df = pd.read_csv(
    "/content/drive/My Drive/Colab Notebooks/Basketball Projected Wins/nba_games_2023_24.csv",
    parse_dates=['GAME_DATE']
)

df = df.sort_values(['TEAM_ID', 'GAME_DATE']) #sort values

df.head()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Unnamed: 0,GAME_ID,GAME_DATE,TEAM_ID,TEAM_ABBREVIATION,MATCHUP,WL,PTS,REB,OREB,DREB,AST,TOV,FGA,FGM,FTA,FTM,HOME,WIN,PTS_ALLOWED
0,22300063,2023-10-25,1610612737,ATL,ATL @ CHA,L,110,42,12,30,24,12,93,39,33,27,0,0,116.0
1,22300079,2023-10-27,1610612737,ATL,ATL vs. NYK,L,120,44,9,35,28,14,87,42,30,24,1,0,126.0
2,22300097,2023-10-29,1610612737,ATL,ATL @ MIL,W,127,46,13,33,32,17,93,47,22,18,0,1,110.0
3,22300104,2023-10-30,1610612737,ATL,ATL vs. MIN,W,127,36,4,32,28,11,86,48,18,17,1,1,113.0
4,22300117,2023-11-01,1610612737,ATL,ATL vs. WAS,W,130,57,14,43,26,21,92,46,32,29,1,1,121.0


# Create Rolling Team Performance Features

Start with rolling averages:

- Use last 5 and last 10 games

Stats to roll:

- PTS

- PTS_ALLOWED

- REB, OREB, DREB

- AST

- TOV

- FGA, FGM

- FTA, FTM

In [2]:
rolling_cols = ['PTS', 'PTS_ALLOWED', 'REB', 'OREB', 'DREB', 'AST', 'TOV']

for col in rolling_cols:
    df[f'{col}_L5'] = (
        df.groupby('TEAM_ID')[col]
        .shift(1)
        .rolling(5) #5 day rolling
        .mean()
    )


# Season to Date Strength

Use for:
- PTS

- PTS_ALLOWED

- AST

- REB

- TOV

In [4]:
# PTS
df['PTS_SEASON_AVG'] = (
    df.groupby('TEAM_ID')['PTS']
      .transform(lambda x: x.shift(1).expanding().mean())
)

# PTS_ALLOWED
df['PTS_ALLOWED_AVG'] = (
    df.groupby('TEAM_ID')['PTS_ALLOWED']
      .transform(lambda x: x.shift(1).expanding().mean())
)

# AST
df['AST_AVG'] = (
    df.groupby('TEAM_ID')['AST']
      .transform(lambda x: x.shift(1).expanding().mean())
)

# REB
df['REB_AVG'] = (
    df.groupby('TEAM_ID')['REB']
      .transform(lambda x: x.shift(1).expanding().mean())
)

# TOV
df['TOV_AVG'] = (
    df.groupby('TEAM_ID')['TOV']
      .transform(lambda x: x.shift(1).expanding().mean())
)

In [5]:
# Check
df[df['TEAM_ABBREVIATION'] == 'BOS'][[
    'GAME_DATE', 'PTS', 'PTS_SEASON_AVG'
]].head(10)


Unnamed: 0,GAME_DATE,PTS,PTS_SEASON_AVG
82,2023-10-25,108,
83,2023-10-27,119,108.0
84,2023-10-30,126,113.5
85,2023-11-01,155,117.666667
86,2023-11-04,124,127.0
87,2023-11-06,109,126.4
88,2023-11-08,103,123.5
89,2023-11-10,121,120.571429
90,2023-11-11,117,120.625
91,2023-11-13,114,120.222222


# Create Winning Momentum
last 5 days

In [6]:
df['WIN_L5'] = (
    df.groupby('TEAM_ID')['WIN']
    .shift(1)
    .rolling(5)
    .mean()
)


# Opponent Strength



In [7]:
df['OPP_TEAM'] = df['MATCHUP'].str[-3:]


# Drop Missing Values


In [8]:
df_model = df.dropna()


# Create Rolling Features

Choose Window Size: L5 (last 5 games) for:

- PTS

- PTS_ALLOWED

- REB

- AST

- TOV

In [9]:
#sort for certainty
df = df.sort_values(['TEAM_ID', 'GAME_DATE']).reset_index(drop=True)


In [10]:
rolling_stats = ['PTS', 'PTS_ALLOWED', 'REB', 'AST', 'TOV']

for col in rolling_stats:
    df[f'{col}_L5'] = (
        df.groupby('TEAM_ID')[col]
          .shift(1)
          .rolling(5)
          .mean()
    )


roll rebounds in detail

In [11]:
for col in ['OREB', 'DREB']:
    df[f'{col}_L5'] = (
        df.groupby('TEAM_ID')[col]
          .shift(1)
          .rolling(5)
          .mean()
    )


# Rolling stats for shooting volume

In [12]:
for col in ['FGA', 'FGM', 'FTA', 'FTM']:
    df[f'{col}_L5'] = (
        df.groupby('TEAM_ID')[col]
          .shift(1)
          .rolling(5)
          .mean()
    )


# Rolling Win Momentum

In [13]:
df['WIN_L5'] = (
    df.groupby('TEAM_ID')['WIN']
      .shift(1)
      .rolling(5)
      .mean()
)


In [14]:
# Check our work
df[df['TEAM_ABBREVIATION'] == 'BOS'][[
    'GAME_DATE',
    'PTS',
    'PTS_L5',
    'WIN',
    'WIN_L5'
]].head(12)


Unnamed: 0,GAME_DATE,PTS,PTS_L5,WIN,WIN_L5
82,2023-10-25,108,,1,
83,2023-10-27,119,,1,
84,2023-10-30,126,,1,
85,2023-11-01,155,,1,
86,2023-11-04,124,,1,
87,2023-11-06,109,126.4,0,1.0
88,2023-11-08,103,126.6,0,0.8
89,2023-11-10,121,123.4,1,0.6
90,2023-11-11,117,122.4,1,0.6
91,2023-11-13,114,114.8,1,0.6


# Drop null rows created

In [15]:
df_model = df.dropna().reset_index(drop=True)


# Save results in new prediciton dataframe

In [16]:
df_model.to_csv(
    "/content/drive/My Drive/Colab Notebooks/Basketball Projected Wins/nba_win_prediction_features_2023_24.csv",
    index=False
)