# NBA Game Data Enrichment

This notebook focuses on enriching our NBA game data with additional features that will help improve our prediction models.


## 1. Import Libraries
Import all necessary libraries.

In [1]:
import pandas as pd
import numpy as np
from nba_api.stats.endpoints import leaguegamefinder, playergamelog, teamgamelog
from nba_api.stats.static import teams
import requests
from datetime import datetime, timedelta
import time
from tqdm import tqdm

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

## 2. Load Raw Game Data
Load the raw NBA game data.

In [2]:
games_df = pd.read_csv('../data/raw/games.csv')
print(f'Loaded {len(games_df)} games')
games_df.head()

Loaded 5585 games


Unnamed: 0,SEASON_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,PTS,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PLUS_MINUS
0,12022,1610612746,LAC,LA Clippers,12200002,2022-09-30,LAC vs. MRA,W,242,121,39,77,0.506,13,29,0.448,30,43,0.698,15,40,55,27,9,6,23,23,44.2
1,12022,1610612764,WAS,Washington Wizards,12200001,2022-09-30,WAS vs. GSW,L,240,87,31,84,0.369,6,35,0.171,19,30,0.633,7,37,44,20,12,10,14,27,-9.0
2,12022,50009,MRA,Ra'anana Maccabi Ra'anana,12200002,2022-09-30,MRA @ LAC,L,240,81,28,85,0.329,3,16,0.188,22,31,0.71,13,25,38,16,14,5,16,34,-43.2
3,12022,1610612744,GSW,Golden State Warriors,12200001,2022-09-30,GSW @ WAS,W,239,96,29,79,0.367,7,26,0.269,31,40,0.775,10,52,62,18,6,3,17,27,9.0
4,12022,1610612763,MEM,Memphis Grizzlies,12200003,2022-10-01,MEM @ MIL,W,239,107,38,77,0.494,10,28,0.357,21,29,0.724,10,31,41,25,13,4,22,19,5.0


## 3. Calculate Rest Days
Add a column for rest days between games for each team.

In [3]:
games_df = games_df.sort_values(['TEAM_ID', 'GAME_DATE'])
games_df['GAME_DATE'] = pd.to_datetime(games_df['GAME_DATE'])
games_df['PREV_GAME_DATE'] = games_df.groupby('TEAM_ID')['GAME_DATE'].shift(1)
games_df['REST_DAYS'] = (games_df['GAME_DATE'] - games_df['PREV_GAME_DATE']).dt.days
print('Rest days calculated.')

Rest days calculated.


## 4. Add Back-to-Back Game Indicator
Add a column indicating if a game is part of a back-to-back.

In [4]:
games_df['IS_BACK_TO_BACK'] = games_df['REST_DAYS'] == 1
print('Back-to-back indicator added.')

Back-to-back indicator added.


## 5. Calculate Rolling Team Performance Metrics
Add rolling averages for team performance over the last 5 and 10 games.

In [5]:
for window in [5, 10]:
    games_df[f'ROLL_WIN_PCT_{window}'] = games_df.groupby('TEAM_ID')['WIN'].rolling(window, min_periods=1).mean().reset_index(0, drop=True)
print('Rolling win percentages calculated.')

KeyError: 'Column not found: WIN'

## 6. Save Enriched Data
Save the enriched data for the next step.

In [6]:
games_df.to_csv('../data/enriched/enriched_games.csv', index=False)
print('Enriched data saved to ../data/enriched/enriched_games.csv')

Enriched data saved to ../data/enriched/enriched_games.csv
