
Use Python and the NBA API to develop advanced machine learning model that predicts player performance metrics in upcoming game


<h3 style="color:black;font-family:'Segoe UI Variable Display';font-size:20px;text-shadow:0.125px 0.25px 0.25px black;margin:0;font-weight:300;line-height:1;">Part 2.0</h3>
<h3 style="color:black;font-family:'Notes from Paris';font-size:20px;text-shadow:0.125px 0.25px 0.25px black;margin:0;line-height:1;">Part 2.0</h3>
<h3 style="color:black;font-family:'Juicy Advice Outline';font-size:20px;text-shadow:0.125px 0.25px 0.25px black;margin:0;">Part 2.0</h3>
<h3 style="color:black;font-family:'Mencken Std';font-size:20px;text-shadow:0.125px 0.25px 0.25px black;line-height:1;margin:0;">Part 2.0</h3>
<h3 style="color:black;font-family:'Digital-7';font-size:20px;text-shadow:0.125px 0.25px 0.25px black;line-height:1;margin:0;">Part 2.0</h3>
<h3 style="color:black;font-family:'Proxima Nova';font-size:20px;text-shadow:0.125px 0.25px 0.25px black;line-height:1;margin:0;">Part 2.0</h3>
<h3 style="color:black;font-family:'Barlow Condensed';font-size:20px;text-shadow:0.125px 0.25px 0.25px black;line-height:1;margin:0;">Part 2.0</h3>


<h3 style="color:black;font-family:'Lazy Crunch';font-size:20px;text-shadow:0.125px 0.25px 0.25px black;line-height:1;margin:0;">Part 2.0</h3>
<h3 style="color:black;font-family:'Abril Display';font-size:20px;text-shadow:0.125px 0.25px 0.25px black;margin:0;">Part 2.0</h3>



In [1]:
# main.py
import pandas as pd

from src.data_ingestion import (
    fetch_bulk_player_game_logs,
    get_player_game_logs,
    get_player_advanced_stats_parallel,
    get_opponent_stats      # ← add this line
)
from src.feature_engineering import feature_engineering_pipeline
from src.model_training import prepare_data, train_models, evaluate_model, DEFAULT_FEATURE_COLS
from src.utils import get_player_id, get_team_abbreviation_id_mapping
from src.aggregators import (
    add_opponent_position_allowed_pts,
    add_team_vs_opponent_allowed_pts
)
from src.data_ingestion import get_opponent_stats_last10
from src.aggregators import compute_position_allowed_pts, add_player_position



# --- Configuration ---
PLAYER_NAMES = [
    "Nikola Jokić", "Shai Gilgeous-Alexander", "Anthony Edwards", "Tyrese Haliburton",
    "Pascal Siakam", "Jalen Williams", "Chet Holmgren"
]
season = '2024-25'

# --- Fetch & Process Data ---
print("=== Fetching Data ===")
bulk_logs_df = fetch_bulk_player_game_logs(season)
team_map = get_team_abbreviation_id_mapping()
opp_df = get_opponent_stats(season)   # ← add this line
opp_last10_df = get_opponent_stats_last10(season)




all_player_data = pd.DataFrame()

for name in PLAYER_NAMES:
    pid = get_player_id(name)
    if not pid:
        continue

    logs = get_player_game_logs(pid, bulk_logs_df)
    if logs.empty:
        continue

    game_ids = logs['GAME_ID'].tolist()
    adv_stats = get_player_advanced_stats_parallel(pid, game_ids)
    if adv_stats.empty:
        continue

    logs['PLAYER_NAME'] = name
    merged = logs.merge(adv_stats, on=['GAME_ID', 'PLAYER_ID'], how='left')
    merged['PLAYER_ID'] = pid

    # Feature engineering
    processed = feature_engineering_pipeline(merged, team_map=team_map,opp_df=opp_df, opp_last10_df=opp_last10_df )
    all_player_data = pd.concat([all_player_data, processed], ignore_index=True)
    print(f"  ✅ Processed data for {name}")

# --- Apply Aggregator Features ---
print("=== Applying Aggregators ===")
print("Rows right after per-player processing :", len(all_player_data))
all_player_data = add_opponent_position_allowed_pts(all_player_data)
print("Rows after position agg                :", len(all_player_data))
all_player_data = add_team_vs_opponent_allowed_pts(all_player_data)
print("Rows after team-vs-opp agg             :", len(all_player_data))


#all_player_data = add_opponent_position_allowed_pts(all_player_data)
#all_player_data = add_team_vs_opponent_allowed_pts(all_player_data)

# --- Train & Evaluate Model ---
print("\n=== Training Model ===")
X_train_scaled, X_test_scaled, y_train, y_test, X_test_original = prepare_data(all_player_data)
best_model = train_models(X_train_scaled, y_train, X_test_scaled, y_test)
eval_df = evaluate_model(best_model, X_test_scaled, y_test, X_test_original)

# --- Next-Game Predictions ---
print("\n=== Predicting Next Game Points ===")
from src.prediction import predict_next_game
for name in PLAYER_NAMES:
    predict_next_game(name, DEFAULT_FEATURE_COLS, season)


=== Fetching Data ===
🔄 Loading bulk logs from cache: cache/bulk_logs_2024-25.parquet
  ✅ Processed data for Nikola Jokić
  ✅ Processed data for Shai Gilgeous-Alexander
  ✅ Processed data for Anthony Edwards
  ✅ Processed data for Tyrese Haliburton
  ✅ Processed data for Pascal Siakam
  ✅ Processed data for Jalen Williams
  ✅ Processed data for Chet Holmgren
=== Applying Aggregators ===
Rows right after per-player processing : 477
Rows after position agg                : 477
Rows after team-vs-opp agg             : 477

=== Training Model ===

CatBoost Performance:
  RMSE: 7.83, MAE: 6.05, R2: 0.35

RandomForest Performance:
  RMSE: 7.15, MAE: 5.81, R2: 0.46

GradientBoosting Performance:
  RMSE: 7.76, MAE: 6.21, R2: 0.36

Ridge Performance:
  RMSE: 6.52, MAE: 5.46, R2: 0.55

BayesianRidge Performance:
  RMSE: 6.52, MAE: 5.45, R2: 0.55

Best model: Ridge with RMSE: 6.52

Evaluation on Test Data:
  RMSE: 6.52, MAE: 5.46, R2: 0.55

=== Predicting Next Game Points ===
No next game for Nik

In [None]:
import os
import time
import joblib
import warnings
from datetime import datetime, timedelta
from collections import defaultdict, Counter

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# NBA API
from nba_api.stats.endpoints import (
    playergamelog, boxscoreadvancedv2,
    leaguedashteamstats, scoreboardv2, commonplayerinfo,
    leaguegamefinder, boxscoretraditionalv2
)
from nba_api.stats.static import players, teams

# Scikit-Learn & Modeling
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import (
    RandomForestRegressor, GradientBoostingRegressor
)
from sklearn.linear_model import Ridge, BayesianRidge
from catboost import CatBoostRegressor

warnings.filterwarnings("ignore")



Developing a machine learning model to predict NBA player performance metrics like points involves several steps:

Data Collection: Gather historical and current season data using the NBA API, including advanced statistics such as Player Impact Estimate (PIE), Efficiency (EFF), Player Efficiency Rating (PER), trends, opponent data, and more.

Data Preprocessing: Clean and preprocess the data to prepare it for modeling.

Feature Engineering: Create features that capture the important aspects influencing player performance.

Model Training: Choose and train a suitable machine learning model.

Model Evaluation: Assess the model's performance and fine-tune as necessary.

Prediction: Use the trained model to predict future player performance.

-----