Skip to content

Predicting NBA game outcomes using schedule related information. This is an example of supervised learning where a xgboost model was trained with 20 seasons worth of NBA games and uses SHAP values for model explainability.

License

josedv82/NBA_Schedule_XGBoost_Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Influence of schedule & travel metrics on game outcomes in the NBA.

Predicting NBA game outcomes and extracting win probabilities for each game based on schedule, travel and game density metrics.

Goal:

The model outcomes game predictions (along with the probability associated to winning and losing). However, my main goal was to understand how different games may be affected by schedule related metrics such as mileage, rest, density, time zone shifts, etc. This information could potentially be used by teams to optimize travel plans and manage different schedule indicators during the season.

Data Source:

  • I built an R package ({airball}) to scrape the data. {airball} provides various functions to extract schedule related metrics from public box score information.
  • Data preparation. My code to clean the data and prepare it for modeling is available here.

Once the data is ready:

  • To train the model I used 20 seasons of NBA data (2000-19).
  • I also ran the model on 2021 season data to check its performance given some of the differences in schedule related to COVID.

Metrics:

  • Game Outcome (model target)
  • Distance Travelled Distance travelled over "X" time windows for both teams in a game.
  • Time Zone Shifts Number of time zone shifs over "X" time windows for both teams in a game.
  • Games Played Games played over "X" time windows for both teams.
  • Rest Days Number of rest days prior to a game for both teams.
  • Location Home or Away.
  • Streak Consecutive Ws or Ls for both teams.
  • Win % Winning % for each team.

XGBoost model:

  • This is an example of supervised learning where a XGBoost classifier was implemented. I used the {h2o} package in R to build, train and evaluate the model.
  • Current model performance:
MSE:  0.1803835
RMSE:  0.4247158
LogLoss:  0.5333432
Mean Per-Class Error:  0.2758835
AUC:  0.8041708
AUCPR:  0.8072966
Gini:  0.6083416
R^2:  0.2784605

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
          0    1    Error        Rate
0      2495 1335 0.348564  =1335/3830
1       774 3035 0.203203   =774/3809
Totals 3269 4370 0.276083  =2109/7639

Model Explainability:

  • I used SHAP values to identify feature importance, as well as to explain how different features contribute to model predictions and outcome probabilities for each observation.
  • Below is an example image of how the model makes a decision for one game:

For more info on the science behind SHAP values visit this video.

Check Out the Code:

  • A static copy of the notebook is available here
  • For access to the interactive notebook visit this link to open google colab.

Future Work:

  • Continue improving model performance.
  • Identify other potential relevant features.
  • Deploy model into shiny app to provide user friendly access to predictions.

Other NBA schedule related work

About

Predicting NBA game outcomes using schedule related information. This is an example of supervised learning where a xgboost model was trained with 20 seasons worth of NBA games and uses SHAP values for model explainability.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published