## Mixed Model NBA Shooting Predictor

#### Table of contents
    1. Motivation
    2. Exploratory Data Analysis

#### Motivation

The goal of this project to is to build a model that predicts the percentage of making a field goal. This project will explore three seperate types of models to compare which best fits this scenario. 

The first model I will explore will be a fixed effect model such as gradient boosting. Each factor will have the same coefficients in the model no matter the player or group.

The second model I will explore will be a Random effect model such as variance component analysis. Each 'observation' will have a different level of varaiblity. Our model will have different coefficients for each player or group.

Lastly we will look at mixed models that contain both random effects and fixed effects. 


In [1]:
#import libraries
import matplotlib.pyplot as plt
import numpy as np
import time
import pandas as pd
from py_ball import player, playbyplay, image, league_dash, team, boxscore
from sklearn.model_selection import train_test_split

In [2]:
#to build connection to py_ball library
HEADERS = {'Connection': 'keep-alive',
           'Host': 'stats.nba.com',
           'Origin': 'http://stats.nba.com',
           'Upgrade-Insecure-Requests': '1',
           'Referer': 'stats.nba.com',
           'x-nba-stats-origin': 'stats',
           'x-nba-stats-token': 'true',
           'Accept-Language': 'en-US,en;q=0.9',
           "X-NewRelic-ID": "VQECWF5UChAHUlNTBwgBVw==",
           'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6)' +\
                         ' AppleWebKit/537.36 (KHTML, like Gecko)' + \
                         ' Chrome/81.0.4044.129 Safari/537.36'}

#### Exploratory Data Analysis

I first explore what data is available through the py_ball api. The first goal is to use the data available to determine which features could contribute to a players field goal percentage during a game. 

The players location on the floor, the angle they shoot from, the type of shot being taken, and the opposing teams defensive rating will make a good starting base for our models features.

In [None]:
#Building a dataframe of the shots taken during the regular season of the 22-23 season, saved as shotchart.csv
#possibly will remove this in final project
league_id = '00' #NBA
player_id = '0' #All players

season = '2022-23'
x=season.split('0',1)[1]
season_split = x.split('-',1)[0]
all_df = pd.DataFrame({})

for x in range(1,1231):
    print('Game #' + str(x))
    t0 = time.time()
    game_id = '002'+ season_split +'0'+ str(x).zfill(4)
    shots = player.Player(headers=HEADERS,
                      endpoint='shotchartdetail',
                      league_id=league_id,
                      player_id=player_id,
                      game_id=game_id,
                      season=season)
    shot_df = pd.DataFrame(shots.data['Shot_Chart_Detail'])
    all_df = pd.concat([all_df, shot_df], axis=0).reset_index(drop=True)
    delay = time.time() - t0
    print('Waiting ' + str(10*delay) + 's')
    time.sleep(delay)
    
all_df.to_csv('shotchart.csv', index=False)

In [5]:
pd.set_option('display.max_columns', None) # displays all columns of the shotchart
#shot_chart = pd.read_csv(r'/home/neil/Desktop/shotchart.csv')
shot_chart = pd.read_csv(r'C:\Users\nmani\OneDrive\Desktop\shotchart.csv')
#box_score = pd.read_csv(r'/home/neil/Desktop/game_box_scores_2223_Team.csv')
box_score = pd.read_csv(r'C:\Users\nmani\OneDrive\Desktop\game_box_scores_2223_Team.csv')

In [6]:
shot_chart.head()

Unnamed: 0,GRID_TYPE,GAME_ID,GAME_EVENT_ID,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_NAME,PERIOD,MINUTES_REMAINING,SECONDS_REMAINING,EVENT_TYPE,ACTION_TYPE,SHOT_TYPE,SHOT_ZONE_BASIC,SHOT_ZONE_AREA,SHOT_ZONE_RANGE,SHOT_DISTANCE,LOC_X,LOC_Y,SHOT_ATTEMPTED_FLAG,SHOT_MADE_FLAG,GAME_DATE,HTM,VTM
0,Shot Chart Detail,22200001,7,203954,Joel Embiid,1610612755,Philadelphia 76ers,1,11,38,Missed Shot,Turnaround Fadeaway shot,2PT Field Goal,Mid-Range,Left Side(L),8-16 ft.,12,-118,50,1,0,20221018,BOS,PHI
1,Shot Chart Detail,22200001,11,203935,Marcus Smart,1610612738,Boston Celtics,1,11,15,Made Shot,Driving Floating Bank Jump Shot,2PT Field Goal,Mid-Range,Right Side(R),8-16 ft.,13,120,55,1,1,20221018,BOS,PHI
2,Shot Chart Detail,22200001,12,202699,Tobias Harris,1610612755,Philadelphia 76ers,1,11,5,Missed Shot,Driving Floating Jump Shot,2PT Field Goal,In The Paint (Non-RA),Center(C),8-16 ft.,14,50,135,1,0,20221018,BOS,PHI
3,Shot Chart Detail,22200001,14,202699,Tobias Harris,1610612755,Philadelphia 76ers,1,11,3,Made Shot,Tip Layup Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,0,0,1,1,20221018,BOS,PHI
4,Shot Chart Detail,22200001,15,1628369,Jayson Tatum,1610612738,Boston Celtics,1,10,46,Made Shot,Jump Shot,3PT Field Goal,Left Corner 3,Left Side(L),24+ ft.,23,-232,49,1,1,20221018,BOS,PHI


In [10]:
shot_chart.shape

(217220, 24)

In [12]:
box_score.shape

(2460, 29)

In [7]:
box_score.head()

Unnamed: 0,GAME_ID,TEAM_ID,TEAM_NAME,TEAM_ABBREVIATION,TEAM_CITY,MIN,E_OFF_RATING,OFF_RATING,E_DEF_RATING,DEF_RATING,E_NET_RATING,NET_RATING,AST_PCT,AST_TOV,AST_RATIO,OREB_PCT,DREB_PCT,REB_PCT,E_TM_TOV_PCT,TM_TOV_PCT,EFG_PCT,TS_PCT,USG_PCT,E_USG_PCT,E_PACE,PACE,PACE_PER40,POSS,PIE
0,22200001,1610612755,76ers,PHI,Philadelphia,240.000000:00,114.3,119.4,126.9,129.9,-12.5,-10.5,0.4,1.14,13.1,0.19,0.744,0.457,13.683,14.3,0.581,0.634,1.0,0.192,100.82,97.5,81.25,98,0.434
1,22200001,1610612738,Celtics,BOS,Boston,240.000000:00,126.9,129.9,114.3,119.4,12.5,10.5,0.522,2.18,18.6,0.256,0.81,0.543,11.075,11.3,0.634,0.668,1.0,0.197,100.82,97.5,81.25,97,0.566
2,22200002,1610612747,Lakers,LAL,Los Angeles,240.000000:00,92.4,97.3,105.9,107.0,-13.6,-9.6,0.575,1.05,15.3,0.246,0.719,0.482,18.644,19.6,0.479,0.519,1.0,0.197,117.06,113.5,94.58,112,0.452
3,22200002,1610612744,Warriors,GSW,Golden State,240.000000:00,105.9,107.0,92.4,97.3,13.6,9.6,0.689,1.72,19.6,0.281,0.754,0.518,15.501,15.7,0.535,0.564,1.0,0.198,117.06,113.5,94.58,115,0.548
4,22200003,1610612753,Magic,ORL,Orlando,240.000000:00,106.5,106.9,107.0,111.9,-0.6,-5.0,0.5,1.17,15.7,0.277,0.707,0.514,17.585,17.6,0.552,0.578,1.0,0.199,103.96,101.5,84.58,102,0.475


In [8]:
#combine dataframes to include defensive rating of a team with the shot angle and location
new_df = pd.merge(shot_chart, box_score,  how='left', left_on=['GAME_ID','TEAM_ID'], right_on = ['GAME_ID','TEAM_ID'])


In [23]:
new_df.head(50)

Unnamed: 0,GRID_TYPE,GAME_ID,GAME_EVENT_ID,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_NAME_x,PERIOD,MINUTES_REMAINING,SECONDS_REMAINING,EVENT_TYPE,ACTION_TYPE,SHOT_TYPE,SHOT_ZONE_BASIC,SHOT_ZONE_AREA,SHOT_ZONE_RANGE,SHOT_DISTANCE,LOC_X,LOC_Y,SHOT_ATTEMPTED_FLAG,SHOT_MADE_FLAG,GAME_DATE,HTM,VTM,TEAM_NAME_y,TEAM_ABBREVIATION,TEAM_CITY,MIN,E_OFF_RATING,OFF_RATING,E_DEF_RATING,DEF_RATING,E_NET_RATING,NET_RATING,AST_PCT,AST_TOV,AST_RATIO,OREB_PCT,DREB_PCT,REB_PCT,E_TM_TOV_PCT,TM_TOV_PCT,EFG_PCT,TS_PCT,USG_PCT,E_USG_PCT,E_PACE,PACE,PACE_PER40,POSS,PIE
0,Shot Chart Detail,22200001,7,203954,Joel Embiid,1610612755,Philadelphia 76ers,1,11,38,Missed Shot,Turnaround Fadeaway shot,2PT Field Goal,Mid-Range,Left Side(L),8-16 ft.,12,-118,50,1,0,20221018,BOS,PHI,76ers,PHI,Philadelphia,240.000000:00,114.3,119.4,126.9,129.9,-12.5,-10.5,0.4,1.14,13.1,0.19,0.744,0.457,13.683,14.3,0.581,0.634,1.0,0.192,100.82,97.5,81.25,98,0.434
1,Shot Chart Detail,22200001,11,203935,Marcus Smart,1610612738,Boston Celtics,1,11,15,Made Shot,Driving Floating Bank Jump Shot,2PT Field Goal,Mid-Range,Right Side(R),8-16 ft.,13,120,55,1,1,20221018,BOS,PHI,Celtics,BOS,Boston,240.000000:00,126.9,129.9,114.3,119.4,12.5,10.5,0.522,2.18,18.6,0.256,0.81,0.543,11.075,11.3,0.634,0.668,1.0,0.197,100.82,97.5,81.25,97,0.566
2,Shot Chart Detail,22200001,12,202699,Tobias Harris,1610612755,Philadelphia 76ers,1,11,5,Missed Shot,Driving Floating Jump Shot,2PT Field Goal,In The Paint (Non-RA),Center(C),8-16 ft.,14,50,135,1,0,20221018,BOS,PHI,76ers,PHI,Philadelphia,240.000000:00,114.3,119.4,126.9,129.9,-12.5,-10.5,0.4,1.14,13.1,0.19,0.744,0.457,13.683,14.3,0.581,0.634,1.0,0.192,100.82,97.5,81.25,98,0.434
3,Shot Chart Detail,22200001,14,202699,Tobias Harris,1610612755,Philadelphia 76ers,1,11,3,Made Shot,Tip Layup Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,0,0,1,1,20221018,BOS,PHI,76ers,PHI,Philadelphia,240.000000:00,114.3,119.4,126.9,129.9,-12.5,-10.5,0.4,1.14,13.1,0.19,0.744,0.457,13.683,14.3,0.581,0.634,1.0,0.192,100.82,97.5,81.25,98,0.434
4,Shot Chart Detail,22200001,15,1628369,Jayson Tatum,1610612738,Boston Celtics,1,10,46,Made Shot,Jump Shot,3PT Field Goal,Left Corner 3,Left Side(L),24+ ft.,23,-232,49,1,1,20221018,BOS,PHI,Celtics,BOS,Boston,240.000000:00,126.9,129.9,114.3,119.4,12.5,10.5,0.522,2.18,18.6,0.256,0.81,0.543,11.075,11.3,0.634,0.668,1.0,0.197,100.82,97.5,81.25,97,0.566
5,Shot Chart Detail,22200001,17,203954,Joel Embiid,1610612755,Philadelphia 76ers,1,10,33,Missed Shot,Jump Shot,3PT Field Goal,Above the Break 3,Right Side Center(RC),24+ ft.,26,116,241,1,0,20221018,BOS,PHI,76ers,PHI,Philadelphia,240.000000:00,114.3,119.4,126.9,129.9,-12.5,-10.5,0.4,1.14,13.1,0.19,0.744,0.457,13.683,14.3,0.581,0.634,1.0,0.192,100.82,97.5,81.25,98,0.434
6,Shot Chart Detail,22200001,23,1630178,Tyrese Maxey,1610612755,Philadelphia 76ers,1,10,12,Missed Shot,Driving Layup Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,3,38,8,1,0,20221018,BOS,PHI,76ers,PHI,Philadelphia,240.000000:00,114.3,119.4,126.9,129.9,-12.5,-10.5,0.4,1.14,13.1,0.19,0.744,0.457,13.683,14.3,0.581,0.634,1.0,0.192,100.82,97.5,81.25,98,0.434
7,Shot Chart Detail,22200001,27,202699,Tobias Harris,1610612755,Philadelphia 76ers,1,10,9,Missed Shot,Fadeaway Jump Shot,2PT Field Goal,In The Paint (Non-RA),Center(C),8-16 ft.,12,31,121,1,0,20221018,BOS,PHI,76ers,PHI,Philadelphia,240.000000:00,114.3,119.4,126.9,129.9,-12.5,-10.5,0.4,1.14,13.1,0.19,0.744,0.457,13.683,14.3,0.581,0.634,1.0,0.192,100.82,97.5,81.25,98,0.434
8,Shot Chart Detail,22200001,29,1628401,Derrick White,1610612738,Boston Celtics,1,10,4,Missed Shot,Running Layup Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,1,-10,12,1,0,20221018,BOS,PHI,Celtics,BOS,Boston,240.000000:00,126.9,129.9,114.3,119.4,12.5,10.5,0.522,2.18,18.6,0.256,0.81,0.543,11.075,11.3,0.634,0.668,1.0,0.197,100.82,97.5,81.25,97,0.566
9,Shot Chart Detail,22200001,35,201143,Al Horford,1610612738,Boston Celtics,1,9,53,Missed Shot,Cutting Layup Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,2,20,10,1,0,20221018,BOS,PHI,Celtics,BOS,Boston,240.000000:00,126.9,129.9,114.3,119.4,12.5,10.5,0.522,2.18,18.6,0.256,0.81,0.543,11.075,11.3,0.634,0.668,1.0,0.197,100.82,97.5,81.25,97,0.566


In [19]:
new_df.isnull().sum()

GRID_TYPE              0
GAME_ID                0
GAME_EVENT_ID          0
PLAYER_ID              0
PLAYER_NAME            0
TEAM_ID                0
TEAM_NAME_x            0
PERIOD                 0
MINUTES_REMAINING      0
SECONDS_REMAINING      0
EVENT_TYPE             0
ACTION_TYPE            0
SHOT_TYPE              0
SHOT_ZONE_BASIC        0
SHOT_ZONE_AREA         0
SHOT_ZONE_RANGE        0
SHOT_DISTANCE          0
LOC_X                  0
LOC_Y                  0
SHOT_ATTEMPTED_FLAG    0
SHOT_MADE_FLAG         0
GAME_DATE              0
HTM                    0
VTM                    0
TEAM_NAME_y            0
TEAM_ABBREVIATION      0
TEAM_CITY              0
MIN                    0
E_OFF_RATING           0
OFF_RATING             0
E_DEF_RATING           0
DEF_RATING             0
E_NET_RATING           0
NET_RATING             0
AST_PCT                0
AST_TOV                0
AST_RATIO              0
OREB_PCT               0
DREB_PCT               0
REB_PCT                0


In [17]:
new_df['ACTION_TYPE'].value_counts()

ACTION_TYPE
Jump Shot                             63981
Pullup Jump shot                      27214
Driving Layup Shot                    19169
Driving Floating Jump Shot            11246
Step Back Jump shot                   10457
Running Layup Shot                     6766
Layup Shot                             5824
Cutting Layup Shot                     5518
Driving Finger Roll Layup Shot         5516
Tip Layup Shot                         4685
Running Jump Shot                      4398
Floating Jump shot                     4331
Fadeaway Jump Shot                     3923
Putback Layup Shot                     3525
Driving Floating Bank Jump Shot        3356
Cutting Dunk Shot                      3340
Turnaround Fadeaway shot               3214
Turnaround Jump Shot                   2868
Running Pull-Up Jump Shot              2601
Running Dunk Shot                      2569
Turnaround Hook Shot                   2546
Driving Reverse Layup Shot             2218
Driving Dunk Shot   

In [6]:
#from basketball_relativity
def feature_engineering(shotchart_df):
    """ feature_engineering calculates engineered
    features from the shotchart data

    @param shotchart_df (DataFrame): DataFrame containing
    shotchart data

    Returns:

        shotchart_df (DataFrame): DataFrame containing
        the engineered features
    """

    shotchart_df['ANGLE'] = abs(np.rad2deg(np.arctan2(shotchart_df['LOC_X'],
                                                      shotchart_df['LOC_Y'])))

    shotchart_df['SIDE'] = [1 if x >= 0 else 0 for x in shotchart_df['LOC_X']]

    shotchart_df['DUNK'] = [1 if 'Dunk' in x else 0 for x in shotchart_df['ACTION_TYPE']]
    shotchart_df['HOOK'] = [1 if 'Hook' in x else 0 for x in shotchart_df['ACTION_TYPE']]
    shotchart_df['LAYUP'] = [1 if 'Layup' in x else 0 for x in shotchart_df['ACTION_TYPE']]
    shotchart_df['JUMP'] = [1 if 'Jump' in x else 0 for x in shotchart_df['ACTION_TYPE']]

    shotchart_df = shotchart_df[['SHOT_DISTANCE', 'ANGLE', 'SIDE', 'DUNK', 'HOOK', 'LAYUP', 'JUMP','SHOT_ATTEMPTED_FLAG', 'SHOT_MADE_FLAG','']]

    return shotchart_df

In [7]:
shotchart = feature_engineering(shot_chart)

In [8]:
shotchart
#split data first #stratify each player is in test and training
#standard scalar Angle, Rating to the training set


Unnamed: 0,SHOT_DISTANCE,ANGLE,SIDE,DUNK,HOOK,LAYUP,JUMP,SHOT_ATTEMPTED_FLAG,SHOT_MADE_FLAG
0,12,67.036227,0,0,0,0,0,1,0
1,13,65.376435,1,0,0,0,1,1,1
2,14,20.323137,1,0,0,0,1,1,0
3,0,0.000000,1,0,0,1,0,1,1
4,23,78.074008,0,0,0,0,1,1,1
...,...,...,...,...,...,...,...,...,...
217215,1,74.744881,1,1,0,0,0,1,1
217216,23,69.443955,1,0,0,0,1,1,1
217217,23,57.659407,1,0,0,0,1,1,0
217218,8,53.781163,0,0,0,0,1,1,0


In [9]:
shotchart['SHOT_MADE_FLAG'].value_counts()

SHOT_MADE_FLAG
0    113960
1    103260
Name: count, dtype: int64

Checking the split of which shots are being made. Seems to display almost even split

Next splitting the data into test and training sets

In [10]:
output = list(shotchart['SHOT_MADE_FLAG'])
shotchart = shotchart.drop(['SHOT_MADE_FLAG'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(shotchart, output, test_size=0.1, random_state=489)

### Model Building

#### Gradient Boosting

-Supervised Learning relies of labeled data, 
-Classification Problems, shot made or not
-AUC, area under ROC curve to evaluate, higher AUC is better model
-Features are numeric or categorical 
-Numeric Features should be scaled
-Categorial Variables are Encoded
-XGBoost
    - optimized gradient boosting machine
    - Speed
    - What is a decision tree?
        -single question at each tree and two choices
        -base learner, individual learner in an ensemble algorithm
        -constructed iteratively
        -individual trees tend to overfit
    -Classification and Regression Tree
        -each leaf always contains real value scores
    -Boosting
        -Ensemble algo to convert many weak learnerd to strong learners
        -Weak learner, any slightly better than 50%
        -boosting combines each weak learners prediction times their weights
        -


In [11]:
# Fixed Effects Gradient Boosted Trees
import xgboost as xgb


In [12]:
#Instantiate the XGBClassifier
xg_cl = xgb.XGBClassifier(objective='binary:logistic',n_estimators=10,seed=123)

#fit the classifier to the training set
xg_cl.fit(X_train,y_train)

#Predict the labels
preds=xg_cl.predict(X_test)

#compute the accuracy
accuracy = float(np.sum(preds==y_test))/np.shape(y_test)[0]
print("accuracy: %f" % (accuracy))

accuracy: 0.627520


In [13]:
#Create the DMatrix from X and y: churn_dmatrix
churn_dmatrix = xgb.DMatrix(data = X_train,label=y_train)

#Create the paramter dictionary: params
params = ("objective":"reg:logistic","max_depth":3}

SyntaxError: closing parenthesis '}' does not match opening parenthesis '(' (2607431374.py, line 5)

In [23]:
from pymer4.models import Lmer

model = Lmer("ANGLE  + (1|PLAYER_ID)",
             data=shotchart, family = 'binomial')

print(model.fit())

PackageNotInstalledError: The R package "lme4" is not installed.

In [1]:
team_stats = team.Team(headers=HEADERS,
                           endpoint='teamgamelogs',
                           team_id = '1610612755',
                           season = '2022-23',
                           
                           measure_type = 'Base',
                           
                           league_id='00')

team_stats = pd.DataFrame(team_stats.data['TeamGameLogs'])

NameError: name 'team' is not defined

In [None]:
games = team_stats['GAME_ID']

In [76]:

all_team_stats = pd.DataFrame({})    
for game in games:
    game_id = game
    team_stats = boxscore.BoxScore(headers=HEADERS,
                           endpoint='boxscoreadvancedv2',
                           game_id = game_id
                           )
    team_stats = pd.DataFrame(team_stats.data['PlayerStats'])
            
            
    all_team_stats = pd.concat([all_team_stats, team_stats], axis=0).reset_index(drop=True)

    
            


KeyboardInterrupt: 

In [None]:
team_stats = team.Team(headers=HEADERS,
                           endpoint='teamgamelogs',
                           team_id = '1610612755',
                           season = '2022-23',
                           
                           measure_type = 'Base',
                           
                           league_id='00')

        team_stats = pd.DataFrame(team_stats.data['TeamGameLogs'])

        team_games = team_stats['GAME_ID']
        all_team_stats = pd.DataFrame({})

        for game in team_games:
            game_id = game
            team_stats = boxscore.BoxScore(headers=HEADERS,
                           endpoint='boxscoreadvancedv2',
                           game_id = game_id
                           )
            team_stats = pd.DataFrame(team_stats.data['PlayerStats'])

In [11]:
team_stats = team.Team(headers=HEADERS,
                           endpoint='teamgamelogs',
                           team_id = '1610612755',
                           season = '2022-23',
                           
                           measure_type = 'Base',
                           
                           league_id='00')

team_stats = pd.DataFrame(team_stats.data['TeamGameLogs'])

team_games = team_stats['GAME_ID']
all_team_stats = pd.DataFrame({})