# Analysis of Foul Trouble Player Effectiveness

By: Erin Neaton

## Introduction
A common, yet difficult decision in basketball is whether to sit a player when they are in foul trouble. When players are in foul trouble, coaches have to balance missing out on a player's effectiveness and difference making on the court with risking the player getting more fouls and ultimately fouling out. In this project, I aim to compare the effectiveness of players while in and not in foul trouble. This analysis includes a new angle: how these elements are impacted by the player's distance from opposing players. I examine whether there are significant changes in a player's distance to other players on the court when they are in foul trouble versus when they are not. This would be an indication that the player is playing less aggressively, so this method will measure effectiveness that isn't captured by traditional statistics. This information would assist coaches in making informed decisions under pressure of when to sit a player and when to leave them in the game, based on their expected effectiveness.

## Purpose & Motivation


In [1]:
# Import packages.
import pandas as pd
import numpy as np
import json
import math
import warnings
import os

In [2]:
# pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
# pd.set_option('display.width', 1000)

In [3]:
warnings.filterwarnings('ignore')

In [4]:
def load_data(data_file):
    '''Takes in a file name. Returns a 
    dataframe of the unpacked file.'''
    
    # Open the tracking data. 
    with open(data_file, encoding="utf8") as f:
        lines = f.read().splitlines()
    # Create the dataframe.
    df = pd.DataFrame(lines)
    df.columns = ['json_data']
    unpacked_tracking_df = pd.io.json.json_normalize(df.json_data.apply(json.loads))

    return unpacked_tracking_df
        

In [5]:
def get_jersey_ids(col):
    '''Take in a column of dictionaries. Returns a dictionary of
    player ids with their jersey numbers.'''
    jersey_dict = {}
    for x in col:
        jersey_dict[x['playerId']] = x['jersey']
    return jersey_dict

In [6]:
def unpack_loc_data(df, loc_col, team):
    '''Takes in a dataframe, the column with the location data, and 
    whether the team was home or away. Unpacks the tracking data 
    into 3 columns per player in each row with each column being 
    the x, y, or z position. Returns a list of the player Ids.'''
    
    for player in loc_col:
        df[team + '_' + player['playerId'] + '_loc'] = (player['xyz'][0], player['xyz'][1])
        
    return None

In [7]:
def get_player_coords(event_tracking_df):
    '''Takes in the dataframe. Returns the x, y coordinates for the 
    home players and the away players.'''
    
    home_coords = {}
    print(event_tracking_df['home_ids'])
    for player in event_tracking_df['home_ids']: 
        home_coords[player] = (event_tracking_df['home_' + player + '_x'][idx],
                               event_tracking_df['home_' + player + '_y'][idx])

    opposing_coords = {}
    for player in event_tracking_df['home_ids']:
        opposing_coords[player] = (event_tracking_df['away_' + player + '_x'][idx], 
                                   event_tracking_df['away_' + player + '_y'][idx])
    
    return home_coords, opposing_coords

In [8]:
def clean_game(event_tracking_df):
    '''Takes in the combined tracking and events dataframe for one game.
    Returns a dataframe with the columns unpacked, the cumulative fouls per
    player per game, and the distance between each player and the next 
    closest player at each data capture. Also returns the player lists from
    the unpack_loc_data call.'''
    
    # Get a list of players who committed at least one foul.
    players_fouled = event_tracking_df[event_tracking_df['eventType'] == 'FOUL']['playerId'].unique()
    players_fouled = [x for x in players_fouled if x != None]
    
    # For each player who committed a foul, create a column to get their cumulative fouls in the game.
    for player in players_fouled:
        event_tracking_df['num_fouls_' + player] = np.where((event_tracking_df['eventType'] == 'FOUL') & \
                                                            (event_tracking_df['playerId'] == player),
                                                            1,
                                                            0)
        event_tracking_df['num_fouls_' + player] = event_tracking_df['num_fouls_' + player].cumsum()
    
        # Mark whether the player was in foul trouble.
        event_tracking_df['foul_trouble' + player] = np.where((event_tracking_df['num_fouls_' + player] >= 4) \
                                                             | ((event_tracking_df['num_fouls_' + player] >= 3) \
                                                               & (event_tracking_df['period_x'] <= 2)),
                                                             1,
                                                             0)
        
    # Get a list of players who had at least one rebound.
    players_rebounded = event_tracking_df[event_tracking_df['eventType'] == 'RB']['playerId'].unique()
    
    # For each player who had a rebound, create a column to get their cumulative rebounds in the game.
    for player in players_fouled:
        event_tracking_df['num_rebounds_' + player] = np.where((event_tracking_df['eventType'] == 'RB') & \
                                                            (event_tracking_df['playerId'] == player),
                                                            1,
                                                            0)
        event_tracking_df['num_rebounds_' + player] = event_tracking_df['num_rebounds_' + player].cumsum()
        
        
    # Get a list of players who had at least one shot.
    players_shot = event_tracking_df[event_tracking_df['eventType'] == 'SHOT']['playerId'].unique()
    
    # For each player who took a shot, create a column to get their total shots taken throughout the game.
    for player in players_fouled:
        event_tracking_df['num_shots_' + player] = np.where((event_tracking_df['eventType'] == 'SHOT') & \
                                                            (event_tracking_df['playerId'] == player),
                                                            1,
                                                            0)
        event_tracking_df['num_shots_' + player] = event_tracking_df['num_shots_' + player].cumsum()
        
        
    
    # Unpack the tracking data columns with unpack_loc_data.
    
    list_dicts = []
    for row in event_tracking_df['homePlayers_x']:
        row_dict = {}
        for test in row:
            row_dict[test['playerId']] = (test['xyz'][0], test['xyz'][1])
        list_dicts.append(row_dict)
        
    event_tracking_df['home_dict_loc'] = list_dicts
    
    list_dicts = []
    for row in event_tracking_df['awayPlayers_x']:
        row_dict = {}
        for test in row:
            row_dict[test['playerId']] = (test['xyz'][0], test['xyz'][1])
        list_dicts.append(row_dict)
        
    event_tracking_df['away_dict_loc'] = list_dicts
            
    
    return event_tracking_df

In [9]:
def get_min_dict(df):
    '''Takes in the dataframe. Uses the dictionaries of locations for the
    home and away players to calculate the distance between each home
    team player and the closest away team player. Returns that dictionary.'''
    dist_dict = {}
    min_dist = 9999999
    
    col1 = df['home_dict_loc']
    col2 = df['away_dict_loc']

    for home_player in col1.items():
        for away_player in col2.items():
            curr_val = math.dist([home_player[1][0], home_player[1][1]], 
                                 [away_player[1][0], away_player[1][1]])

            if curr_val < min_dist:
                min_dist = curr_val
        dist_dict[home_player[0]] = min_dist

    return dist_dict

In [32]:
def get_corr_categories(correl):
    '''Takes in the correlation and returns a categorical
    label for the value.'''
    if abs(correl) <= .1:
        return 'No change'
    elif correl > .1:
        return 'More Aggressive'
    else:
        return 'Less Aggressive'

In [11]:
# Load the tracking data into a dataframe.
tracking_data = pd.DataFrame()
directory = os.fsencode('./')
for file in os.listdir(directory)
    filename = os.fsdecode(file)
    if filename.endswith("_tracking.jsonl"): 
        tracking_data = pd.concat([tracking_data, load_data(filename)])
        
        print(tracking_data.shape)

(92880, 9)
(191040, 9)
(279840, 9)
(375240, 9)


In [12]:
# tracking_data.to_csv('tracking_data.csv')

In [13]:
# tracking_data = pd.read_csv('tracking_data.csv')

In [14]:
# Load the play by play data into a dataframe.
events_data = pd.DataFrame()
directory = os.fsencode('./')
for file in os.listdir(directory):
    filename = os.fsdecode(file)
    if filename.endswith("_events.jsonl"): 
        events_data = pd.concat([events_data, load_data(filename)])
        
        print(events_data.shape)

(3620, 10)
(7267, 10)
(11021, 10)
(14690, 10)
(18300, 10)
(21856, 10)
(25455, 10)
(29120, 10)
(32850, 10)
(36507, 10)
(40110, 10)
(43692, 10)
(47279, 10)
(50934, 10)
(54550, 10)
(58273, 10)
(61903, 10)
(65555, 10)


In [16]:
# Combine the tracking and events dataframes into one dataframe.
event_tracking_df = pd.merge(tracking_data, events_data, 
                             on=['shotClock', 'gameClock', 'wallClock'], 
                             how='inner')

In [17]:
df = clean_game(event_tracking_df)

In [18]:
df['min_dist'] = df.apply(get_min_dict, axis=1)

In [19]:
test =df[[x for x in df.columns if ('num_fouls_' in x)] + ['min_dist']].reset_index()

In [20]:
corr_df = test.merge(test.min_dist.dropna().apply(pd.Series).reset_index(), on='index', how='outer')

In [21]:
num_fouls_cols = [x for x in corr_df.columns if ('num_fouls_' in x)]
matching_cols = [x[10:] for x in num_fouls_cols]
final_list = list(zip(num_fouls_cols, matching_cols))

In [33]:
corr_dicts = {}
for pair in final_list:
    try:
        cor_ = corr_df[[pair[0], pair[1]]].corr()
        corr_dicts[pair[1]] = cor_[pair[0]][1]
                    
    except:
        #didnt foul

        pass
coach_table = pd.DataFrame.from_dict(corr_dicts, 
                       orient='index', 
                       columns=['Correlation']).reset_index().rename(columns={'index': 'PlayerId'})
print(coach_table)
coach_table['Correlation'] = coach_table['Correlation'].apply(get_corr_categories)
coach_table.rename(columns={'Correlation': 'Change in Aggresiveness'}, inplace=True)

                                PlayerId  Correlation
0   ff4187ba-89ef-11e6-9d60-a45e60e298d3    -0.113314
1   986b713a-b20b-4eb0-919e-c859d0508af7    -0.032114
2   a6904c1c-0dc5-41c8-a618-b3a276131726    -0.067883
3   33d595d2-295a-4291-a01b-c947b399ffc6    -0.124205
4   ff427c2e-89ef-11e6-8d2a-a45e60e298d3    -0.097359
5   ff41a38c-89ef-11e6-b14d-a45e60e298d3    -0.044342
6   ff4170c2-89ef-11e6-83e1-a45e60e298d3    -0.007962
7   79b83f96-5723-4219-b521-57de6111ee97    -0.089605
8   a53f4c58-d55e-4571-a980-808bdd874241     0.020494
9   297a0fda-0a03-4003-bf39-aa92b7a730ff          NaN
10  8f30d857-98df-476f-a924-f6720f29d3be     0.124765
11  ff4179a1-89ef-11e6-909f-a45e60e298d3          NaN
12  561048bb-c412-4d8b-a0bc-27d250a1a431     0.058192
13  ff41a31e-89ef-11e6-8405-a45e60e298d3          NaN


In [34]:
overall_dict = {}
for dict_ in df['homePlayers_x'].apply(get_jersey_ids).tolist():
    for key,val in dict_.items():
        overall_dict[key] = val

In [35]:
for dict_ in df['awayPlayers_x'].apply(get_jersey_ids).tolist():
    for key,val in dict_.items():
        overall_dict[key] = val

In [36]:
num_rbs_cols = [x for x in corr_df.columns if ('num_rebounds_' in x)]
matching_rbs_cols = [x[10:] for x in num_rbs_cols]
final_rbs_list = list(zip(matching_rbs_cols, matching_cols))

In [37]:
df.head()

Unnamed: 0,frameIdx,homePlayers_x,awayPlayers_x,ball,period_x,gameClock,gameClockStopped,shotClock,wallClock,gameId,eventType,period_y,homePlayers_y,awayPlayers_y,playerId,pbpId,num_fouls_ff4187ba-89ef-11e6-9d60-a45e60e298d3,foul_troubleff4187ba-89ef-11e6-9d60-a45e60e298d3,num_fouls_986b713a-b20b-4eb0-919e-c859d0508af7,foul_trouble986b713a-b20b-4eb0-919e-c859d0508af7,num_fouls_a6904c1c-0dc5-41c8-a618-b3a276131726,foul_troublea6904c1c-0dc5-41c8-a618-b3a276131726,num_fouls_33d595d2-295a-4291-a01b-c947b399ffc6,foul_trouble33d595d2-295a-4291-a01b-c947b399ffc6,num_fouls_ff427c2e-89ef-11e6-8d2a-a45e60e298d3,foul_troubleff427c2e-89ef-11e6-8d2a-a45e60e298d3,num_fouls_ff41a38c-89ef-11e6-b14d-a45e60e298d3,foul_troubleff41a38c-89ef-11e6-b14d-a45e60e298d3,num_fouls_ff4170c2-89ef-11e6-83e1-a45e60e298d3,foul_troubleff4170c2-89ef-11e6-83e1-a45e60e298d3,num_fouls_79b83f96-5723-4219-b521-57de6111ee97,foul_trouble79b83f96-5723-4219-b521-57de6111ee97,num_fouls_a53f4c58-d55e-4571-a980-808bdd874241,foul_troublea53f4c58-d55e-4571-a980-808bdd874241,num_fouls_297a0fda-0a03-4003-bf39-aa92b7a730ff,foul_trouble297a0fda-0a03-4003-bf39-aa92b7a730ff,num_fouls_8f30d857-98df-476f-a924-f6720f29d3be,foul_trouble8f30d857-98df-476f-a924-f6720f29d3be,num_fouls_ff4179a1-89ef-11e6-909f-a45e60e298d3,foul_troubleff4179a1-89ef-11e6-909f-a45e60e298d3,num_fouls_561048bb-c412-4d8b-a0bc-27d250a1a431,foul_trouble561048bb-c412-4d8b-a0bc-27d250a1a431,num_fouls_ff41a31e-89ef-11e6-8405-a45e60e298d3,foul_troubleff41a31e-89ef-11e6-8405-a45e60e298d3,num_rebounds_ff4187ba-89ef-11e6-9d60-a45e60e298d3,num_rebounds_986b713a-b20b-4eb0-919e-c859d0508af7,num_rebounds_a6904c1c-0dc5-41c8-a618-b3a276131726,num_rebounds_33d595d2-295a-4291-a01b-c947b399ffc6,num_rebounds_ff427c2e-89ef-11e6-8d2a-a45e60e298d3,num_rebounds_ff41a38c-89ef-11e6-b14d-a45e60e298d3,num_rebounds_ff4170c2-89ef-11e6-83e1-a45e60e298d3,num_rebounds_79b83f96-5723-4219-b521-57de6111ee97,num_rebounds_a53f4c58-d55e-4571-a980-808bdd874241,num_rebounds_297a0fda-0a03-4003-bf39-aa92b7a730ff,num_rebounds_8f30d857-98df-476f-a924-f6720f29d3be,num_rebounds_ff4179a1-89ef-11e6-909f-a45e60e298d3,num_rebounds_561048bb-c412-4d8b-a0bc-27d250a1a431,num_rebounds_ff41a31e-89ef-11e6-8405-a45e60e298d3,num_shots_ff4187ba-89ef-11e6-9d60-a45e60e298d3,num_shots_986b713a-b20b-4eb0-919e-c859d0508af7,num_shots_a6904c1c-0dc5-41c8-a618-b3a276131726,num_shots_33d595d2-295a-4291-a01b-c947b399ffc6,num_shots_ff427c2e-89ef-11e6-8d2a-a45e60e298d3,num_shots_ff41a38c-89ef-11e6-b14d-a45e60e298d3,num_shots_ff4170c2-89ef-11e6-83e1-a45e60e298d3,num_shots_79b83f96-5723-4219-b521-57de6111ee97,num_shots_a53f4c58-d55e-4571-a980-808bdd874241,num_shots_297a0fda-0a03-4003-bf39-aa92b7a730ff,num_shots_8f30d857-98df-476f-a924-f6720f29d3be,num_shots_ff4179a1-89ef-11e6-909f-a45e60e298d3,num_shots_561048bb-c412-4d8b-a0bc-27d250a1a431,num_shots_ff41a31e-89ef-11e6-8405-a45e60e298d3,home_dict_loc,away_dict_loc,min_dist
0,101352,"[{'xyz': [-3.35, 5.85, 0], 'jersey': '11', 'pl...","[{'xyz': [-2.4, -9.92, 0], 'jersey': '0', 'pla...","[22.2, -6.13, 1.81]",1,719.16,False,24.0,1654474102546,135d26c9-57f2-4e2e-a72c-e91e32d29957,TOUCH,1,"[ff4179a1-89ef-11e6-909f-a45e60e298d3, ff41a31...","[ff411907-89ef-11e6-9c68-a45e60e298d3, 986b713...",ff411907-89ef-11e6-9c68-a45e60e298d3,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,{'ff41a31e-89ef-11e6-8405-a45e60e298d3': (-3.3...,{'986b713a-b20b-4eb0-919e-c859d0508af7': (-2.4...,{'ff41a31e-89ef-11e6-8405-a45e60e298d3': 3.151...
1,101447,"[{'xyz': [-21.2, 6.57, 0], 'jersey': '11', 'pl...","[{'xyz': [-38.01, -19.39, 0], 'jersey': '0', '...","[-4.9, -14.15, 0.31]",1,715.36,False,20.8,1654474106346,135d26c9-57f2-4e2e-a72c-e91e32d29957,DRIBBLE,1,"[ff4179a1-89ef-11e6-909f-a45e60e298d3, ff41a31...","[ff411907-89ef-11e6-9c68-a45e60e298d3, 986b713...",ff411907-89ef-11e6-9c68-a45e60e298d3,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,{'ff41a31e-89ef-11e6-8405-a45e60e298d3': (-21....,{'986b713a-b20b-4eb0-919e-c859d0508af7': (-38....,{'ff41a31e-89ef-11e6-8405-a45e60e298d3': 9.221...
2,102239,"[{'xyz': [-35.08, 9.42, 0], 'jersey': '11', 'p...","[{'xyz': [-40.58, -22.92, 0], 'jersey': '0', '...","[-23.44, -16.4, 0.98]",1,704.2,False,10.4,1654474138026,135d26c9-57f2-4e2e-a72c-e91e32d29957,DRIBBLE,1,"[ff4179a1-89ef-11e6-909f-a45e60e298d3, ff41a31...","[ff411907-89ef-11e6-9c68-a45e60e298d3, 986b713...",ff411907-89ef-11e6-9c68-a45e60e298d3,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,{'ff41a31e-89ef-11e6-8405-a45e60e298d3': (-35....,{'986b713a-b20b-4eb0-919e-c859d0508af7': (-40....,{'ff41a31e-89ef-11e6-8405-a45e60e298d3': 13.49...
3,102262,"[{'xyz': [-33.39, 10.09, 0], 'jersey': '11', '...","[{'xyz': [-40.42, -22.92, 0], 'jersey': '0', '...","[-21.15, -16.98, 0.48]",1,703.24,False,9.4,1654474138946,135d26c9-57f2-4e2e-a72c-e91e32d29957,DRIBBLE,1,"[ff4179a1-89ef-11e6-909f-a45e60e298d3, ff41a31...","[ff411907-89ef-11e6-9c68-a45e60e298d3, 986b713...",ff411907-89ef-11e6-9c68-a45e60e298d3,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,{'ff41a31e-89ef-11e6-8405-a45e60e298d3': (-33....,{'986b713a-b20b-4eb0-919e-c859d0508af7': (-40....,{'ff41a31e-89ef-11e6-8405-a45e60e298d3': 12.24...
4,102287,"[{'xyz': [-31.88, 11.43, 0], 'jersey': '11', '...","[{'xyz': [-40.29, -22.92, 0], 'jersey': '0', '...","[-32.4, -15.73, 0.56]",1,702.24,False,8.4,1654474139946,135d26c9-57f2-4e2e-a72c-e91e32d29957,DRIBBLE,1,"[ff4179a1-89ef-11e6-909f-a45e60e298d3, ff41a31...","[ff411907-89ef-11e6-9c68-a45e60e298d3, 986b713...",ff411907-89ef-11e6-9c68-a45e60e298d3,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,{'ff41a31e-89ef-11e6-8405-a45e60e298d3': (-31....,{'986b713a-b20b-4eb0-919e-c859d0508af7': (-40....,{'ff41a31e-89ef-11e6-8405-a45e60e298d3': 10.37...


In [38]:
for player in overall_dict.keys():
    shots_col = 'num_shots_'+player
    rb_col = 'num_rebounds_'+player
    fouls = 'num_fouls_'+player
    foul_trouble = 'foul_trouble'+player
    
    if ('num_shots_'+player in df.columns) & ('num_fouls_'+player in df.columns):
        shots_df = df[[shots_col, foul_trouble]]
        print(shots_df[foul_trouble].value_counts())
#         shots_df = shots_df[shots_df[foul_trouble] == 1]
        print(shots_df)
    if ('num_rebounds_'+player in df.columns) & ('num_fouls_'+player in df.columns):
        rb_df = df[[rb_col, foul_trouble]]

0    5077
Name: foul_troubleff41a31e-89ef-11e6-8405-a45e60e298d3, dtype: int64
      num_shots_ff41a31e-89ef-11e6-8405-a45e60e298d3  \
0                                                  0   
1                                                  0   
2                                                  0   
3                                                  0   
4                                                  0   
...                                              ...   
5072                                              13   
5073                                              13   
5074                                              13   
5075                                              13   
5076                                              13   

      foul_troubleff41a31e-89ef-11e6-8405-a45e60e298d3  
0                                                    0  
1                                                    0  
2                                                    0  
3                   

In [40]:
coach_table['Player Jersey #'] = coach_table['PlayerId'].apply(lambda x: overall_dict[x])

In [41]:
coach_table[['Player Jersey #', 'Change in Aggresiveness']]

Unnamed: 0,Player Jersey #,Change in Aggresiveness
0,22,Less Aggressive
1,0,No change
2,7,No change
3,3,Less Aggressive
4,42,No change
5,23,No change
6,5,No change
7,44,No change
8,0,No change
9,9,Less Aggressive


## Proposed Solution / Recommendations
The solution proposed here is to use the code here to print out a foul number and expected effectiveness chart for coaches to quickly reference during the game. It would not require any in game analyses which they would not have time for, but instead would be a physical sheet the coaches would have prior to the game. It would be simple and clear to enable coaches to quickly understand whether the expected effectiveness of each player changes as players commit more fouls. They could factor this information into their decision of how long they should sit the player in foul trouble based on their expected effectiveness when in foul trouble. 

## Difficulties and Challenges
The main challenge with this project was not having a lot of cases of players in foul trouble. The dataset was imbalanced because the majority of the time, players are not in foul trouble. Additionally, my laptop could not handle the size of the dataset, so I was unable to run the entire dataset through my code without my kernel crashing. The output in my html notebook is just a subset of 4 games. 

Additionally, behaviors during playoff games and regular season games may be different. Players on teams that do not make the playoffs may also have different trends in performance in and out of foul trouble. If this method is used, a new model should be developed to make better predictions on this type of data. 

A second major problem I faced was that I had planned the project with my team and with the expectation that I would have 3 people working on this with me. Unfortunately my team stopped responding to messages after our initial meeting, so I had to complete the project alone. Although this caused a time crunch, I did get to complete the project from start to finish which gave me the opportunity to learn about working with sports data at every stage of the process.

## Next Steps
I would like to make the chart separate for both the coach's own team and the opposing team. Having the chart for the away team could provide insight as to which players on the opposing team are not as effective under foul trouble and, therefore, the coaches could call plays that take advantage of this information. 

I would also like to actually build out a predictive model for likelihood to foul again instead of just creating a simple chart. I would also like to a composite effectiveness score that includes expected shots and expected rebounds in addition to the change in distance with more fouls. 

The next step I would like to take with this project is to make the effectiveness measure more robust. I think the tracking data on the ball could be used to see how close players are getting to the ball/player with the ball on defense, how quickly players move, and whether they jump on defense when in foul trouble. There is a similar measure in football data (bite distance under expected) that could be used to analyze pump fakes in basketball and whether players are more/less likely to get tricked by pump fakes when in foul trouble. These ideas may not pan out, but I think they have potential and I would love to explore them. Additionally, if I could get feedback from coaches, I think I could improve the foul number vs. expected effectiveness chart to design it in a way they find most useful. Finally, with more data, I could improve the model.