# Basketball Analytics

Use the attached data to calculate plus/minus for each player in each game. Plus/Minus is defined as the team’s score differential while the player is on the court. 

Note: When a player is substituted before or during a set of free throws but was on the court at the time of the foul that caused the free throw, he is considered to be on the court for the free throws for the purposes of plus/minus. A player substituted in before a free throw but after a foul is not considered to be on the court until after the conclusion of the free throws.

This folder includes three data sets: Play by Play, Event Code Description and Game Lineups. Please submit your answer in a .csv file and save your code, spreadsheets and all other work in a zip file.

Please submit your answer in a spreadsheet titled “Your_Team_Name_Q1_BBALL.csv” as the title, substituting in the name of your team for “Your_Team_Name.” The final product should have 3 columns. Column 1: Game_ID, Column 2: Player_ID, Column 3 Player_Plus/Minus.

### Import Packages

In [1]:
# import packages
import pandas as pd
import numpy as np

### Load 'NBA Hackathon - Event Codes.txt'

This dataset provides look up values for the event message types and action types found in the play by play dataset. Each code is converted to an English language description of the event.

In [2]:
# read in Event_Codes
Event_Codes = pd.read_table('NBA Hackathon - Event Codes.txt', sep='\t', header = 0)
Event_Codes.head()

Unnamed: 0,Event_Msg_Type,Action_Type,Event_Msg_Type_Description,Action_Type_Description
0,1,0,Made Shot,No Shot
1,1,1,Made Shot,Jump Shot
2,1,3,Made Shot,Hook Shot
3,1,4,Made Shot,Tip Shot
4,1,5,Made Shot,Layup Shot


### Load 'NBA Hackathon – Game Lineup Data Sample (50 Games).txt'

This dataset provides start of period player availability.
* Game_id – A unique game code for each game
* Period (Quarter) – The associated period of the line up (overtime period are indicated by values greater than 4)
* Person_id – A unique identifier for each player
* Team_id – A unique identifier for each team
* Status – A variable indicating whether a player is active (A) or inactive (I)

In [3]:
# read in Game_Lineup
Game_Lineup = pd.read_table('NBA Hackathon - Game Lineup Data Sample (50 Games).txt', sep='\t', header = 0)
Game_Lineup.head()

Unnamed: 0,Game_id,Period,Person_id,Team_id,status
0,021fd159b55773fba8157e2090fe0fe2,1,881f83d2dee3f18c7d1751659406144e,012059d397c0b7e5a30a5bb89c0b075e,A
1,021fd159b55773fba8157e2090fe0fe2,1,27ea17a8685c4919f157e83fe9cb2d9e,cff694c8186a4bd377de400e4f60fe47,A
2,021fd159b55773fba8157e2090fe0fe2,1,57bbd7e30bc694aeee9ee40c583e6811,cff694c8186a4bd377de400e4f60fe47,A
3,021fd159b55773fba8157e2090fe0fe2,1,cec898a1d355dbfbad8c760615fde1af,012059d397c0b7e5a30a5bb89c0b075e,A
4,021fd159b55773fba8157e2090fe0fe2,1,33963fe856a1523ff46438ba07d1d99f,cff694c8186a4bd377de400e4f60fe47,A


### Load 'NBA Hackathon - Play by Play Data Sample (50 Games).txt'

This dataset provides play by play information on the event level for each game.

To properly sort the events in a game use the following sequence of sorted columns: Period (ascending), PC_Time (descending), WC_Time (ascending), Event_Number (ascending)
* Event_Number – An ordered counter for each event in a game. Note, this number may not be perfectly sequential so please use the sorting methodology outlined above
* Event_Msg_Type, Action_Type – Coded descriptions of what happened during the event
* WC_Time – The in-arena time of the event in Unix format. It is coded in tenths of a second.
* PC_Time – The time on the game clock in tenths of a second (e.g. 7200 corresponds to 720 seconds/12 minutes remaining in the quarter)
* Option 1 – On a shot attempt, this column will tell you the point value of the shot
    * On free throw attempts, if the value in this column is 1, it means it was a made free throw, otherwise, it was missed.
* Person1, Person2 – The person_ids of the players who are directly associated with the event (e.g. If the event is an assisted made basket, Person1 is the shot maker and Person2 is the player who assisted)
    * In the case of a substitution, the Event_Msg_Type will be 8, Person1 will be the ID for the player leaving the game, and Person2 will be the ID for the player entering the game.
* Team_id – The team_id associated with Person1

In [4]:
# read in Play_by_Play
Play_by_Play = pd.read_table('NBA Hackathon - Play by Play Data Sample (50 Games).txt', sep='\t', header = 0)

# +/- doesn't depend on missed shots (2), rebounds (4), turnovers (5), fouls (6), 
#     violations (7), timeouts (9), jump balls (10), or ejections (11)
# +/- depends on made shots (1), free throws (3), and substitutions (8)
Play_by_Play = Play_by_Play[(Play_by_Play.Event_Msg_Type==1) | (Play_by_Play.Event_Msg_Type==3) | (Play_by_Play.Event_Msg_Type==8)]

# add descriptors to Play_by_Play
Play_by_Play = pd.merge(Play_by_Play, Event_Codes, on=['Event_Msg_Type', 'Action_Type'])

# sort Play_by_Play
Play_by_Play = Play_by_Play.sort_values(['Game_id', 'Period', 'PC_Time', 'Event_Msg_Type', 'WC_Time', 'Event_Num'], ascending=[True, True, False, True, True, True]).reset_index(drop=True)

Play_by_Play.head()

Unnamed: 0,Game_id,Event_Num,Event_Msg_Type,Period,WC_Time,PC_Time,Action_Type,Option1,Option2,Option3,Team_id,Person1,Person2,Team_id_type,Event_Msg_Type_Description,Action_Type_Description
0,021fd159b55773fba8157e2090fe0fe2,9,1,1,547220,6740,49,2,0,0,012059d397c0b7e5a30a5bb89c0b075e,a99f44bbff39e352191a870e17f04537,881f83d2dee3f18c7d1751659406144e,2,Made Shot,Driving Dunk Shot
1,021fd159b55773fba8157e2090fe0fe2,10,1,1,547395,6580,1,2,0,0,cff694c8186a4bd377de400e4f60fe47,57bbd7e30bc694aeee9ee40c583e6811,c00264c3114d23bac482e9de50fb7d28,3,Made Shot,Jump Shot
2,021fd159b55773fba8157e2090fe0fe2,17,1,1,547782,6190,97,2,0,0,012059d397c0b7e5a30a5bb89c0b075e,89706b99ddd00dc05d37ef5cafc04276,6bcf6c1f8c373d25fca1579bc4464a91,2,Made Shot,Tip Layup Shot
3,021fd159b55773fba8157e2090fe0fe2,19,1,1,547964,6000,1,2,0,0,cff694c8186a4bd377de400e4f60fe47,57bbd7e30bc694aeee9ee40c583e6811,307beab25b1021a548b4a47550bc4b25,3,Made Shot,Jump Shot
4,021fd159b55773fba8157e2090fe0fe2,22,1,1,548345,5620,1,2,0,0,cff694c8186a4bd377de400e4f60fe47,57bbd7e30bc694aeee9ee40c583e6811,6bcf6c1f8c373d25fca1579bc4464a91,3,Made Shot,Jump Shot


### Create Helper Functions

In [5]:
def create_game_plusminus(game_id):
    """Given a game_id, will return dataframe with game_id, person_id, and plusminus. This will be appended to answer df"""
    
    # find team_id that play in game_id
    teams = Game_Lineup[Game_Lineup.Game_id==game_id].Team_id.drop_duplicates()
    
    # create dataframe of plusminus
    plusminus = pd.DataFrame(columns=['Game_id', 'Team_id', 'Person_id', 'Player_PlusMinus'])
    
    # for each team
    for team_id in teams:
        team_players = pd.unique(Game_Lineup[(Game_Lineup.Game_id==game_id) & (Game_Lineup.Team_id==team_id)]['Person_id'].values.ravel('K'))
        team_players = np.append(team_players, pd.unique(Play_by_Play[(Play_by_Play.Game_id==game_id) & (Play_by_Play.Team_id==team_id) &
                                                                  (Play_by_Play.Event_Msg_Type==8)][['Person1', 'Person2']].values.ravel('K')))
        team_players = np.unique(team_players)
        team_plusminus = pd.DataFrame(columns=['Game_id', 'Team_id', 'Person_id', 'Player_PlusMinus'])
        team_plusminus.Person_id = team_players
        team_plusminus.Game_id = game_id
        team_plusminus.Team_id = team_id
        team_plusminus['Player_Plus/Minus'] = 0
        plusminus = plusminus.append(team_plusminus)
    return plusminus

In [6]:
def create_sub_answer(game_id):
    
    '''sub_answer creates dataframe for a game'''
    
    sub_answer = pd.DataFrame(columns=['Game_id', 'Team_id', 'Person_id', 'Player_PlusMinus'])
    teams = Game_Lineup[Game_Lineup.Game_id==game_id].Team_id.drop_duplicates()
    for team_id in teams:
        temp = pd.DataFrame(columns=['Game_id', 'Team_id', 'Person_id', 'Player_PlusMinus'])
        temp.Person_id = pd.unique(Game_Lineup[(Game_Lineup.Game_id==game_id) & (Game_Lineup.Team_id==team_id)]['Person_id'].values.ravel('K'))
        temp.Game_id = game_id
        temp.Team_id = team_id
        temp.Player_PlusMinus = 0
        sub_answer = sub_answer.append(temp, ignore_index = True)
        sub_answer = sub_answer.sort_values('Team_id', ascending=True).reset_index(drop=True)
    return sub_answer

In [7]:
def create_new_lineup(sub_answer, period):
    
    '''Create new lineup from Game_Lineup'''
    
    # find info on game
    lineup_arr = sub_answer[['Person_id']].copy()
    
    # mask players that played in the period, 0/1
    player_ids = Game_Lineup[(Game_Lineup.Game_id==game_id) & (Game_Lineup.Period==period)].Person_id
    lineup_arr['mask'] = np.where(lineup_arr.Person_id.isin(player_ids), 1, 0)
    
    # mask the players on different teams, 1/-1
    lineup_arr['mask'] = lineup_arr['mask'].multiply(np.where(sub_answer.Team_id==sub_answer.Team_id[0], 1, -1))
    
    # return the lineup_arr containing the player ID's and status
    return lineup_arr

In [8]:
def change_player_lineup(sub_answer, current_player_lineup, game_id, person1, person2):
    """Changes the current player lineup to reflect a substitution. Person1 is leaving, Person2 is entering."""

    # if person2 is in the player lineup
    if any(current_player_lineup.Person_id==person2):
        
        # find indices of person1 and person2 in current_player_lineup
        person1_idx = current_player_lineup.loc[current_player_lineup.Person_id == person1].index[0]
        person2_idx = current_player_lineup.loc[current_player_lineup.Person_id == person2].index[0]
        
        # switch the mask number
        new_player_lineup = current_player_lineup
        new_player_lineup.at[person2_idx, 'mask'] = new_player_lineup['mask'][person1_idx]
        new_player_lineup.at[person1_idx, 'mask'] = 0
    
    # if person2 is not in the player lineup
    else:
        
        # add person2 to subanswer
        person1_team = sub_answer.loc[(sub_answer.Person_id == person1) & (sub_answer.Game_id==game_id), 'Team_id'].iloc[0]
        sub_answer = sub_answer.append(pd.DataFrame([[game_id, person1_team, person2, 0]], columns=['Game_id', 'Team_id', 'Person_id', 'Player_PlusMinus']))
        
        # add person2 to the player lineup 
        current_player_lineup = current_player_lineup.append(pd.DataFrame([[person2, 0]], columns=['Person_id', 'mask']))
        
        # re-index and re-sort
        sub_answer = sub_answer.sort_values(['Game_id', 'Team_id', 'Person_id'], ascending=[True, True, True]).reset_index(drop=True)
        current_player_lineup = current_player_lineup.reset_index(drop=True)
        
        # find indices of person1 and person2 in current_player_lineup
        person1_idx = current_player_lineup.loc[current_player_lineup.Person_id == person1].index[0]
        person2_idx = current_player_lineup.loc[current_player_lineup.Person_id == person2].index[0]

        # switch the mask number
        new_player_lineup = current_player_lineup
        new_player_lineup.at[person2_idx, 'mask'] = new_player_lineup['mask'][person1_idx]
        new_player_lineup.at[person1_idx, 'mask'] = 0
    
    #return the new lineup
    return sub_answer, new_player_lineup

In [9]:
def add_plusminus(sub_answer, team1_scored, team2_scored, player_lineup):
    '''Calculates the plusminus'''
    for index, row in player_lineup.iterrows():
        idx = sub_answer.loc[sub_answer.Person_id==row['Person_id']].index.values[0]
        sub_answer.iloc[idx].Player_PlusMinus = sub_answer.iloc[idx].Player_PlusMinus + team1_scored*row['mask']-team2_scored*row['mask']

### Analyze

Create the answers dataframe, which we will submit at the end

In [10]:
answer = pd.DataFrame(columns=['Game_id', 'Team_id', 'Person_id', 'Player_PlusMinus'])

Calculate +/- for each game

In [11]:
# will calculate +/- for each period played in each game
unique_games = Play_by_Play['Game_id'].drop_duplicates().reset_index(drop = True)

# for each game
for game_id in unique_games:

    # find number of periods
    num_periods = len(Game_Lineup[Game_Lineup.Game_id == game_id]['Period'].drop_duplicates())
    
    # create array that represents the total +/- of the players for this game. should all be zeros
    sub_answer = create_sub_answer(game_id)
#     print("Created sub_answer for this game. Should be all 0s. Displayed below:")
#     display(sub_answer)
    
    # find the relevant data for this game
    game_data = Play_by_Play[Play_by_Play.Game_id==game_id]
    
    # for each period
    for period in range(1, num_periods+1):

        # find the relevant data for this period
        game_period_data = game_data[game_data.Period == period].reset_index()
#         print("Shown below is the relevant data for period " + str(period))
#         display(game_period_data)
        
        # create lineup that represents players on the court at start of each period
        player_lineup = create_new_lineup(sub_answer, period)
#         print('created new lineup for the period')
#         display(player_lineup)

        # find all substitutions indices to chunk up the data
        substitutions = [0] + game_period_data.index[game_period_data['Event_Msg_Type'] == 8].tolist() + [len(game_period_data)]
#         display(substitutions)

        # for each chunk
        for sub in range(0, len(substitutions)-1):
            
            # find the data from that chunk
            chunk_data = game_period_data[substitutions[sub]:substitutions[sub+1]]
#             print("Below is the chunk_data")
#             display(chunk_data)
            
            # change player_lineup bc sub
            substitution_row = game_period_data.loc[substitutions[sub]]
            if substitution_row.Event_Msg_Type==8:
                sub_answer, player_lineup = change_player_lineup(sub_answer, player_lineup, game_id, substitution_row.Person1, substitution_row.Person2)
#                 print("exchanged " + substitution_row.Person1 + " for " + substitution_row.Person2)
#                 print("Shown below is the new player_lineup")
#                 display(player_lineup)
                
            # calculate the plus/minus of that chunk and add to sub_answer
            team1_scored = chunk_data[chunk_data.Team_id==sub_answer.Team_id[0]].Option1.sum()
            team2_scored = chunk_data[chunk_data.Team_id!=sub_answer.Team_id[0]].Option1.sum()
            add_plusminus(sub_answer, team1_scored, team2_scored, player_lineup)
#             print("Team1 scored " + str(team1_scored) + " and Team2 scored " + str(team2_scored))
#             print("Calculated the +/- and added it to sub_answer")
#             display(sub_answer)
            
    answer = answer.append(sub_answer, ignore_index=True)


In [12]:
# Subset of answer
answer = answer[['Game_id', 'Person_id', 'Player_PlusMinus']]

### Save to .csv

In [13]:
# Save to .csv
answer.to_csv('airballs_Q1_BBALL.csv', index=False)