# Timeouts in the NBA

- The data set includes play-by-play for every regular season game from 2001 to 2018 and was downloaded [here](https://eightthirtyfour.com/).

- The logos I got from [here](http://www.stickpng.com/cat/sports/basketball/nba-teams)

- The colors I got from [here](https://teamcolorcodes.com/nba-team-color-codes/)

- My goal here is to understand the causes and effects of coach timeouts by analyzing the scores of adjacent minutes.

- Explain here the structure of the file, explain each column (i know...).

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import datetime
from scipy.optimize import curve_fit
import plotly
plotly.tools.set_credentials_file(username='Zaca', api_key='UhceZt0NrnTH2RHzSgFE')
import plotly.plotly as py
import plotly.graph_objs as go

# This is for plotting while prototyping. The limit of 100 uploads/saves per 24hh can be annyoing.
# When we want to upload comment these 2 lines.
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)


# Select season to load
season = '2018-2019'
filename = 'C:/Users/Zaca/Dropbox/data_science/events_{season}_pbp.csv'.format(season=season)

data = pd.read_csv(filename)
print('List of columns loaded: ', list(data))

# The total games in a 82-game season (non lockout) is 1230. 
# Each of the 30 NBA teams play 82 games in a season: (30*82)/2
# Divided by two because there are two teams involved in each game.
unique_game_ids = np.unique(data['GAME_ID'])
print('Total games found: ', len(unique_game_ids))


Columns (71,89,90) have mixed types. Specify dtype option on import or set low_memory=False.



List of columns loaded:  ['Unnamed: 0', 'EVENTMSGACTIONTYPE', 'EVENTMSGTYPE', 'EVENTNUM', 'GAME_ID', 'HOMEDESCRIPTION', 'NEUTRALDESCRIPTION', 'PCTIMESTRING', 'PERIOD', 'PERSON1TYPE', 'PERSON2TYPE', 'PERSON3TYPE', 'PLAYER1_ID', 'PLAYER1_NAME', 'PLAYER1_TEAM_ABBREVIATION', 'PLAYER1_TEAM_CITY', 'PLAYER1_TEAM_ID', 'PLAYER1_TEAM_NICKNAME', 'PLAYER2_ID', 'PLAYER2_NAME', 'PLAYER2_TEAM_ABBREVIATION', 'PLAYER2_TEAM_CITY', 'PLAYER2_TEAM_ID', 'PLAYER2_TEAM_NICKNAME', 'PLAYER3_ID', 'PLAYER3_NAME', 'PLAYER3_TEAM_ABBREVIATION', 'PLAYER3_TEAM_CITY', 'PLAYER3_TEAM_ID', 'PLAYER3_TEAM_NICKNAME', 'SCORE', 'SCOREMARGIN', 'VISITORDESCRIPTION', 'WCTIMESTRING', 'HOME_TEAM', 'AWAY_TEAM', 'HOME_SCORE', 'AWAY_SCORE', 'TIME', 'TEAM', 'TYPE', 'SUB_TYPE', 'ASSIST_PLAYER_ID', 'ASSIST_COUNT', 'BLOCK_PLAYER_ID', 'BLOCK_COUNT', 'REBOUND_PLAYER_ID', 'REBOUND_TEAM', 'REBOUND_OFFENSIVE_COUNT', 'REBOUND_DEFENSIVE_COUNT', 'JUMP_BALL_HOME_PLAYER_ID', 'JUMP_BALL_AWAY_PLAYER_ID', 'JUMP_BALL_RETRIEVED_PLAYER_ID', 'SUB_ENTERED_

In [9]:
print(data['TYPE'])

0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
5        NaN
6        NaN
7        NaN
8        NaN
9        NaN
10       NaN
11       NaN
12       NaN
13       NaN
14       NaN
15       NaN
16       NaN
17       NaN
18       NaN
19       NaN
20       NaN
21       NaN
22       NaN
23       NaN
24       NaN
25       NaN
26       NaN
27       NaN
28       NaN
29       NaN
          ..
582438   NaN
582439   NaN
582440   NaN
582441   NaN
582442   NaN
582443   NaN
582444   NaN
582445   NaN
582446   NaN
582447   NaN
582448   NaN
582449   NaN
582450   NaN
582451   NaN
582452   NaN
582453   NaN
582454   NaN
582455   NaN
582456   NaN
582457   NaN
582458   NaN
582459   NaN
582460   NaN
582461   NaN
582462   NaN
582463   NaN
582464   NaN
582465   NaN
582466   NaN
582467   NaN
Name: TYPE, Length: 582468, dtype: float64


# Data cleaning
- The time left in quarter column (PCTIMESTRING) is in m:s format and only relative to its own quarter. To simplify analysis in the future we change this into an integer seconds format and created and additional column of time left in the game.

In [25]:
# Change min:secs to seconds
def convert_time(times):
    mins, secs = map(int, times.split(':'))
    return 60*mins + secs

data['PCTIMESTRING'] = data['PCTIMESTRING'].apply(convert_time)

# OTs are considered as repetitions of the 4th quarters in global time. 
data['GLOBALTIMELEFT'] = data['PCTIMESTRING'] + (720*(4-data['PERIOD'].clip(upper=4)))

- The SCORE column is only updated in the raw data when there is a bucket and is on a string '34-36' format. To clean it up, we are creating three additional columns 'HOMESCORE', 'VISITORSCORE' and 'SCOREMARGIN'

In [26]:
for x in unique_game_ids:
    # This fills up empty values of events when the ball didn't go in...
    data.loc[data['GAME_ID'] == x, 'SCORE'] = data['SCORE'].loc[data['GAME_ID'] == x].fillna(method='ffill').replace(np.nan, '0 - 0')

# ...and extracts home and visitor score.
temp_score = data['SCORE'].str.split(" - ", expand=True).astype('int32')
data['HOMESCORE'] = temp_score[1]
data['VISITORSCORE'] = temp_score[0]

# finally calculating the score margin (home - away)
data['SCOREMARGIN'] = data['HOMESCORE'] - data['VISITORSCORE']

# Replace NaNs in the DESCRIPTION columns with empty strings
data['HOMEDESCRIPTION'] = data.HOMEDESCRIPTION.fillna('')
data['VISITORDESCRIPTION'] = data.VISITORDESCRIPTION.fillna('')

# Structuring our data
- I'm not an experto on this, but my intuition tells me there's a few things I'm gonna need before I keep going.
- I need some form of processed tables that include helpful information.
- For now I'm only gonna make a table where each row is a game, and another where each row is a team.

In [27]:
# Piggybacking oh the game count, I'm gonna create an auxialiary info DataFrame that contains only onw row per game.
per_game_data = pd.DataFrame(data= unique_game_ids, columns=['GAME_ID'])

# Get list of teams. It actually seems to me that timeouts is the way to go here since it says the team that asked for it.
# We get the string where Timeout occurs, get the first word, lower case it, and get only unique appearances.
team_list = data['HOMEDESCRIPTION'].loc[data['HOMEDESCRIPTION'].str.contains("Timeout", na=False)].str.split().str.get(0).str.lower().unique()

# Since we're at this, let's also make a table with team stats.
team_data = pd.DataFrame(index=team_list)

# Let's for now, just manually fill it with teams full names

team_data.loc['hawks', 'FULL_NAME'] = 'Atlanta Hawks'
team_data.loc['celtics', 'FULL_NAME'] = 'Boston Celtics'
team_data.loc['mavericks', 'FULL_NAME'] = 'Dallas Mavericks'
team_data.loc['timberwolves', 'FULL_NAME'] = 'Minnesota Timberwolves'
team_data.loc['warriors', 'FULL_NAME'] = 'Golden State Warriors'
team_data.loc['heat', 'FULL_NAME'] = 'Miami Heat'
team_data.loc['grizzlies', 'FULL_NAME'] = 'Memphis Grizzlies'
team_data.loc['76ers', 'FULL_NAME'] = 'Philadelphia 76ers'
team_data.loc['hornets', 'FULL_NAME'] = 'Charlotte Hornets'
team_data.loc['suns', 'FULL_NAME'] = 'Phoenix Suns'
team_data.loc['clippers', 'FULL_NAME'] = 'Los Angeles Clippers'
team_data.loc['thunder', 'FULL_NAME'] = 'Oklahoma City Thunder'
team_data.loc['knicks', 'FULL_NAME'] = 'New York Knicks'
team_data.loc['nuggets', 'FULL_NAME'] = 'Denver Nuggets'
team_data.loc['nets', 'FULL_NAME'] = 'Brooklyn Nets'
team_data.loc['pistons', 'FULL_NAME'] = 'Detroit Pistons'
team_data.loc['raptors', 'FULL_NAME'] = 'Toronto Raptors'
team_data.loc['cavaliers', 'FULL_NAME'] = 'Cleveland Cavaliers'
team_data.loc['wizards', 'FULL_NAME'] = 'Washington Wizards'
team_data.loc['jazz', 'FULL_NAME'] = 'Utah Jazz'
team_data.loc['kings', 'FULL_NAME'] = 'Sacramento Kings'
team_data.loc['bulls', 'FULL_NAME'] = 'Chicago Bulls'
team_data.loc['bucks', 'FULL_NAME'] = 'Milwaukee Bucks'
team_data.loc['trail', 'FULL_NAME'] = 'Portland Trailblazers'
team_data.loc['lakers', 'FULL_NAME'] = 'Los Angeles Lakers'
team_data.loc['pelicans', 'FULL_NAME'] = 'New Orleans Pelicans'
team_data.loc['spurs', 'FULL_NAME'] = 'San Antonio Spurs'
team_data.loc['rockets', 'FULL_NAME'] = 'Houston Rockets'
team_data.loc['pacers', 'FULL_NAME'] = 'Indiana Pacers'
team_data.loc['magic', 'FULL_NAME'] = 'Orlando Magic'

# Because we're gonna be working with 30 categories now we need to customize the colors.
# I think team colors are gonna work best, luckily there are websites that give you hexcodes for each team.
# I just had to go copy paste them manually (its just 30 thought, probably not worth automating)

team_data.loc['hawks', 'COLOR'] = '#E03A3E'
team_data.loc['celtics', 'COLOR'] = '#007A33'
team_data.loc['mavericks', 'COLOR'] = '#00538C'
team_data.loc['timberwolves', 'COLOR'] = '#0C2340'
team_data.loc['warriors', 'COLOR'] = '#FDB927'
team_data.loc['heat', 'COLOR'] = '#98002E'
team_data.loc['grizzlies', 'COLOR'] = '#5D76A9'
team_data.loc['76ers', 'COLOR'] = '#006BB6'
team_data.loc['hornets', 'COLOR'] = '#00788C'
team_data.loc['suns', 'COLOR'] = '#1D1160'
team_data.loc['clippers', 'COLOR'] = '#C8102E'
team_data.loc['thunder', 'COLOR'] = '#007AC1'
team_data.loc['knicks', 'COLOR'] = '#006BB6'
team_data.loc['nuggets', 'COLOR'] = '#0E2240'
team_data.loc['nets', 'COLOR'] = '#000000'
team_data.loc['pistons', 'COLOR'] = '#C8102E'
team_data.loc['raptors', 'COLOR'] = '#CE1141'
team_data.loc['cavaliers', 'COLOR'] = '#6F263D'
team_data.loc['wizards', 'COLOR'] = '#002B5C'
team_data.loc['jazz', 'COLOR'] = '#002B5C'
team_data.loc['kings', 'COLOR'] = '#5A2D81'
team_data.loc['bulls', 'COLOR'] = '#CE1141'
team_data.loc['bucks', 'COLOR'] = '#00471B'
team_data.loc['trail', 'COLOR'] = '#E03A3E'
team_data.loc['lakers', 'COLOR'] = '#552583'
team_data.loc['pelicans', 'COLOR'] = '#0C2340'
team_data.loc['spurs', 'COLOR'] = '#C4CED4'
team_data.loc['rockets', 'COLOR'] = '#CE1141'
team_data.loc['pacers', 'COLOR'] = '#FDBB30'
team_data.loc['magic', 'COLOR'] = '#0077C0'

# Links for the logos!
    
team_data.loc['hawks', 'LOGO'] = 'https://i.ibb.co/dthFtZ6/hawks.png'
team_data.loc['celtics', 'LOGO'] = 'https://i.ibb.co/gPWLh3N/celtics.png'
team_data.loc['mavericks', 'LOGO'] = 'https://i.ibb.co/BCwN3dt/mavericks.png'
team_data.loc['timberwolves', 'LOGO'] = 'https://i.ibb.co/LPB0N1j/timberwolves.png'
team_data.loc['warriors', 'LOGO'] = 'https://i.ibb.co/vkdCkks/warriors.png'
team_data.loc['heat', 'LOGO'] = 'https://i.ibb.co/rfcK6bZ/heat.png'
team_data.loc['grizzlies', 'LOGO'] = 'https://i.ibb.co/4fVM3mt/grizzlies.png'
team_data.loc['76ers', 'LOGO'] = 'https://i.ibb.co/stY6xWK/76ers.png'
team_data.loc['hornets', 'LOGO'] = 'https://i.ibb.co/CsCR1cq/hornets.png'
team_data.loc['suns', 'LOGO'] = 'https://i.ibb.co/n0MHQhs/suns.png'
team_data.loc['clippers', 'LOGO'] = 'https://i.ibb.co/cTG6NSp/clippers.png'
team_data.loc['thunder', 'LOGO'] = 'https://i.ibb.co/TgPjZMv/thunder.png'
team_data.loc['knicks', 'LOGO'] = 'https://i.ibb.co/HNVrj2B/knicks.png'
team_data.loc['nuggets', 'LOGO'] = 'https://i.ibb.co/vHYY9yz/nuggets.png'
team_data.loc['nets', 'LOGO'] = 'https://i.ibb.co/zJnKfJV/nets.png'
team_data.loc['pistons', 'LOGO'] = 'https://i.ibb.co/C5hFw8r/pistons.png'
team_data.loc['raptors', 'LOGO'] = 'https://i.ibb.co/ynSsVd0/raptors.png'
team_data.loc['cavaliers', 'LOGO'] = 'https://i.ibb.co/mXCDk7q/cavaliers.png'
team_data.loc['wizards', 'LOGO'] = 'https://i.ibb.co/WyjnsGW/wizards.png'
team_data.loc['jazz', 'LOGO'] = 'https://i.ibb.co/JvhV0d9/jazz.png'
team_data.loc['kings', 'LOGO'] = 'https://i.ibb.co/JcGsj0B/kings.png'
team_data.loc['bulls', 'LOGO'] = 'https://i.ibb.co/pvP5P7Q/bulls.png'
team_data.loc['bucks', 'LOGO'] = 'https://i.ibb.co/4gf3Xtn/bucks.png'
team_data.loc['trail', 'LOGO'] = 'https://i.ibb.co/dryFPBj/trail.png'
team_data.loc['lakers', 'LOGO'] = 'https://i.ibb.co/RhqvZ2V/lakers.png'
team_data.loc['pelicans', 'LOGO'] = 'https://i.ibb.co/y5CHptN/pelicans.png'
team_data.loc['spurs', 'LOGO'] = 'https://i.ibb.co/Z21VDKx/spurs.png'
team_data.loc['rockets', 'LOGO'] = 'https://i.ibb.co/TBVgxxL/rockets.png'
team_data.loc['pacers', 'LOGO'] = 'https://i.ibb.co/ZNxxgJp/pacers.png'
team_data.loc['magic', 'LOGO'] = 'https://i.ibb.co/2g8Z9Mw/magic.png'
