# NBA Tracking - Sprint Speed
This notebook is a proof of concept for working with frame-by-frame tracking data and in particular, applying MLB Statcast's "sprint speed" metric ([see article](http://tangotiger.com/index.php/site/article/statcast-lab-introducing-sprint-speed)) to the players in an NBA game

In [1]:
# read in data
import json
import scipy.spatial.distance as spdist
import pandas as pd
import numpy as np

nba_fn = '0021500492.json'

nba_fo = open(nba_fn)
nba = json.load(nba_fo)

The data format of NBA tracking data is a JSON file which is similar to the dictionary structure in Python. Reading the file in results in a dictionary object whose values are the game date, an ID for the game and a list of "events" each of which is a dictionary which represents (I think) the data for a given possession. See [this blog post](http://savvastjortjoglou.com/nba-play-by-play-movements.html) for more info on the structure of the tracking data

In [3]:
print (type(nba))

<type 'dict'>


In [4]:
print nba.keys()

[u'gamedate', u'gameid', u'events']


In parsing this and computing speed numbers, it was apparent that some events contain no frame data, so filter them out from a list of the events

In [5]:
game_events = [i for i in nba['events'] if len(i['moments']) != 0]
# columns for each frame of location data
mom_columns = ['team_id','player_id','x','y','z']
# metadata columns for each event
mom_attr = ['quarter','unix_ts','game_clock_sec','shot_clock_sec','extra']

Function which calculates the max distance travelled for all 30-second windows in an event, grouped by player. Also can adjust the length of the window to tweak the definition of sprint speed (e.g. 3 or 4/5 of a second instead of 1 sec)

In [8]:
def sprint_speed(frame_df,frames_window=25):
    # calculate distance moved for each player in each frame
    prev_x = frame_df.groupby(level=2).x.shift(1)
    prev_y = frame_df.groupby(level=2).y.shift(1)

    # there are some frames where the clock is stopped and people jump around a bit within events
    # filter these out by making sure distance moved is calculated only if the clock is running.
    clock_run = (frame_df.game_clock_sec != frame_df.groupby(level=2).game_clock_sec.shift(-1))
    frame_df['dist_moved'] = np.sqrt(np.square(frame_df.x - prev_x) + np.square(frame_df.y - prev_y))
    frame_df.loc[~(clock_run),'dist_moved'] = np.nan
    frame_df['speed'] = frame_df['dist_moved'] * 25

    # calculate most distance moved over 1 second window (MLB sprint speed)
    f = lambda x: x.rolling(frames_window).sum()
    spr_speed = frame_df.groupby(level=2).dist_moved.apply(f)
    max_fps = (pd.DataFrame(spr_speed.groupby(level=2).max() * (25.0 / frames_window))
                .assign(event = e_i)
                .join(player_df)
                .reset_index()
                .set_index(['event','player_id']))
    return max_fps

Loop through the events and parse the moment data, calculate the best sprint speed over a 1 second window, and concatenate that data to a dataframe (this takes a while and will print 400ish lines as it iterates) 

In [9]:
event_max = pd.DataFrame()

for e_i, ev in enumerate(game_events):
    print e_i
    ev_frames = pd.DataFrame()
    ev_moments = [i for i in ev['moments']]
    ev_players = ev['home']['players'] + ev['visitor']['players']
    player_df = pd.DataFrame(ev_players).set_index('playerid')
    for m_i, ev_m in enumerate(ev_moments):
        ev_player_m = ev_m[5]
        attr_vals = ev_m[:5]
        ev_mdf = pd.DataFrame(ev_player_m,columns=mom_columns)
        ev_m_attrs = pd.DataFrame(map(lambda x: attr_vals,ev_player_m),columns=mom_attr)
        ev_m_data = pd.concat([ev_mdf,ev_m_attrs],axis=1)
        # ev_m_data['ev_moment'] = '_'.join((str(e_i),str(m_i)))
        ev_m_data['event'] = e_i
        ev_m_data['moment'] = m_i
        ev_frames = pd.concat([ev_frames,ev_m_data.set_index(['event','moment','player_id'])],axis=0)
    ev_frames.sort_index(inplace=True)
    
    # could also calc distance from the ball in each frame and mess with speed on/off ball
    # frame_df = (pd.merge(frame_df.reset_index()
    #                     , ball_locs
    #                     , left_on=['event','moment']
    #                     , right_index=True
    #                     , suffixes=['','_ball'])
    #               .set_index(['event','moment','player_id']))
    # 
    # frame_df['ball_dist'] = np.sqrt(
    #                             np.square(frame_df.x - frame_df.x_ball) +
    #                             np.square(frame_df.y - frame_df.y_ball)
    #                         )
    
    ev_ss = sprint_speed(ev_frames)
    event_max = pd.concat([event_max,ev_ss])
    

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

Now `events_max` holds the max distance moved over a 1-second window (sprint speed) grouped by player and event. We can summarize by looking at each player's deciles for sprint speed. Pretty reasonable values overall. I think this window is too big to say much about ball speed though, as a 1-second window probably involves someone holding the ball or dribbling around/in-place a lot of the time

In [18]:
# avg sprint speed by player (+ball) across all events (feet/second)
event_max.groupby(level=1).dist_moved.mean().sort_values()

player_id
 201949     12.529066
 201946     12.641452
 201942     12.736357
 202335     12.777226
 202687     12.868852
 200768     12.900366
 203148     13.141373
 202709     13.250354
 201960     13.267593
 203082     13.286106
 201587     13.659056
 202689     13.829119
 202685     13.834249
 203087     14.018271
 1626163    14.101544
 203798     14.232963
 2449       14.884475
 203469     15.108584
 101107     15.331326
-1          29.994701
Name: dist_moved, dtype: float64

In [19]:
# player sprint speeds sorted by 9th decile (feet/second)
(event_max.reset_index()
        .groupby(['player_id','firstname','lastname'])
        .dist_moved.quantile(np.arange(0.1,1,0.1))
        .unstack().sort_values(0.9))

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9
player_id,firstname,lastname,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
201949,James,Johnson,10.582506,11.319458,11.896244,12.373454,12.644536,12.644536,12.866792,13.675345,14.92502
201942,DeMar,DeRozan,7.377614,9.813789,11.059613,12.843186,13.755189,14.351686,15.163799,15.994548,16.644875
202335,Patrick,Patterson,9.087915,10.275899,11.43338,12.035108,13.087133,13.538975,14.154991,15.466262,17.071844
200768,Kyle,Lowry,7.807444,10.023057,11.201471,12.492558,12.994817,13.747406,14.651598,16.133451,17.179131
201946,Tyler,Hansbrough,5.531378,8.176309,10.83395,13.316458,13.911914,14.740447,15.716749,16.266886,17.319034
202687,Bismack,Biyombo,6.874043,8.476644,10.186788,13.086877,14.174443,14.544412,15.362892,16.008535,17.422042
202709,Cory,Joseph,8.697169,9.937864,12.149085,12.710476,13.61431,14.071064,15.119002,16.290007,17.613275
201587,Nicolas,Batum,8.711772,11.383378,12.417168,13.442992,14.122165,15.02868,15.605203,16.369333,17.875923
202685,Jonas,Valanciunas,7.929556,10.416066,13.044748,13.887565,14.505652,15.525537,16.077205,17.01606,18.223899
203148,Brian,Roberts,7.968156,9.963497,11.617567,12.404818,13.201854,14.451626,14.996579,15.720249,18.423284
