# SPL Open Data Usage

Welcome to the MLSE Sport Performance Lab (SPL) Open data repository! See below for a quick explanation on how best to load into this data into a notebook and do some basic visualization.

### File structure

Assuming that you have cloned the repository, the naming convention of the files is as follows. The data in this repository is structured in the following tree:

```
[sport]/
├─ [action_type]/
│  ├─ participant_information.json
│  ├─ [participant_id]/
│  │  ├─ trial_data/
|  |     ├─ [trial_id].json
```

where, trial data is unique to each individual participant and anonymized, demographic information relating to all participants is referenced in the `participant_information.json` file.

In this notebook, we will investigate how to load and work with the basketball free throw data from participant ```P0001```.

### Import Packages

In [None]:
import json
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from animate import animate_trial

### Free throw data 

The free throws are divided into "trials" which each contain a single free throw. These were captured using SPL's internal motion capture setup, and are provided at a frame rate of 30fps. They contain 25 body-pose keypoints corresponding to the participant, as well as a limited amount of ball data, from the initial dribbles to shortly after release. To open a particular free throw trial, we can use the `json` library in python:

In [None]:
for trial_number in range(1,126):

    trial_id = str(trial_number).zfill(4)

    with open(f'./data/P0001/BB_FT_P0001_T{trial_id}.json') as json_file:
        free_throw_data = json.load(json_file)
        break

We can then print the results of ```free_throw_data``` to see its contents:

In [None]:
free_throw_data

The `json` contains shot metadata and a list of dictionaries, where each dictionary of the list corresponds to a single frame of data. The keys of the dictionary are as follows:
1. `frame`: The frame number, starting from 0.
2. `time`: The time in milliseconds, since the beginning of the trial. Since the frame rate is 30fps, the difference between subsequent frames' times will be 33 or 34 milliseconds (due to rounding).
3. `landing_x`: Landing position x coordinate on hoop plane.
4. `landing_y`: Landing position y coordinate on hoop plane.
5. `entry_angle`: Angle at which ball breaks (enters) hoop plane.
6. `tracking`: A dictionary that contains the following `xyz` data corresponding to the frame:
   1. `ball`: The `x`, `y`, and `z` coordinates of the ball center. 
   2. `player`: A dictionary containing `x`, `y`, and `z` coordinates for all of the person's keypoints. Each key in this dictionary corresponds to a specific keypoint.  

### Animating A Free Throw

To visualize what this data look like, we can use the animate_trial() function in animate.py. This creates a 3D animation of the free throw data provided with fully customizable views and arguments.

In [None]:
animate_trial(f'./data/P0001/BB_FT_P0001_T0001.json', azim=290, player_color='green')

Let's start looking at all 125 trials now instead of just focusing on one, what are some research questions we can ask using this data?

### Plotting Ball Landing Positions

One of the neat things we can do with our data is plot the ball landing positions for each shot. Let's make some histograms to show the distributions of these landing positions.

In [None]:
landing_x = []
landing_y = []

fig, ax = plt.subplots(ncols=2, figsize=(12,6))

for trial_number in range(1,126):

    trial_id = str(trial_number).zfill(4)

    with open(f'./data/P0001/BB_FT_P0001_T{trial_id}.json') as json_file:
        free_throw_data = json.load(json_file)
    
    landing_x.append(free_throw_data['landing_x'])
    landing_y.append(free_throw_data['landing_y'])

ax[0].hist(landing_x, bins=15, color='orange', edgecolor='black')
ax[0].set_xlabel('Landing X Position (inches)')
ax[0].set_ylabel('Frequency')
ax[0].set_title('Landing X Position Distribution')

ax[1].hist(landing_y, bins=15, color='orange', edgecolor='black')
ax[1].set_xlabel('Landing Y Position (inches)')
ax[1].set_ylabel('Frequency')
ax[1].set_title('Landing Y Position Distribution')

We can also plot the ball landing positions on top of where we know the rim of the basketball hoop is. This can help determine if the participant has a bias either to the left/right of the rim or shoots the ball too short/too long:

In [None]:
fig, ax = plt.subplots(figsize=(6,6))

for trial_number in range(1,126):

    trial_id = str(trial_number).zfill(4)

    with open(f'./data/P0001/BB_FT_P0001_T{trial_id}.json') as json_file:
        free_throw_data = json.load(json_file)
    
    color = 'green' if free_throw_data['result'] == 'made' else 'red'
    ax.scatter(free_throw_data['landing_x'], free_throw_data['landing_y'], color=color, alpha=0.8, edgecolors='black', s=30)

circle = plt.Circle((0, 9), 9, color='black', fill=False)
ax.add_artist(circle)

plt.xlim([-15,15])
plt.ylim([-5,25])
plt.xlabel('Landing X Position (inches)')
plt.ylabel('Landing Y Position (inches)')
plt.title('Landing Position of Free Throws')

plt.tight_layout()
plt.show()


This type of analysis still skews very outcome-based. With the full-body pose data, we have the opportunity to look deeper into more process-based information from the free throw. Let's do that below:

### Plotting Keypoints Over Time

Given these are free throw trials, we know a lot of the power from the shot comes from the legs. To visualize this, let's see how the vertical velocity of the hip marker changes across the shot.

First, let's plot the hip vertical position for one shot:

In [None]:
for trial_number in range(1,126):

    trial_number_corrected = str(trial_number).zfill(4)

    with open(f'./data/P0001/BB_FT_P0001_T{trial_number_corrected}.json') as json_file:
        free_throw_data = json.load(json_file)

    hip_z_position = [f['data']['player']['R_HIP'][2] for f in free_throw_data['tracking']]

    plt.plot(hip_z_position)
    break

plt.xlabel('Frame')
plt.ylabel('Right Hip Vertical Position (ft)')
plt.title('Right Hip Vertical Position Over Shot')



Now, let's differentiate the position signal to get the vertical velocity of the hip in ft/s:

In [None]:
for trial_number in range(1,126):

    trial_number_corrected = str(trial_number).zfill(4)

    with open(f'./data/P0001/BB_FT_P0001_T{trial_number_corrected}.json') as json_file:
        free_throw_data = json.load(json_file)

    hip_z_velocity = np.gradient([f['data']['player']['R_HIP'][2] for f in free_throw_data['tracking']], 1/30, axis=-1)

    plt.plot(hip_z_velocity)
    break

plt.xlabel('Frame')
plt.ylabel('Right Hip Vertical Velocity (ft/s)')
plt.title('Right Hip Vertical Velocity Over Shot')



The velocity graph clearly shows there is a peak, this is the point where the participant is extending up to propel the ball, let's see how this looks for all shots:

In [None]:
for trial_number in range(1,126):

    trial_number_corrected = str(trial_number).zfill(4)

    with open(f'./data/P0001/BB_FT_P0001_T{trial_number_corrected}.json') as json_file:
        free_throw_data = json.load(json_file)

    hip_z_velocity = np.gradient([f['data']['player']['R_HIP'][2] for f in free_throw_data['tracking']], 1/30, axis=-1)

    plt.plot(hip_z_velocity)

plt.xlabel('Frame')
plt.ylabel('Right Hip Vertical Velocity (ft/s)')
plt.title('Right Hip Vertical Velocity Over All Shots')



Great, Let's now calculate the peak hip vertical velocity and see if this value relates to ball landing y-position. My thinking here is that a greater hip velocity will typically result in a farther shot, so there should be a relationship between these values.

Let's also color the shots based on wether they were made (green) or missed (red):

In [None]:
for trial_number in range(1,126):

    trial_number_corrected = str(trial_number).zfill(4)

    with open(f'./data/P0001/BB_FT_P0001_T{trial_number_corrected}.json') as json_file:
        free_throw_data = json.load(json_file)

    result = free_throw_data['result']

    peak_hip_z_velocity = np.nanmax(np.gradient([f['data']['player']['R_HIP'][2] for f in free_throw_data['tracking']], 1/30, axis=-1))

    plt.scatter(peak_hip_z_velocity, free_throw_data['landing_y'], color='green' if result == 'made' else 'red')

plt.xlabel('Peak Hip Vertical Velocity (ft/s)')
plt.ylabel('Ball Landing Position Y (inches)')



Interesting! A weak relationship does look like it's there but I've now also found what looks like a cluster of misses that have high peak hip vertical velocity and high ball landing position y values (top right of the figure), my next step here what other biomechanical processes are happening to result in these shots resulting in misses.

This is a great example of a basic line of biomechanical research that can be explored using the SPL Open Data basketball freethrow dataset, the rest is up to you!