In [1]:
import sys
import requests
import json
import pandas as pd
import os
from pprint import pprint

In [2]:
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path+"/codes")

import peloton_api_toolkit as api_tool
import peloton_data_toolkit as data_tool

In the previous section, we successfully retrieved some basic performance data, specifically personal records, using the `user overview` endpoint.  This gave us a glimpse into our workout history, showing key metrics like ride length, total output, and the dates of those achievements.  While valuable, the user overview endpoint provides a limited snapshot.  

It's great for high-level tracking, but what if we want to dive deeper into the specifics of each workout?  What if we're curious about our cadence, heart rate zones, power output over time, or even the instructor who led the class?  For this more granular data, let's explore the `workout` endpoint.

The table below shows the personal best workouts. Let's get the workoud_id of 45 min ride, and examine the output from `workout` endpoint.

In [220]:
session = requests.Session()
userID = api_tool.get_user_id(session)
response_json = api_tool.extract_user_overview(session, userID)
df_personal_records, df_streaks, df_achievements, df_workout_counts = data_tool.clean_user_overview(response_json)

df_personal_records[['name', 'value', 'unit', 'workout_date']]

Unnamed: 0,name,value,unit,workout_date
4,5 min,71,kj,2021-06-27 09:48:37.000000
5,10 min,148,kj,2024-04-29 09:41:50.385748
6,15 min,173,kj,2021-01-07 21:03:53.000000
7,20 min,460,kj,2024-09-08 11:46:38.416654
8,30 min,572,kj,2024-06-23 20:08:42.478000
0,45 min,883,kj,2024-05-06 09:57:46.552438
1,60 min,1056,kj,2024-09-24 20:20:52.210217
2,75 min,587,kj,2024-03-09 07:27:46.642128
3,90 min,1554,kj,2024-09-01 20:10:55.379911


In [223]:
workout_id = df_personal_records[df_personal_records['name'] == '45 min']['workout_id'][0]
api_base_url = "https://api.onepeloton.com"
path_workout = "/api/workout/"

session = requests.Session()
userID = api_tool.get_user_id(session)
response = session.get(api_base_url + path_workout + workout_id)
response_json = json.loads(response.text)    

Diving into the response_json returned by the workout endpoint reveals a wealth of data far exceeding what we saw with the user overview.  Beyond the basic workout details, we now have access to granular information that paints a much richer picture of each session.  This includes overall workout metrics like the start time, status (completed, pending, etc.), effort score, the total number of users on the leaderboard, and even our rank.  For rides specifically, we get details like the title, instructor, difficulty rating, and duration. But the real power lies in the detailed performance metrics.  We can now see the duration spent in each heart rate zone, our Functional Threshold Power (FTP), and any achievements earned during the workout.

Output from `pprint(response_json, depth = 1)`:

```
{'achievement_templates': [...],
 'average_effort_score': 75.1,
 'created': 1715003866,
 'created_at': 1715003866,
 'device_time_created_at': 1714989466,
 'device_type': 'home_bike_v1',
 'device_type_display_name': 'Bike',
 'end_time': 1715006750,
 'fitbit_id': None,
 'fitness_discipline': 'cycling',
 'ftp_info': {...},
 'has_leaderboard_metrics': True,
 'has_paused': False,
 'has_pedaling_metrics': True,
 'is_outdoor': False,
 'is_pause_available': False,
 'is_paused': False,
 'is_skip_intro_available': False,
 'is_total_work_personal_record': True,
 'leaderboard_rank': 62,
 'metrics_type': 'cycling',
 'name': 'Cycling Workout',
 'pause_time_elapsed': None,
 'pause_time_remaining': None,
 'platform': 'home_bike',
 'ride': {...},
 'service_id': None,
 'start_time': 1715004050,
 'status': 'COMPLETE',
 'strava_id': None,
 'timezone': 'Etc/GMT+4',
 'title': None,
 'total_heart_rate_zone_durations': {...},
 'total_leaderboard_users': 23699,
 'total_music_audio_buffer_seconds': None,
 'total_music_audio_play_seconds': None,
 'total_video_buffering_seconds': 0,
 'total_video_watch_time_seconds': 0,
 'total_work': 882694.27,
 'v2_total_video_buffering_seconds': 2,
 'v2_total_video_watch_time_seconds': 3010,
 'workout_type': 'class'}

```

Now that we've seen the wealth of information available, let's put it into practice.  Our goal is to extract all of our personal best workouts and store them in a list called `responses`. From this data, we're particularly interested in a select few pieces of information for each workout. To make working with this data easier, we'll flatten it out into a Pandas DataFrame.

To accomplish this, we'll loop through each workout in our responses list. Within the loop, we'll extract only the relevant information we've identified, storing it in a new list.  A crucial step here is handling potentially missing data.  Not all workouts, for example, may have heart rate zone information. To gracefully manage these missing keys, we'll use the `.get()` method.  This allows us to specify a default value if a key isn't present, preventing our code from crashing.

In [226]:
workout_ids = df_personal_records['workout_id'].tolist()

session = requests.Session()
userID = api_tool.get_user_id(session)

responses = []

for workout_id in workout_ids:
    response = session.get(api_base_url + path_workout + workout_id)
    response_json = json.loads(response.text)
    responses.append(response_json)


all_workout_data = [] 

for response in responses:
    workout_data = {}  # Initialize an empty dictionary for each workout

    workout_data['workout_id'] = response.get('id')

    # Use .get() with a default value (None) to handle missing keys
    workout_data['ride_title'] = response['ride'].get('title') if 'ride' in response else None
    workout_data['ride_description'] = response['ride'].get('description') if 'ride' in response else None
    workout_data['ride_id'] = response['ride'].get('id') if 'ride' in response else None
    workout_data['ride_difficulty'] = response['ride'].get('difficulty_estimate') if 'ride' in response else None
    workout_data['ride_duration'] = response['ride'].get('duration') if 'ride' in response else None
    workout_data['ride_instructor'] = response['ride'].get('instructor_id') if 'ride' in response else None
    workout_data['average_effort_score'] = response.get('average_effort_score')
    workout_data['ftp'] = response['ftp_info'].get('ftp') if 'ftp_info' in response and 'ftp' in response['ftp_info'] else None
    workout_data['start_time'] = response.get('start_time')
    workout_data['status'] = response.get('status')

    # Handle nested dictionaries and potential missing keys
    total_heart_rate_zone_durations = response.get('total_heart_rate_zone_durations')
    workout_data['heart_rate_z1_duration'] = total_heart_rate_zone_durations.get('heart_rate_z1_duration') if total_heart_rate_zone_durations else None
    workout_data['heart_rate_z2_duration'] = total_heart_rate_zone_durations.get('heart_rate_z2_duration') if total_heart_rate_zone_durations else None
    workout_data['heart_rate_z3_duration'] = total_heart_rate_zone_durations.get('heart_rate_z3_duration') if total_heart_rate_zone_durations else None
    workout_data['heart_rate_z4_duration'] = total_heart_rate_zone_durations.get('heart_rate_z4_duration') if total_heart_rate_zone_durations else None
    workout_data['heart_rate_z5_duration'] = total_heart_rate_zone_durations.get('heart_rate_z5_duration') if total_heart_rate_zone_durations else None
    
    workout_data['total_leaderboard_users'] = response.get('total_leaderboard_users')
    workout_data['leaderboard_rank'] = response.get('leaderboard_rank')
    workout_data['total_work'] = response.get('total_work')

    all_workout_data.append(workout_data)

df_workouts = pd.DataFrame(all_workout_data)

# remove workout_id from displaying
df_workouts[list(df_workouts.columns[~df_workouts.columns.str.contains('workout_id')])].head(2).T

This extracted data is already a significant improvement, but we can make it even more useful with a few transformations.  

* First, the start_time is currently represented as Unix time, which isn't very human-readable.  We'll convert this to a standard timestamp format.  
* Second, the heart rate zone durations are currently raw numbers.  It's more insightful to see these as percentages of the total workout time spent in each zone. 
* Finally, while leaderboard rank is informative, it's even better to see it as a percentile.  This gives us a better sense of our performance relative to other riders.  These transformations will make our data much more meaningful and easier to analyze.

In [256]:
# convert unix time to timestamp
col_type_workout_records = {
    'start_time': 'datetime'
}
df_workouts = data_tool.coerce_columns(df_workouts, col_type_workout_records, date_unit = 's')


# create pct of time spent on each heard rate zone
df_workouts['heart_rate_durtation_total'] = df_workouts['heart_rate_z1_duration'] + df_workouts['heart_rate_z2_duration']\
                                            + df_workouts['heart_rate_z3_duration'] + df_workouts['heart_rate_z4_duration']\
                                            + df_workouts['heart_rate_z5_duration']

df_workouts['heart_rate_z1_pct'] = round(df_workouts['heart_rate_z1_duration'] / df_workouts['heart_rate_durtation_total'] * 100, 1)
df_workouts['heart_rate_z2_pct'] = round(df_workouts['heart_rate_z2_duration'] / df_workouts['heart_rate_durtation_total'] * 100, 1)
df_workouts['heart_rate_z3_pct'] = round(df_workouts['heart_rate_z3_duration'] / df_workouts['heart_rate_durtation_total'] * 100, 1)
df_workouts['heart_rate_z4_pct'] = round(df_workouts['heart_rate_z4_duration'] / df_workouts['heart_rate_durtation_total'] * 100, 1)
df_workouts['heart_rate_z5_pct'] = round(df_workouts['heart_rate_z5_duration'] / df_workouts['heart_rate_durtation_total'] * 100, 1)

# clean output from J to KJ
df_workouts['total_work_in_kj'] = round(df_workouts['total_work'] / 1000, 1)

# create heart leaderboard percentil.
df_workouts['leaderboard_top_pct'] = round(df_workouts['leaderboard_rank'] / workout_data['total_leaderboard_users'] * 100, 1)

# show only relevant info
heart_rate_columns = df_workouts.columns[df_workouts.columns.str.contains('heart_rate')]
heart_rate_pct_columns = list(heart_rate_columns[heart_rate_columns.str.contains('_pct')])
workout_columns = ['ride_title', 'start_time', 'total_work_in_kj', 'leaderboard_top_pct', 'total_leaderboard_users'] + heart_rate_pct_columns

df_workouts[workout_columns]

Unnamed: 0,ride_title,start_time,total_work_in_kj,leaderboard_top_pct,total_leaderboard_users,heart_rate_z1_pct,heart_rate_z2_pct,heart_rate_z3_pct,heart_rate_z4_pct,heart_rate_z5_pct
0,5 min AT's Ride to Greatness: Warm Up,2021-06-27 13:48:40,71.1,4.1,79832,15.2,12.0,51.0,21.8,0.0
1,10 min Warm Up: Palomar Mountain,2024-04-29 13:46:00,147.5,0.7,36852,,,,,
2,15 min Intro to Climb Ride,2021-01-08 02:04:08,172.5,7.0,30682,,,,,
3,20 min FTP Test Ride,2024-09-08 15:47:38,460.3,0.5,30306,,,,,
4,30 min HIIT & Hills Ride,2024-06-24 00:09:42,571.6,2.1,90311,,,,,
5,45 min Climb: Palomar Mountain Part 2,2024-05-06 14:00:50,882.7,0.2,23699,5.2,8.4,38.5,46.3,1.7
6,60 min Power Zone Endurance Ride,2024-09-25 00:21:52,1056.1,1.3,58788,5.5,12.8,55.4,26.3,0.0
7,75 min Power Zone Endurance Ride,2024-03-09 12:30:50,587.1,96.8,49854,,,,,
8,90 min Power Zone Endurance Ride,2024-09-02 00:11:55,1553.6,0.4,30809,,,,,


Looking at our transformed data, some interesting patterns emerge.  It appears I've been performing quite well on the leaderboard for most of my personal best rides, with the `leaderboard_top_pct` often falling within the top 1%.  That's encouraging!  However, one ride, the 75-minute class, stands out with a leaderboard_top_pct in the lower 5%.  This suggests I likely didn't complete that ride.  A reminder to revisit it and conquer that challenge!

The absence of heart rate zone data for some workouts is puzzling.  This information is crucial for detailed performance analysis, and its absence raises questions. Perhaps there's a more comprehensive endpoint that provides this missing data.  Ideally, I'd like to have this level of detail for all my rides, not just the personal bests.  To investigate this further and unlock the data for all my workouts, we'll need to explore a different part of the Peloton API.  In the next section, we'll dive into the `user workout` endpoint, which promises to provide a more complete picture of our Peloton activity.