# Peloton API Walkthrough


Like many people during the COVID pandemic, I found myself looking for a better way to workout at home (I have always been a gym-goer) and this led to the appearance of a Peloton stationary bike in my living room. The bike helped my through some of the dark days of the lockdown, particulary during the winter when getting outside was a challenge. But as an engineer I quickly became as interested in all of the data that my machine was throwing of as I was in using it. Power, Cadence, Resistance, Suffer-Score, FTP ... The bike is a data machine.

I also wanted to uncover the factors/drivers that would lead to the biggest improvements in my performance. Was it intervals, total time on the bike, or maybe peak intensity? This led me to exploring how I could access my data via the Peloton API.

It quickly became apparent that Peloton does not yet have an official api that is publicly documented. However, I was able to find some useful resources that helped me to start walking through the API. This notebook shows anyone interested what type fo data can be found at different endpoints in the Peloton API and how to get their Peloton performance data into a tidy data-science-ready format (tibble).

### References:

##### Unofficial API showing many of the endpoints you can get:
https://app.swaggerhub.com/apis/DovOps/peloton-unofficial-api/0.2.3#/

##### Git repository for a Peloton python library that someone created:
https://github.com/geudrik/peloton-client-library

### Walk-through:

My goal in exploring the Peloton API was to get the data pertaining to my athletic performance and the drivers of improvements to that performance. There is a lot of other data, for example, social media connections or user-experience variables. However, my goal was specific to getting to the performance data. 

My functional goal was to get a table of tidy data "tibble" of my performance data that I could use for visualizations and analysis.

This walkthrough shows how to get to that end goal, but also shows what is sitting at a lot of the other endpoints.

Note: I have hashed out #### a lot of my personal information along the way. Input your own information to make the code work.

In [1]:
## Package Imports

import requests
import pandas as pd
from pandas.io.json import json_normalize
import json
import numpy as np
import pprint

In [2]:
## Connect Session to Peloton API using Requests.

## Note: I have hashed out my username & password. Enter your own password to make the code work

s = requests.Session()
payload = {'username_or_email':'########', 'password':'############'}
s.post('https://api.onepeloton.com/auth/login', json=payload)

<Response [200]>

The first peloton endpoint to is under api/me. As the name suggests this endpoint has a lot of personal user information.

Below, is the result of a query to this endpoint and example of the data that it returns. Note, that I have only shown the keys of the returned data, because a full print out would display all of my own personal information.

In [3]:
## Query personal data
query_personal = s.get("https://api.onepeloton.com/api/me")

In [4]:
## Print personal query headings

print(json_normalize(query_personal.json()).columns.values)

['username' 'location' 'email' 'default_heart_rate_zones'
 'total_non_pedaling_metric_workouts' 'cycling_ftp_workout_id'
 'is_provisional' 'facebook_id' 'birthday' 'subscription_credits_used'
 'height' 'member_groups' 'height_unit' 'first_name' 'referral_code'
 'is_effort_score_enabled' 'customized_max_heart_rate'
 'has_active_device_subscription' 'weight' 'external_music_auth_list'
 'locale' 'created_country' 'last_workout_at' 'default_max_heart_rate'
 'id' 'is_fitbit_authenticated' 'workout_counts'
 'total_pedaling_metric_workouts' 'referrals_made'
 'is_internal_beta_tester' 'allow_marketing'
 'has_active_digital_subscription' 'contract_agreements' 'block_prenatal'
 'has_signed_waiver' 'total_followers' 'estimated_cycling_ftp' 'image_url'
 'instructor_id' 'is_profile_private' 'is_complete_profile' 'is_demo'
 'facebook_access_token' 'gender' 'obfuscated_email'
 'is_external_beta_tester' 'subscription_credits'
 'customized_heart_rate_zones' 'paired_devices' 'v1_referrals_made'
 'cyclin

From the keys you can see that this endpoint contains a lot of user account data, such as total following, connected social media accounts, birthday, gender, number of workouts, whether they are a beta tester, etc. However, it does not include much (if any) of the athletic performance data that I was hoping to find. It does however contain one key item for getting to that data, noteably your user id.

Like many APIs, user IDs are hidden behind hash keys to keep identities private. However, this hash key is important for querying many of the Peleton endpoints. Therefore, it is useful to store your Peleton ID for running additional queries. 

In [5]:
user_id = query_personal.json()['id']

Using this hash key you can query another endpoint which will give additonal workout data, which is: 

api/user/{user_id}/workouts?

In [6]:
pw_query_string = r"https://api.onepeloton.com/api/user/{}/workouts?joins=ride&limit=100".format(user_id)

q_personal_workouts = s.get(pw_query_string)

In [8]:
## Print personal workout query headings

print(json_normalize(q_personal_workouts.json()).columns.values)

['data' 'limit' 'page' 'total' 'count' 'page_count' 'show_previous'
 'show_next' 'sort_by' 'aggregate_stats' 'next.workout_id'
 'next.created_at' 'summary.2021-11' 'summary.2021-07' 'summary.2021-06'
 'summary.2021-05' 'summary.2021-04' 'summary.2021-03' 'summary.2021-02'
 'summary.2021-01'
 'total_heart_rate_zone_durations.heart_rate_z1_duration'
 'total_heart_rate_zone_durations.heart_rate_z2_duration'
 'total_heart_rate_zone_durations.heart_rate_z3_duration'
 'total_heart_rate_zone_durations.heart_rate_z4_duration'
 'total_heart_rate_zone_durations.heart_rate_z5_duration']


In [11]:
## Show an example of information in the 'data' key
print(q_personal_workouts.json()['data'][0].keys())

dict_keys(['created_at', 'device_type', 'end_time', 'fitness_discipline', 'has_pedaling_metrics', 'has_leaderboard_metrics', 'id', 'is_total_work_personal_record', 'metrics_type', 'name', 'peloton_id', 'platform', 'start_time', 'status', 'timezone', 'title', 'total_work', 'user_id', 'workout_type', 'total_video_watch_time_seconds', 'total_video_buffering_seconds', 'v2_total_video_watch_time_seconds', 'v2_total_video_buffering_seconds', 'total_music_audio_play_seconds', 'total_music_audio_buffer_seconds', 'ride', 'created', 'device_time_created_at', 'strava_id', 'fitbit_id', 'effort_zones'])


In [12]:
## Show an example of information in the 'summary' key
print(q_personal_workouts.json()['summary'])

{'2021-11': 2, '2021-07': 1, '2021-06': 5, '2021-05': 6, '2021-04': 10, '2021-03': 10, '2021-02': 17, '2021-01': 1}


In [13]:
## Show an example of information in the 'aggregate_stats' key
print(q_personal_workouts.json()['aggregate_stats'])

[]


It seems that this endpoint contains a lot of overall user data and user experience data for each workout. However, it does not contain the detailed athletic performance data I was hoping to find. It has, however, given some other useful data, notably the workout id for each of my rides, as well as the overall number of rides I've done. This data is useful for digging into each workout in more detail.

In [90]:
print(q_personal_workouts.json()['data'][0]['id'])

deb6957a2baa45f88c75e926b4bf7af1


All of pelotons API endpoints seem to be stored under different hash-key ids. As shown above there is a personal id for each user, which you need to access your own data. But additionally, each of your workouts has its own id. On top of that each peloton ride has its own id ("peleton_id"). This is different than your workout id, which is specific to your performance during that ride. Multiple users can experience the same ride, but each personal performance is stored under its own id.

The ride ids ("peleton_id") can be useful for comparing the performance of different riders to the same ride.

First let's look at a single example of the data for an individual workout, which as this endpoint:

api/workout/{workout_id}

In [14]:
wo_str = r'https://api.onepeloton.com/api/workout/deb6957a2baa45f88c75e926b4bf7af1'
q_single_workout = s.get(wo_str)

In [15]:
##view individual workout data

#print(q_single_workout.json().keys())

print(json_normalize(q_single_workout.json()).columns.values)

['created_at' 'device_type' 'end_time' 'fitness_discipline'
 'has_pedaling_metrics' 'has_leaderboard_metrics' 'id'
 'is_total_work_personal_record' 'metrics_type' 'name' 'peloton_id'
 'platform' 'start_time' 'status' 'timezone' 'title' 'total_work'
 'user_id' 'workout_type' 'total_video_watch_time_seconds'
 'total_video_buffering_seconds' 'v2_total_video_watch_time_seconds'
 'v2_total_video_buffering_seconds' 'total_music_audio_play_seconds'
 'total_music_audio_buffer_seconds' 'created' 'device_time_created_at'
 'strava_id' 'fitbit_id' 'is_skip_intro_available' 'has_paused'
 'is_pause_available' 'total_heart_rate_zone_durations'
 'average_effort_score' 'achievement_templates' 'leaderboard_rank'
 'total_leaderboard_users' 'device_type_display_name'
 'ride.availability.is_available' 'ride.availability.reason'
 'ride.class_type_ids' 'ride.content_provider' 'ride.content_format'
 'ride.description' 'ride.difficulty_estimate' 'ride.overall_estimate'
 'ride.difficulty_rating_avg' 'ride.diffi

This data is mostly user-experience data rather than performance/training data

However, looking at the unofficial API reference we can see that there is another endpoint we can look into:

api/workout/{workout_id}/performance_graph?

In [16]:
##Query performance graph data for individual workout
q_performance_graph = s.get('https://api.onepeloton.com/api/workout/f9347eede5cc4d04812bea35bd25e509/performance_graph?every_n=30').json()

We can explore this endpoint by looking at the data behind each of the keys returned by the query.

In [19]:
print(q_performance_graph.keys())

dict_keys(['duration', 'is_class_plan_shown', 'segment_list', 'seconds_since_pedaling_start', 'average_summaries', 'summaries', 'metrics', 'has_apple_watch_metrics', 'location_data', 'is_location_data_accurate', 'splits_data', 'target_metrics_performance_data', 'effort_zones'])


In [21]:
print(q_performance_graph['segment_list'])

[{'id': 'c09097cbbf644a33a099ddc8bbed1a07', 'length': 657, 'start_time_offset': 0, 'icon_url': 'https://s3.amazonaws.com/static-cdn.pelotoncycle.com/segment-icons/warmup.png', 'intensity_in_mets': 3.5, 'metrics_type': 'cycling', 'icon_name': 'warmup', 'icon_slug': 'warmup', 'name': 'Warm Up', 'is_drill': False}, {'id': '7b4554cfcba44352bddcf599cbd67e8d', 'length': 1085, 'start_time_offset': 657, 'icon_url': 'https://s3.amazonaws.com/static-cdn.pelotoncycle.com/segment-icons/cycling.png', 'intensity_in_mets': 6.0, 'metrics_type': 'cycling', 'icon_name': 'cycling', 'icon_slug': 'cycling', 'name': 'Cycling', 'is_drill': False}, {'id': 'b276dc5b35c74ec499f9a7a596618850', 'length': 58, 'start_time_offset': 1742, 'icon_url': 'https://s3.amazonaws.com/static-cdn.pelotoncycle.com/segment-icons/cooldown.png', 'intensity_in_mets': 3.5, 'metrics_type': 'cycling', 'icon_name': 'cooldown', 'icon_slug': 'cooldown', 'name': 'Cool Down', 'is_drill': False}]


In [22]:
print(q_performance_graph['seconds_since_pedaling_start'])

[1, 31, 61, 91, 121, 151, 181, 211, 241, 271, 301, 331, 361, 391, 421, 451, 481, 511, 541, 571, 601, 631, 661, 691, 721, 751, 781, 811, 841, 871, 901, 931, 961, 991, 1021, 1051, 1081, 1111, 1141, 1171, 1201, 1231, 1261, 1291, 1321, 1351, 1381, 1411, 1441, 1471, 1501, 1531, 1561, 1591, 1621, 1651, 1681, 1711, 1741, 1771, 1800]


In [23]:
print(q_performance_graph['average_summaries'])

print(json_normalize(q_performance_graph['average_summaries']))

[{'display_name': 'Avg Output', 'display_unit': 'watts', 'value': 133, 'slug': 'avg_output'}, {'display_name': 'Avg Cadence', 'display_unit': 'rpm', 'value': 87, 'slug': 'avg_cadence'}, {'display_name': 'Avg Resistance', 'display_unit': '%', 'value': 41, 'slug': 'avg_resistance'}, {'display_name': 'Avg Speed', 'display_unit': 'kph', 'value': 29.2, 'slug': 'avg_speed'}]
     display_name display_unit  value            slug
0      Avg Output        watts  133.0      avg_output
1     Avg Cadence          rpm   87.0     avg_cadence
2  Avg Resistance            %   41.0  avg_resistance
3       Avg Speed          kph   29.2       avg_speed


In [24]:
print(q_performance_graph['summaries'])

print(json_normalize(q_performance_graph['summaries']))

[{'display_name': 'Total Output', 'display_unit': 'kj', 'value': 240, 'slug': 'total_output'}, {'display_name': 'Distance', 'display_unit': 'km', 'value': 14.61, 'slug': 'distance'}, {'display_name': 'Calories', 'display_unit': 'kcal', 'value': 321, 'slug': 'calories'}]
   display_name display_unit   value          slug
0  Total Output           kj  240.00  total_output
1      Distance           km   14.61      distance
2      Calories         kcal  321.00      calories


In [25]:
print(q_performance_graph['metrics'])

print(json_normalize(q_performance_graph['metrics']))

[{'display_name': 'Output', 'display_unit': 'watts', 'max_value': 223, 'average_value': 133, 'values': [121, 133, 130, 127, 141, 130, 143, 213, 223, 220, 135, 135, 138, 141, 138, 165, 165, 169, 159, 103, 107, 119, 153, 159, 156, 160, 154, 163, 154, 120, 125, 122, 167, 163, 153, 167, 182, 180, 166, 129, 137, 131, 181, 161, 175, 172, 166, 163, 163, 129, 133, 125, 154, 156, 142, 156, 158, 158, 166, 97, 94], 'slug': 'output'}, {'display_name': 'Cadence', 'display_unit': 'rpm', 'max_value': 116, 'average_value': 87, 'values': [78, 85, 84, 83, 88, 85, 90, 115, 116, 115, 101, 101, 87, 85, 87, 91, 91, 92, 89, 79, 81, 86, 88, 89, 88, 87, 88, 87, 84, 83, 85, 84, 88, 87, 76, 79, 85, 86, 82, 81, 83, 82, 102, 95, 100, 99, 97, 96, 95, 98, 100, 96, 108, 110, 104, 105, 106, 106, 110, 114, 112], 'slug': 'cadence'}, {'display_name': 'Resistance', 'display_unit': '%', 'max_value': 48, 'average_value': 41, 'values': [43, 43, 41, 41, 41, 41, 41, 41, 41, 41, 38, 38, 41, 41, 41, 44, 44, 44, 44, 39, 39, 39, 4

Under the performance_graph endpoint I was able to find the detailed atheltic performance data that I was looking for, including power, cadence, heart-rate, etc.

Now that I have found the location of the data, I can gather the data for each of my workouts. This is one of the drawbacks of the current Peloton API (or at least my understanding of it) in that each workout is stored at it's own endpoint. There does not seem to be a single source for all of my workouts. Therefore, to get the data for all of my workouts, I have to run a query for each single workout.

However, knowing the location of all the needed hash keys and the endpoints of the wanted data, it's fairly easy to setup a loop to query each workout.

In [26]:
df_my_id = json_normalize(q_personal_workouts.json()['data'])

In [28]:
## I am only interested in cycling data
df_my_id = df_my_id[df_my_id['fitness_discipline'] == 'cycling']

In [29]:
df_my_id = df_my_id[['id','peloton_id','end_time']]

I have created a list of the hash keys for each of my individual workouts, the peleton ride ids, as well as the end-time which I can use to get the date.

I can use the hash keys to query each workout that I've completed:

In [31]:
q_pw_pgL = []       ## a List of personal workout performance graph data
for w in df_my_id['id']:
    
    pgs = r'https://api.onepeloton.com/api/workout/{}/performance_graph?every_n=30'.format(w)
    
    wo_pg = s.get(pgs).json()
    
    q_pw_pgL.append(wo_pg)

The data I have now is a list of queries, which is a list of nested JSON dictionaries. 

Taking a look at one example of the workout data that I have compiled, I can figure out how to tidy and organize the final dataframe.

In [34]:
print(q_pw_pgL[5].keys())

dict_keys(['duration', 'is_class_plan_shown', 'segment_list', 'seconds_since_pedaling_start', 'average_summaries', 'summaries', 'metrics', 'has_apple_watch_metrics', 'location_data', 'is_location_data_accurate', 'splits_data', 'target_metrics_performance_data', 'effort_zones'])


In [35]:
print(q_pw_pgL[5]['duration'])

1800


In [36]:
print(q_pw_pgL[5]['seconds_since_pedaling_start'])

[1, 31, 61, 91, 121, 151, 181, 211, 241, 271, 301, 331, 361, 391, 421, 451, 481, 511, 541, 571, 601, 631, 661, 691, 721, 751, 781, 811, 841, 871, 901, 931, 961, 991, 1021, 1051, 1081, 1111, 1141, 1171, 1201, 1231, 1261, 1291, 1321, 1351, 1381, 1411, 1441, 1471, 1501, 1531, 1561, 1591, 1621, 1651, 1681, 1711, 1741, 1771, 1800]


In [37]:
print(q_pw_pgL[5]['average_summaries'])

[{'display_name': 'Avg Output', 'display_unit': 'watts', 'value': 138, 'slug': 'avg_output'}, {'display_name': 'Avg Cadence', 'display_unit': 'rpm', 'value': 82, 'slug': 'avg_cadence'}, {'display_name': 'Avg Resistance', 'display_unit': '%', 'value': 43, 'slug': 'avg_resistance'}, {'display_name': 'Avg Speed', 'display_unit': 'kph', 'value': 29.4, 'slug': 'avg_speed'}]


In [38]:
print(q_pw_pgL[5]['summaries'])

[{'display_name': 'Total Output', 'display_unit': 'kj', 'value': 248, 'slug': 'total_output'}, {'display_name': 'Distance', 'display_unit': 'km', 'value': 14.71, 'slug': 'distance'}, {'display_name': 'Calories', 'display_unit': 'kcal', 'value': 341, 'slug': 'calories'}]


In [39]:
print(q_pw_pgL[5]['metrics'])

[{'display_name': 'Output', 'display_unit': 'watts', 'max_value': 228, 'average_value': 138, 'values': [93, 93, 102, 138, 135, 201, 192, 201, 207, 191, 204, 216, 113, 116, 145, 163, 176, 167, 213, 205, 194, 88, 150, 147, 159, 156, 170, 198, 170, 176, 190, 198, 173, 144, 141, 135, 211, 228, 219, 94, 93, 96, 142, 153, 145, 148, 173, 160, 125, 160, 198, 198, 197, 166, 148, 140, 209, 201, 146, 97, 86], 'slug': 'output'}, {'display_name': 'Cadence', 'display_unit': 'rpm', 'max_value': 111, 'average_value': 82, 'values': [82, 81, 85, 87, 86, 109, 106, 109, 111, 106, 106, 110, 75, 76, 87, 92, 92, 88, 98, 96, 93, 59, 87, 85, 89, 89, 86, 90, 86, 87, 92, 94, 87, 84, 84, 81, 89, 92, 91, 88, 87, 90, 87, 90, 87, 88, 88, 83, 72, 81, 92, 94, 88, 84, 88, 85, 92, 89, 70, 59, 55], 'slug': 'cadence'}, {'display_name': 'Resistance', 'display_unit': '%', 'max_value': 49, 'average_value': 43, 'values': [38, 38, 38, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 44, 44, 46, 46, 46, 46, 46, 44, 44, 44, 4

In [40]:
print(q_pw_pgL[5]['splits_data'])

{}


In [41]:
print(q_pw_pgL[5]['target_metrics_performance_data'])

{'target_metrics': [], 'time_in_metric': []}


In [42]:
print(q_pw_pgL[5]['effort_zones'])

{'total_effort_points': 24.4, 'heart_rate_zone_durations': {'heart_rate_z1_duration': 284, 'heart_rate_z2_duration': 751, 'heart_rate_z3_duration': 751, 'heart_rate_z4_duration': 12, 'heart_rate_z5_duration': 0}}


In [43]:
#Close the API request session

s.close()

In [62]:
#Create a list for each metric of interest

DurL=[]     # Duration
TOL=[]      # Total Output
MOL=[]      # Max Output
AOL=[]      # Average Output
MCL=[]      # Max Cadence
ACL=[]      # Average Cadence
MRL=[]      # Max Resistance
ARL=[]      # Average Resistance
MHRL=[]     # Max Heart Rate
AHRL=[]     # Average Heart Rate


for data in q_pw_pgL:
    
    #print(data)
    
    DurL.append(data['duration'])
    TOL.append(data['summaries'][0]['value'])
    MOL.append(data['metrics'][0]['max_value'])
    AOL.append(data['metrics'][0]['average_value'])
    MCL.append(data['metrics'][1]['max_value'])
    ACL.append(data['metrics'][1]['average_value'])
    MRL.append(data['metrics'][2]['max_value'])
    ARL.append(data['metrics'][2]['average_value'])
    try:
        MHRL.append(data['metrics'][4]['max_value'])
    except:
        MHRL.append(np.nan)
    try:
        AHRL.append(data['metrics'][4]['average_value'])
    except:
        AHRL.append(np.nan)

In [63]:
## Combine the lists into a Pandas dataframe
pelo_stats = pd.DataFrame(zip(df_my_id['id'],df_my_id['peloton_id'],df_my_id['end_time'],DurL,TOL,MOL,AOL,ACL,MRL,ARL,AHRL,MHRL))

pelo_stats = pelo_stats.rename(columns={0: 'workout_id', 1: 'peloton_id', 2: 'date', 3: 'duration', 4: 'total_output',
5: 'max_output', 6: 'average_output', 7: 'average_cadence', 8: 'max_resistance',9: 'average_resistence', 10: 'average_heart_rate', 11: 'max_heart_rate'})


In [65]:
## Convert the date from UNIX time to datetime
pelo_stats['date'] = pd.to_datetime(pelo_stats['date'], unit='s').apply(lambda x: x.to_datetime64())

In [66]:
print(pelo_stats)

                          workout_id                        peloton_id  \
0   b88214101694484286873f85dacf714d  9a92c52459ef42f0ae3639ef03b2361b   
1   deb6957a2baa45f88c75e926b4bf7af1  fb1d3ac0f5aa4a1090b302acbb512ee1   
2   223b95068acd4d92b76b9c7935d2017d  2213f5679bbd479f9d93c7de0344fcfd   
3   a682dbdda66e4ff29483d968acb59d00  ade0f46f36044962b7cc61bac51d2c06   
4   e10adacca609456989bbe373dd8da100  8ea67fdadb5e4f319f6d422d92cedf12   
5   c042b7fa590c4b169d36187fffba2648  9da5a0c42b1849a8ba706a8c7b315452   
6   fe371827309545fa9c49af3258df6593  1f469f192e3646218c61dfc6fb9962c8   
7   df83a80fa5ac466ea615326f4d76bef7  3ef7ce2bbdd74c1d8210365e323de93c   
8   b915ae179b7e436e8c4c833f12c2a1ae  05157117b2324eed83df30b6d4e0b1f1   
9   bcdbedc3fa4041738f607594eaa6fbcd  6abf5955aafa4c9e934d44cafc3ed0ee   
10  f9347eede5cc4d04812bea35bd25e509  46f0d10a19cd49cda9f3f69a6c7821d7   
11  90ebcd7605cc4c818611b6cafa2acc1c  8680008901dd49829c53439716514c8c   
12  cbb9357b9f4d43f6b50a103cd51a4084  

In [67]:
## Now I can export the final tibble to my dropbox so that I can work with the data without have to query the API

pelo_stats.to_csv(path_or_buf=r'C:\Users\Dropbox\#########')