# Peloton API Walkthrough

While there isn't an official Peloton API documentation, Peloton communities have shared their own pagckages (e.g. peloton-client-library, pelotonR) and documentations (peloton-unofficial-api).

These resources are somewhat dated, so I've opted to dig into raw outputs from API calls using `request` package.

In [1]:
import requests
import json
import pandas as pd
import os

## Step 1. Authenticate

To access the Peloton API, you'll need to authenticate using your user ID and password. The endpoint for authentication is https://api.onepeloton.com/auth/login/.

Here's how to authenticate:

* Prepare the request: Create a JSON object with your user ID and password.
* Make the request: Use the requests library to send a POST request to the authentication endpoint with the user credentials.
* Check the response: The response will be a JSON object that contains an user_id (access token) if the authentication was successful. You can use this access token to authorize future requests to the Peloton API.



In [2]:
peloton_username = os.environ.get('peloton_user_name') 
peloton_password = os.environ.get('peloton_password')

api_base_url = 'https://api.onepeloton.com'
path_auth = '/auth/login'

params_auth_query = {'username_or_email': peloton_username, 'password': peloton_password}

s = requests.Session()
response = s.post(api_base_url + path_auth, json = params_auth_query)

userID = json.loads(response.text)['user_id']

Your user_name, password, and access token are sensitive information. Do not share them publicly, and always use best practices in storing credentials. I am using environment variable to store and access the credentials.

## Step 2. Data Extraction

Now that you have an access token, let's start exploring the Peloton API. 

A great place to begin is the /api/user/{user_ID}/overview endpoint. This endpoint provides an overview of your workout history, including the total number of workouts by type, earned achievements, and personal records.

Here's how to use the /api/user/{user_ID}/overview endpoint:

* Construct the URL: Replace {user_ID} with your actual Peloton user ID in the URL.
* Make the request: Use the requests library to send a GET request to the endpoint, including the access token in the Authorization header.

In [3]:
path_user_overview = f'/api/user/{userID}/overview'
headers = {
    'Peloton-Platform': 'web'
}

response = s.get(api_base_url + path_user_overview, headers = headers)

Explore the response: The API will return a JSON response containing information about your overall workout history.

In [4]:
response_json = json.loads(response.text)
response_json.keys()

dict_keys(['id', 'workout_counts', 'personal_records', 'streaks', 'achievement_counts'])

From top structure, we can see that this includes data like:

* `Total Workout Counts`: A breakdown of the total number of workouts you've completed across different disciplines (e.g., cycling, strength, yoga). See which categories dominate your fitness routine!
* `Personal Records`: This section reveals your personal records (PRs) for each workout type. (Note: PRs are only available if the workout was completed on a Peloton device, so my data focuses solely on cycling.)
* `Streaks`: Discover your dedication! This section highlights your daily and weekly workout streaks, showcasing your commitment to consistent exercise.
* `Achievements`: Ever wonder how many times you've snagged that 7-day streak badge? This section reveals all your hard-earned Peloton achievements, complete with descriptions, image URLs, and the number of times you've unlocked each one.

To make this data easier to analyze, we'll need to transform it into a more structured format. This involves:

* Converting to Tabular Format: We'll convert the JSON response into a tabular format, like a dataframe, which is ideal for analysis and manipulation.
* Data Type Coercion: We'll ensure that data entries are in the correct format. For example, we'll convert string representations of dates and times into datetime objects for easier manipulation and analysis.

For example, to flatten and show achievement info in tabular format, we first convert it to dataframe, and unnest the template entries.



In [5]:
df_workout_counts = pd.DataFrame(response_json['workout_counts']['workouts'])
df_workout_counts

Unnamed: 0,name,slug,count,icon_url,workout_name
0,Cycling,cycling,749,https://s3.amazonaws.com/static-cdn.pelotoncyc...,Cycling
1,Stretching,stretching,473,https://s3.amazonaws.com/static-cdn.pelotoncyc...,Stretching
2,Strength,strength,394,https://s3.amazonaws.com/static-cdn.pelotoncyc...,Strength
3,Walking,walking,66,https://s3.amazonaws.com/static-cdn.pelotoncyc...,Walking
4,Rowing,caesar,31,https://s3.amazonaws.com/static-cdn.pelotoncyc...,Rowing
5,Meditation,meditation,23,https://s3.amazonaws.com/static-cdn.pelotoncyc...,Meditation
6,Cardio,cardio,17,https://s3.amazonaws.com/static-cdn.pelotoncyc...,Cardio
7,Running,running,10,https://s3.amazonaws.com/static-cdn.pelotoncyc...,Running
8,Bike Bootcamp,bike_bootcamp,1,https://s3.amazonaws.com/static-cdn.pelotoncyc...,Bike Bootcamp
9,Yoga,yoga,1,https://s3.amazonaws.com/static-cdn.pelotoncyc...,Yoga


In [6]:
df_personal_records = pd.DataFrame(response_json['personal_records'][0]['records'])

# workoud_id is removed from the view as it is personal info.
columns_to_exclude = ['workout_id']
columns_to_include = [col for col in df_personal_records.columns if col not in columns_to_exclude]

df_personal_records[columns_to_include].head(5)

Unnamed: 0,name,slug,value,raw_value,unit,unit_slug,workout_date
0,45 min,2700,883,882694.27,kj,kj,2024-05-06T09:57:46.552438
1,60 min,3600,1056,1056107.31,kj,kj,2024-09-24T20:20:52.210217
2,75 min,4500,587,587051.0,kj,kj,2024-03-09T07:27:46.642128
3,90 min,5400,1554,1553594.02,kj,kj,2024-09-01T20:10:55.379911
4,5 min,300,71,71090.35,kj,kj,2021-06-27T09:48:37


In [7]:
df_streaks = pd.DataFrame([response_json['streaks']])
df_streaks

Unnamed: 0,current_weekly,best_weekly,start_date_of_current_weekly,current_daily,start_date_of_current_daily
0,114,111,1669792908,0,


In [8]:
df_achievements =  pd.DataFrame(response_json['achievement_counts']['achievements'])
achievement_template_norm = pd.json_normalize(df_achievements['template'])
df_achievements = pd.concat([df_achievements.drop(columns=['template']), achievement_template_norm], axis=1)

df_achievements.sort_values('count', ascending = False).head(10)

Unnamed: 0,count,id,name,slug,image_url,description,animated_image_url,kinetic_token_background
118,302,657e50c747d6458480f1ba6a0fa94c6a,Dynamic Duo,two_to_tango,https://s3.amazonaws.com/peloton-achievement-i...,Awarded for working out with a friend.,,
121,208,3a9ea8169d17455c86b9f52b1011e57b,Squad,socialite,https://s3.amazonaws.com/peloton-achievement-i...,Awarded for working out with 5 friends.,,
120,177,f0bdc95051b64c5bbd296e20d3fecb03,Pack,3_squad,https://s3.amazonaws.com/peloton-achievement-i...,Awarded for working out with 3 friends.,,
119,165,1b0f7ba0b9e945e88c93792484995c00,Three's Company,threes_company,https://s3.amazonaws.com/peloton-achievement-i...,Awarded for working out with 2 friends.,,
122,114,7a68d49d95ce4918b7408b26f91d9eac,Flock,10_flock,https://s3.amazonaws.com/peloton-achievement-i...,Awarded for working out with 10 friends.,,
207,98,df98e7119e4b478ea02494f22c004fe6,Movement Tracker Gold,movement_tracker_gold,https://s3.amazonaws.com/peloton-achievement-i...,Awarded for getting Movement Tracker credit fo...,,
0,93,bac5aefabb2940ba8f0a170fc9d63bf0,Best Output,best_output,https://s3.amazonaws.com/peloton-achievement-i...,Personal best output in a workout.,,
98,64,5298b832e2274ad59cf8857240440fb2,3-Day Streak,3_day_streak,https://s3.amazonaws.com/peloton-achievement-i...,Awarded for working out 3 days in a row.,,
29,42,4036702255e84f26bee944331ef92310,Artist Series,artist_series,https://s3.amazonaws.com/peloton-achievement-i...,Awarded for participating in an Artist Series ...,,
123,38,7bbfe8e58f6744b696b104acf2ecaa72,Swarm,20_swarm,https://s3.amazonaws.com/peloton-achievement-i...,Awarded for working out with 20 friends.,,


## Step 3. Data Type Coersion

As we review the dataframe, there are common data type issues that makes it harder for us to "read" the data. All columns are passed as string, so we'll need to change types for date or numeric column. The date fields are passsed as UNIX, and will need to be converted to timstamp to be easily interpretable. 

In [9]:
def coerce_columns(df, type_dict, date_format=None, date_unit = None):
  """
  Coerces columns in a DataFrame to specified types, including date conversion.

  Args:
    df: The pandas DataFrame.
    type_dict: A dictionary mapping column names to data types.
    date_format: (Optional) A string specifying the date format 
                 if any columns need to be converted to datetime.

  Returns:
    The DataFrame with coerced columns.
  """
  for col, col_type in type_dict.items():
    try:
      if col_type == 'datetime' and date_format:
        df[col] = pd.to_datetime(df[col], format=date_format)
      elif col_type == 'datetime' and date_unit:
        df[col] = pd.to_datetime(df[col], unit=date_unit)
      else:
        df[col] = df[col].astype(col_type)
    except ValueError as e:
      print(f"Error converting column {col} to {col_type}: {e}")
  return df

In [10]:
col_type_personal_records = {
    'slug':'int',
    'value':'int',
    'raw_value':'float',
    'workout_date': 'datetime'
}

df_personal_records = coerce_columns(df_personal_records, col_type_personal_records, date_unit = 'mixed')

df_personal_records.sort_values('slug').head(5)

Error converting column workout_date to datetime: non convertible value 2024-05-06T09:57:46.552438 with the unit 'mixed', at position 0


Unnamed: 0,name,slug,value,raw_value,unit,unit_slug,workout_id,workout_date
4,5 min,300,71,71090.35,kj,kj,fbdc1a79aa3742aa93f72ab821bfdd53,2021-06-27T09:48:37
5,10 min,600,148,147537.31,kj,kj,867b9f0e534f4eb99f8165fdb3a603ac,2024-04-29T09:41:50.385748
6,15 min,900,173,172514.45,kj,kj,6a574765b50645688e574ecb485c119d,2021-01-07T21:03:53
7,20 min,1200,460,460290.48,kj,kj,b8a1fc94d9824a11b31927c44c76ac6e,2024-09-08T11:46:38.416654
8,30 min,1800,572,571577.48,kj,kj,7700367be5464034b64a2d405ab24c29,2024-06-23T20:08:42.478000


## Next Walkthrough

We've successfully accessed the Peloton API and retrieved valuable data. But as we explore more endpoints, we'll likely encounter similar patterns in our code. Repeating the same code blocks for different API calls can quickly become tedious and inefficient.

In the next walkthrough, I'll transform some of our existing code into reusable functions, making our Peloton data exploration more efficient and enjoyable.