In [10]:
from dotenv import load_dotenv
from urllib.parse import urlencode
import requests
import os

import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
import pandas as pd
import requests
import urllib3
import seaborn as sns
from pandas import json_normalize

## Setup for Accessing Strava Data

When making a [Strava App](https://www.strava.com/settings/api), the access token scope is not sufficient for actually getting activity data. This is because the "/athlete/activities" API endpoint requires either a token with scope "activity:read" or "activity:read_all". Since a Strava API application only gets you a read token (which funnily enough is not sufficient to pull activity data on its own), the easiest way to overcome this is the following.

1) Make a request using your clientID through a browser to the http://www.strava.com/oauth/authorize. Make sure to set scope=activity:read or your desired scope in the request. redirect_url=http://localhost is sufficient in this request since it is whitelisted by default and does not need to actually successfully redirect
2) After making the request, you will be redirected to an OAuth page for your created Strava API application. Grant it permission to login with the requested scope.
3) view the URL returned (the actual browser probably just shows nothing since it tried to redirect to localhost) and save the code that was returned into the response.
4) Using this code, make a new request to https://www.strava.com/oauth/token, making sure to use your normal clientID, clientSecret, and code=your_code in the request and grant_type=authorization_code. This can be done in a Python script now.
5) If successful, the request should return a new access_token and a refresh_token, which can be saved into your preferred way for storing secrets (this project uses a .env file that is .gitignored).

The provided access_refresh tokens supposedly do not expire, so this process only needs to be done once. While roundabout, this appears to be the easiest way for handling this sort of issues when working with Strava's API in an isolated environment, rather than an active application with a server and browser to handle OAuth request.

In [12]:
# Set Strava URL for accessing API
activities_url = 'https://www.strava.com/api/v3/athlete/activities'

In [9]:
# Load .env file contents
load_dotenv()

True

## Request all my Strava Activity Data

Using the access token with the appropriate scope for reading my activities from Strava, I create request headers to use

In [None]:
header = {'Authorization': 'Bearer ' + os.getenv("STRAVA_ACCESS_TOKEN")}

This function loops through the Strava response data and adds it to an output array, if any were returned. This is because when hitting the 'athlete/activity' GET API, the items are returned in a list without an associated key (i.e. 'data') so the JSON can be directly iterated over or appended to a list. When all activites are parsed, the next page request will return an empty JSON item back, so iteration will stop there.

The maximum results per page that can be configured is 200, which is used to reduce the chance of hitting Strava's API limits.

Small functionality is added to check for an error of hitting the Rate Limit, since the boolean condition does not account for getting a response that is not actually containing data (since there isn't a key to check for when data is returned).

Strava has a 100 read requests / 15 minute limit on top of a 1000 read limit per day, which isn't an issue when dealing with just my activites but could be difficult if parsing for multiple athletes.

In [6]:
# Function to request activities data
def loop_through_pages(page):
    # start at page ...
    page = page
    # set new_results to True initially
    new_results = True
    # create an empty array to store our combined pages of data in
    data = []
    while new_results:
        # Give some feedback
        print(f'You are requesting page {page} of your activities data ...')
        # request a page + 200 results
        get_strava = requests.get(activities_url, headers=header, params={'per_page': 200, 'page': f'{page}'}).json()

        if 'message' in get_strava:
            if get_strava['message'] == "Rate Limit Exceeded":
                print("Rate Limited Exceeded, please wait before retrying")
                break

        # save the response to new_results to check if its empty or not and close the loop
        new_results = get_strava
        # add our responses to the data array
        data.extend(get_strava)
        # increment the page
        page += 1
    # return the combine results of our get requests
    return data

# call the function to loop through our strava pages and set the starting page at 1
my_dataset = loop_through_pages(1)

You are requesting page 1 of your activities data ...
You are requesting page 2 of your activities data ...


In [13]:
print(f'Found {len(my_dataset)} activites!')

Found 106 activites!


## Converting Activities into a proper dataset