---

# Download Activity Summary and Details from Strava



Strava provides basic information on how to get access to the API at https://developers.strava.com/docs/getting-started/.  The information from the Strava API page combined with https://towardsdatascience.com/using-the-strava-api-and-pandas-to-explore-your-activity-data-d94901d9bfde and other websites allowed me to write the code necessary to connect to Strava.

With some additional research I was able to download all my activites and their summary data.  Though, in all my online searching I could not find any good examples of how to download all the detailed activity streams associated with each Strava activity.  The code to download activity details was mostly written from scratch.

**The value of this notebook is:**
   - example of code to authentiate and connect to the Strava API (follow Strava instructions to create an app)
   - download all activities and their associated summaries to a dataframe
   - download all 11 detailed activity streams for each activity to a dataframe
   - single notebook that authentiates to Strava, downloads activity summaries and downloads activity details
    
**Addtional useful features:**
   - controls the number of requests so they don't exceed Strava's default limits (100 requests/15 min)
   - stores all download activity details in pkl and csv format
   - loads previously downloaded activity details and then only downloads details for new activities
    



###### NEXT STEPS: 
  - Create a separate notebook that loads the pkl file with activity details
      - run basic analysis and create visualizations on the activity details
      - run machine learning models on the activity details to find patterns in heartrate and wattage numbers

---

**Written by:  Sheraz Choudhary**

**Date:        November 2021** 

---

In [None]:
# import libraries

import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
import warnings

import numpy as np
import pandas as pd

import os
from os.path import join
import sys
import subprocess
import pathlib
from dotenv import load_dotenv

import codecs
from codecs import open
from datetime import date
import time

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

# from fitparse import FitFile # (http://johannesjacob.com/2019/03/13/analyze-your-cycling-data-python/)
# import fitparse

In [None]:
# display upto 100 columns and 100 rows

pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)

## Connect to Strava -- Get Current Access Token

In [None]:
# (https://www.realpythonproject.com/3-ways-to-store-and-read-credentials-locally-in-python/)
credential_file = join(os.getcwd(), 'strava-credentials.env')

load_dotenv(credential_file)
client_id = os.environ.get('client_id')
client_secret = os.environ.get('client_secret')
refresh_token = os.environ.get('refresh_token')

In [None]:
# (https://github.com/franchyze923/Code_From_Tutorials/blob/master/Strava_Api/strava_api.py)

auth_url = "https://www.strava.com/oauth/token"

payload = {
    'client_id': client_id,
    'client_secret': client_secret,
    'refresh_token': refresh_token,
    'grant_type': "refresh_token",
    'f': 'json'
}

print("Requesting Token...\n")
res = requests.post(auth_url, data=payload, verify=False)
access_token = res.json()['access_token']
print("Access Token = {}\n".format("Received!"))

## Create Dataframe with Summary Info for All Activities *(Strava API)*

In [None]:
%%time

# (http://www.hainke.ca/index.php/2018/08/23/using-the-strava-api-to-retrieve-activity-data/)

# Initialize the dataframe
activities_overview = pd.DataFrame()

url = "https://www.strava.com/api/v3/athlete/activities"
page = 1

while True:
  
    # get page of activities from Strava
    page_json = requests.get(url + '?access_token=' + access_token + '&per_page=200' + '&page=' + str(page)).json()

    for a in range(len(page_json)):
        activity_json = pd.json_normalize(page_json[a])  #(https://stackoverflow.com/questions/21104592/)
        activities_overview = activities_overview.append(activity_json, ignore_index=True) 

    # if no results then exit loop
    if (not page_json):
        break

    # increment page
    page += 1

In [None]:
activities_overview = activities_overview.sort_values(by='id', ascending=True)  # makes sense since new added on bottom

In [None]:
activities_overview.tail(5)

In [None]:
print("Number of Strava Activities Found: ", activities_overview.shape)

In [None]:
activities_overview.to_csv('activities_overview.csv', header=True)

## Create Dataframe with Detailed Activity Data Streams for New Activities *(Strava API)*

In [None]:
def activity_streams(id):
    a_dict = {}
    a_df = pd.DataFrame()
    a_url = "https://www.strava.com/api/v3/activities/"
    
    streams_list = ['time','distance','latlng','altitude','velocity_smooth','heartrate','cadence','watts','temp',
                    'moving','grade_smooth']
    streams_text = 'time,distance,latlng,altitude,velocity_smooth,heartrate,cadence,watts,temp,moving,grade_smooth'

    a_json = pd.json_normalize(requests.get(a_url + str(id) + '/streams?access_token=' + access_token + 
                                             '&keys=time,distance,latlng,altitude,velocity_smooth,heartrate,cadence,watts,temp,moving,grade_smooth' + 
                                             '&key_by_type=true').json())

    for a in range(0,len(streams_list)):
        try:
            a_df[streams_list[a]] = a_json[str(streams_list[a]) +'.data'][0]
        except:
            a_df[streams_list[a]] = np.nan

    a_df['id'] = id
    idx = activities_overview.index[activities_overview['id'] == id].tolist()[0]

    a_df['date'] = activities_overview['start_date_local'][idx]
    a_df['name'] = activities_overview['name'][idx]
    a_df['type'] = activities_overview['type'][idx]

    return a_df

### Load Already Downloaded Activity Details if Present

In [None]:
%%time

try:
    activities_details = pd.read_pickle('activities_details.pkl')
    
except:
    activities_details = pd.DataFrame()

In [None]:
activities_details.info()

In [None]:
activities_details.tail()

### Download only Details for New Activities

In [None]:
%%time

#(https://thispointer.com/pandas-check-if-a-value-exists-in-a-dataframe-using-in-not-in-operator-isin/)

a_details_to_import = []
a_already_downloaded = activities_details['id'].unique()

for a in activities_overview['id']:
    try: # faster searching in only one column
        if a in a_already_downloaded:  
            pass
        else:
            a_details_to_import.append(a)
            
    except: # if empty dataframe searching in 'id' column will fail
        if a in activities_details.values:
            pass
        else:
            a_details_to_import.append(a)                         
    
print("Number of Activities to Import:  " + str(len(a_details_to_import)))

In [None]:
# Activities that have no details will always show up because they will never have any details added

a_details_to_import

In [None]:
%%time

a_range_l = 0
a_range_h = 89
a_number = len(a_details_to_import)

while a_range_l < a_number:
    if a_range_l > 0:
        print('Waiting...')
        time.sleep(1000) #wait a little over 16m40s to be safe (100 requests per 15min limit)
    
    print('Downloading activities ' + str(a_range_l) + ' to ' + str(a_range_h) +' ...')
    
    for a in a_details_to_import[a_range_l:a_range_h]:
        print('Downloading activity ', a)
        a_df_curr = activity_streams(a)
        activities_details = activities_details.append(a_df_curr, ignore_index=True) 

    a_range_l = a_range_l + 90 #90 rather than 100 to be safe
    a_range_h = a_range_h + 90 #90 rather than 100 to be safe

print('Done getting details for all new activities.')

In [None]:
display(activities_details.head(2))
display(activities_details.tail(2))

In [None]:
print("Number of Rows for all Activities Found: ", activities_details.shape)

In [None]:
print("Sum of Moving Time:   ", activities_overview['moving_time'].sum())
print("")
print("Sum of Elapsed Time:  ", activities_overview['elapsed_time'].sum())

In [None]:
activities_details.to_pickle('activities_details.pkl')

In [None]:
%%time

activities_details.to_csv('activities_details.csv', header=True)