INITIAL ONE-TIME SETUP
1. Go to https://www.strava.com/settings/api
2. Create an application
    2a. "Authorization Callback Domain" needs to be "localhost" and "Website"
    should be https://www.strava.testapp.com. Everything else can be whatever
    you want.
    2b. Thorougly read the terms and conditions ;) then check that you've read 
    and agreed to terms and conditions and click "Create".
    2b. Add an image (can be anything).
3. After adding an image, you should see a Strava page titled "My API
   Application". Copy the "Client ID" and "Client Secret" from that page and paste it into
   the box underneath this one, then run the box with the play button on the
   left.
   The Access Token and Refresh Token on this page are not useful for
   what we want to do, so we much take extra steps to get special "Read All"
   tokens.

In [None]:
# <------ press the play button in the upper left corner when finished
CLIENT_ID = 'YOUR CLIENT ID'
CLIENT_SECRET = 'YOUR CLIENT SECRET'

print("Paste this link into your browser search bar and press Enter.")
print(f"https://www.strava.com/oauth/authorize?client_id={CLIENT_ID}&redirect_uri=http://localhost&response_type=code&scope=activity:read_all")

4. Copy the link produced by the box above and paste it into your browser's
   search bar and hit Enter.
5. On the page that follows, press the large orange "Authorize" button. The
   webpage will error after this, but that is okay.
6. The webpage URL should look something like:
    http://localhost/?state=&code=[YOUR_CODE]&scope=read,activity:read_all
    Copy the series of characters following "code=" and before the "&", then
    paste the code into the box below where it says 'YOUR CODE' and press play
    on that box. Note: THIS CODE CAN ONLY BE USED ONCE. Repeat steps 4, 5, and
    6 if something goes wrong and you get an error in the box below after you
    have previously run the code in the box below.

In [None]:
YOUR_CODE = 'YOUR CODE'   # Leave the quotation marks surrounding the code.

import requests

data = {
    'client_id': CLIENT_ID,
    'client_secret': CLIENT_SECRET,
    'code': YOUR_CODE,
    'grant_type': 'authorization_code'
}

res = requests.post('https://www.strava.com/api/v3/oauth/token', data=data)
print("Information about your profile should be displayed below:")
print(res.json())
print()
print()
print(f"READ ALL REFRESH TOKEN: {res.json()['refresh_token']}")
print(f"CLIENT ID: {CLIENT_ID}")
print(f"CLIENT SECRET: {CLIENT_SECRET}")

7. Copy and paste the "READ ALL REFRESH TOKEN" displayed above into the box
   below. I also reccommend copying and pasting the CLIENT SECRET and CLIENT ID
   so that you can ignore everything above this box in the future.
   Don't try to run the above cell again; your code is usable only one time and
   you will need to repeat steps 4, 5, and 6 to get a new code.

In [None]:
# <------ press the play button in the upper left corner when finished
REFRESH_TOKEN = 'YOUR REFRESH TOKEN'
CLIENT_ID = 'YOUR CLIENT ID'
CLIENT_SECRET = 'YOUR CLIENT SECRET'

You are now done with setup. If you have saved the refresh token you got from
the steps above, you can skip right to here in the future. Run the box below to
get all your data.
"NUMBER_OF_PAGES" (at top of box below) can be adjusted to get more or less
data (more pages = more data, more loading time, fewer pages = less data,
shorter loading time). There are 200 activities per page of all activity types.

In [None]:
NUMBER_OF_PAGES = 5

import pandas as pd
import numpy as np
import scipy as sp
from scipy import stats
from numpy.polynomial import polynomial
import requests
import urllib3
import matplotlib.pyplot as plt
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)


# coeffs is a list of the coefficients for the line of best fit (i.e. [4, -.5, 19]
# for the function "4x^2 - .5x + 19" or [3, 28] for "3x + 28". x and y are the
# column names for the data that is being graphed, and activity df is the df
# with all of the data that is to be graphed (rows without data for x and y
# must be excluded).
def calc_r_sq(coeffs, x, y, activity_df):
    x = list(activity_df[x])
    y = list(activity_df[y])

    expected_values = []
    for x_val in x:
        # In english: expected_values.append(f(x)) where x=x_val and f(x) is an
        # nth order polynomial where n=len(coeffs)
        expected_values.append(sum([coeffs[coeff_num] * x_val**coeff_num for coeff_num in range(len(coeffs))]))
    
    r = stats.pearsonr(x=expected_values, y=y)[0]
    r_sq = r**2

    return r_sq


# Plot the line of best fit for a line and 2nd-order polynomial. x and y are
# the names of the columns to be used, and activity_df is the dataframe with
# all the data
def plot_best_fit(x, y, activity_df):
    y_vs_x_text = f"{y} vs {x}" # Name of graph
    print(f"\n{y_vs_x_text}")

    # If there are no data points for specified x and y, remove those rows. 
    # Then sort and plot the data.
    activity_df = activity_df.dropna(subset=[x, y])
    activity_df = activity_df.sort_values(by=x)
    activity_df.plot(x=x, y=y, kind='scatter', color='black', figsize=(10,5))

    # Calculate 2nd-degree polynomial and r^2 for line of best fit
    poly_coeff = polynomial.polyfit(x=activity_df[x], y=activity_df[y], deg=2)
    poly_r_sq = calc_r_sq(poly_coeff, x, y, activity_df)
    poly_equation_text = "y = %.3gx^2 + %.3gx + %.3g   (r^2 = %.2g)" % (poly_coeff[2], poly_coeff[1], poly_coeff[0], poly_r_sq)

    # Calculate linear equation and r^2 for line of best fit
    linear_coeff = polynomial.polyfit(x=activity_df[x], y=activity_df[y], deg=1)
    linear_r_sq = calc_r_sq(linear_coeff, x, y, activity_df)
    linear_equation_text = "y = %.3gx + %.3g   (r^2 = %.2g)" % (linear_coeff[1], linear_coeff[0], linear_r_sq)

    print(poly_equation_text)
    print(linear_equation_text)

    # Plot lines of best fit
    plt.plot(activity_df[x], linear_coeff[1] * activity_df[x] + linear_coeff[0], color='green')
    plt.plot(activity_df[x], poly_coeff[2] * (activity_df[x]**2) + poly_coeff[1] * activity_df[x] + poly_coeff[0], color='blue')

    # Display equations in upper-right-hand corner
    x_range = max(activity_df[x]) - min(activity_df[x])
    y_range = max(activity_df[y]) - min(activity_df[y])
    plt.text(max(activity_df[x]) - .5*x_range, max(activity_df[y]) + .2*y_range, poly_equation_text, size=14)
    plt.text(max(activity_df[x]) - .5*x_range, max(activity_df[y]) + .1*y_range, linear_equation_text, size=14)

    # Save and show image
    plt.savefig(f"{y_vs_x_text}.png")
    plt.show()


auth_url = "https://www.strava.com/oauth/token"
activites_url = "https://www.strava.com/api/v3/athlete/activities"

payload = {
    'client_id': CLIENT_ID,
    'client_secret': CLIENT_SECRET,
    'refresh_token': REFRESH_TOKEN,
    'grant_type': "refresh_token",
    'f': 'json'
}

print("Requesting Token...\n")
res = requests.post(auth_url, data=payload, verify=False)
print(res.json())
access_token = res.json()['access_token']
print("Access Token = {}\n".format(access_token))

downloaded_df_columns = [
    'start_date_local',
    'name',
    'distance', # meters
    'moving_time',  # seconds
    'elapsed_time', # seconds
    'total_elevation_gain', # meters
    'elev_high',
    'elev_low',
    'trainer',
    'commute',
    'manual',
    'sport_type',
    'gear_id',
    'average_speed',
    'max_speed',
    'average_cadence',
    'average_heartrate',
    'max_heartrate',
    'weighted_average_watts',
    'average_watts',
    'kilojoules',
    'device_watts',
    'average_temp',
    'suffer_score'
]

activity_df_columns = [
    'start_date_local',
    'name',
    'distance', # meters
    'moving_time',  # seconds
    'elapsed_time', # seconds
    'total_elevation_gain', # meters
    'elev_high',
    'elev_low',
    'trainer',
    'commute',
    'manual',
    'sport_type',
    'gear_id',
    'average_speed',
    'max_speed',
    'average_cadence',
    'average_heartrate',
    'max_heartrate',
    'est_avg_watts',
    'weighted_average_watts',
    'average_watts',
    'kilojoules',
    'device_watts',
    'average_temp',
    'suffer_score'
]

activity_df = pd.DataFrame(columns=activity_df_columns)
print("Loading data... Shouldn't take more than 10 seconds per page.")
for page_num in range(1, NUMBER_OF_PAGES+1):
    # print()
    # print(f"---------------------PAGE: {page_num}----------------------")
    print(f"Downloading Page {page_num}/{NUMBER_OF_PAGES}...")
    header = {'Authorization': 'Bearer ' + access_token}
    param = {'per_page': 200, 'page': page_num}
    initial_json = requests.get(activites_url, headers=header, params=param).json()
    downloaded_df = pd.json_normalize(initial_json)

    downloaded_df = downloaded_df.loc[downloaded_df['sport_type'] == 'Ride']
    downloaded_df['est_avg_watts'] = downloaded_df['average_watts'].loc[downloaded_df['device_watts'] == False]
    downloaded_df['average_watts'] = downloaded_df['average_watts'].loc[downloaded_df['device_watts'] == True]

    activity_df = pd.concat([activity_df, downloaded_df], ignore_index=True)

activity_df['start_date_local'] = pd.to_datetime(activity_df['start_date_local'], infer_datetime_format=True)

print(f"Bottom 1% weighted avg power (rides to crop out): {activity_df['weighted_average_watts'].quantile(.01)}")
activity_df = activity_df.loc[activity_df['weighted_average_watts'] > activity_df['weighted_average_watts'].quantile(.01)]  # crop out noodle rides
activity_df['watts_per_bpm'] = activity_df['average_watts'] / activity_df['average_heartrate']
activity_df['weighted_watts_per_bpm'] = activity_df['weighted_average_watts'] / activity_df['average_heartrate']
activity_df['w_per_bpm_per_distance'] = activity_df['average_watts'] / activity_df['average_heartrate'] / activity_df['distance']

activity_df['distance'] = activity_df['distance'] / 1000    # convert from m to km
activity_df['average_speed'] = activity_df['average_speed'] * 3.6   # convert from m/s to km/hr
activity_df['max_speed'] = activity_df['max_speed'] * 3.6   # convert from m/s to km/hr
activity_df['avg_mph'] = activity_df['average_speed'] * 3600 / 1609 # convert from m/s to mi/hr
activity_df['max_mph'] = activity_df['max_speed'] * 3600 / 1609     # convert from m/s to mi/hr

# print(f"Available data: {activity_df.columns}")
print(f"\n{activity_df.iloc[:, :2].head()}")

You should see a few of your most recent rides above. If you do, everything
worked. If you don't, something broke idk.

In the box is an example of what I think the most interesting plots are.
Running that cell should produce a scatter plot with both the parabolic linear
lines of best fit. Some of the data you can plot are:

        'distance', 'moving_time', 'elapsed_time',
        'total_elevation_gain', 'elev_high', 'elev_low',
        'average_speed', 'max_speed',
        'average_cadence', 'average_heartrate', 'max_heartrate',
        'est_avg_watts', 'weighted_average_watts', 'average_watts',
        'kilojoules', 'device_watts', 'average_temp', 'suffer_score',
        'achievement_count', 'kudos_count', 'comment_count',
        'athlete_count', 'photo_count',
        'start_latlng', 'end_latlng',
        'pr_count', 'total_photo_count', 'max_watts',
        'watts_per_bpm', 'weighted_watts_per_bpm', 'w_per_bpm_per_distance',
        'avg_mph', 'max_mph'

The graphs also show "r^2" which is a measure of how well the equation of the
line follows the data. Data points scattered all over the place will have an
r^2 close to 0, while data points that are perfectly lined up will have an
r^2 of 1. Due to the extremely uncontrolled envrionment that our data are
collected in, if you find something that has an r^2 of ~.2-.25 or better,
that's a pretty good correlation, imho.

In [None]:
plot_best_fit('average_temp', 'weighted_average_watts', activity_df)
plot_best_fit('average_temp', 'weighted_watts_per_bpm', activity_df)

In [None]:
plot_best_fit('average_temp', 'suffer_score', activity_df)
plot_best_fit('average_watts', 'average_heartrate', activity_df)
plot_best_fit('weighted_average_watts', 'average_heartrate', activity_df)
plot_best_fit('average_temp', 'watts_per_bpm', activity_df)

Interpret the results as you wish. They're probably completely meaningless, but
still probably worthy of r/mildlyinteresting for cyclists.