# Load Strava Data
This notebook loads the activity data from Strava, puts it in a data frame and stores the data frame on disc. The data frame is not included in .gitignore, so the data is also stored in the git repo. In case of privacy concerns you may want to change that.

## Connect to Strava
To run through the Strava OAuth workflow, run the first cell of this notebook and click on the resulting URL. This leads you to the login page of Strava and once logged in, to a defunct redirect. Nevermind the redirect, just take out the temp code from the URL. Insert the temp code in the STRAVA_CODE variable in the second cell of this notebook. Once you run the second cell of the notebook, you have an authenticated client for Strava.

In [1]:
import os

from stravalib.client import Client

# The STRAVA API keys are expected as env variables
STRAVA_ID=int(os.environ.get("STRAVA_ID"))
STRAVA_SECRET=os.environ.get("STRAVA_SECRET")

client=Client() 
authorize_url = client.authorization_url(
    client_id=STRAVA_ID, 
    redirect_uri='http://localhost:8282/authorized') 
print(authorize_url)

https://www.strava.com/oauth/authorize?client_id=29670&redirect_uri=http%3A%2F%2Flocalhost%3A8282%2Fauthorized&approval_prompt=auto&response_type=code


In [2]:
STRAVA_CODE="171482836f49334b7bc81ea7365372d372131d8d"

access_token = client.exchange_code_for_token(
    client_id=STRAVA_ID, 
    client_secret=STRAVA_SECRET, 
    code=STRAVA_CODE)
client = Client(access_token=access_token) 

## Load basic data from Strava
The query requests all activity data from 2018. For other time intervalls, the query parameter can be changed accordingly. The data is stored in a Pandas data frame. The columns can be defined in the respective array. 

In [10]:
import pandas as pd

# Define columns and create data frame
data = []
columns =['average_cadence', 'average_heartrate', 'average_speed', 'calories',  'description', 'distance', 'elapsed_time', 'end_latlng', 'gear', 'id', 'location_city', 'location_country', 'start_date', 'start_date_local', 'start_latitude', 'start_longitude', 'start_latlng', 'type', 'workout_type']
index = []
index_column = "start_date_local"

# List some activities
activities = client.get_activities(after = "2018-10-26T00:00:00Z", limit=5)

for activity in activities:
    activity_dict = {}
    for column in columns:
        activity_dict[column] = activity.__getattribute__(column)
    data.append(activity_dict)
    index.append(activity_dict[index_column])
    
activity_df = pd.DataFrame(
    data, 
    index=index, 
    columns=columns)

No such attribute visibility on entity <Activity id=1930230520 name='Heßdorf - Dechsendorf' resource_state=2>
No such attribute heartrate_opt_out on entity <Activity id=1930230520 name='Heßdorf - Dechsendorf' resource_state=2>
No such attribute display_hide_heartrate_option on entity <Activity id=1930230520 name='Heßdorf - Dechsendorf' resource_state=2>
No such attribute visibility on entity <Activity id=1932623703 name='Indoor Running' resource_state=2>
No such attribute heartrate_opt_out on entity <Activity id=1932623703 name='Indoor Running' resource_state=2>
No such attribute display_hide_heartrate_option on entity <Activity id=1932623703 name='Indoor Running' resource_state=2>


## Enrich activity data
For each activity, additional data points like heartrate are provided as streams. This needs to be requested by activity id.

In [None]:
# Lookup heartrate stream
def lookup_heartrate_stream (id):
    
activity_df['heartrate_stream'] = activity_df.apply(
    lambda row: lookup_heartrate_stream(row['id']), axis=1)


## Enrich location data
The location data is provided as latitude/logitude only. We use the Nominatim service to convert this in an address and store country and postcode.

In [11]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="mybinder")

def lookup_address(start_latlng):
    lat = start_latlng.lat
    lon = start_latlng.lon
    loc = geolocator.reverse(str(lat)+", "+str(lon))
    return loc.raw["address"]
    
def lookup_country(start_latlng):
    if start_latlng:
        return lookup_address(start_latlng)["country"]

def lookup_postcode(start_latlng):
    if start_latlng:
        return lookup_address(start_latlng)["postcode"]

activity_df['location_country'] = activity_df.apply(
    lambda row: lookup_country(row['start_latlng']), axis=1)

activity_df['location_postcode'] = activity_df.apply(
    lambda row: lookup_postcode(row['start_latlng']), axis=1)


In [12]:
activity_df

Unnamed: 0,average_cadence,average_heartrate,average_speed,calories,description,distance,elapsed_time,end_latlng,gear,id,location_city,location_country,start_date,start_date_local,start_latitude,start_longitude,start_latlng,type,workout_type,location_postcode
2018-10-27 15:35:09,76.4,130.4,2.95 m / s,,,8272.90 m,00:49:11,"(49.63, 10.91)",,1930230520,,Deutschland,2018-10-27 13:35:09+00:00,2018-10-27 15:35:09,49.63,10.91,"(49.63, 10.91)",Run,,91093.0
2018-10-28 15:40:26,82.1,125.5,2.69 m / s,,,10027.50 m,01:02:14,,,1932623703,,,2018-10-28 14:40:26+00:00,2018-10-28 15:40:26,,,,Run,,
