# Audio Features KNN 
## Strava API Client

Now that the Spotify features are loaded into a pandas dataframe, a similar dataframe for our health data needs to be built. Over the scope of time we will be observing, all workouts have been uploaded to both Garmin Connect and Strava. The easiest API to integrate with is the Strava API, since no applications for approval are needed to get the initial credentials. 

In this notebook, we will:
- Build a list of all activity ids and timeframes of workouts since March 2020
- Match workout timeframe with recorded song history in `history.csv`
- Build a dataframe including splits for each of the workouts. A split includes:
    - Split time
    - Split distance
    - Average heart rate
    - Elevation difference (optional)
    - Average grade adjusted speed (optional)
- A method to group track ids to split
    - Catch overlapping song (belongs to two splits)

Once these steps are complete, we can continue forward into building the model to cluster songs according to workout output. 


# Set up notebook for local development

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
from tools import cd_to

cd_to(full_path="/Users/per.morten.halvorsen@schibsted.com/personal/website/radio")

Moving to /Users/per.morten.halvorsen@schibsted.com/personal/website/radio...
Current working directory: /Users/per.morten.halvorsen@schibsted.com/personal/website/radio


# Authorize Strava API Client

Authorize via browser login:
`https://www.strava.com/oauth/authorize?client_id=100479&client_secret=104b3ce334f234529230b7b940329c3493d68cb2&response_type=code&redirect_uri=http://perhalvorsen.com&approval_prompt=force&scope=activity:read_all`

POST `https://www.strava.com/oauth/token?client_id=100479&client_secret=104b3ce334f234529230b7b940329c3493d68cb2&code=846957b8031838ba77daed50a880152b38b1b551&grant_type=authorization_code`


Response:<br/>
```
    "refresh_token": "57e0108089eead9b4191b17314adaa3b208cd71a",
    "access_token": "4c501d86c7b4903994614cbaa465b3f3a9ca1b79",
```

In [4]:
tokens = {
    "refresh_token": "57e0108089eead9b4191b17314adaa3b208cd71a",
    "access_token": "4c501d86c7b4903994614cbaa465b3f3a9ca1b79",
}

In [5]:
refresh = "5b3a271fcb4a958de7b9bf78f5ec3a24430c28fc"
client_id = "100479"
client_secret = "104b3ce334f234529230b7b940329c3493d68cb2"

In [6]:
from tools import refresh_access_token

tokens = refresh_access_token(
    client_id=client_id,
    client_secret=client_secret,
    refresh_token=tokens["refresh_token"],
)


Returned token info: 
 {'token_type': 'Bearer', 'access_token': '3931a00e2aa81c0054cf08affe27cffb6922a241', 'expires_at': 1678644428, 'expires_in': 11653, 'refresh_token': '57e0108089eead9b4191b17314adaa3b208cd71a'}


In [7]:
tokens

{'token_type': 'Bearer',
 'access_token': '3931a00e2aa81c0054cf08affe27cffb6922a241',
 'expires_at': 1678644428,
 'expires_in': 11653,
 'refresh_token': '57e0108089eead9b4191b17314adaa3b208cd71a'}

# Get Activity

**List Athlete Activities (getLoggedInAthleteActivities)**<br/>
Returns the activities of an athlete for a specific identifier. 

Requires 
- activity:read. Only Me activities will be filtered out unless requested by a token with activity:read_all.

GET
/athlete/activities

**Parameters**<br/>
- before:
Integer, in query	An epoch timestamp to use for filtering activities that have taken place before a certain time.
- after:
Integer, in query	An epoch timestamp to use for filtering activities that have taken place after a certain time.
- page:
Integer, in query	Page number. Defaults to 1.
- per_page:
Integer, in query	Number of items per page. Defaults to 30.

**Responses**<br/>
- HTTP code 200	An array of SummaryActivity objects.
- HTTP code 4xx, 5xx	A Fault describing the reason for the error.


```
http GET 
"https://www.strava.com/api/v3/athlete/activities" 
"Authorization: Bearer [[token]]"

params:
    before=
    after=
    page=
    per_page=
```

In [8]:
import requests

def get_athlete(
        access_token=tokens["access_token"],
        url = "https://www.strava.com/api/v3/athlete/",
    ):
    header = {"Authorization": f"Bearer {access_token}"}
    payload = {
        "activity": "read_all",
    }

    # post request
    response = requests.post(
        url=url,
        # data=payload,
        headers=header,
    )
    return response.json()

get_athlete()

{'id': 40990567,
 'username': 'pmc_h',
 'resource_state': 2,
 'firstname': 'Per',
 'lastname': 'Halvorsen',
 'bio': 'A fish out of water.',
 'city': 'Oslo',
 'state': 'Norway',
 'country': None,
 'sex': 'M',
 'premium': True,
 'summit': True,
 'created_at': '2019-04-11T08:04:45Z',
 'updated_at': '2023-02-25T21:04:50Z',
 'badge_type_id': 1,
 'weight': 89.0,
 'profile_medium': 'https://dgalywyr863hv.cloudfront.net/pictures/athletes/40990567/13168113/1/medium.jpg',
 'profile': 'https://dgalywyr863hv.cloudfront.net/pictures/athletes/40990567/13168113/1/large.jpg',
 'friend': None,
 'follower': None}

In [9]:
def get_activities(
        access_token=tokens["access_token"],
        url = "https://www.strava.com/api/v3/athlete/activities/",
    ):
    header = {"Authorization": f"Bearer {access_token}"}
    payload = {
        "activity": "read",
    }

    # post request
    response = requests.post(
        url=url,
        data=payload,
        headers=header,
    )
    return response.json()

activities = get_activities()

In [10]:
import json 

for i, activity in enumerate(activities):
    print(json.dumps(activity, indent=4))
    if i>4:
        break

{
    "resource_state": 2,
    "athlete": {
        "id": 40990567,
        "resource_state": 1
    },
    "name": "Afternoon Yoga",
    "distance": 0.0,
    "moving_time": 3286,
    "elapsed_time": 3286,
    "total_elevation_gain": 0,
    "type": "Yoga",
    "sport_type": "Yoga",
    "id": 8697242695,
    "start_date": "2023-03-11T15:10:54Z",
    "start_date_local": "2023-03-11T16:10:54Z",
    "timezone": "(GMT+01:00) Africa/Algiers",
    "utc_offset": 3600.0,
    "location_city": null,
    "location_state": null,
    "location_country": null,
    "achievement_count": 0,
    "kudos_count": 1,
    "comment_count": 0,
    "athlete_count": 1,
    "photo_count": 0,
    "map": {
        "id": "a8697242695",
        "summary_polyline": "",
        "resource_state": 2
    },
    "trainer": true,
    "commute": false,
    "manual": false,
    "private": false,
    "visibility": "everyone",
    "flagged": false,
    "gear_id": null,
    "start_latlng": [],
    "end_latlng": [],
    "average_sp

In [11]:
import pandas as pd 

df = pd.DataFrame(activities)

ids = list(df.id[df.elapsed_time>1200])
len(ids)

28

In [12]:
idx = ids.index(8670834905)
ids[idx]

8670834905

In [13]:
def get_activity(
        access_token=tokens["access_token"],
        id=0,
        url = "https://www.strava.com/api/v3/activities/{id}/",
    ):
    header = {"Authorization": f"Bearer {access_token}"}
    payload = {
        "activity": "read",
    }

    # post request
    response = requests.post(
        url=url.format(id=id),
        data=payload,
        headers=header,
    )
    return response.json()

activity = get_activity(id=ids[idx])
activity

{'resource_state': 3,
 'athlete': {'id': 40990567, 'resource_state': 1},
 'name': 'Afternoon Run',
 'distance': 8305.1,
 'moving_time': 3236,
 'elapsed_time': 3662,
 'total_elevation_gain': 226.0,
 'type': 'Run',
 'sport_type': 'Run',
 'workout_type': None,
 'id': 8670834905,
 'start_date': '2023-03-06T16:56:09Z',
 'start_date_local': '2023-03-06T17:56:09Z',
 'timezone': '(GMT+01:00) Europe/Oslo',
 'utc_offset': 3600.0,
 'location_city': None,
 'location_state': None,
 'location_country': None,
 'achievement_count': 6,
 'kudos_count': 6,
 'comment_count': 0,
 'athlete_count': 3,
 'photo_count': 0,
 'map': {'id': 'a8670834905',
  'polyline': 'i{qlJsccaAl@NdAZd@BRCj@UX@JDNNVj@Jd@FnA?rABb@NTnAEdAA|@Fr@JjBHP?^HL?hB_@tAIvH_Br@SnBu@\\BLFRCh@[j@c@pAuA^]bAw@PK`@K\\]^Qp@@n@Ep@K`@M^AVI\\WRIzAKnKUP@n@?`@DbA@RGFEJARB?BEBBiADmARyAPk@\\aATa@j@s@b@UrA]nAOf@KxBO\\QN@IGYPSFwDd@YHeAPw@\\o@r@QVQ\\i@~ASnAEh@ElB?{@B{@Fg@VsAXeARe@`AoAZU`@S\\Kr@OdAMP@nBSt@SJKDMEZIFMEg@TeE^uAT[Jg@^i@t@]r@e@|AWnBGhA?n@BgBJqALo

In [14]:
activity.keys()

dict_keys(['resource_state', 'athlete', 'name', 'distance', 'moving_time', 'elapsed_time', 'total_elevation_gain', 'type', 'sport_type', 'workout_type', 'id', 'start_date', 'start_date_local', 'timezone', 'utc_offset', 'location_city', 'location_state', 'location_country', 'achievement_count', 'kudos_count', 'comment_count', 'athlete_count', 'photo_count', 'map', 'trainer', 'commute', 'manual', 'private', 'visibility', 'flagged', 'gear_id', 'start_latlng', 'end_latlng', 'average_speed', 'max_speed', 'average_cadence', 'average_temp', 'has_heartrate', 'average_heartrate', 'max_heartrate', 'heartrate_opt_out', 'display_hide_heartrate_option', 'elev_high', 'elev_low', 'upload_id', 'upload_id_str', 'external_id', 'from_accepted_tag', 'pr_count', 'total_photo_count', 'has_kudoed', 'suffer_score', 'description', 'calories', 'perceived_exertion', 'prefer_perceived_exertion', 'segment_efforts', 'splits_metric', 'splits_standard', 'laps', 'best_efforts', 'photos', 'stats_visibility', 'hide_from

In [15]:
laps_df = pd.DataFrame(activity["laps"])

In [16]:
laps_df

Unnamed: 0,id,resource_state,name,activity,athlete,elapsed_time,moving_time,start_date,start_date_local,distance,...,total_elevation_gain,average_speed,max_speed,average_cadence,device_watts,average_heartrate,max_heartrate,lap_index,split,pace_zone
0,29346036903,2,Lap 1,"{'id': 8670834905, 'resource_state': 1}","{'id': 40990567, 'resource_state': 1}",281,281,2023-03-06T16:56:09Z,2023-03-06T17:56:09Z,1000.0,...,15.2,3.56,4.098,82.0,False,139.7,150.0,1,1,2
1,29346036906,2,Lap 2,"{'id': 8670834905, 'resource_state': 1}","{'id': 40990567, 'resource_state': 1}",397,219,2023-03-06T17:00:54Z,2023-03-06T18:00:54Z,709.73,...,9.6,3.24,4.287,79.5,False,156.4,166.0,2,2,2
2,29346036909,2,Lap 3,"{'id': 8670834905, 'resource_state': 1}","{'id': 40990567, 'resource_state': 1}",111,111,2023-03-06T17:07:30Z,2023-03-06T18:07:30Z,376.11,...,23.2,3.39,4.564,76.8,False,162.2,177.0,3,3,2
3,29346036910,2,Lap 4,"{'id': 8670834905, 'resource_state': 1}","{'id': 40990567, 'resource_state': 1}",28,28,2023-03-06T17:09:22Z,2023-03-06T18:09:22Z,15.79,...,0.0,0.56,2.64,45.0,False,174.7,178.0,4,4,1
4,29346036911,2,Lap 5,"{'id': 8670834905, 'resource_state': 1}","{'id': 40990567, 'resource_state': 1}",125,125,2023-03-06T17:09:50Z,2023-03-06T18:09:50Z,373.64,...,0.0,2.99,3.432,76.9,False,131.9,154.0,5,5,1
5,29346036914,2,Lap 6,"{'id': 8670834905, 'resource_state': 1}","{'id': 40990567, 'resource_state': 1}",195,109,2023-03-06T17:13:22Z,2023-03-06T18:13:22Z,386.65,...,21.2,3.55,4.208,79.3,False,161.1,183.0,6,6,2
6,29346036916,2,Lap 7,"{'id': 8670834905, 'resource_state': 1}","{'id': 40990567, 'resource_state': 1}",28,28,2023-03-06T17:15:12Z,2023-03-06T18:15:12Z,32.94,...,0.0,1.18,3.411,50.9,False,176.5,183.0,7,7,1
7,29346036918,2,Lap 8,"{'id': 8670834905, 'resource_state': 1}","{'id': 40990567, 'resource_state': 1}",144,141,2023-03-06T17:15:40Z,2023-03-06T18:15:40Z,379.41,...,0.0,2.69,3.368,75.0,False,134.7,164.0,8,8,1
8,29346036920,2,Lap 9,"{'id': 8670834905, 'resource_state': 1}","{'id': 40990567, 'resource_state': 1}",115,115,2023-03-06T17:18:05Z,2023-03-06T18:18:05Z,380.76,...,19.6,3.31,4.017,77.4,False,169.8,180.0,9,9,2
9,29346036922,2,Lap 10,"{'id': 8670834905, 'resource_state': 1}","{'id': 40990567, 'resource_state': 1}",306,249,2023-03-06T17:20:01Z,2023-03-06T18:20:01Z,353.28,...,0.0,1.42,2.601,53.4,False,113.7,179.0,10,10,1


In [17]:
def get_activity_zones(
        access_token=tokens["access_token"],
        id=0,
        url = "https://www.strava.com/api/v3/activities/{id}/zones/",
    ):
    header = {"Authorization": f"Bearer {access_token}"}
    payload = {
        "activity": "read",
    }

    # post request
    response = requests.post(
        url=url.format(id=id),
        data=payload,
        headers=header,
    )
    return response.json()

zones_json = get_activity_zones(
    access_token=tokens["access_token"],
    id=ids[idx]
)

zones_json

[{'score': 38.0,
  'distribution_buckets': [{'min': 0, 'max': 136, 'time': 756},
   {'min': 136, 'max': 170, 'time': 1377},
   {'min': 170, 'max': 186, 'time': 749},
   {'min': 186, 'max': 203, 'time': 14},
   {'min': 203, 'max': -1, 'time': 0}],
  'type': 'heartrate',
  'resource_state': 3,
  'sensor_based': True,
  'points': 0,
  'custom_zones': False},
 {'score': 3,
  'distribution_buckets': [{'max': 3.122941981057581, 'min': 0, 'time': 1595},
   {'max': 3.628123772111013, 'min': 3.122941981057581, 'time': 437},
   {'max': 4.0414543284274576, 'min': 3.628123772111013, 'time': 328},
   {'max': 4.31700803263842, 'min': 4.0414543284274576, 'time': 231},
   {'max': 4.592561736849383, 'min': 4.31700803263842, 'time': 113},
   {'max': -1, 'min': 4.592561736849383, 'time': 192}],
  'type': 'pace',
  'resource_state': 3,
  'sensor_based': True}]

In [18]:
ids[idx]

8670834905

In [19]:
splits_df = pd.DataFrame(activity['splits_standard'])

In [20]:
splits_df

Unnamed: 0,distance,elapsed_time,elevation_difference,moving_time,split,average_speed,average_grade_adjusted_speed,average_heartrate,pace_zone
0,1610.2,454,2.4,454,1,3.55,3.66,146.169604,3
1,1617.1,833,-7.4,568,2,2.85,3.07,148.694981,1
2,1604.2,697,-0.6,637,3,2.52,2.72,146.488449,1
3,1609.5,654,1.6,552,4,2.92,3.15,168.874,2
4,1608.5,831,40.8,831,5,1.94,2.21,155.445967,1
5,255.6,194,1.6,194,6,1.32,1.49,144.291045,1


In [21]:
splits_metric_df = pd.DataFrame(activity['splits_metric'])
splits_metric_df

Unnamed: 0,distance,elapsed_time,elevation_difference,moving_time,split,average_speed,average_grade_adjusted_speed,average_heartrate,pace_zone
0,1001.6,282,3.8,282,1,3.55,3.68,139.698582,3
1,998.5,480,6.0,302,2,3.31,3.55,156.367133,2
2,1001.5,451,-1.0,364,3,2.75,3.0,150.193939,1
3,1000.2,457,-14.4,454,4,2.2,2.27,133.164733,1
4,1007.0,367,7.8,310,5,3.25,3.63,165.117241,3
5,992.1,451,-3.0,349,6,2.84,3.04,166.305732,1
6,1002.5,351,1.2,351,7,2.86,3.09,175.567073,1
7,997.7,606,33.8,606,8,1.65,1.89,147.158242,1
8,304.0,218,4.2,218,9,1.39,1.64,145.986928,1


In [22]:
streams = """time
distance
latlng
altitude
velocity_smooth
heartrate
cadence
watts
temp
moving
grade_smooth""".split("\n")
streams


['time',
 'distance',
 'latlng',
 'altitude',
 'velocity_smooth',
 'heartrate',
 'cadence',
 'watts',
 'temp',
 'moving',
 'grade_smooth']

In [80]:
def get_activity_stream(
        access_token=tokens["access_token"],
        id=0,
        stream_type="heartrate",  # https://developers.strava.com/docs/reference/#api-models-StreamSet
        url = "https://www.strava.com/api/v3/activities/{id}/streams/",
    ):
    header = {"Authorization": f"Bearer {access_token}"}
    payload = {
        # "activity": "read",
        "keys": stream_type,
        "key_by_type": True

    }

    # get request
    response = requests.get(
        url=url.format(id=id),
        params=payload,
        headers=header,
    )
    return response.json()


def build_stream_df(
        access_token=tokens["access_token"],
        id=0,
        stream_type="heartrate",
): 
    response = get_activity_stream(
        access_token=access_token,
        id=id,
        stream_type=stream_type,
    )

    if "latlng" in response.keys():
        data = {
            "distance": response["distance"]["data"],
            "lat": [x[0] for x in response["latlng"]["data"]],
            "lng": [x[1] for x in response["latlng"]["data"]],
        }
    else:
        data = {
            key: response[key]["data"]
            for key in response.keys() 
        }

    return pd.DataFrame(data)



In [73]:
def get_streams_df(
    access_token=tokens["access_token"],
    id=ids[idx],
    streams=["cadence", "heartrate", "altitude", "latlng", "velocity_smooth", "grade_smooth"],
):
    df = build_stream_df(
        access_token=access_token,
        id=id,
        stream_type="time",
    )


    for stream in streams:
        tmp_df = build_stream_df(
            access_token=access_token,
            id=id,
            stream_type=stream,
        ).drop(columns="distance")
        df = pd.concat([df, tmp_df], axis=1)
        
    return df.set_index("time")


In [82]:
streams_df = get_streams_df(streams=["cadence", "heartrate", "altitude", "velocity_smooth"])
streams_df

Unnamed: 0_level_0,distance,cadence,heartrate,altitude,velocity_smooth
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,0.0,20,117,126.2,0.000
1,3.2,20,117,125.8,0.000
7,26.1,88,118,125.2,3.729
8,30.0,89,117,125.0,3.829
9,33.7,86,117,124.8,3.807
...,...,...,...,...,...
3658,8296.8,58,116,164.8,0.874
3659,8298.1,75,117,164.8,1.024
3661,8302.0,70,120,164.6,1.316
3662,8303.9,69,120,164.6,1.610


# Plots

In [70]:
pd.options.plotting.backend = "plotly"

In [72]:
fig = streams_df.plot()
fig

In [81]:
figs = []
for col in streams_df.columns:
    fig = streams_df[col].plot()
    fig.show()
    figs.append(fig)