## Leah's Strava Heatmap Experiments 2016-2021

Leah started running consistently in the Fall 2016 right around Thanksgiving.

In [1]:
# to convert -- CLI
# jupyter nbconvert --to hide_code_html strava.ipynb
import os
from datetime import datetime
from glob import glob

import pandas as pd
import geopandas as gpd
import shapely
from shapely.geometry import LineString
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from folium.plugins import HeatMap

from stravalib.client import Client
from stravalib.util import limiter

from process_strava import authenticate, swap
from process_strava import get_activities, get_stream_data 



In [2]:
%%capture
client = Client(rate_limiter=limiter.DefaultRateLimiter())
client = authenticate("strava-secrets.txt", client)

athlete_info = client.get_athlete()
athlete_info

In [3]:
athlete_info

<Athlete id=10295934 firstname='Leah' lastname='Wasser'>

In [None]:
# Get all activities which we use to get activity ideas for gps data below
activities = client.get_activities()

Below, I loop through each ativity and add it to a new list object which 
can then be turned into a dataframe for easy parsing below.

Running this cell takes time as it's accessing a lot of data.

In [None]:
daily_data_df_orig = get_activities(client)

## Summary Data

Below are several tables showing the data in aggregated form.
Summarizing by 

1. Daily Actiities, 
2. Month, 
3. Cumulative by year

This is the data used to create the plots below!

# Spatial Data - The GPS Points Representing Runs on a Map 

This post helped me access this data with `stravalib`. Again stravalib is a Python 
package that makes it easier to access the strava API and associated data requests.

https://medium.com/analytics-vidhya/accessing-user-data-via-the-strava-api-using-stravalib-d5bee7fdde17  


In [None]:
# activity_number = 0
# types = ['time', 'distance', 'latlng', 'altitude',
#          'velocity_smooth', 'moving', 'grade_smooth']
# #activity_data=client.get_activity_streams(df['id'][activity_number], types=types)

# act = 4506113699
# activity_data = client.get_activity_streams(act, types=types)
# activity_data

## Grab spatial run data

The effort below is rate-limited. I figured this out somewhere in another notebook.
The better option is to save each year as a csv file and open the data to process it.


### Download Spatial Data By Year

This operation hits significant rate limits so it's best to 
do it year by year. 

Strava limits used to be 600 every 15 mins. It may be less now but 
it is definitely less with the free account

In [None]:
types = ['time', 'distance', 
         'latlng', 'altitude',
         'velocity_smooth', 
         'moving', 'grade_smooth']

Getting streaming data takes time. so i save each year and combine below to make sure i'm not redownloading data I already have. I could do this programatically in a more clean way...

In [None]:
# data_2016_gdf = get_stream_data(2016, daily_data_df_orig, types)
# data_2017_gdf = get_stream_data(2017, daily_data_df_orig, types)
# data_2018_gdf = get_stream_data(2018, daily_data_df_orig, types)
# data_2019_gdf = get_stream_data(2019, daily_data_df_orig, types)
# data_2020_gdf = get_stream_data(client,
#                                 2020,
#                                 daily_data_df_orig,
#                                 types)
year_to_get = 2021
# Get 2021 data
data_2021_gdf = get_stream_data(client,
                                year_to_get,
                                daily_data_df_orig,
                                types)

# open 2021


In [None]:
# Save all data as is as a pickle for now
#import pickle

# Create a list of all gdf objects
# TODO - change this to open years 2016-2020 when 2020 is done and to
# # then add 2021
# all_years = [data_2020_gdf,
#              data_2019_gdf,
#              data_2018_gdf,
#              data_2017_gdf,
#              data_2016_gdf]

# with open('strava_data', 'wb') as all_years_file:
#     pickle.dump(all_years, all_years_file


# Add 2021 gdf to data?

In [None]:
# The 2021 data will need to be added each time

In [None]:
# Open each of the spatial datasets and combine to create a single gdf
all_shpfiles = sorted(glob(os.path.join("data", "*.shp")))

# Open each shapefile & concat for plotting
all_years = []
for afile in all_shpfiles[0:1]:
    all_years.append(gpd.read_file(afile))

all_years_gdf = pd.concat(all_years,
                          ignore_index=True)

In [None]:
all_years_gdf.tail(3)

In [None]:
# 2021 data plot
f, ax = plt.subplots(figsize=(12, 6))
data_2021_gdf.plot(ax=ax)
ax.set(title="Ugly plot of 2021 spatial data")
plt.show()

In [None]:
f, ax = plt.subplots(figsize=(12, 6))
all_years_gdf.plot(ax=ax)
ax.set(title="Ugly plot of all run spatial data")
plt.show()

In [None]:
# data_2020_gdf = gpd.read_file(all_shpfiles[4])
# # Plot of 2020 data only 
# f, ax = plt.subplots()
# data_2020_gdf.plot(ax=ax)
# ax.set(title="Leah's 2020 Adventures")
# plt.show()

## Interactive Map of All Activities

Below is an interactive plot of my data from 2020!

TODO's:

1. Add other years of data
2. Add label popups maybe?? tbd as lots of overlapping lines here

In [None]:
# Simplify 202 data
# data_simp = data_2021_gdf.copy()
# data_simp["geometry"] = data_simp.simplify(tolerance=.01)

In [None]:
# Plot 2020 data only
# m = folium.Map([39.95, -105.2],
#                zoom_start=11)

# for index, row in data_simp.iterrows():
#     folium.PolyLine(
#         row.xy,
#         line_weight=3,
#         color="purple",
#         opacity=.5
#     ).add_child(folium.Popup(str(row.activity_id))).add_to(m)

# m

In [None]:
# 2021 data
all_years_gdf.head(3)

Below, i'm grabbing only the 2021 data and creating a heat map.
there are bunch of nuances to consider here

1. error in gps - you can see each run points may be close but slightly "off" yielding a less accurate and less clean heat map
2. selecting the gradient of colors. 

lots to play with here to make the final map a bit cleaner. 

In [None]:
def clean_points(gdf):
    # swap x and y for folium because it wants lat first not lon
    gdf["geometry"] = gdf.geometry.map(swap)

    # Get points

    gdf['points'] = gdf.apply(
        lambda x: [y for y in x['geometry'].coords], axis=1)


    line_list = []
    for aline in gdf.points:
        line_list.append([list(tpl) for tpl in aline])

    # now flatten it - this should be all runs in 2021
    return [item for sublist in line_list for item in sublist]
    
    

In [None]:
points_2021 = clean_points(data_2021_gdf)


## Heatmap for 2021 Running Data

In [None]:
# Generate a heat map for 2021 data
m = folium.Map([39.96, -105.27], zoom_start=14)
HeatMap(points_2021,
        name="runs",
        radius=2,
        blur=3,
        gradient={0.1: 'thistle',
                  0.3: 'purple',
                  0.7: 'orange',
                  1: 'indigo'}).add_to(m)
m

In [None]:
all_points = clean_points(all_years_gdf)
len(all_points)

In [None]:
# # Because this takes a while to process, it could make sense to process and export as pickle
# all_years_gdf_points = all_years_gdf.copy()

# # swap x and y for folium because it wants lat first not lon
# all_years_gdf_points["geometry"] = all_years_gdf_points.geometry.map(swap)

# # Get points

# all_years_gdf_points['points'] = all_years_gdf_points.apply(
#     lambda x: [y for y in x['geometry'].coords], axis=1)


# line_list = []
# for aline in all_years_gdf_points.points:
#     line_list.append([list(tpl) for tpl in aline])

# # now flatten it - this should be all runs in 2021
# flat_list = [item for sublist in line_list for item in sublist]
#all_points[4]

In [None]:
# Save all points as a pickle - maybe process by year and save it out?
#all_points[0:400]

In [None]:
# Generate a heat map of all data across time - missing data i believe.
m = folium.Map([39.96, -105.27], zoom_start=14)
HeatMap(all_points, 
        name="runs",
        radius=4, blur=5,
        gradient = {0.1: 'thistle', 
                   0.3: 'lime',
                   0.7: 'orange', 
                   1: 'indigo'}).add_to(m)

m

In [None]:
# Simplify all data
all_years_simp = all_years_gdf.copy()
all_years_simp["geometry"] = all_years_simp.simplify(tolerance=.01)

all_years_simp.crs = "EPSG:4326"

In [None]:
# # This is some experiment with a chloropleth ??
# m = folium.Map([39.95, -105.2],
#                zoom_start=11) 
# folium.Choropleth(
#     all_years_simp,
#     line_weight=3,
#     line_color='blue'
# ).add_to(m)
 
# m

In [None]:
# # Patience - this is parsing a LOT of activities
# # TODO - this doesn't run right now. 
# m = folium.Map([39.95, -105.2],
#                zoom_start=11)

# for aline in all_years_simp.xy:
#     folium.PolyLine(
#         aline,
#         line_weight=3,
#         color="purple",
#         opacity=.5
#     ).add_to(m)

# m

In [None]:
# # this is parsing a LOT of activities 
# # TODO also doesn't run
# map_data.crs = "EPSG:4326"

# m = folium.Map([39.95, -105.2],
#                zoom_start=11)

# for aline in map_data.xy:
#     folium.PolyLine(
#         aline,
#         line_weight=3,
#         color="purple",
#         opacity=.5
#     ).add_to(m)

# m

In [None]:
%%capture
# Convert to html with code hidden!
!jupyter nbconvert --to html --TemplateExporter.exclude_input=True  leah-summary-strava-heat-map.ipynb

# https://github.com/jupyter/nbconvert/issues/944 <- export issues of plotly
# Fix == setting the save state setting -https://github.com/jupyter-widgets/ipywidgets/issues/1632#issuecomment-510138573