Skip to content

This is a step-by-step guide to analyze your own Strava data using Python. In this repo, I’ll be sharing the steps I took in performing an Exploratory Data Analysis (EDA) of my Strava data.

License

Notifications You must be signed in to change notification settings

jomarmartinezjordas/analyze-strava-data-using-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analyzing my Strava records using Python

Step-by-step guide to analyze your own Strava data using Python

In this repo, I’ll be sharing the steps I took in performing an Exploratory Data Analysis (EDA) of my Strava data. In the first part, I'll share to you how was I able to get my Strava data and on the second part, I'll show you my results. To make this readme file short, please refer to this Jupyter notebook 🗒️🗒️🗒️ for the step by step codes that I used. I've been coding for quite some time and I realized that I haven't been making some decent projects hence this first one small analysis.

Getting your Strava data

For you to be able to do some analysis, of course you need to get your data. I got my data as a CSV file from the Strava website. To get it, you need to log in to your account then click the down arrow beside your profile picture in Strava then select My Account on the right.

Afterwards scroll down the resulting page and click on "Get Started" button on the 'Download or Delete Your Account Section'. Don't worry, you'll not delete your account.

In the next page, head towards the 'Download Request (optional)' section and click the 'Request your archive' button. You'll see 'Request received' afterwards then the zip file will be sent towards your registered email.

The next step is to go over your email and get the zip file. This may be in your Spam/Junk folder depending on your settings. (At least that's where I found mine). Download and extract it and it should look like this:

The one you're interested with is the 'activities.csv' file. This is the one that we'll use in our analysis.

Then, you can proceed in making a Jupyter Notebook in the IDE of your choice. I did this analysis using VSCode. For the step-by-step code, please refer to this Jupyter notebook.

Exploratory Data Analysis

After cleaning the data, I started by showing the summary statistics table for ease of inspecting the usual statistical values that I'm interested with. Afterwards, I created a pairs plot chat which will illustrate the distribution of the variables and its relationship with the other variables. Then I made a box-plot to visually show the dispertion of my ride data and if there are signs of skewness. Then, I proceeded with illustrating the distance per month that I've covered through a bar plot. And lastly, I made another box-plot to see which quarter am I most actively riding my bike outdoors

activity id elapsed time moving time distance max heart rate elevation gain max speed calories activity_date year dayofyear elapsed hour km per hour
count 1.380000e+02 138.0 138.0 138.0 34.0 138.0 138.0 86.0 138 138.0 138.0 138.0 138.0
mean 6.892066e+09 38999.0 6876.0 34.0 176.0 155.0 12.0 1078.0 2022-03-18 23:33:04.362318848 2022.0 228.0 11.0 12.0
min 3.748056e+09 171.0 171.0 1.0 118.0 0.0 5.0 20.0 2020-07-04 12:57:37 2020.0 14.0 0.0 0.0
25% 5.807027e+09 2619.0 1843.0 9.0 171.0 32.0 10.0 277.0 2021-08-17 06:39:24.750000128 2021.0 168.0 1.0 8.0
50% 7.058098e+09 7865.0 4951.0 25.0 178.0 104.0 12.0 579.0 2022-04-29 05:08:36 2022.0 232.0 2.0 11.0
75% 8.008873e+09 20309.0 9597.0 45.0 181.0 192.0 14.0 1239.0 2022-10-23 22:42:44 2022.0 299.0 6.0 14.0
max 1.056717e+10 3055629.0 38296.0 210.0 241.0 982.0 23.0 7699.0 2023-12-29 05:53:09 2023.0 365.0 849.0 52.0
std 1.535680e+09 264373.0 6911.0 35.0 18.0 178.0 3.0 1311.0 NaN 1.0 89.0 73.0 6.0

The good thing about showing the summary statistics first is that you'll immediately see if there are some extreme and funky values through showing the minimum and maximum values. In my data, it can be observed my maximum elapsed time was 8489rs (Calculated as =3055629/60mins/60hrs) for which I couldn't think of any ride that I made where I was on the bike for that long. However, for the moving time, my max was at 10.63 hrs which was when I did my first Laguna Loop.

Another funky value was my max heart rate which peaked at 241bpm? I don't have the best heart condition but I'm pretty sure that I won't reach that high of a heart rate.

Those two might have skewed my data hence, I think we should exclude those before even proceeding with the other charts and analysis. As I recall, I haven't made an activity which lasted for more than 20 hours. With regard to my max heartrate, there were times that my heart rate monitor was acting strange that's why some of the readings were off. The max reasonable heart rate that I could recall was at 210bpm. Hence, I will set another filter to not include those rides with heart rate greater than 210bpm.

activity id activity date activity type elapsed time moving time distance max heart rate elevation gain max speed calories activity_date start_time start_date_local month year dayofyear elapsed hour km per hour
1 3748239651 Jul 11, 2020, 9:51:06 PM Ride 3789 3115.0 14.95 NaN 64.584747 17.700001 NaN 2020-07-11 21:51:06 21:51:06 2020-07-11 July 2020 193 1.052500 14.204276
2 3979407827 Aug 28, 2020, 9:45:19 PM Ride 2417 1564.0 8.76 NaN 12.994913 11.300000 NaN 2020-08-28 21:45:19 21:45:19 2020-08-28 August 2020 241 0.671389 13.047580
3 4051298220 Sep 12, 2020, 10:17:13 PM Ride 12127 8909.0 42.74 NaN 148.551956 11.400000 NaN 2020-09-12 22:17:13 22:17:13 2020-09-12 September 2020 256 3.368611 12.687722
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
194 10386367323 Dec 15, 2023, 11:27:39 PM Ride 2055 1802.0 8.82 168.0 59.000000 11.894043 286.0 2023-12-15 23:27:39 23:27:39 2023-12-15 December 2023 349 0.570833 15.451095
196 10567168689 Dec 29, 2023, 5:53:09 AM Ride 4504 446.0 1.65 NaN 15.000000 7.702002 61.0 2023-12-29 05:53:09 05:53:09 2023-12-29 December 2023 363 1.251111 1.318828

After excluding those activities, I am now down to 135 records from having 138 records to start with.

png

Using a pairs plot really is very useful as it gives you immediate visualization amongst the variables that you wanted to have an analysis with.

As expected, we can observe that there is a positive relationship between distance and calories -- the longer the distance, the higher number of calories burned. For most of my rides under 100km, I've kept my speed under 30km/hr. And for those rides where I was able to record my heartrate, I was over 160bpm for most of them. This is definitely something that I can improve as I wanted to keep my efforts in Zone 2. And perhaps, I should replace my heart rate monitor.

Next stop is making a box-plot chart for the distance I've covered.

png

First observation that immediately struct me is the lack of 2024 data. Then I realized that I wasn't able to record a bike ride outside as of writing this because I've suffered from a gout attack at the start of the year. I've mostly had indoor rides this year so far.

Extremes was highest during 2023 where I did my Audax Populaire event in January (140km) and a Laguna Loop ride (>200km). In the previous years however, the distance of the rides that I do mostly range upto 100km with the majority being under 50km.

I then proceeded into illustrating the total distance that I've covered on a monthly basis.

png

Here, the total distance I covered per month was shown. there were no data for the first half of 2020 as I started recording my rides when I got my fixed gear in July.

The most that I've done was during the December 2022 when I completed the Rapha Festive 500 challenge.

I guess the best insight I wanted to make out of this is that I should ride more outside.

png

And lastly, I created a box-plot for the quarterly analysis. Given the chart, it can be observed that I ride the most during the first quarter of the year.

Again, I should really ride more outside.

Thank you for going through this simple exploratory data analysis project that I did. I enjoyed coding this and generating insights from my cycling data. I plan to further improve this in the future and perhaps with better data after I replace my heart rate monitor. Please feel free to use this if you wanted to do a simple analysis on your cycling data as well.

About

This is a step-by-step guide to analyze your own Strava data using Python. In this repo, I’ll be sharing the steps I took in performing an Exploratory Data Analysis (EDA) of my Strava data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published