# Running

I love running. I also collect a lot of data about my runs through my Garmin watch. It's about time I did something interesting with it.

## Where to get the data from?

Garmin Connect and Strava are the two obvious places to retrieve my running history. Both give the user the option to bulk export their data history.

- Garmin - [https://www.garmin.com/en-US/account/datamanagement/exportdata/](https://www.garmin.com/en-US/account/datamanagement/exportdata/)
- Strava - [https://www.strava.com/athlete/delete_your_account](https://www.strava.com/athlete/delete_your_account) (don't be scared by the page name)

These need to be compiled so these requests can take a while. They both send a link to your email address for you to retrieve the data. For me, Strava took about 5 mins, Garmin took about 20 mins. After having a quick look through both folders, Strava seems a lot more intuitive to understand, so for now I'll focus on files from there.

## File types

There are several file types across the folders=

- `csv` - the activites summary is in this format
- `gpx` - routes and some activities are stored as this
- `fit.gz` - most of the activities are in this format. These are zipped .fit files.

## Running overview
The activities summary file provides top level statistics about my activities to date. I've tackled this aspect of the data in an [R script](https://github.com/patricktudor/running/blob/main/Activity%20summary%20visualisations.R) because tidyverse is epic.

## Individual activity data
In this notebook I'm going to focus on tackling the activity `gpx` and `gz` files.

In [116]:
# import packages
import pandas as pd
import os
import fnmatch
import glob
import gzip

from pathlib import Path

First get a list of activities that are runs.

In [198]:
# open file
activities = pd.read_csv("running-data-exports/Strava/activities.csv")

# get Activity IDs for run activities
metrics_run = activities.loc[activities['Activity Type'] == 'Run', ['Activity ID', 'Filename']]
run_ids = metrics_run['Filename'].to_list()

# add file type to IDs
# run_ids = [str(run) + '.fit.gz' for run in run_ids]

data_folder = Path('running-data-exports/Strava/')

run_files = []

for run in run_ids:
    file_to_open = Path(data_folder / run)
    run_files.append(file_to_open)
    

# get run activities using glob
# note that recursive = True is required if '**' is specified for the directory
activity_files = glob.glob('**/*.fit.gz', recursive = True)


In [199]:
my_directory = os.getcwd()
print(my_directory)

C:\Users\ptudor\Documents\GitHub\running


In [200]:
run_ids

['activities/143805738.tcx.gz',
 'activities/143805734.tcx.gz',
 'activities/143859088.tcx.gz',
 'activities/143859096.tcx.gz',
 'activities/143859092.tcx.gz',
 'activities/143859101.tcx.gz',
 'activities/143859107.tcx.gz',
 'activities/143859109.tcx.gz',
 'activities/143859133.tcx.gz',
 'activities/143859111.tcx.gz',
 'activities/143859134.tcx.gz',
 'activities/143859132.tcx.gz',
 'activities/143859119.tcx.gz',
 'activities/143859127.tcx.gz',
 'activities/145453522.tcx.gz',
 'activities/145453524.tcx.gz',
 'activities/145844041.fit.gz',
 'activities/146261229.fit.gz',
 'activities/157810046.fit.gz',
 'activities/160644683.fit.gz',
 'activities/189025150.fit.gz',
 'activities/189025156.fit.gz',
 'activities/193105453.fit.gz',
 'activities/195327903.fit.gz',
 'activities/198107559.fit.gz',
 'activities/201247305.fit.gz',
 'activities/201802592.fit.gz',
 'activities/202411662.fit.gz',
 'activities/203266659.fit.gz',
 'activities/205719582.fit.gz',
 'activities/205784727.fit.gz',
 'activi

In [204]:
len(run_files)

837

## .FIT files
A .fit.gz file is a zipped .fit file. FIT stands for Flexible and Interoperable Data Transfer. They are for storing data originating from health devices from Garmin / Ant.

Lets open one fit.gz file and have a look at it.

In [203]:
# select one file
my_run = run_files[0]

with gzip.open(my_run, 'r') as run:
    for line in run:
        print(line) 

b'          <?xml version="1.0" encoding="UTF-8"?>\n'
b'<TrainingCenterDatabase\n'
b'  xsi:schemaLocation="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd"\n'
b'  xmlns:ns5="http://www.garmin.com/xmlschemas/ActivityGoals/v1"\n'
b'  xmlns:ns3="http://www.garmin.com/xmlschemas/ActivityExtension/v2"\n'
b'  xmlns:ns2="http://www.garmin.com/xmlschemas/UserProfile/v2"\n'
b'  xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"\n'
b'  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns4="http://www.garmin.com/xmlschemas/ProfileExtension/v1">\n'
b'            <Activities>\n'
b'              <Activity Sport="Running">\n'
b'      <Id>2014-04-16T17:34:54.000Z</Id>\n'
b'      <Lap StartTime="2014-04-16T17:34:54.000Z">\n'
b'        <TotalTimeSeconds>376.804</TotalTimeSeconds>\n'
b'        <DistanceMeters>1000.0</DistanceMeters>\n'
b'        <MaximumSpeed>4.021999835968018</MaximumSpeed>\n'
b'        <C

b'                <RunCadence>81</RunCadence>\n'
b'              </TPX>\n'
b'            </Extensions>\n'
b'          </Trackpoint>\n'
b'          <Trackpoint>\n'
b'            <Time>2014-04-16T17:42:28.000Z</Time>\n'
b'            <Position>\n'
b'              <LatitudeDegrees>51.564907394349575</LatitudeDegrees>\n'
b'              <LongitudeDegrees>-4.073271341621876</LongitudeDegrees>\n'
b'            </Position>\n'
b'            <AltitudeMeters>66.4000015258789</AltitudeMeters>\n'
b'            <DistanceMeters>1203.1700439453125</DistanceMeters>\n'
b'            <HeartRateBpm>\n'
b'              <Value>138</Value>\n'
b'            </HeartRateBpm>\n'
b'            <Extensions>\n'
b'              <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">\n'
b'                <Speed>2.6410000324249263</Speed>\n'
b'                <RunCadence>80</RunCadence>\n'
b'              </TPX>\n'
b'            </Extensions>\n'
b'          </Trackpoint>\n'
b'          <Trackpoint>\n'
b' 

b'                <RunCadence>79</RunCadence>\n'
b'              </TPX>\n'
b'            </Extensions>\n'
b'          </Trackpoint>\n'
b'          <Trackpoint>\n'
b'            <Time>2014-04-16T17:52:09.000Z</Time>\n'
b'            <Position>\n'
b'              <LatitudeDegrees>51.563083408400416</LatitudeDegrees>\n'
b'              <LongitudeDegrees>-4.062165068462491</LongitudeDegrees>\n'
b'            </Position>\n'
b'            <AltitudeMeters>85.5999984741211</AltitudeMeters>\n'
b'            <DistanceMeters>2577.830078125</DistanceMeters>\n'
b'            <HeartRateBpm>\n'
b'              <Value>142</Value>\n'
b'            </HeartRateBpm>\n'
b'            <Extensions>\n'
b'              <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">\n'
b'                <Speed>2.3420000076293945</Speed>\n'
b'                <RunCadence>80</RunCadence>\n'
b'              </TPX>\n'
b'            </Extensions>\n'
b'          </Trackpoint>\n'
b'          <Trackpoint>\n'
b'     

b'              <Value>136</Value>\n'
b'            </HeartRateBpm>\n'
b'            <Extensions>\n'
b'              <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">\n'
b'                <Speed>2.1459999084472656</Speed>\n'
b'                <RunCadence>81</RunCadence>\n'
b'              </TPX>\n'
b'            </Extensions>\n'
b'          </Trackpoint>\n'
b'          <Trackpoint>\n'
b'            <Time>2014-04-16T17:59:16.000Z</Time>\n'
b'            <Position>\n'
b'              <LatitudeDegrees>51.55927064828575</LatitudeDegrees>\n'
b'              <LongitudeDegrees>-4.066391224041581</LongitudeDegrees>\n'
b'            </Position>\n'
b'            <AltitudeMeters>68.4000015258789</AltitudeMeters>\n'
b'            <DistanceMeters>3404.0</DistanceMeters>\n'
b'            <HeartRateBpm>\n'
b'              <Value>134</Value>\n'
b'            </HeartRateBpm>\n'
b'            <Extensions>\n'
b'              <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension

b'                <Speed>2.4820001125335693</Speed>\n'
b'                <RunCadence>84</RunCadence>\n'
b'              </TPX>\n'
b'            </Extensions>\n'
b'          </Trackpoint>\n'
b'          <Trackpoint>\n'
b'            <Time>2014-04-16T18:07:40.000Z</Time>\n'
b'            <Position>\n'
b'              <LatitudeDegrees>51.563008306548</LatitudeDegrees>\n'
b'              <LongitudeDegrees>-4.074660977348685</LongitudeDegrees>\n'
b'            </Position>\n'
b'            <AltitudeMeters>29.600000381469727</AltitudeMeters>\n'
b'            <DistanceMeters>4644.31982421875</DistanceMeters>\n'
b'            <HeartRateBpm>\n'
b'              <Value>133</Value>\n'
b'            </HeartRateBpm>\n'
b'            <Extensions>\n'
b'              <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">\n'
b'                <Speed>2.434999942779541</Speed>\n'
b'                <RunCadence>79</RunCadence>\n'
b'              </TPX>\n'
b'            </Extensions>\n'
b'       

b'            <AltitudeMeters>64.5999984741211</AltitudeMeters>\n'
b'            <DistanceMeters>5845.4599609375</DistanceMeters>\n'
b'            <HeartRateBpm>\n'
b'              <Value>150</Value>\n'
b'            </HeartRateBpm>\n'
b'            <Extensions>\n'
b'              <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2">\n'
b'                <Speed>2.3420000076293945</Speed>\n'
b'                <RunCadence>81</RunCadence>\n'
b'              </TPX>\n'
b'            </Extensions>\n'
b'          </Trackpoint>\n'
b'          <Trackpoint>\n'
b'            <Time>2014-04-16T18:16:57.000Z</Time>\n'
b'            <Position>\n'
b'              <LatitudeDegrees>51.566161243245006</LatitudeDegrees>\n'
b'              <LongitudeDegrees>-4.086668472737074</LongitudeDegrees>\n'
b'            </Position>\n'
b'            <AltitudeMeters>64.19999694824219</AltitudeMeters>\n'
b'            <DistanceMeters>5866.81982421875</DistanceMeters>\n'
b'            <HeartRateBpm>\n'
b'

What a mess!

A chap called dtcooper has created a python library called [fitparse](https://github.com/dtcooper/python-fitparse) to parse .FIT files. Lets install it and see how it can help.

In [58]:
# install the package
# pip install fitparse

import fitparse

In [73]:
my_run = run_files[0]

my_run_unzipped = []

with gzip.open(my_run, 'r') as run:
    my_run_unzipped.append(run)
    fitfile = fitparse.FitFile(my_run_unzipped[0])

    # this next bit is taken from dtcooper github page

    # Iterate over all messages of type "record"
    # (other types include "device_info", "file_creator", "event", etc)
    for record in fitfile.get_messages("record"):

        # Records can contain multiple pieces of data (ex: timestamp, latitude, longitude, etc)
        for data in record:

            # Print the name and value of the data (and the units if it has any)
            if data.units:
                print(" * {}: {} ({})".format(data.name, data.value, data.units))
            else:
                print(" * {}: {}".format(data.name, data.value))

        print("---")

 * altitude: 29.799999999999955 (m)
 * cadence: 0 (rpm)
 * distance: 0.0 (m)
 * enhanced_altitude: 29.799999999999955 (m)
 * enhanced_speed: 0.0 (m/s)
 * fractional_cadence: 0.0 (rpm)
 * position_lat: 615739871 (semicircles)
 * position_long: -47370982 (semicircles)
 * speed: 0.0 (m/s)
 * timestamp: 2017-03-18 08:41:43
---
 * altitude: 40.39999999999998 (m)
 * cadence: 0 (rpm)
 * distance: 6.15 (m)
 * enhanced_altitude: 40.39999999999998 (m)
 * enhanced_speed: 4.133 (m/s)
 * fractional_cadence: 0.0 (rpm)
 * position_lat: 615739217 (semicircles)
 * position_long: -47372798 (semicircles)
 * speed: 4.133 (m/s)
 * timestamp: 2017-03-18 08:41:48
---
 * altitude: 28.399999999999977 (m)
 * cadence: 88 (rpm)
 * distance: 29.69 (m)
 * enhanced_altitude: 28.399999999999977 (m)
 * enhanced_speed: 3.704 (m/s)
 * fractional_cadence: 0.0 (rpm)
 * position_lat: 615737774 (semicircles)
 * position_long: -47376148 (semicircles)
 * speed: 3.704 (m/s)
 * timestamp: 2017-03-18 08:41:56
---
 * altitude: 27

In [25]:
# get run activities using os.walk
run_files = []

for dirpath, dirnames, files in os.walk(my_directory):
    for file_name in files:
        if fnmatch.fnmatch(file_name, '*.fit.gz'):
            run_file = dirpath + '\\' + file_name
            run_files.append(run_file)
            
# print(run_files)   

In [50]:
run_files[0]

'running-data-exports\\Strava\\activities\\1003625911.fit.gz'

In [83]:
metrics_run.shape

(837, 1)