This blog gives the results of using Python analytics with fitness data.
The fitness data relates to a ride from the Great Dublin Cycle - a 100km cycling sportif that took place in Dublin last September.
The data was generated with a Garmin bike computer and is stored in a computer file in the Garmin activity format.
Python analytics was used to analyze the Garmin activity file containing sensor data such as time, distance, cadence, heart_rate and altitude.    

The same activity file was uploaded to the Garmin Connect and Strava web-sites.
This allows the Python analytics to be bench-marked against the equivalent features from the Garmin and Strava web-sites.

This blog finds that Python can analyse fitness data just as accurately as the well know branded web-sites such as Garmin Connect and Strava.  This opens the possibility to use Python analytics to provide customized and personalized fitness analytics for the performance athlete or serious amateur athlete. 




## Comparison of Summary Statistics

This section compares the summary statistics generated by the three systems - Python Analytics, Garmin Connect and Strava.  The following summary statistics are compared:
- distance
- time (elapsed, activity, moving)
- average speed
- max speed
- heart rate (average and maximum)
- cadence (average and maximum)

### Python Analytics Summary Statistics

The summary statistics generated by Python Analytics are as follows:
<img src='python_summary_stats.png'>

Here are some notes on summary statistics from Python Analytics
- distance agrees with Garmin
- elapsed time agrees with Strava
- activity time  agrees (to within one second) with time metric from Garmin
- moving time differs in all three systems: Python number is within a few seconds of Strava
- average speed agrees with Garmin
- max speed differs in all three systems: Python number is in between number reported by Garmin and Strava
- avg heart rate is within one bpm of Starva and Garmin.  Max heart rate agrees with Garmin and Strava
- avg cadence is within one rpm of Starva and Garmin.  Max cadence agrees with Strava

### Garmin Connect Summary Statistics
The summary statistics generated by Garmin Connect are as follows:
<img src='garmin_summary_stats.png'>

Here are some notes on summary statistics from Garmin Connect
- elapsed time is one second less than elapsed time from Strava and Python - this is possibly a very small calculation error.
- the Garmin time metric agrees (to within one second) with Python activity time metric.  Strava does not report this metric.
- the Garmin moving time metric is over a minute lower than either the Strava or Python moving times.  This may indicate that the Garmin moving time algorithm sets a higher threshold to classify a data point as moving.
- max speed figure from Garmin is higher than either Python or Strava
- max cadence figure does not agree with either Python or Strava

### Strava Summary Statistics
The summary statistics generated by Strava are as follows:
<img src='strava_summary_stats.png'>

Here are some notes on summary statistics from Strava
- Strava recalculates all distance values from the raw GPS data.  It reports a distance metric that is half a kilometer greater than the distance metric reported by Garmin
- elapsed time agrees with Python (and is within one second of Garmin)
- it does not report an activity time metric
- Strava uses moving time metric for its summary metrics.  Its moving time is within a few seconds of Python, and over a minute greater than Garmin
- average speed is higher than Garmin and Python
- max speed differs in all three systems: Strava reports the lowest max speed


## Data Visualization of Fitness Activity
A number of web-sites provide visualizations of an athlete's activity over distance and time.  In this section we review the data visualization from Python Analytics and compare with Garmin and Strava.


### Python Analytics Data Visualization
The data visualization from Python are generated using the Python matplotlib visualization package.  This package is extensively used by the research community to create plots for scientific papers.  The header section of the visualization provides a quick overview of the activity using the main summary statistics.  This is followed by visualizations of speed, heart rate, cadence and altitude.  Temperature will be added at a future date.
<img src="summary_distance.png">


### Garmin Connect Data Visualization
The data visualizations from Garmin Connect show speed, heart rate, cadence, altitude and temperature. 
<img src="garmin_distance.png">


### Strava  Data Visualization
The data visualizations from Strava show speed, heart rate, cadence, altitude, estimated power and temperature.
<img src="strava_distance.png">

## Conclusions
This blog compares the results from Python analytics with similiar summary statistics and data visualizations from two leading fitness websites.  It shows that Python Analytics can produce an accurate and reliable analysis of fitness data.

Python analytics is a powerful platform that is widely used in the financial services industry.  Python is the language of choice for many data scientists.  The potential of Python analytics is to provide customized and personalized analysis of fitness data that is not currently catered for by the current websites. And as a result provide new insights into the performance of the athlete.



In [None]:
from IPython.display import Image
#Image(filename='/opt/jupyter/data/test.png') 

In [None]:
!pwd

### new metrics


### new analytics


## Conclusions



For many years I have been interested in running and biking - and using technology to measure and analyze this activity.  I use a Garmin GPS enabled bike computer.  This gives me that stardard time, distance and speed stats that almost all GSP enabled apps give.  However when I bike I have additional sensors connected to the bike computer.  These allow the bike computer to also capture my heart-rate, the bike cadence (pedalling strokes per minute) and altitude.  All this data is saved during the ride onto a data file on the bike computer.  For years I have uploaded these data files to web-sides such as Garmin and Strava for data analytics.

This blog discusses a new approach to analyzing this data. It discusses pulling this data into the Python Pandas progamming platform for data analytics.  Python is major open source computer language that powers data science.  Pandas is the analytics platform in Python developed primarily for use in the financial service industry.  The blog looks at the early results of applying the Python Pandas to analysing cycling data.

*******


Running and biking are particularly suited for traking using GPS enabled devices such as smart phones, GPS enabled wrist watches or bike computers.  For biking I use a Garmin Edge 500 bike computer.  Smart phones, and all other GPS enabled devices, give time, distance and speed by default.  The Edge 500 gives these metrics also and more - it also gives altitude, heart rate, cadence and temperature.  

A bike computer, such as my Edge 500, has two basic functions.  On the road it gives a real time display of activity information such as speed, heart rate, cadence, distance travelled and elapsed time.  It also saves all this activity data on the device so that it can be uploaded after the activity and analysed.  Two common web-sites for analyzing such data are Garmin and Strava.  In this blog I describe how I have pulled this bike data into Python and developed programs to do a similar analysis to these web-sites.

In my opinion web-sites such as Garmin and Strava provide very little real analytics that is useful to the serious athlete.  The data analytics from these web-sites is basic and focuses on giving summarized, aggregate level data.  Its a one size fits all approach.  My first task in delivering personalized analytics is to develop similiar summarized reports using Python.  

The figure below shows the aggregate level data and graphs of speed, heart rate, cadence and altitude from Python.  The graphs shows this data over local time.  The greyed out areas represents waiting times or breaks.  The first greyed out area represents a wait of about 30 minutes before the event start.  The second greyed out area represents the wait at the event feeding station, about 66km into the 100km event.  Note these greyed out areas are not represented on the Strave or Garmin web-sites: these are a visualization specific to the Python analysis.  The greyed out areas make clear that this is actually a composite activities.




The two most common approaches to analysing cycling activity are over distance and over time.  This section takes a single bike ride from the Great Dublin Cycle.  It uses Python powered analytics to reproduce the common summary metrics and visualizations found on web-sites such as Garmin and Strava.  This allows us to compare and validate the output from Python with external web-sites.


### Cycling Analytics Over Time

It is very common to report cycling metrics over time.  The time based report needs to tally with the distance report - so waiting time, break times and other non-moving times are filtered out.  By filtering out these non-moving times the report also reconciles with the data the cyclist saw live during the ride.  It is common to report time this way in external web-sites.  The reprot below is produced from the Python system.  The summary stats do not change - however the horizontal axis is now activity time in hours, minutes and seconds.  The maximum time is the duration of actual time spend cycling during the event

<img src="summary_activity_time.png">



## Customized Analysis of Athlete Data

The purpose of using Python to analyse cycling data is to deliver personalized insights into activity data for the serious athlete.  

### Cycling Data showing local time and breaks
<img src="summary_local_time.png">
