-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
230 lines (188 loc) · 10.7 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "80%",
fig.width = 9,
fig.height = 9
)
```
# GoldenCheetahOpenData
<!-- badges: start -->
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/GoldenCheetahOpenData)](https://cran.r-project.org/package=GoldenCheetahOpenData)
[![R build status](https://github.com/ikosmidis/GoldenCheetahOpenData/workflows/R-CMD-check/badge.svg)](https://github.com/ikosmidis/GoldenCheetahOpenData/actions)
[![Travis build status](https://travis-ci.com/ikosmidis/GoldenCheetahOpenData.svg?branch=master)](https://travis-ci.com/ikosmidis/GoldenCheetahOpenData)
[![Codecov test coverage](https://codecov.io/gh/ikosmidis/GoldenCheetahOpenData/branch/master/graph/badge.svg)](https://codecov.io/gh/ikosmidis/GoldenCheetahOpenData?branch=master)
[![Licence](https://img.shields.io/badge/licence-GPL--2-blue.svg)](https://www.gnu.org/licenses/gpl-2.0.en.html)
[![Licence](https://img.shields.io/badge/licence-GPL--3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0.en.html)
<!-- badges: end -->
## Introduction
The **GoldenCheetahOpenData** R package provides methods for querying
the GoldenCheetah OpenData project database <doi:
10.17605/OSF.IO/6HFPZ>, downloading workout data from it and managing
local workout databases. Methods are also provided for the
organization of the workout data into `trackeRdata` objects for
further data analysis and modelling in R using the infrastructure
provided by the [**trackeR**](https://CRAN.R-project.org/package=trackeR) R package.
## Installation
You can install the released version of GoldenCheetahOpenData from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("GoldenCheetahOpenData")
```
The development version can be installed directly from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("ikosmidis/GoldenCheetahOpenData")
```
## Workflow
**GoldenCheetahOpenData** implements a simple query-download-read workflow, that allows users to gradually build and maintain a local repository with workouts from the GoldenCheetah OpenData project.
### Querying GoldenCheetah OpenData project mirrors
The first step in the **GoldenCheetahOpenData** R package workflow is to query the mirrors of the GoldenCheetah OpenData project for the available IDs. This is done with a call to `get_athlete_ids()`, which will create a `gcod_db` object.
```{r athletes, cache = TRUE}
library("GoldenCheetahOpenData")
ids <- get_athlete_ids()
class(ids)
```
### `gcod_db` objects
`gcod_db` objects have two perspectives: a remote perspective describing the state of the GoldenCheetah OpenData project's database, and a local describing the state of any local database (more on this later).
`gcod_db` objects that are produced by `get_athlete_ids()` have no records in their local perspective because none have been downloaded.
```{r output}
print(ids, txtplot = TRUE)
```
The output above also gives us a quick snapshot of GoldenCheetah OpenData project's database. It currently has
```{r size}
format(total_size(ids), unit = "auto")
```
worth of *compressed* workout sessions for
```{r n_ids}
n_ids(ids, perspective = "remote")
```
athletes. And it keeps growing!
### Downloading workouts
`gcod_db` objects can directly be passed into the `download_workouts()` function for getting the workouts for the athlete IDs in the remote perspective or at least a few of them. Below we only get the workouts for those athletes whose IDs contain **b7-9** (as a big fan of the [Star Trek Voyager series](https://en.wikipedia.org/wiki/Star_Trek:_Voyager), this choice made sense to me!).
```{r download, cache = TRUE}
ids_b79 <- download_workouts(ids,
pattern = "b7-9",
local_dir = "~/Downloads/GCOD-db/",
verbose = TRUE)
```
After the above code chunk was run, `download_workouts()` downloaded the requested workout data on my disk and placed them in the directory "~/Downloads/GCOD-db/"
```{r dir}
dir("~/Downloads/GCOD-db")
```
The result from `download_workouts()` is a new `gcod_db` object, which has updated the local and remote perspectives of the original `gcod_db` object
```{r ids_be79}
ids_b79
```
You may save your `gcod_db` objects, but this is not really necessary. They can be reconstructed using the `rebuild_bd()` method, which reads the local workout database and makes appropriate queries to the GoldenCheetah OpenData project's mirrors:
```{r re_ids_b79}
ids_dir <- rebuild_db("~/Downloads/GCOD-db")
ids_dir
ids_b79
athlete_id(ids_dir, perspective = "local")
athlete_id(ids_b79, perspective = "local")
athlete_id(ids_dir, perspective = "remote")
athlete_id(ids_b79, perspective = "remote")
```
Caution! A careless call to `download_workouts()` can easily instruct R to start downloading all workouts from the **GoldenCheetah OpenData** project, which is rarely what you want. Instead, I recommend downloading only a few at a time. This can be done in various ways, including using the `prefix` argument of `get_athlete_ids()`, the `pattern` argument of `download_workouts()` (as above) or --- a bit more advanced --- by directly subsetting the `gcod_db` object
```{r subset}
ids_sub <- subset(ids, subset = grepl("b7-9", athlete_id(ids)), perspective = "remote")
athlete_id(ids_sub)
```
In this way, whenever you rebuild the `gcod_db` object form the local database you are getting access to all the workout files you have ever downloaded.
### Reading workouts
Reading workouts involves: extracting the workout archives for each athlete ID, reading all the `.csv` files in the extracted directories, wrangling the information in them (e.g. inferring the workout timestamps, carrying out data quality checks, imputation, etc), and organizing the resulting data into objects that can be used for further analyses. The `read_workouts()` method can do all the above from a unified interface. For example (and this takes a while):
```{r extract, warning = FALSE, cache = TRUE}
b79 <- read_workouts(ids_b79)
```
By default, `read_workouts()`:
- does not overwrite existing directories with extracted workouts (which can be bypassed by setting `overwrite = TRUE`)
- deletes the extracted directories after everything has been read (which can be bypassed by setting `clean_db = FALSE`)
- writes the processed `trackeRdata` objects (see `?saveRDS`) in the same directory as the workout archives, using the convention `<athlete_id>.rds`.
## Analyses of workout data using `trackeR`
`b79` is now a list of `trackeRdata` objects, and **trackeR** can be used for exploration.
```{r trackeRdata}
library("trackeR")
## Reading was not possible for athlete ID (see `warnings`)
which(is.na(b79))
## so we remove them
b79 <- b79[!is.na(b79)]
## number of workout sessions per athlete ID
sapply(b79, nsessions)
## total duration per athlete ID in hours
sapply(b79, function(x) sum(session_duration(x, duration_unit = "h")))
```
Let's explore further the workout sessions for athlete ID `af3ab0e9-fc82-43b7-9d5b-60d496b77d70`
```{r trackeRdata1, fig.height = 7}
athlete1 <- b79[["af3ab0e9-fc82-43b7-9d5b-60d496b77d70"]]
## Number of sessions
nsessions(athlete1)
## Total workout duration
athlete1_duration <- session_duration(athlete1)
sum(athlete1_duration)
## Only keep workout sessions with duration more than 10 min
athlete1 <- athlete1[athlete1_duration > 10/60]
```
The workout timeline and some workout views can be easily produced using methods from the **trackeR** R package
```{r trackeRdata1.1}
## Training times
timeline(athlete1)
## Power and heart_rate for the 80th to 84th workout
plot(athlete1, session = 40:45, what = c("power", "heart_rate"))
```
We can also compute and visualize summaries for the workout sessions (see, Section 5.2 of the [**trackeR** vignette](https://cran.r-project.org/package=trackeR/vignettes/trackeR.pdf) for details)
```{r trackeRdata2, fig.height = 7}
athlete1_summaries <- summary(athlete1)
## Choose some features (see `?trackeR::plot.trackeRdataSummary`
## for the names of the available summaries, or
## names(data.frame(athlete1_summaries)))
features <- c("duration", "distance", "avgPower", "avgHeartRate", "total_elevation_gain", "wrRatio")
## Plot each feature longitudinally
plot(athlete1_summaries, what = features)
```
and explore the relationships between those summaries
```{r trackeRdata2.1}
## Plot all pairs of features
plot(data.frame(athlete1_summaries)[features])
```
And a bit more advanced analysis: the power concentration profiles (see, Section 5.5 of the [**trackeR** vignette](https://cran.r-project.org/package=trackeR/vignettes/trackeR.pdf) for details and definition of concentration profiles) for this athlete ID are
```{r trackeRdata3, fig.height = 7}
athlete1_cp <- concentration_profile(athlete1, what = c("power"))
plot(athlete1_cp, multiple = TRUE)
```
and a [functional PCA](https://en.wikipedia.org/wiki/Functional_principal_component_analysis) on them gives that the first 2 eigenfunctions of the concentration profiles explain about 90% of the variation in the concentration profiles
```{r}
athlete1_fpca <- funPCA(athlete1_cp, what = "power", nharm = 5)
round(athlete1_fpca$varprop[1:2] * 100, 2)
```
The principal components can then be used for further analyses or as features in further modelling
```{r}
## Check which sessions have power data
has_power <- !is.na(athlete1_summaries$avgPower)
## Scatterplots of the first harmonic against session summaries (high correlation with distance and duration)
plot(cbind(data.frame(athlete1_summaries)[has_power, features], PC1 = athlete1_fpca$scores[, 1]))
## Scatterplots of the second harmonic against session summaries
plot(cbind(data.frame(athlete1_summaries)[has_power, features], PC2 = athlete1_fpca$scores[, 2]))
## etc
```
## Issues
Please use the **GoldenCheetahOpenData** [GitHub issue
page](https://github.com/ikosmidis/GoldenCheetahOpenData/issues) to
report any issues or suggest enhancements or improvements.
## Code of Conduct
Please note that this project is released with a [Contributor Code of Conduct](https://github.com/ikosmidis/GoldenCheetahOpenData/blob/master/CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms.
## References and resources
See the [**trackeR** R package CRAN
page](https://cran.r-project.org/package=trackeR) for vignettes on the
use of trackeR.
Frick, H., Kosmidis, I. (2017). trackeR: Infrastructure for Running
and Cycling Data from GPS-Enabled Tracking Devices in R. *Journal
of Statistical Software*, **82**(7),
1--29. [doi:10.18637/jss.v082.i07](https://doi.org/10.18637/jss.v082.i07)
Liversedge, M. (2020). GoldenCheetah OpenData
Project. OSF. [doi:10.17605/OSF.IO/6HFPZ](https://doi.org/10.17605/OSF.IO/6HFPZ)