/
README.Rmd
145 lines (99 loc) · 5.46 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
output:
github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
## vbvs.concurrent: Fitting Methods for the Functional Linear Concurrent Model
[![](https://travis-ci.org/jeff-goldsmith/vbvs.concurrent.svg?branch=master)](https://travis-ci.org/jeff-goldsmith/vbvs.concurrent)
[![codecov.io](https://codecov.io/gh/jeff-goldsmith/vbvs.concurrent/coverage.svg?branch=master)](https://codecov.io/gh/jeff-goldsmith/vbvs.concurrent?branch=master)
[![status](http://joss.theoj.org/papers/a3884174f17dbb9a695f7b8658887eff/status.svg)](http://joss.theoj.org/papers/a3884174f17dbb9a695f7b8658887eff)
```{r echo=FALSE}
knitr::opts_chunk$set(
comment = '#>',
collapse = TRUE,
warning = FALSE,
message = FALSE,
eval = TRUE,
cache = FALSE
)
```
Author: Jeff Goldsmith
License: [GPL-3](https://opensource.org/licenses/GPL-3.0)
Version: 0.1
---------------
Functional data analysis is concerned with understanding measurements made over time, space, frequencies, and other domains for multiple subjects. Given the ubiquity of wearable devices, it is common to obtain several data streams monitoring blood pressure, physical activity, heart rate, location, and other quantities on study participants in parallel. Each of these data streams can be thought of as functional data, and the functional linear concurrent model is useful for relating predictor data to an outcome.
This package implements two statistical methods (with and without variable selection) for estimating the parameters in the functional linear concurrent model; these methods are described in detail [here](http://jeffgoldsmith.com/Downloads/VBVS.pdf). Additional functions to create predictions based on parameter estimates, to extract model coefficients, and to choose tuning parameters via cross validation are included. Interactive visualizations are supported through the [refund.shiny](https://github.com/refunders/refund.shiny) package.
### Installation
---------------
You can install the latest version directly from GitHub with [devtools](https://github.com/hadley/devtools):
```{r, results='hide', eval=FALSE}
install.packages("devtools")
devtools::install_github("jeff-goldsmith/vbvs.concurrent")
```
Interactive plotting is implemented through [refund.shiny](https://github.com/refunders/refund.shiny), which can be installed from CRAN or GitHub.
### Example of use
---------------
The code below simulates a dataset under the functional linear concurrent model. For each of 50 subjects, observations of two predictor functions and a response function are observed over times between 0 and 1. The predictors and the coefficients that relate them to the response vary over time.
```{r simulate_data}
library(tidyverse)
## set design elements
set.seed(1)
I = 50
p = 2
## coefficient functions
beta1 = function(t) { 1 }
beta2 = function(t) { cos(2*t*pi) }
## generate subjects and observation times
concurrent.data =
data.frame(
subj = rep(1:I, each = 20)
) %>%
mutate(time = runif(dim(.)[1])) %>%
arrange(subj, time) %>%
group_by(subj) %>%
mutate(Cov_1 = runif(1, .5, 1.5) * sin(2 * pi * time),
Cov_2 = runif(1, 0, 1) + runif(1, -.5, 2) * time,
Y = Cov_1 * beta1(time) +
Cov_2 * beta2(time) +
rnorm(20, 0, .5)) %>%
ungroup()
```
The plot below shows the predictors and the response, highlighting four subjects.
```{r plot_data, fig.align='center', fig.height=3, fig.width=9, echo=FALSE}
library(gridExtra)
plot_function = function(value_var, data) {
ggplot(data, aes_string(x = "time", y = value_var, group = "subj")) + geom_path(alpha = .1) +
theme_bw() +
geom_path(data = filter(data, subj %in% 1:4) %>% mutate(subj = as.factor(subj)), aes(color = subj)) +
theme(legend.position = "none")
}
panels = lapply(c("Cov_1", "Cov_2", "Y"), plot_function, data = concurrent.data)
grid.arrange(panels[[1]], panels[[2]], panels[[3]], nrow = 1)
```
To fit the functional linear concurrent model, we can use `vb_concurrent`. Alternatively, we can use `vbvs_concurrent` which is similar but adds variable selection to the estimation approach.
```{r fit}
library(vbvs.concurrent)
fit_vb = vb_concurrent(Y ~ Cov_1 + Cov_2 | time, id.var = "subj", data = concurrent.data,
t.min = 0, t.max = 1, standardized = TRUE)
fit_vbvs = vbvs_concurrent(Y ~ Cov_1 + Cov_2 | time, id.var = "subj", data = concurrent.data,
t.min = 0, t.max = 1, standardized = TRUE)
```
The plot below shows true coefficients and estimates without using variable selection.
```{r coefficients, echo=FALSE, fig.height=3, fig.width=9, echo=FALSE}
coefs = coef(fit_vb, t.new = seq(0, 1, length = 101)) %>%
mutate(True_1 = beta1(time),
True_2 = beta2(time)) %>%
gather(key, value, Cov_1:True_2) %>%
extract(key, c("Estimate", "Beta"), "([[:alnum:]]+)_([[:alnum:]]+)")
ggplot(coefs, aes(x = time, y = value, color = Estimate)) + geom_path() +
facet_wrap(~Beta) + theme_bw()
```
Interactive graphics show observed data, coefficient functions, and residual curves. The code below will produce such a graphic.
```{r interactive_plot, eval=FALSE}
library(refund.shiny)
plot_shiny(fit_vb)
plot_shiny(fit_vbvs)
```
### Contributions
---------------
If you find small bugs, larger issues, or have suggestions, please file them using the [issue tracker](https://github.com/jeff-goldsmith/vbvs.concurrent/issues) or email the maintainer at <jeff.goldsmith@columbia.edu>. Contributions (via pull requests or otherwise) are welcome.