The goal of flexl is to implement flexible models for simple longitudinal data. Given repeated observations on different subjects, we model how the mean response for each subject varies over time.
Writing for the observation on subject at time point , we model where is a function describing how the mean response for subject varies over time, and are independent error terms. Our interest is in estimating the subject-specific mean functions . In doing this, we allow flexible dependence on time, and make similarities between these curves to improve estimation.
Details of the methods used by flexl are given in this preprint.
flexl can be installed by using
devtools::install_github("heogden/flexl")
For demonstration purposes, we use some simulated longitudinal data, with 20 subjects and 10 observations per subject (see Section 5.4 of the preprint for details of how the data are generated).
library(flexl)
data_full <- simulate_1dv(1, -0.5, 0.1, 0.5, 0.1, 20, 10)
data <- data_full$data
The data is in the format needed by flexl: it contains columns c
(an
identifier for the subject), x
(the time the observation was made) and
y
(the response).
We can plot the data:
library(tidyverse)
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_wrap(vars(c), ncol = 5) +
xlab("Time") +
ylab("Response")
We can then fit the model using flexl:
mod <- fit_flexl(data)
We can then plot out our estimated subject-specific mean curves, with pointwise 95% confidence intervals. Since this is simulated data, we can compare the fitted mean curves against the true mean curves.
pred_data <- data_full$pred_data %>%
mutate(mu_c_hat = predict_flexl(mod,
newdata = list(x = x, c = c),
interval = TRUE))
ggplot(pred_data, aes(x = x)) +
geom_line(aes(y = mu_c_hat$estimate)) +
geom_ribbon(aes(ymin = mu_c_hat$lower, ymax = mu_c_hat$upper),
alpha = 0.3) +
geom_line(aes(y = mu_c), linetype = "dashed") +
facet_wrap(vars(c), ncol = 5) +
xlab("Time") +
ylab("Subject-specific mean")
In this simulated example, the fitted mean curves (solid lines) and the true mean curves (dashed lines) match closely. The confidence intervals are narrow in this case, but contain the true mean most of the time.