Skip to content
Analyse your researcher degrees of freedom
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Joachim Gassen

Researcher Degrees of Freedom Analysis

A package to explore and document your degrees of freedom

This in-development package provides a coding infrastructure that allows researchers to systematically document and explore their researcher degrees of freedom when conducting analyses on observational data. The resulting code base is self-documenting, supports unit testing and power simulations based on simulated data. The documented researcher degrees of freedom can be exhausted to generate a distribution of outcome estimates.

To provide a quick tour we will construct a research design where an independent variable x is confounded by a co-variate z and where the only researcher degree of freedom is whether to control for z in the regression setup. We will ignore the testing bit in this quick walk-through.

For a more in-depth introduction into the package, please refer to the vignette included in the documentation.

Step 1: Open a new Rstudio project in a clean directory and install the rdfanalysis package


Step 2: Write a function that simulates data

sim_data <- function(n, effect_size) {
  z <- rnorm(n)
  x <- rnorm(n) + z 
  y <- effect_size*x + z + rnorm(n) 
  data.frame(x = x, y = y, z = z)

Step 3: Define your research design by a series of functions

Each research design consists of a series of steps. Each step becomes a function. To initialize these functions, you can use the call define_design(). It creates a code directory in your current working directory and produces template code for each step. In this case, our design will have only one step.

design <- define_design("est_model")

Step 4: Develop your code for each step

Edit the resulting template file est_model.R in the code directory until it looks like the code below.

est_model <- function(input = NULL, choice = NULL) {
  step_description <- c(
    "## Estimate model",
    "### Content",
    "This step estimates on OLS model based on simulated data."
  choice_description <- c(
    "### Choice",
    "A character value `control_for_z` that may take one of the following values:",
    "- `yes`: control for z",
    "- `no`: do not control for z"
  choice_type <- list(
    list(name = "control_for_z", 
         type = "character", 
         valid_values = c("yes", "no"))
  if (is.null(choice)) return(list(
    step_description = step_description,
    choice_description = choice_description,
    choice_type = choice_type
  )) else check_choice(choice, choice_type)

  # ___ Analysis code starts below ___
  if(choice[[1]] == "yes") 
    mod <- lm(y ~ x + z, data = input)
  else     mod <- lm(y ~ x, data = input)
    data = list(
      est = summary(mod)$coefficient[2,1],
      lb = confint(mod)[2,1],
      ub = confint(mod)[2,2]
    protocol = list(choice)

Step 5: Source your code


Step 6: Test your code

test_design(design, input = sim_data(100, 0.1), reporter = "minimal")
## ........................

Step 7: Document your design

The below serves documentation purposes. The function prepare_design_documentation() generates a PDF file in your project directory that documents your code. For it to work you need a local R Markdown installation that is capable to produce PDF files.

prepare_design_documentation(design, output_file = "my_design.pdf")

prepare_design_flow_chart() produces a quick visual of the code flow.

prepare_design_flow_chart(design, landscape = TRUE)

Step 8: Run a single protocol of choices

sim_data(100, 0.1) %>%
## $data
## $data$est
## [1] 0.236353
## $data$lb
## [1] 0.01058808
## $data$ub
## [1] 0.4621179
## $protocol
## $protocol[[1]]
## [1] "yes"

Step 9: Assess the power of your analysis for a certain protocol

power_df <- simulate_design_power(design, protocol = list("yes"), 
                                  input_sim_func = sim_data, 
                                  range_n = seq(100, 1000, 100),
                                  effect_size = 0.1)

power_df %>%
  group_by(n) %>%
  summarise(power = sum(lb > 0)/n()) %>%
  ggplot(aes(x = n, y = power)) +
  geom_line() + 

Step 10: Exhaust your researcher degrees of freedom

df <- exhaust_design(design, sim_data(1000, 0.1)) 
control_for_z est lb ub
yes 0.1014760 0.0394621 0.1634900
no 0.6016612 0.5489681 0.6543544

Only two researcher degrees of freedom in this setting but you will easily get into the thousands in a real research setting. For a real-life case study on how to use rdfanalysis, please refer to the vignette included in the documentation.

You can’t perform that action at this time.