# Table of Content

- [Table of Content](#table-of-content)
- [0-General](#0-general)
  - [0-Introduction](#0-introduction)
  - [0-Objective](#0-objective)
  - [0-Analysis](#0-analysis)
  - [0-Data Dictionary](#0-data-dictionary)
  - [0-Acknowledgements](#0-acknowledgements)
- [1-Sample Size Calculation](#1--sample-size-calculation)

## 0-Acknowledgements
[Back to Table of Content](#table-of-content)

We extended our gratitude to the entire research group of [Prof. Dr. med. Seifert](https://kinderonkologie.charite.de/forschung/ag_seifert/team/) and all participating kindergartens and participants for their invalubale contributions to this study as well as thanks to the funding agencies. Statistical analysis was conducted by Dr. Steven Schepanski, who also oversaw this notebook.

In [None]:
# install packages
install.packages(c("dplyr", "ggplot2", "tidyr", "readr", "purrr", "stringr",
                "lubridate", "data.table", "plyr", "janitor", "reshape2", "readxl"))

In [None]:
# load packages
library(dplyr)
library(ggplot2)
library(tidyr)
library(readr)
library(purrr)
library(stringr)
library(lubridate)
library(data.table)
library(plyr)
library(janitor)
library(reshape2)
library(readxl)
library(mice)
library(gridExtra)
library(rlang)

# 1- Sample Size Calculation
[Back to Table of Content](#table-of-content)

SUMMARY!!

In [None]:
library(pwr)

In [None]:
# Reported results from the referenced paper
intervention_baseline_mean <- 86.6
intervention_baseline_sd <- 52.3
intervention_week_12_mean <- 55.8
intervention_week_12_sd <- 53.2
intervention_sd_change <- 37.1

control_baseline_mean <- 79.4
control_baseline_sd <- 61.4
control_week_12_mean <- 69.4
control_week_12_sd <- 67.2
control_sd_change <- 41.5


In [None]:
# Calculate the pooled standard deviation of change
pooled_sd_change <- sqrt(((intervention_sd_change^2) + (control_sd_change^2)) / 2)


In [None]:
# Calculate the mean difference in change
mean_diff <- intervention_week_12_mean - control_week_12_mean


In [None]:
# Assumptions based on the previous study
effect_size <- mean_diff / pooled_sd_change
alpha <- 0.05         # Significance level
power <- 0.80         # Power of the study

In [None]:
# Calculate the sample size for each group
sample_size <- pwr.t.test(d = effect_size, sig.level = alpha, power = power, type = "two.sample")$n

In [None]:
print(sample_size)

In [None]:
# Adjust for 30% dropout
adjusted_sample_size <- sample_size * 1.20

In [None]:
# Round up to nearest whole number
total_sample_size <- ceiling(adjusted_sample_size)
sample_size_per_group <- ceiling(adjusted_sample_size / 2)

In [None]:
# Output the results
cat("Total sample size required: ", total_sample_size, "\n")
cat("Sample size per group: ", sample_size_per_group, "\n")

In [51]:
# Install required package
if (!requireNamespace("pwr")) install.packages("pwr")
library(pwr)

# Input data from the study
delta <- -0.72  # Mean difference (e.g., between groups)
ci_upper <- -0.32  # Upper limit of the confidence interval
ci_lower <- -1.12  # Lower limit of the confidence interval
n_omnivore <- 72  # Sample size for the omnivore group
n_vegan <- 52  # Sample size for the vegan group
z_value <- 1.96  # Z-value for 95% confidence level

# Step 1: Calculate Standard Error (SE)
se <- (ci_upper - ci_lower) / (2 * z_value)
cat("Standard Error (SE):", se, "\n")

# Step 2: Calculate Pooled Standard Deviation (SD_pool)
sd_pool <- se / sqrt((1 / n_omnivore) + (1 / n_vegan))
cat("Pooled Standard Deviation (SD_pool):", sd_pool, "\n")

# Step 3: Calculate Cohen's d
cohens_d <- abs(delta) / sd_pool
cat("Cohen's d:", cohens_d, "\n")

# Parameters for sample size calculation
alpha <- 0.05  # Significance level
power <- 0.80  # Desired power

# Simulated R^2 values (total variance explained by baseline + sex + SES)
R2_total_values <- seq(0.1, 0.5, by = 0.05)  # Total R^2 (baseline + covariates)

# Number of predictors in the model
u <- 5  # Group effect + 4 additional covariates (sex, SES, maternal BMI, parental education)

# Function to calculate sample size with additional covariates
calculate_sample_size_covariates <- function(R2_total, d, alpha, power, u) {
  f2 <- d^2 / (1 - R2_total)  # Adjusted effect size for ANCOVA
  pwr.f2.test(u = u, v = NULL, f2 = f2, sig.level = alpha, power = power)$v + 1
}

# Calculate sample sizes for varying total R^2
sample_sizes_covariates <- sapply(R2_total_values, calculate_sample_size_covariates, 
                                  d = abs(cohens_d), alpha = alpha, power = power, u = u)

# Adjust for 30% dropout
dropout_rate <- 0.30
adjusted_sample_sizes <- ceiling(sample_sizes_covariates / (1 - dropout_rate))

# Create a data frame with results
results_covariates <- data.frame(
  R2_Total = R2_total_values,
  Required_Sample_Size_per_Group = ceiling(sample_sizes_covariates),
  Adjusted_Sample_Size_per_Group = adjusted_sample_sizes
)

# Print the updated results
print(results_covariates)

Standard Error (SE): 0.2040816 
Pooled Standard Deviation (SD_pool): 1.121401 
Cohen's d: 0.6420539 
  R2_Total Required_Sample_Size_per_Group Adjusted_Sample_Size_per_Group
1     0.10                             29                             42
2     0.15                             28                             40
3     0.20                             26                             38
4     0.25                             25                             35
5     0.30                             23                             33
6     0.35                             22                             31
7     0.40                             20                             29
8     0.45                             19                             27
9     0.50                             17                             25


# Interpretation of the Results

The calculations show that for \( R^2 = 0.10 \), which assumes the covariates (baseline, biological sex, socioeconomic status, maternal BMI, and parental education) together explain only 10% of the variability in the outcome, the required sample size per group is 29 participants. Accounting for a 30% dropout rate, this increases to 42 participants per group, for a total of 84 participants. This conservative assumption minimizes the expected explanatory power of the covariates, ensuring the study is robust even if their actual impact is weaker than anticipated.

In practical terms, the study will need to recruit a total of 84 participants to maintain sufficient power (80%) to detect the observed moderate effect size (\( d = 0.642 \)) at a 5% significance level, even in the presence of attrition. By considering potential dropout, the design reduces the risk of underpowering the study due to participant loss, ensuring the results remain reliable and interpretable.

## Practical Implications for the Study

Recruitment efforts and resources should focus on achieving the adjusted target of 42 participants per group. Dropout mitigation strategies, such as regular follow-ups, minimizing participant burden, and offering appropriate incentives, will be essential to keep dropout within the planned margin of 30%. Including covariates like biological sex, SES, maternal BMI, and parental education enhances the precision of the study by reducing unexplained variability, allowing for a smaller sample size than would otherwise be required. 

The observed effect size (\( d = 0.642 \)) is moderate, suggesting a meaningful difference between the two groups. The inclusion of covariates ensures the sample size calculation accounts for their potential impact while remaining cautious in estimating their explanatory power.

## Justification for Choosing \( R^2 = 0.10 \)

Taking \( R^2 = 0.10 \) is a conservative approach that ensures the study design is robust. This value reflects a minimal assumption about the explanatory power of the covariates, avoiding overconfidence in their contribution. By doing so, the study avoids the risk of underpowering due to an overestimated \( R^2 \). If the actual \( R^2 \) is higher (e.g., 0.3 or 0.5), the planned sample size will remain sufficient. This conservative assumption is particularly important when planning for covariates whose actual impact is uncertain, ensuring the reliability of the study's findings.

## Conclusion

Under the conservative assumption of \( R^2 = 0.10 \), the study requires 42 participants per group (84 total), including a 30% allowance for dropout. This approach ensures that the study remains adequately powered to detect the observed effect size while accounting for attrition and variability in covariate effects. This robust and realistic design balances statistical rigor with practical feasibility, ensuring reliable results that are generalizable and interpretable.
