# 🧪 Analysing a Simulated RCT with Baseline and Follow-up Data
This notebook provides a detailed example of how to analyse a simulated RCT dataset that includes baseline and follow-up data for both primary and secondary outcomes.

**Goals:**
- Explore the structure of the dataset
- Perform a basic power analysis
- Estimate effect sizes
- Compare within- and between-group differences
- Analyse binary outcomes
- Use linear models to adjust for covariates


## 📥 Load and Inspect the Data

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.api as sm
import statsmodels.formula.api as smf

df = pd.read_csv('https://raw.githubusercontent.com/ggkuhnle/data-analysis/main/data/rct_realistic_with_baseline.csv')
df['SBP_Change'] = df['SBP_Before'] - df['SBP_After']
df['LDL_Change'] = df['LDL_Before'] - df['LDL_After']
df.head()

## 🔋 Power Analysis

In [None]:
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
effect_size = 0.5
power = 0.8
alpha = 0.05
required_n = analysis.solve_power(effect_size=effect_size, power=power, alpha=alpha)
print(f"Estimated required sample size per group (d=0.5): {required_n:.1f}")

## 📊 Are Groups Comparable at Baseline?

In [None]:
sns.boxplot(data=df, x='Group', y='SBP_Before')
plt.title('Baseline SBP by Group')
plt.show()
res = stats.ttest_ind(df[df['Group']=='Control']['SBP_Before'],
                      df[df['Group']=='Intervention']['SBP_Before'])
print(f"T-test SBP_Before: t = {res.statistic:.2f}, p = {res.pvalue:.3f}")

## 🎯 Primary Outcome: Change in SBP

In [None]:
sns.boxplot(data=df, x='Group', y='SBP_Change')
plt.title('SBP Change by Group')
plt.show()
t_stat, p_value = stats.ttest_ind(df[df['Group'] == 'Control']['SBP_Change'],
                                  df[df['Group'] == 'Intervention']['SBP_Change'])
print(f"T-test SBP Change: t = {t_stat:.2f}, p = {p_value:.3f}")

## 🧮 Linear Model Adjusting for Covariates

In [None]:
model = smf.ols('SBP_Change ~ Group + Age + C(Sex)', data=df).fit()
print(model.summary())

## 🧪 Secondary Outcome: LDL Cholesterol Change

In [None]:
sns.boxplot(data=df, x='Group', y='LDL_Change')
plt.title('LDL Change by Group')
plt.show()
t_stat, p_value = stats.ttest_ind(df[df['Group'] == 'Control']['LDL_Change'],
                                  df[df['Group'] == 'Intervention']['LDL_Change'])
print(f"T-test LDL Change: t = {t_stat:.2f}, p = {p_value:.3f}")

## ✅ Binary Outcome: Reaching SBP Target

In [None]:
ct = pd.crosstab(df['Group'], df['SBP_Target'])
chi2, p, _, _ = stats.chi2_contingency(ct)
print(f"Chi-squared test: χ² = {chi2:.2f}, p = {p:.3f}")
ct

## 📋 Summary
- Power analysis suggests ~64 participants per group needed for d = 0.5
- No baseline difference in SBP confirms good randomisation
- SBP change significantly greater in intervention group
- LDL change also shows a group effect
- More participants in the intervention group achieved target SBP

This notebook illustrates a realistic RCT analysis with paired outcomes and covariate adjustment.