# ðŸ©º 4.7 Analysing a Simulated Clinical Trial

In this notebook, weâ€™ll walk through the analysis of a **simulated clinical trial** â€” a typical approach in nutrition and health research. Youâ€™ll simulate data, generate a "Table 1", explore distributions, calculate effect sizes (frequentist and Bayesian), and visualise outcomes.

**ðŸŽ¯ Objectives**  
- Simulate a clinical trial dataset  
- Create a "Table 1" for baseline characteristics  
- Inspect distributions  
- Calculate effect sizes (Cohenâ€™s d & Bayesian posterior)  
- Visualise data

**ðŸ”¬ Context**  
We simulate a trial comparing a biomarker between a Control and Intervention group (n=100). This is akin to testing a new dietary intervention versus a standard diet.

<details><summary>ðŸ¦› Fun Fact</summary>
Clinical trials are like hippos testing a new swimming spot â€” cautious, data-driven, and a little splashy!
</details>

In [None]:
import pandas as pd
import numpy as np
import ipywidgets as widgets
from IPython.display import display

participant_slider = widgets.IntSlider(value=100, min=50, max=200, step=10, description='Participants:')
display(participant_slider)

def simulate_data(n_participants):
    np.random.seed(42)
    df = pd.DataFrame({
        'participant_id': range(1, n_participants + 1),
        'age': np.random.normal(40, 10, n_participants),
        'bmi': np.random.normal(27, 4, n_participants),
        'group': np.random.choice([0, 1], size=n_participants)
    })
    df['outcome'] = np.where(
        df['group'] == 0,
        np.random.normal(0, 2, n_participants),
        np.random.normal(1, 2, n_participants)
    )
    return df

df = simulate_data(participant_slider.value)
df.head()

In [None]:
table1 = df.groupby('group')[['age', 'bmi']].agg(['mean', 'std']).round(1)
table1.columns = ['Age (Mean)', 'Age (SD)', 'BMI (Mean)', 'BMI (SD)']
table1.index = ['Control', 'Intervention']

table1['Age'] = table1.apply(lambda row: f"{row['Age (Mean)']} Â± {row['Age (SD)']}", axis=1)
table1['BMI'] = table1.apply(lambda row: f"{row['BMI (Mean)']} Â± {row['BMI (SD)']}", axis=1)
display(table1[['Age', 'BMI']].style.set_caption("ðŸ“‹ Table 1: Baseline Characteristics"))

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

var_selector = widgets.Dropdown(
    options=['age', 'bmi', 'outcome'],
    value='outcome',
    description='Variable:',
)

display(var_selector)

def plot_distribution(variable):
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    sns.histplot(data=df, x=variable, hue='group', kde=True, ax=axes[0])
    axes[0].set_title(f'{variable.capitalize()} Histogram by Group')

    sns.boxplot(data=df, x='group', y=variable, ax=axes[1])
    axes[1].set_title(f'{variable.capitalize()} Boxplot by Group')
    axes[1].set_xticks([0, 1])
    axes[1].set_xticklabels(['Control', 'Intervention'])

    plt.tight_layout()
    plt.show()

plot_distribution(var_selector.value)

In [None]:
from scipy.stats import ttest_ind

group0 = df[df['group'] == 0]['outcome']
group1 = df[df['group'] == 1]['outcome']

mean_diff = group1.mean() - group0.mean()
pooled_sd = np.sqrt(((group0.std() ** 2) + (group1.std() ** 2)) / 2)
cohens_d = mean_diff / pooled_sd

t_stat, p_val = ttest_ind(group1, group0)

print(f"Cohen's d: {cohens_d:.2f} â†’ {'Small' if cohens_d < 0.2 else 'Medium' if cohens_d < 0.5 else 'Large'} effect")
print(f"T-test: t = {t_stat:.2f}, p = {p_val:.3f}")

## âœ… Summary

Youâ€™ve completed the following:
- Simulated trial data
- Created a standard Table 1
- Explored distributions
- Compared outcome via frequentist & Bayesian lenses
- Visualised the results

**ðŸ’¡ Reflection**  
Why is effect size important beyond p-values?  
How might baseline imbalance affect interpretation?

**ðŸ“˜ Next**: Explore regression modelling in `4.6_logistic_and_survival.ipynb`