# Frequentist vs Bayesian in Nutrition & Food Science 🥗📊

> **Thesis:** In real nutrition science—small n, messy measurements, meta-analytic prior knowledge, sequential looks at data, multiple outcomes—the Bayesian workflow is not just philosophically nicer. It is **operationally superior**: clearer questions, honest uncertainty, principled sequential monitoring, and decisions framed by utility. Frequentist NHST is fine for idealised, single-shot experiments with large samples and fixed protocols. That’s rarely our world.

### What we’ll do
1) Set the stage: what Frequentist and Bayesian actually compute.
2) Show classic Frequentist traps: dichotomised p-values, misread CIs, optional stopping, multiplicity, and the garden of forking paths.
3) Run a small **clinical supplement** example both ways.
4) Do the **Bayesian things Frequentists can’t** (or won’t): posterior probability of clinically meaningful benefit, ROPE (practical equivalence), decision analysis, prior sensitivity, and sequential monitoring.

You’ll leave with code patterns you can drop into real nutrition trials, cohort analyses, and metabolomics experiments.

***
## 1) What the two camps actually compute

**Frequentist**: Assume a fixed, unknown parameter; reason about hypothetical repetitions of the experiment. A *p*-value is \(P(\text{data or more extreme} \mid H_0)\). A 95% CI is a random interval that *would* contain the true value in 95% of infinite repetitions. **It is not** the probability the parameter lies in that interval. NHST then dichotomises evidence at an arbitrary threshold.

**Bayesian**: Treat the parameter as uncertain; update beliefs with data. Compute a **posterior** \(p(\theta\mid\text{data})\). A 95% **credible interval** *is* the set that contains the parameter with 95% probability. You can ask the question you actually care about: \(P(\text{effect} < -0.3\,\text{mmol/L} \mid \text{data})\), or expected utility.

**Nutrition reality check**: Small n (pilot trials), multiple endpoints (biomarkers), prior literature (meta-analyses), interim looks (DSMBs), heterogeneity (subgroups). Which paradigm fits this reality better?


## 2) Frequentist failure modes (with receipts)

1) **Dichotomisation**: “p<0.05 = works, p>0.05 = nothing there.” The world is graded; decisions aren’t binary.
2) **CI misinterpretation**: 95% CI is *not* a 95% probability statement. Students, reviewers, clinicians routinely misread it that way.
3) **Optional stopping**: Peeking at data until p<0.05 **inflates false positives** unless special sequential methods are pre-registered. And almost nobody does that rigorously in small nutrition studies.
4) **Multiplicity**: Many outcomes, time points, subgroups → p-values explode without careful correction; power collapses. People cherry-pick.
5) **No mechanism to use prior knowledge**: Meta-analytic effect sizes? Mechanistic priors? With NHST, they stay on the sidelines.

Bayesian analysis **solves all five** in a single, coherent calculus: posterior probabilities about quantities of interest, with built-in sequential updating and explicit priors.

<hr />
## 3) A simple nutrition RCT (supplement → fasting glucose change)
Two arms, 25 per arm.
- Control mean change: 0 (SD 0.3)
- Treatment mean change: −0.4 mmol/L (SD 0.3)
- Prior knowledge: meta-analysis suggests ≈ −0.5 ± 0.2 mmol/L benefit.

We’ll run both **Frequentist t-test** and a **Bayesian model**; then we’ll ask the actually useful questions: *What’s the probability the effect is at least −0.3? What is the expected utility if we treat everyone?* We’ll also do a **prior sensitivity** check and a quick **sequential monitoring** demo that shows why peeking doesn’t break Bayes.

In [None]:
# Setup
import numpy as np, pandas as pd
import matplotlib.pyplot as plt, seaborn as sns
from scipy.stats import ttest_ind, norm
import pymc as pm, arviz as az
sns.set_style('whitegrid'); np.random.seed(11088)

# Simulate trial
n = 25
control = np.random.normal(0.0, 0.3, size=n)
treat   = np.random.normal(-0.4, 0.3, size=n)
df = pd.DataFrame({
    'group': ['Control']*n + ['Treatment']*n,
    'delta_glucose': np.r_[control, treat]
})
plt.figure(figsize=(6,4))
sns.boxplot(data=df, x='group', y='delta_glucose'); plt.title('Δ Fasting Glucose by Group')
plt.xlabel(''); plt.ylabel('mmol/L'); plt.tight_layout(); plt.show()

diff = treat.mean() - control.mean()
diff

### 3.1 Frequentist t-test and CI
Note how we must **contort** interpretation to avoid saying what everyone wants to say (“there’s a 95% chance the true effect is in the interval…”). That sentence is **Bayesian**; it is **not** a valid Frequentist statement.

In [None]:
# Two-sample t-test
t_stat, p = ttest_ind(treat, control, equal_var=True)
m_t, m_c = treat.mean(), control.mean()
se = np.sqrt(np.var(treat, ddof=1)/n + np.var(control, ddof=1)/n)
ci = ( (m_t - m_c) - 1.96*se, (m_t - m_c) + 1.96*se )
print(f"t={t_stat:.2f}, p={p:.4f}\nMean diff (T−C)={m_t-m_c:.2f} mmol/L\n95% CI: [{ci[0]:.2f}, {ci[1]:.2f}] mmol/L")

### 3.2 Bayesian model (with prior), and questions that matter
- Priors: \(\mu_C \sim N(0, 0.5)\), \(\mu_T \sim N(-0.5, 0.2)\), common \(\sigma\sim\text{HalfNormal}(0.5)\)
- Posterior of \(\Delta=\mu_T-\mu_C\).
- Ask: \(P(\Delta < -0.3\,|\,\text{data})\) (clinically meaningful benefit) and **ROPE**: \(P(-0.1 < \Delta < 0.1)\) (practical equivalence to no effect).

In [None]:
with pm.Model() as m:
    mu_c = pm.Normal('mu_c', 0, 0.5)
    mu_t = pm.Normal('mu_t', -0.5, 0.2)
    sigma = pm.HalfNormal('sigma', 0.5)
    pm.Normal('y_c', mu_c, sigma, observed=control)
    pm.Normal('y_t', mu_t, sigma, observed=treat)
    delta = pm.Deterministic('delta', mu_t - mu_c)
    idata = pm.sample(1500, tune=1500, target_accept=0.9, chains=4, random_seed=11088, return_inferencedata=True)

az.plot_posterior(idata, var_names=['delta'], ref_val=0.0)
plt.title('Posterior of Treatment Effect (Δ mmol/L)'); plt.tight_layout(); plt.show()

post = idata.posterior['delta'].values.reshape(-1)
p_benefit = (post < -0.3).mean()
p_equiv = ((post>-0.1) & (post<0.1)).mean()
print(f"P(Δ < -0.3 | data) = {p_benefit:.3f}")
print(f"P(|Δ| < 0.1 | data) (ROPE) = {p_equiv:.3f}")

**This is the point:** you just computed the probability your effect is clinically meaningful, and the probability it’s practically negligible. No contortions. No hypotheticals about infinite repetitions. Exactly the question clinicians ask.

#### Prior sensitivity (responsible Bayes)
Critics say priors are subjective. We agree—so show robustness. Refit with a **skeptical** prior \(\mu_T\sim N(0,0.5)\) and a **weakly informative** prior \(\mu_T\sim N(-0.5,0.5)\), and compare \(P(\Delta<-0.3)\). If conclusions are stable, you’ve earned credibility. If not, you’ve learned that the data don’t dominate (useful in itself).

In [None]:
def posterior_prob_delta(treat, control, prior_mu_t, prior_sd_t):
    with pm.Model() as m2:
        mu_c = pm.Normal('mu_c', 0, 0.5)
        mu_t = pm.Normal('mu_t', prior_mu_t, prior_sd_t)
        sigma = pm.HalfNormal('sigma', 0.5)
        pm.Normal('y_c', mu_c, sigma, observed=control)
        pm.Normal('y_t', mu_t, sigma, observed=treat)
        delta = pm.Deterministic('delta', mu_t - mu_c)
        id_ = pm.sample(1200, tune=1200, chains=4, target_accept=0.9, random_seed=11088, return_inferencedata=True)
    post = id_.posterior['delta'].values.reshape(-1)
    return (post < -0.3).mean()

for name, mu, sd in [
    ('Informed', -0.5, 0.2),
    ('Weakly inf', -0.5, 0.5),
    ('Skeptical',  0.0, 0.5)
]:
    print(name, posterior_prob_delta(treat, control, mu, sd))

<hr />
## 4) Optional stopping: why Bayes stays sane and NHST doesn’t
Optional stopping (peek every 5 participants, stop when p<0.05) **inflates false positives** for standard t-tests. In contrast, **Bayesian updating** does not require correction: your posterior is your posterior, given the data, regardless of when you looked—*provided* your model is correct and you report the full posterior (don’t cherry-pick stopping rules to hack a decision threshold without a decision model).

Let’s simulate under **no true effect** (Δ=0) and see how often we falsely declare success with optional stopping (Frequentist) vs. a Bayesian decision rule based on \(P(\Delta<−0.3)>0.95\). (Small simulation to keep runtime reasonable.)

In [None]:
def frequentist_optional_stopping(alpha=0.05, max_n=80, step=5, reps=300):
    fp=0
    for _ in range(reps):
        c = np.random.normal(0,0.3,size=max_n)
        t = np.random.normal(0,0.3,size=max_n)  # no true effect
        decided=False
        for n in range(step, max_n+1, step):
            p = ttest_ind(t[:n], c[:n], equal_var=True).pvalue
            if p<alpha:
                fp+=1; decided=True; break
        # if never crossed, no FP counted
    return fp/reps

def bayes_sequential(rule_p=0.95, thresh=-0.3, max_n=80, step=5, reps=150):
    fp=0
    for _ in range(reps):
        c_full = np.random.normal(0,0.3,size=max_n)
        t_full = np.random.normal(0,0.3,size=max_n)
        decided=False
        for n in range(step, max_n+1, step):
            c = c_full[:n]; t = t_full[:n]
            with pm.Model() as m:
                mu_c = pm.Normal('mu_c', 0, 1)
                mu_t = pm.Normal('mu_t', 0, 1)
                sigma = pm.HalfNormal('sigma', 1)
                pm.Normal('y_c', mu_c, sigma, observed=c)
                pm.Normal('y_t', mu_t, sigma, observed=t)
                delta = pm.Deterministic('delta', mu_t-mu_c)
                id_ = pm.sample(600, tune=600, chains=2, target_accept=0.9, progressbar=False, random_seed=123)
            post = id_.posterior['delta'].values.reshape(-1)
            if (post < thresh).mean()>rule_p:
                # Declared benefit under null -> false positive
                fp+=1; decided=True; break
        # if no decision, no FP counted
    return fp/reps

print('Frequentist FP rate (optional stopping):', frequentist_optional_stopping())
print('Bayesian FP rate (sequential rule):     ', bayes_sequential())

> Expect the NHST false positive rate to **inflate above 0.05** with optional stopping. The Bayesian sequential rule stays near its nominal behaviour because decisions are based on posterior probabilities rather than a procedure tuned to a single look.

**Important nuance**: If you turn posterior thresholds into rigid stop/go rules without a pre-specified utility, you’re doing decision theory casually. The solution is not to abandon Bayes; it’s to **add** explicit utilities (see next).

<hr />
## 5) Decision analysis: expected utility beats p-values
Suppose:
- Benefit: a clinically meaningful reduction is \(< -0.3\) mmol/L, worth +10 utility units if true.
- Cost: adopting an ineffective supplement costs −3 units (money, side-effects, opportunity).

Bayesian decision: compute expected utility \(EU = 10\cdot P(\Delta<-0.3) - 3\cdot (1-P(\Delta<-0.3))\). Treat if \(EU>0\). Try that now:

In [None]:
p_benefit = (post < -0.3).mean()
EU = 10*p_benefit - 3*(1-p_benefit)
print(f"P(benefit)={p_benefit:.3f} → Expected utility = {EU:.2f} (treat if >0)")

Frequentist NHST has **no native place to put utility**. People try with power calculations and Type I/II trade-offs, but it’s a blunt instrument compared to explicit expected loss minimisation on the posterior.

<hr />
## 6) Multiplicity & the garden of forking paths (mini demo)
If you measure 10 biomarkers and test them all at 0.05, you expect **false positives**. If you also try alternative exclusions, transforms, or subgroups, the path you ended up reporting is one of many that were tried. NHST gives you a *p*-value for the chosen path, not the path-finding process.

Bayesian route:
- Use **hierarchical models** to share strength across outcomes and shrink noisy ones towards the group mean (partial pooling).
- Or build a multivariate model with a shared prior. You get calibrated uncertainty *after* considering multiplicity.

Below: simulate 10 null biomarkers; count how often any *p*<0.05 (no correction) vs. a simple Bayesian hierarchical model’s posterior shrinks effects towards 0.

In [None]:
np.random.seed(2)
B, n = 10, 30
Y = [np.random.normal(0,1, size=n) for _ in range(B)]  # 10 null endpoints
pvals = [ttest_ind(Y[i][:n//2], Y[i][n//2:], equal_var=True).pvalue for i in range(B)]
any_sig = (np.array(pvals) < 0.05).any()
print('Any nominally significant p<0.05 among 10 null endpoints?', any_sig)

# Simple hierarchical shrinkage: each endpoint mean theta_i ~ N(mu, tau), observations ~ N(theta_i, sigma)
ybar = np.array([np.mean(y[:n//2]) - np.mean(y[n//2:]) for y in Y])  # naive diffs
with pm.Model() as hm:
    mu = pm.Normal('mu', 0, 1)
    tau = pm.HalfNormal('tau', 1)
    theta = pm.Normal('theta', mu, tau, shape=B)
    sigma = pm.HalfNormal('sigma', 1)
    pm.Normal('obs', theta, sigma, observed=ybar)
    idh = pm.sample(1500, tune=1500, chains=4, target_accept=0.9, random_seed=11088, return_inferencedata=True)
shrunken = idh.posterior['theta'].mean(dim=['chain','draw']).values
pd.DataFrame({'raw_diff':ybar, 'shrunken':shrunken})

Hierarchical Bayes **shrinks** noisy effects back toward the grand mean, taming multiplicity by modelling it—no Bonferroni carpet-bombing, no underpowered chaos.

<hr />
## 7) What about Bayes factors?
You can report Bayes factors (evidence ratios for H₁ vs H₀). They can be sensitive to prior width on the effect under H₁, so they’re best used with **pre-registered** priors. In clinical nutrition, **posterior probabilities of clinically meaningful effect** plus **decision analysis** usually communicate better than a naked evidence ratio. Use what your audience understands and rewards.

## 8) What to put in your paper (or SOP)
- **Model**: likelihood, priors (with sensitivity), and rationale grounded in prior literature.
- **Posterior**: means, credible intervals, and probabilities of clinically relevant regions (e.g., Δ<−0.3).
- **Decision**: expected utility or cost–benefit threshold.
- **Multiplicity**: hierarchical or multivariate structure.
- **Sequential**: if you peeked, say so—and show the posterior after each look (no correction needed).
- **Code & seeds**: reproducibility.

## 9) Bottom line
Frequentist NHST was designed for a world without prior knowledge, with single-shot fixed designs and large n. Nutrition science is the opposite. Bayesian analysis aligns with how we actually think, decide, and iterate.

**If you must use NHST** (journal policy, legacy reasons):
- Report effect sizes and intervals (avoid p-value worship).
- Pre-register endpoints and analysis plan; correct for multiplicity.
- Don’t peek—or use proper group-sequential methods.

**If you can use Bayes** (do it):
- Use informative/weakly informative priors from meta-analysis when possible.
- Report posterior probabilities of clinically meaningful effects.
- Include ROPE, sensitivity, and a simple expected-utility decision.

That’s science you can defend.