# 01 · Introduction to Nutritional Epidemiology

> **Learning objectives**
- Define epidemiology in the context of nutrition.
- Recognise challenges: confounding, measurement error, missingness.
- Load and inspect the FB2NEP synthetic cohort (N≈25k).
---

In [None]:
%run ../notebooks/_bootstrap.py
df.head()

In [None]:
# Ensure dataset exists (works in Colab and locally)
import os, subprocess, shlex
if not os.path.exists(PATH):
    print("Dataset missing — generating via scripts/generate_dataset.py ...")
    ret = subprocess.run(shlex.split("python scripts/generate_dataset.py"))
    if ret.returncode != 0:
        raise SystemExit("Generation failed. Check scripts/generate_dataset.py output.")
df = pd.read_csv(PATH)
df.head(3)

## First look

In [None]:
%run ../notebooks/_bootstrap.py
# now df is loaded; CSV_REL/REPO_ROOT/IN_COLAB are available
df.head()

In [None]:
df.describe(include='all', datetime_is_numeric=True)

### Measurement error (discussion)
- Self-reported diet vs biomarkers (e.g. fruit/veg vs plasma vitamin C).
- Day-to-day variation and systematic bias.

### # TODO · Quintiles of fruit/veg vs plasma vitamin C

In [None]:
# Compute mean vitamin C by quintiles of fruit_veg_g_d and assert monotonicity.
q = pd.qcut(df['fruit_veg_g_d'], 5, duplicates='drop')
res = df.groupby(q)['plasma_vitC_umol_L'].agg(['mean','std','count']).round(2)
# Expect monotone increase in mean vit C
assert res['mean'].is_monotonic_increasing, "Mean vit C should increase across fruit/veg quintiles"
res

### Checkpoint
- Note any odd ranges or surprising values you want to revisit later.