# 📊 5.1 Bayesian Methods

This notebook introduces Bayesian inference for analysing nutrient intakes, a powerful approach for nutrition studies such as NDNS analysis. Bayesian methods combine prior knowledge with observed data to estimate parameters with uncertainty.

**Objectives**:
- Understand Bayesian principles: priors, likelihoods, and posteriors.
- Apply PyMC to model nutrient intake data.
- Interpret posterior distributions for practical insights.

**Context**: Bayesian methods are ideal for small datasets or when prior information (e.g., Recommended Dietary Allowances) is available. We will model iron intakes using `nutrient_survey.csv`.

<details>
<summary>Advanced Note</summary>
Bayesian methods offer advantages over frequentist approaches (covered in 4.4) for handling uncertainty in nutrition research.
</details>

In [1]:
# Install required packages
%pip install pymc pandas numpy matplotlib  # For Colab users
import pymc as pm
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
print('Bayesian analysis environment ready.')

## Data Preparation

We will use `nutrient_survey.csv`, a simulated dataset of nutrient intakes. The following code loads and filters iron data.

In [2]:
df = pd.read_csv('../data_handling/data/nutrient_survey.csv')
iron_data = df[df['Nutrient'] == 'Iron']['Value'].dropna()
print(df.head(2))

   ID Nutrient  Year  Value  Age Sex
0  P1     Iron  2024    8.2   25   F
1  P1     Iron  2025    8.5   26   F


## Bayesian Model

We assume iron intake follows a normal distribution with a prior mean of 8 mg (based on RDA). The model estimates the true mean and uncertainty.

**Exercise 1**: Run the Bayesian model to estimate iron intake.

In [3]:
with pm.Model() as model:
    mu = pm.Normal('mu', mu=8, sigma=2)  # Prior: mean ~ N(8, 2)
    sigma = pm.HalfNormal('sigma', sigma=1)
    obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=iron_data)
    trace = pm.sample(1000, return_inferencedata=False)

# Plot posterior
plt.hist(trace['mu'], bins=30, density=True)
plt.title('Posterior Distribution of Iron Intake Mean')
plt.xlabel('Mean Iron Intake (mg)')
plt.ylabel('Density')
plt.show()

## Exercise 2: Interpret Results

Examine the posterior plot. Estimate the likely range for iron intake and document your findings in a Markdown cell.

**Guidance**: Identify the histogram’s peak and approximate 95% credible interval.

**Answer**:

The likely range for iron intake is...

## Conclusion

This notebook demonstrated Bayesian inference using PyMC to estimate nutrient intakes. You have learned:
- How to specify priors and likelihoods.
- Modelling with PyMC for nutrition data.
- Interpreting posterior distributions.

**Next Steps**: Explore workflow automation in 5.2.

**Resources**:
- [PyMC Documentation](https://www.pymc.io/)
- [Bayesian Methods for Nutrition](https://statswithr.com/book/bayesian-basics.html)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)