---
title: "Advanced Bayesian Modelling"
output-file: "advanced_bayesian.html"
format: html
---

In [None]:
# Setup for Google Colab: Fetch datasets automatically or manually
%run ../../bootstrap.py    # installs requirements + editable package

import fns_toolkit as fns

import pandas as pd
import numpy as np
import pymc as pm  # For Bayesian modelling and sampling

# 🧠 5.3 Advanced Bayesian Modelling

This notebook explores advanced Bayesian modelling techniques for nutrition data analysis, building on basic Bayesian methods.

**Objectives**:
- Build hierarchical Bayesian models to account for group variations.
- Interpret posterior distributions for nutrition insights.
- Apply models to `hippo_nutrients.csv` to estimate nutrient intake variations.

**Context**: Hierarchical Bayesian models are powerful for nutrition research, enabling robust estimation of nutrient intakes across groups, such as hippo populations.

<details><summary>Fun Fact</summary>
Bayesian models are like a hippo’s intuition—blending prior knowledge with new data for wise decisions! 🦛
</details>

## Data Preparation

Load `hippo_nutrients.csv` from the data handling module and filter for Iron data to model intakes by sex.

In [2]:
# Load the nutrient dataset
df = fns.get_dataset('hippo_nutrients')  # Path relative to advanced module

# Filter for Iron data and remove missing values
df_iron = df[df['Nutrient'] == 'Iron'].dropna()
print(f'Iron data shape: {df_iron.shape}')  # Display number of rows and columns

Iron data shape: (50, 6)


## Hierarchical Bayesian Model

Build a hierarchical Bayesian model to estimate Iron intakes, accounting for differences between female (F) and male (M) hippos.

In [3]:
# Encode Sex as numerical index (F=0, M=1)
sex_idx = df_iron['Sex'].map({'F': 0, 'M': 1}).values

# Define hierarchical model
with pm.Model() as model:
    # Priors for group means (Female and Male)
    mu = pm.Normal('mu', mu=8, sigma=2, shape=2)  # Mean Iron intake for F (0) and M (1)
    sigma = pm.HalfNormal('sigma', sigma=1)  # Shared standard deviation
    
    # Likelihood of observed Iron intakes
    iron = pm.Normal('iron', mu=mu[sex_idx], sigma=sigma, observed=df_iron['Value'])
    
    # Sample from posterior distribution
    trace = pm.sample(1000, tune=1000, return_inferencedata=False)  # 1000 samples after tuning

## Posterior Analysis

Summarise and visualise the posterior distributions of Iron intake means for female and male hippos.

In [4]:
# Calculate posterior means for Female and Male
mu_posterior = trace['mu'].mean(axis=0)
print(f'Posterior means: Female={round(mu_posterior[0], 1)}, Male={round(mu_posterior[1], 1)}')

# Visualise posterior distributions
pm.plot_posterior(trace, var_names=['mu'])  # Plot histograms of mu
plt.show()  # Display plot

Posterior means: Female=8.1, Male=8.0


## Exercise: Build Your Own Model

Create a hierarchical Bayesian model for Calcium intakes, grouped by `Year` (2024, 2025). Summarise the posterior means in a Markdown cell below.

**Guidance**:
- Filter `df` for `Nutrient == 'Calcium'` and remove missing values.
- Encode `Year` as a numerical index (e.g., 2024=0, 2025=1).
- Use `pm.Normal` for group means and `pm.HalfNormal` for sigma.
- Sample and summarise the posterior with `pm.sample()` and `trace['mu'].mean()`.

**Answer**:

My Calcium model code and posterior summary are as follows:

```python
# Your code here
```

**Posterior Summary**:

- Year 2024: Mean = [Your Result]
- Year 2025: Mean = [Your Result]

## Conclusion

You’ve applied advanced Bayesian modelling to estimate nutrient intake variations, capturing uncertainty across groups.

**Next Steps**: Explore database querying with SQL in `5.4_databases_sql.ipynb`.

**Resources**:
- [PyMC Documentation](https://docs.pymc.io/)
- [Bayesian Analysis Guide](https://www.datacamp.com/community/tutorials/bayesian-methods-python)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)