# 📈 3.5 Data Aggregation

This notebook covers data aggregation techniques to summarise and combine nutrition datasets.

**Objectives**:
- Summarise data with group-by operations.
- Join datasets for comprehensive analysis.
- Aggregate `hippo_nutrients.csv` for insights.

**Context**: Aggregation provides high-level insights from nutrition data, like average intakes across groups.

<details><summary>Fun Fact</summary>
Aggregating data is like a hippo summing up its daily snacks—big picture, big impact! 🦛
</details>

In [None]:
# Setup for Google Colab: Fetch datasets automatically or manually
%run ../../bootstrap.py    # installs requirements + editable package

import fns_toolkit as fns

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print('Environment ready.')

## Data Preparation

Load `hippo_nutrients.csv` and inspect its structure.

In [None]:
# Load the dataset
df = fns.get_dataset('hippo_nutrients')  # Path relative to notebook
print(df.head(2))  # Display first two rows

   ID Nutrient  Year  Value  Age Sex
0  H1     Iron  2024    8.2   25   F
1  H1     Iron  2025    8.5   26   F


 ## Summarising Data

Summarise `Value` by `Nutrient` and `Sex` using mean and count.

In [3]:
# Group by Nutrient and Sex, compute mean and count
summary = df.groupby(['Nutrient', 'Sex'])['Value'].agg(['mean', 'count'])
print(summary)  # Display summary

                   Value       
                    mean count
Nutrient   Sex                 
Calcium    F    1150.0    25
           M    1140.0    25
Iron       F       8.1    25
           M       8.0    25
Vitamin_D  F      10.6    25
           M      10.4    25


## Joining Data

Create a small dataset of hippo weights and join with `df`.

In [4]:
# Create a small weight dataset
weights = pd.DataFrame({
    'ID': ['H1', 'H2', 'H3'],
    'Weight': [2000, 2100, 1950]  # Weight in kg
})

# Join with main dataset
df_joined = df.merge(weights, on='ID', how='left')
print(df_joined.head(2))  # Display joined data

   ID Nutrient  Year  Value  Age Sex  Weight
0  H1     Iron  2024    8.2   25   F    2000
1  H1     Iron  2025    8.5   26   F    2000


## Exercise 1: Aggregate and Join

Summarise mean `Value` by `Nutrient` and `Year`, then join with a dataset of reference intakes (e.g., Iron: 15 mg). Document your code.

**Guidance**: Use `groupby(['Nutrient', 'Year'])` and `merge()`.

**Answer**:

My aggregation and join code is...

## Conclusion

You’ve learned to aggregate nutrition data through summarisation and joining, unlocking key insights.

**Next Steps**: Move to data analysis in 4.1.

**Resources**:
- [Pandas Merging](https://pandas.pydata.org/docs/user_guide/merging.html)
- [Pandas GroupBy](https://pandas.pydata.org/docs/user_guide/groupby.html)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)