# 🔄 3.4 Data Transformation

This notebook explores data transformation techniques to prepare nutrition datasets for analysis.

**Objectives**:
- Filter and group data for insights.
- Pivot data for alternative views.
- Transform `hippo_nutrients.csv` for analysis.

**Context**: Transformation enables meaningful insights from nutrition data, like comparing nutrient intakes across groups.

<details><summary>Fun Fact</summary>
Transforming data is like a hippo rearranging its snacks—same stuff, better view! 🦛
</details>

In [None]:
# Setup for Google Colab: Fetch datasets automatically or manually
%run ../../bootstrap.py    # installs requirements + editable package

import fns_toolkit as fns

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print('Environment ready.')

 ## Data Preparation

Load `hippo_nutrients.csv` and inspect its structure.

In [None]:
# Load the dataset
df = fns.get_dataset('hippo_nutrients')  # Path relative to notebook
print(df.head(2))  # Display first two rows

   ID Nutrient  Year  Value  Age Sex
0  H1     Iron  2024    8.2   25   F
1  H1     Iron  2025    8.5   26   F


## Filtering Data

Filter for female hippos and Iron intakes.

In [3]:
# Filter for female hippos and Iron
df_female_iron = df[(df['Sex'] == 'F') & (df['Nutrient'] == 'Iron')]
print(df_female_iron.head(2))  # Display filtered data

   ID Nutrient  Year  Value  Age Sex
0  H1     Iron  2024    8.2   25   F
1  H1     Iron  2025    8.5   26   F


## Grouping Data

Group by `Nutrient` and calculate mean `Value`.

In [4]:
# Group by nutrient and compute mean
mean_values = df.groupby('Nutrient')['Value'].mean()
print(mean_values)  # Display mean values

Nutrient
Calcium     1150.0
Iron           8.0
Vitamin_D     10.5
Name: Value, dtype: float64


 ## Pivoting Data

Pivot the data to show `Value` by `Nutrient` and `Year`.

In [5]:
# Pivot data
df_pivot = df.pivot_table(values='Value', index='Nutrient', columns='Year', aggfunc='mean')
print(df_pivot)  # Display pivoted data

Year      2024  2025
Nutrient            
Calcium   1150  1140
Iron         8     8
Vitamin_D   10    11


 ## Exercise 1: Transform Data

Filter for Vitamin_D data in 2024, group by `Sex`, and compute median `Value`. Document your code.

**Guidance**: Use `df[(df['Nutrient'] == 'Vitamin_D') & (df['Year'] == 2024)]` and `groupby('Sex')['Value'].median()`.

**Answer**:

My transformation code is...

## Conclusion

You’ve learned to transform nutrition data through filtering, grouping, and pivoting.

**Next Steps**: Explore data aggregation in 3.5.

**Resources**:
- [Pandas GroupBy](https://pandas.pydata.org/docs/user_guide/groupby.html)
- [Pandas Pivot](https://pandas.pydata.org/docs/user_guide/reshaping.html)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)