# 🧮 3.1 Data Types and Structures

This notebook explores data structures for nutrition research, focusing on tidy data principles.

**Objectives**:
- Understand vectors, tables, and tidy data.
- Transform data using pandas.
- Apply tidy principles to `hippo_nutrients.csv`.

**Context**: Tidy data is critical for efficient analysis of nutrition datasets. 🦛

<details><summary>Fun Fact</summary>
Tidy data is like a hippo’s lunch tray—neat and ready to munch! 🦛
</details>

In [None]:
# Setup for Google Colab: Fetch datasets automatically or manually
import os
from google.colab import files

# Define the module and dataset for this notebook
MODULE = '03_data_handling'
DATASET = 'hippo_nutrients.csv'
DATASET_PATH = os.path.join('data', DATASET)

# Step 1: Attempt to clone the repository (automatic method)
try:
    print('Attempting to clone repository...')
    !git clone https://github.com/ggkuhnle/data-analysis-toolkit-FNS.git
    os.chdir(f'/content/data-analysis-toolkit-FNS/notebooks/{MODULE}')
    if os.path.exists(DATASET_PATH):
        print(f'Dataset found: {DATASET_PATH} 🦛')
    else:
        print(f'Error: Dataset {DATASET} not found after cloning.')
        raise FileNotFoundError
except Exception as e:
    print(f'Cloning failed: {e}')
    print('Falling back to manual upload option...')

    # Step 2: Manual upload option
    print(f'Please upload {DATASET} manually.')
    print(f'1. Click the "Choose Files" button below.')
    print(f'2. Select {DATASET} from your local machine.')
    print(f'3. Ensure the file is placed in notebooks/{MODULE}/data/')
    
    # Create the data directory if it doesn't exist
    os.makedirs('data', exist_ok=True)
    
    # Prompt user to upload the dataset
    uploaded = files.upload()
    
    # Check if the dataset was uploaded
    if DATASET in uploaded:
        with open(DATASET_PATH, 'wb') as f:
            f.write(uploaded[DATASET])
        print(f'Successfully uploaded {DATASET} to {DATASET_PATH} 🦛')
    else:
        raise FileNotFoundError(f'Upload failed. Please ensure you uploaded {DATASET}.')

# Install required packages for this notebook
%pip install pandas numpy
print('Python environment ready.')

In [1]:
# Install required packages
%pip install pandas  # For Colab users
import pandas as pd
print('Data handling environment ready.')

## Data Preparation

Load `hippo_nutrients.csv` and inspect its structure.

In [2]:
df = pd.read_csv('data/hippo_nutrients.csv')
print(df.head(2))

   ID Nutrient  Year  Value  Age Sex
0  H1     Iron  2024    8.2   25   F
1  H1     Iron  2025    8.5   26   F


## Tidy Data Transformation

Reshape the data into a tidy format using `pandas.melt()`.

In [3]:
df_melted = df.melt(id_vars=['ID', 'Age', 'Sex'], var_name='Nutrient', value_name='value')
print(df_melted.head(2))

   ID  Age Sex Nutrient  value
0  H1   25   F     Year   2024
1  H1   26   F     Year   2025


## Exercise 1: Filter Tidy Data

Filter the tidy data to show only iron intakes and describe the result in a Markdown cell.

**Guidance**: Use `df_melted[df_melted['Nutrient'] == 'Iron']`.

**Answer**:

The filtered iron data shows...

## Conclusion

You’ve learned to transform data into a tidy format. Next, explore importing data in 3.2.

**Resources**:
- [Tidy Data Paper](https://vita.had.co.nz/papers/tidy-data.pdf)
- [Pandas Documentation](https://pandas.pydata.org/)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)