# 📊 2.4 Data Structures

This notebook introduces Python data structures (lists, dictionaries, DataFrames) for organizing nutrition data, essential for MSc students.

**Objectives**:
- Use lists and dictionaries to store nutrient data.
- Introduce pandas DataFrames for tabular data.
- Manipulate data structures for analysis.

**Context**: Data structures are the backbone of nutrition datasets, like NDNS or hippo diet logs.

<details><summary>Fun Fact</summary>
A DataFrame is like a hippo’s pantry—everything neatly organised for quick access! 🦛
</details>

In [None]:
# Setup for Google Colab: Ensure environment is ready
# Note: This module (Programming Basics) does not require datasets
print('No dataset required for this notebook 🦛')

# Install required packages for this notebook
%pip install pandas
print('Python environment ready.')

In [1]:
# Install required package
%pip install pandas  # Ensures compatibility in Colab
import pandas as pd  # For DataFrames
print('Data structure environment ready.')

Data structure environment ready.


## Lists

Create a list of nutrient intakes for multiple hippos.

In [2]:
# List of iron intakes
iron_intakes = [8.2, 8.0, 8.4]  # Iron in mg for hippos H1, H2, H3

# Calculate average
average_iron = sum(iron_intakes) / len(iron_intakes)  # Compute mean
print(f'Iron intakes: {iron_intakes}')  # Display list
print(f'Average iron: {round(average_iron, 1)} mg')  # Display average

Iron intakes: [8.2, 8.0, 8.4]
Average iron: 8.2 mg


## Dictionaries

Store nutrient data for a single hippo in a dictionary.

In [3]:
# Dictionary for a hippo’s nutrients
hippo_nutrients = {
    'Iron': 8.2,  # mg
    'Calcium': 1200,  # mg
    'Protein': 80.5  # g
}

# Access a value
calcium = hippo_nutrients['Calcium']  # Retrieve calcium value
print(f'Hippo H1 nutrients: {hippo_nutrients}')  # Display dictionary
print(f'Calcium intake: {calcium} mg')  # Display calcium

Hippo H1 nutrients: {'Iron': 8.2, 'Calcium': 1200, 'Protein': 80.5}
Calcium intake: 1200 mg


## Pandas DataFrames

Create a DataFrame for multiple hippos’ nutrient data.

In [4]:
# Create a DataFrame
data = {
    'ID': ['H1', 'H2', 'H3'],
    'Iron': [8.2, 8.0, 8.4],
    'Calcium': [1200, 1150, 1250],
    'Protein': [80.5, 75.0, 82.3]
}
df = pd.DataFrame(data)  # Convert dictionary to DataFrame

# Display DataFrame
print(df)  # Show all rows

   ID  Iron  Calcium  Protein
0  H1   8.2     1200     80.5
1  H2   8.0     1150     75.0
2  H3   8.4     1250     82.3


## Exercise 1: Build a DataFrame

Create a DataFrame for two hippos with calorie and protein data (e.g., H4: 2400 kcal, 78 g; H5: 2550 kcal, 81 g). Print it with a comment.

**Guidance**: Use `pd.DataFrame()` with a dictionary.

**Answer**:

My DataFrame code is...

## Conclusion

You’ve learned to use lists, dictionaries, and DataFrames to organise nutrition data.

**Next Steps**: Explore object-oriented programming in 2.5.

**Resources**:
- [Pandas Documentation](https://pandas.pydata.org/)
- [Python Data Structures](https://docs.python.org/3/tutorial/datastructures.html)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)