# 📊 2.4 Data Structures

This notebook introduces Python data structures — lists, tuples, sets, dictionaries, NumPy arrays, and pandas DataFrames — for organizing and analysing nutrition data.

**Objectives**:
- Use **lists**, **tuples**, and **sets** to store and manipulate nutrient data.
- Work with **dictionaries** for labelled data.
- Perform numerical operations with **NumPy arrays** and **list comprehensions**.
- Explore **pandas DataFrames** for tabular data and read real CSV files.
- **Convert** between different data structures.

**Context**: Data structures are the backbone of nutrition datasets, like NDNS or hippo diet logs.

<details><summary>Fun Fact</summary>
A DataFrame is like a hippo’s pantry — everything neatly organised for quick access! 🦛
</details>

In [None]:
# Setup: install & import core libraries
%pip install pandas numpy  # install if needed
import pandas as pd
import numpy as np
print('Python environment ready.')

## 📦 What Are Data Structures?

When working with data — nutrient intakes, participant IDs, or sensory scores — we need containers to **store**, **organise**, and **manipulate** it. Different structures suit different tasks:

- **List**: ordered collection of items (like a shopping list).
- **Tuple**: immutable sequence (fixed recipe steps).
- **Set**: unordered collection of unique elements.
- **Dictionary**: key–value pairs (like nutrition labels).
- **NumPy array**: fast numerical arrays (like spreadsheets optimized for math).
- **pandas DataFrame**: tabular data (rows + columns, like Excel/R tables).

In this notebook, we’ll explore each and see how they apply to nutrition research.

## 🧺 Lists in Python

A **list** holds an ordered collection of items, which you can modify, extend, and iterate over.

### ▶ Creating & Accessing
```python
# Daily calcium intakes for hippos (mg)
calcium_intakes = [1200, 1150, 1250]

print(calcium_intakes[0])   # First element
print(calcium_intakes[-1])  # Last element
```

### ▶ Common Operations
```python
len(calcium_intakes)        # Number of items
sum(calcium_intakes)        # Total intake
max(calcium_intakes)        # Highest value

calcium_intakes.append(1180)     # Add new intake
calcium_intakes[1] = 1190        # Update second value
calcium_intakes.remove(1250)     # Remove a value
```

### ▶ List Comprehension vs. NumPy
```python
# Python list comprehension
kilojoules = [c * 4.184 for c in calcium_intakes]
print(kilojoules)

# Equivalent NumPy operation
kj_array = np.array(calcium_intakes) * 4.184
print(kj_array)
```

<details><summary>💡 Advanced Tip</summary>
Lists can store mixed types, even nested lists:
```python
mixed = ['apple', 3.5, True, [1,2,3]]
```
</details>

## 🧪 Exercise: List Practice

1. Create a list of iron intakes for three hippos (`8.2`, `7.9`, `8.5`).
2. Print the full list.
3. Access and print the second value.
4. Add another intake (`8.1`).
5. Calculate and print the average.

_Hint_: `average = sum(my_list) / len(my_list)`

In [None]:
# Your code here

## 🔗 Tuples & Sets

- **Tuple**: immutable ordered sequence. Good for fixed collections (e.g. coordinate pairs).
- **Set**: unordered collection of unique items. Useful to remove duplicates.

```python
# Tuple example
hippo_ids = ('H1', 'H2', 'H3')

# Set example
nutrients = {'Iron', 'Calcium', 'Protein', 'Iron'}  # 'Iron' appears once
print(hippo_ids)
print(nutrients)
```

## 🧪 Exercise: Tuples & Sets

1. Create a tuple of unique hippo IDs: `('H1','H2','H3','H2')` and print it.
2. Create a set from the list `['apple','banana','apple','carrot']` and print the result.

In [None]:
# Your code here

## 📚 Dictionaries

Store **key–value** pairs for labelled data.

```python
hippo_nutrients = {
    'Iron': 8.2,
    'Calcium': 1200,
    'Protein': 80.5
}
print(hippo_nutrients['Calcium'])  # 1200

# Add/update/remove
hippo_nutrients['Vitamin C'] = 90
hippo_nutrients['Iron'] = 8.5
del hippo_nutrients['Protein']

# Looping
for k, v in hippo_nutrients.items():
    print(f'{k}: {v}')
```

## 🧪 Exercise: Build a Dictionary

Create `hippo_H5` with:
- `'Calories'`: 2500  
- `'Protein'`: 82.0  
- `'Water'`: 3500  

Print it, then update `'Water'` to `3600`.

In [None]:
# Your code here

## 🧮 NumPy Arrays

Fast, vectorised arrays for numerical computing.

```python
import numpy as np
iron = np.array([8.2, 7.9, 8.5])
print(iron + 1)           # [9.2, 8.9, 9.5]
print(np.mean(iron))      # 8.866...
```

## 🧪 Exercise: NumPy Practice

1. Create `calcium = np.array([1200,1150,1250])`  
2. Print it and its type.  
3. Compute the mean.  
4. Convert to grams (`*0.001`).  
5. Access the last element.

In [None]:
# Your code here

## 🐼 pandas DataFrames

Tabular data like spreadsheets.

```python
# From a dict
data = {
  'ID': ['H1','H2','H3'],
  'Calories': [2500,2450,2600],
  'Protein': [80.5,78.0,85.2]
}
df = pd.DataFrame(data)
print(df.head())

# From a CSV
df2 = pd.read_csv('data/hippo_diets.csv')
df2.head()
```

## 🧪 Exercise: DataFrame Practice

1. Create a DataFrame for hippos H4/H5 with Calories, Protein, Water.
2. Read `hippo_diets.csv`, show first 5 rows.
3. Filter rows where Calories > 2500.

In [None]:
# Your code here

## 🔄 Converting Between Structures

- `list → array`: `np.array(my_list)`  
- `array → list`: `my_array.tolist()`  
- `dict → DataFrame`: `pd.DataFrame([my_dict])`  
- `DataFrame → dict`: `df.to_dict(orient='records')`


## 📋 Summary

| Structure       | Ordered? | Mutable? | When to Use                                      |
|-----------------|----------|----------|--------------------------------------------------|
| **List**        | Yes      | Yes      | Ordered collections, simple sequences            |
| **Tuple**       | Yes      | No       | Fixed sequences, safe keys for dicts             |
| **Set**         | No       | Yes      | Unique elements, membership tests                |
| **Dictionary**  | No       | Yes      | Labelled data, fast lookups by key               |
| **NumPy Array** | Yes      | Yes      | Fast numerical operations, vectorisation         |
| **DataFrame**   | Yes      | Yes      | Tabular data with mixed types and rich methods   |

**Next:** Dive into object oriented programming in `2.4_oop_basics.ipynb` 🦛