# 🧮 2.2 Data Types and Conversion

This notebook explores Python data types and type conversion, critical for handling nutrition data accurately.

**Objectives**:
- Understand basic data types: integers, floats, strings, dates.
- Perform type conversion for data consistency.
- Handle missing values effectively.
- Apply data types to nutrition-related variables.

**Context**: Correct data types ensure reliable calculations, such as averaging nutrient intakes.

<details><summary>Fun Fact</summary>
Data types are like a hippo’s food labels—knowing what’s what prevents a mix-up! 🦛
</details>

In [None]:
# Setup: This module (Programming Basics) does not require external datasets
print('No dataset required for this notebook 🦛')

## Exploring Data Types

Create variables with different data types for a hippo’s diet and check their types.

In [None]:
hippo_id = 'H1'           # String for hippo identifier
calories = 2500           # Integer for daily calorie intake
protein = 80.5            # Float for protein intake in grams

print(f'hippo_id: {type(hippo_id)}')
print(f'calories: {type(calories)}')
print(f'protein: {type(protein)}')

## Type Conversion

Convert a string nutrient value to a float for calculations.

In [None]:
iron_str = '8.2'  # Iron intake as a string
iron_float = float(iron_str)
print(f'Iron as float: {iron_float}')
print(f'Type after conversion: {type(iron_float)}')

## 🧾 Loading Excel and Mixed-Type Data

In real-world datasets, you might find text, numbers, and percentages mixed in a single column.

<details><summary>Common Problems</summary>

- Percent signs like `'45%'` can't be converted directly to float.
- Missing entries might appear as `''`, `'NA'`, or `'-'`.
- Excel autoformats e.g., `'01/02'` as a date, even if it's not.

Use `pd.read_excel(..., dtype=str)` to preserve raw data.

You can strip `%` and convert manually:

```python
df['Iron (%)'] = df['Iron (%)'].str.replace('%', '').astype(float)
```
</details>

## 📅 Working with Dates

Python’s `datetime` helps convert and work with date strings.

In [None]:
import pandas as pd
from datetime import datetime

# Convert string to datetime
date_str = '2024-01-01'
date_obj = pd.to_datetime(date_str)
print(f'Date: {date_obj} — Type: {type(date_obj)}')

## 🕳 Handling Missing Data

In [None]:
protein_values = ['80.5', '75.2', '', '82.0', 'NaN', 'missing']
converted = []
for val in protein_values:
    try:
        converted.append(float(val))
    except ValueError:
        converted.append(None)
print(converted)

In [None]:
# Using Pandas to coerce errors
df = pd.DataFrame({'Protein (g)': ['80.5', 'NaN', '75.0', '', None, 'missing']})
df['Protein (g)'] = pd.to_numeric(df['Protein (g)'], errors='coerce')
print(df)

# Optional cleaning
print('Filled:')
print(df.fillna(0))

## Conclusion

You’ve learned Python data types, conversion, date handling, and how to manage missing values.

**Next Steps**: Explore functions and loops in 2.3.

**Resources**:
- [Python Data Types](https://docs.python.org/3/library/stdtypes.html)
- [W3Schools Python](https://www.w3schools.com/python/)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)