# 🧮 2.2 Data Types and Conversion (Revised)

*This notebook explores Python data types and type conversion, critical for handling nutrition data accurately—and now includes Boolean examples, enhanced error handling, and dtype introspection.*

**Objectives**:
- Understand basic data types: integers, floats, strings, booleans, dates.
- Perform type conversion for data consistency.
- Use `else` and `finally` in `try`/`except`.
- Inspect pandas dtypes after conversion.
- Handle missing values effectively.

**Context**: Correct data types ensure reliable calculations, such as averaging nutrient intakes.

<details><summary>🦛 Fun Fact</summary>
Data types are like a hippo’s food labels—knowing what’s what prevents a mix-up!
</details>


In [None]:
# Setup imports
import pandas as pd
import numpy as np
from datetime import datetime

print('Environment ready 🦛')

## 🔍 Exploring Data Types

Create variables with different data types for a hippo’s diet and check their types.

In [None]:
hippo_id = 'H1'             # string
calories = 2500             # integer
protein = 80.5              # float
is_hungry = True            # boolean

print(f"hippo_id: {hippo_id}  → {type(hippo_id)}")
print(f"calories: {calories}      → {type(calories)}")
print(f"protein: {protein}     → {type(protein)}")
print(f"is_hungry: {is_hungry}    → {type(is_hungry)}")

## 🔄 Type Conversion

Convert string nutrient values to numeric types for calculations.

In [None]:
iron_str = '8.2'             # nutrient as string
iron_float = float(iron_str)

hungry_str = 'True'          # boolean as string
hungry_bool = hungry_str.lower() == 'true'  # robust conversion

print(f"Iron: {iron_str} → {iron_float} ({type(iron_float)})")
print(f"Hungry: {hungry_str} → {hungry_bool} ({type(hungry_bool)})")

## 🚨 Enhanced `try`/`except` with `else` and `finally`

Demonstrate how to handle errors gracefully and run cleanup code.

In [None]:
sample = 'not_a_number'

try:
    val = float(sample)
except ValueError:
    val = np.nan
    print("⚠️ Conversion failed, set to NaN")
else:
    print("✅ Conversion succeeded, value =", val)
finally:
    print("🔚 Finished attempt to convert sample")

## 📊 Loading & Cleaning Mixed-Type Data

In real-world datasets, you might see `%`, units, or currency mixed in. Use pandas string methods to clean before conversion.

In [None]:
df = pd.DataFrame({
    'Iron (%)': ['45%', '50%', None, 'NA'],
    'Dose (mg)': ['10mg', '12mg', '15 mg', '']
})

# Strip '%' and convert
df['Iron (%)'] = (
    df['Iron (%)']
      .str.rstrip('%')
      .replace({'NA': None})
      .astype(float)
)

# Strip 'mg' and whitespace, then convert
df['Dose (mg)'] = (
    df['Dose (mg)']
      .str.replace(r'\s*mg', '', regex=True)
      .replace({'': None})
      .astype(float)
)

print(df)
print("\nDataFrame dtypes:")
print(df.dtypes)

## 📅 Working with Dates

Convert strings to pandas `datetime` and inspect.

In [None]:
date_strs = ['2024-01-01', '01/02/2024', 'March 3, 2024']
dates = pd.to_datetime(date_strs)
print(dates)
print("dtype of dates object:", dates.dtype)

## 🕳 Handling Missing Data & dtype Introspection

Use pandas coercion and inspect final dtypes.

In [None]:
raw = ['80.5', 'NaN', '75.0', '', None, 'missing']
df2 = pd.DataFrame({'Protein (g)': raw})
df2['Protein (g)'] = pd.to_numeric(df2['Protein (g)'], errors='coerce')

print(df2)
print("\nAfter coercion, dtypes:")
print(df2.dtypes)

# Optional: fill or drop
filled = df2.fillna(0)
print("\nFilled missing with 0:")
print(filled)

## ✅ Conclusion

You’ve learned:
- Core Python types (including `bool`).
- Enhanced `try`/`except` with `else` & `finally`.
- Cleaning mixed-type columns (%, mg, NA).
- Date parsing with `pd.to_datetime`.
- Missing data coercion & dtype inspection with pandas.

**Next Steps**: Explore functions and loops in 2.3.

**Resources**:
- [Python stdtypes](https://docs.python.org/3/library/stdtypes.html)
- [pandas.to_numeric](https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html)
- Course repo: https://github.com/ggkuhnle/data-analysis-toolkit-FNS
