# 📊 3.2 Importing Data

This notebook covers importing datasets into Python, a critical step for nutrition data analysis.

**Objectives**:
- Import CSV and Excel files using pandas.
- Verify data integrity after import.
- Apply importing skills to `hippo_nutrients.csv`.

**Context**: Importing data correctly ensures accurate analysis of nutrition datasets like NDNS.

<details><summary>Fun Fact</summary>
Importing data is like a hippo gathering ingredients—get it right, and the feast begins! 🦛
</details>

In [1]:
# Install required packages
%pip install pandas openpyxl  # Ensures compatibility in Colab, openpyxl for Excel
import pandas as pd  # For data manipulation
print('Data import environment ready.')

Data import environment ready.


## Importing a CSV File

Load `hippo_nutrients.csv` and verify its contents.

In [2]:
# Load CSV file
df_csv = pd.read_csv('data/hippo_nutrients.csv')  # Path relative to notebook

# Verify data
print(f'Shape: {df_csv.shape}')  # Display number of rows and columns
print(f'Columns: {df_csv.columns.tolist()}')  # Display column names

Shape: (100, 6)
Columns: ['ID', 'Nutrient', 'Year', 'Value', 'Age', 'Sex']


## Importing an Excel File

Assume `hippo_nutrients.xlsx` exists (same data as CSV) and import it.

In [3]:
# Load Excel file (commented out as file may not exist)
# df_excel = pd.read_excel('data/hippo_nutrients.xlsx')  # Path relative to notebook
# print(f'Excel shape: {df_excel.shape}')  # Display shape

# For this example, reuse CSV data
df_excel = df_csv  # Simulate Excel import
print(f'Excel shape: {df_excel.shape}')  # Display shape

## Exercise 1: Import and Summarise

Import `hippo_nutrients.csv` and calculate the mean `Value` for Iron intakes. Document your code with comments.

**Guidance**: Filter for `Nutrient == 'Iron'` and use `df['Value'].mean()`.

**Answer**:

My import and summary code is...

## Conclusion

You’ve learned to import CSV and Excel files, preparing nutrition data for analysis.

**Next Steps**: Explore data cleaning in 3.3.

**Resources**:
- [Pandas I/O](https://pandas.pydata.org/docs/user_guide/io.html)
- [OpenPyXL Documentation](https://openpyxl.readthedocs.io/)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)