# 🐍 Welcome to Python and Colab for Public Health Nutrition

This notebook is your first step into the world of **Python** and **data analysis**.  
We'll use **Google Colab**, a free cloud-based coding environment, to explore datasets and uncover insights relevant to **public health nutrition**.

📌 **What is Python?**
> Python is a programming language used for data science, machine learning, and scientific research. It’s readable, flexible, and widely used.

📌 **What is Google Colab?**
> Colab is a free, browser-based version of Jupyter Notebook that runs Python code on Google’s servers — no setup needed!

You'll use Python to analyse real-world nutrition data, make plots, and understand key epidemiological concepts like bias, confounding, and causality.


## ▶️ How to Use This Notebook

- Press the **Play button** to run a code cell.
- Wait for the ⚪️ to turn into a ✅.
- If you break something, go to `Runtime > Restart and run all`.

💡 This notebook includes both **code cells** and **text explanations** (like this one).


## 🧠 Python Basics – Just Enough to Get Started

In [None]:
# A variable
hippo_name = "Helga"
hippo_age = 12

# A list
favourite_fruits = ["apple", "banana", "papaya"]

# A dictionary
hippo = {"name": "Helga", "age": 12, "weight": 1500}

print(f"{hippo['name']} is {hippo['age']} years old and weighs {hippo['weight']} kg.")


## 🗃️ Working with Tables – `pandas` DataFrames

In [None]:
import pandas as pd

# Small example
data = {
    "name": ["Helga", "Hugo", "Harriet"],
    "age": [12, 14, 10],
    "BMI": [22.5, 27.8, 19.4]
}

df = pd.DataFrame(data)
df

## 📥 Load the Public Health Nutrition Dataset

In [None]:
url = "https://raw.githubusercontent.com/ggkuhnle/FB2NEP_datascience/main/data/synthetic_phn_data.csv"
df = pd.read_csv(url)
df.head()

In [None]:
# Basic info
df.info()

## 📊 Quick Plots with `seaborn`

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Histogram of BMI
sns.histplot(df["BMI"], kde=True)
plt.title("BMI Distribution")
plt.show()

## 🧭 Principles of Data Analysis (for Nutrition & Public Health)

- Look at your data before analysing
- Visualisation helps spot patterns
- Correlation ≠ causation
- Always think about confounding, bias, and missing data


## 📚 Recommended Resources

- [Colab Intro](https://colab.research.google.com/notebooks/intro.ipynb)
- [Python for Beginners](https://swcarpentry.github.io/python-novice-gapminder/)
- [Pandas Cheat Sheet (PDF)](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)
- [Python Tutor – Visual Code Tracer](https://pythontutor.com/)


## ✅ Tasks for You

1. List all the variables in the dataset. Which are categorical, which are numeric?
2. Are there any missing values?
3. Make a plot of one of the variables using `sns.histplot`, `sns.boxplot`, or `sns.countplot`.
4. Write 1 strength and 1 limitation of this dataset.
