# Introduction to Nutritional Epidemiology - FB2NEP (Part 1)

Welcome to the Nutritional Epidemiology part of FB2NEP. In the *nutritional epidemiology* part, we will use **Jupyter notebooks** to **demonstrate and explain key concepts in nutritional epidemiology** and to **work through small, realistic examples** step by step.  

You can **run the code, edit it, and re-run it** to see how changes affect the results‚Äîan ideal way to practise methods you will use in assignments and perhaps even your own projects.


> You do not need to **‚Äúknow how to code‚Äù** to succeed in this module. The code used here is **simple, fully explained**, and always accompanied by interpretation. You can treat it as a tool to **illustrate and explore concepts**, rather than as something to master in itself. **In other words, you will learn *with* the code, not *through* coding.**

These notebooks are stored in a **shared, read-only repository**, so you **cannot accidentally overwrite or damage** the original files. Please feel free to **experiment, modify code, and explore** --- this is the best way to learn.

**To save your own work:**

- In **Google Colab**, go to *File ‚Üí Save a copy in Drive*.  
- If you are using a **local Jupyter installation**, simply *Save As‚Ä¶* and choose a different filename.

> What is a ‚Äúlocal installation‚Äù? See the Appendix for setup instructions

---

This introductory notebook provides **detailed guidance on how to use Jupyter notebooks**, including:
- Accessing the notebooks.
- Running and editing cells,
- Understanding outputs and Markdown formatting,
- Saving and managing your own copies,
- And using notebooks to **explore and practise core concepts in nutritional epidemiology**.

You can safely try everything demonstrated here‚Äînothing you do will affect the original teaching materials.


## Where to find the learning resources?

These teaching notebooks live in a **read-only repository**. You only need a **web browser** to view and run them (e.g. in Google Colab). You **cannot break** the originals -- so please **experiment freely**.

- **Source repository (GitHub):** https://github.com/ggkuhnle/fb2nep-epi  
  The **README.md** contains all essential information and links.
- **Published site (browse everything easily):** https://ggkuhnle.github.io/fb2nep-epi/

### Using the notebooks in the browser
- Open a notebook from the published site or repository.
- If using **Google Colab**, choose *File ‚Üí Save a copy in Drive* to keep your own editable version.
- Work through cells top-to-bottom; you can rerun or modify cells safely.

### If you prefer to run locally
You can also use these notebooks on **your own computer** (no internet required once set up).  
See **[Appendix: Local installation](#appendix-local-installation)** for a short guide.

> Tip: Nothing you do here will overwrite the originals. If you want to keep changes, save a copy (Colab) or use *Save As‚Ä¶* locally.

---

<details>
<summary><strong>What‚Äôs in this repository?</strong></summary>

- Teaching notebooks for FB2NEP (Nutritional Epidemiology & Public Health)  
- Example code, small datasets, and step-by-step exercises  
- Links to documentation and further reading

Everything is explained in the repo‚Äôs **README.md**, including how the materials are organised and how to get started quickly.
</details>


## Jupyter notebooks

A **Jupyter Notebook** is an interactive environment that lets you combine **code, text, images, and visualisations** in a single, shareable document. They are widely used in science, education, and data analysis because they make it easy to **run code step-by-step**, document your process, and share reproducible results.

In a Jupyter Notebook:
- Each **cell** can contain either **Python code** or **Markdown text**.
- You can run code one cell at a time ‚Äî this helps you experiment and explore data interactively.
- The **output** (tables, plots, or text) appears directly below the code that generated it.
- Markdown cells (like this one) are used for explanations, instructions, and formatting.

> üí° *Jupyter* stands for **Ju**lia, **Py**thon, and **R**, the three core languages it was originally designed for ‚Äî but today it supports many more.

<details>
<summary><strong>Background: A (very) short history of Jupyter</strong></summary>

- **2001 ‚Äî IPython begins:** Fernando P√©rez starts **IPython**, an enhanced interactive Python shell for scientific computing.  
- **2011 ‚Äî IPython Notebook:** A browser-based, document-centred interface appears (the `.ipynb` format), letting users mix code, text and outputs.  
- **2014 ‚Äî Project Jupyter announced:** The project becomes **language-agnostic** and separates from IPython‚Äôs Python-only core. The name ‚ÄúJupyter‚Äù nods to **Ju**lia, **Py**thon and **R**. The **kernel** architecture and **messaging protocol** enable many languages to run in notebooks.  
- **2015‚Äì2017 ‚Äî Ecosystem grows:** Tools like **JupyterHub** (multi-user servers for classes/labs) and **nbconvert** (exporting notebooks) mature; **Binder/mybinder.org** appears for shareable, reproducible environments.  
- **2018‚Äì2021 ‚Äî JupyterLab era:** **JupyterLab**, a more flexible, IDE-like interface, moves from beta to stable and continues to evolve; the classic Notebook interface remains available.  
- **Today:** Jupyter supports dozens of languages via kernels (R, Julia, Python, and many more), powers education and research, and sits at the centre of many reproducible data-science workflows.

**Further reading**
- Project overview: https://jupyter.org/about  
- Jupyter documentation: https://docs.jupyter.org/en/latest/  
- IPython history (context): https://ipython.readthedocs.io/en/stable/overview/history.html  
- JupyterLab docs: https://jupyterlab.readthedocs.io/

</details>

## How to use this notebook
- **Run cells in order** from top to bottom (Shift+Enter).  
- If you‚Äôre in **Google Colab**:  
  - *File ‚Üí Save a copy in Drive* to keep your own version.  
  - *Runtime ‚Üí Restart runtime* if things get out of sync, then run cells again from the top.  
- If running locally: ensure the required packages are installed (instructions are provided below when needed).

## What you should try
- **Tweak parameters** in the examples (e.g. sample size, effect sizes, noise) and observe the impact.  
- **Add short Markdown notes** explaining what you changed and what you learned.  
- **Re-run plots/tables** and check that results are consistent with expectations.

> ‚úÖ **Learning goal:** by the end, you should feel confident running cells, reading outputs, and making small edits to explore ‚Äúwhat-if‚Äù scenarios‚Äîthis is the essence of hands-on epidemiology with data.

---

<details>
<summary><strong>Practical etiquette & tips</strong></summary>

- Run one cell at a time; watch for errors and read the messages.
- Keep variable names clear; avoid reusing the same name for different things.
- When unsure, re-run from the top to ensure a clean state.
- Write short reflections in Markdown so future-you remembers what you did.
</details>

# Let's get started

## Your first code cell: ‚ÄúHello, world!‚Äù

Run the cell below (Shift+Enter) to print a message.  
This shows how **code ‚Üí output** works in a notebook.

**Expected output:** a single line saying `Hello, world!`

In [None]:
# Run me (Shift+Enter)
print("Hello, world!")

### Try it
- Change the message to your own text, e.g. `print("Hallo, FB2NEP!")`
- Add a second line that does a quick calculation:
  ```python
  print(2 + 2)
  ```

- Run the cell again and check the output appears directly under the code.

<details> <summary><strong>Troubleshooting</strong></summary>

If nothing happens, make sure you clicked inside the cell before pressing Shift+Enter.

If you see an error, re-run the cell. If problems persist, try Runtime ‚Üí Restart and run all (Colab) or Kernel ‚Üí Restart Kernel and Run All (JupyterLab).

</details>

## Using packages (libraries): a super-simple example

In Python, **packages** (also called libraries) add extra features so you don‚Äôt have to code everything yourself.

Typical steps:
1) *(If needed)* **install** a package into your environment.  
2) **import** it in your notebook.  
3) **use** its functions.

Below we‚Äôll:
- install (if needed) and import three common libraries,
- make a tiny table of **fruit portions per day**,
- calculate a simple **average**,
- and draw a quick **bar chart**.


### Common Python packages you‚Äôll see in this module

Below is a quick guide to the libraries we‚Äôll use most often. You don‚Äôt need to memorise any of this ‚Äî treat it as a mini cheat-sheet you can scroll back to.

| Package | What it‚Äôs for (in plain English) | Typical import | One-liner example |
|---|---|---|---|
| **NumPy** | Fast maths on arrays; random numbers; underpins many other libraries. | `import numpy as np` | `np.mean([1, 2, 3])` |
| **pandas** | Tables (data frames); reading CSV/Excel; filtering, grouping, reshaping. | `import pandas as pd` | `pd.read_csv("data.csv").head()` |
| **Matplotlib** | Core plotting (line, bar, scatter, histograms). | `import matplotlib.pyplot as plt` | `plt.plot([1,2,3]); plt.show()` |
| **Seaborn** *(optional)* | Nicer statistical plots built on Matplotlib. | `import seaborn as sns` | `sns.barplot(x="day", y="value", data=df)` |
| **SciPy** | Scientific utilities (statistics, optimisation, signals). | `from scipy import stats` | `stats.ttest_ind(a, b)` |
| **statsmodels** | Classical statistical models (OLS/logistic/regression summaries). | `import statsmodels.api as sm` | `sm.OLS(y, X).fit().summary()` |
| **scikit-learn** | Machine learning (train/test split, models, metrics). | `from sklearn.model_selection import train_test_split` | `X_tr, X_te, y_tr, y_te = train_test_split(X, y)` |
| **Plotly** *(optional)* | Interactive plots you can hover/zoom/save. | `import plotly.express as px` | `px.scatter(df, x="x", y="y")` |


> In most FB2NEP notebooks we‚Äôll prioritise **NumPy**, **pandas**, and **Matplotlib**. We‚Äôll introduce others only when useful.

---

### Installing vs importing (very short reminder)
- **Importing** uses what‚Äôs already installed: `import pandas as pd`.  
- **Installing** adds a package to your environment (usually once per environment):  

```python
# In a notebook, prefer %pip so it targets the current kernel
%pip install numpy pandas matplotlib
```

In Colab, many packages are already present; installing often isn‚Äôt needed.



### Try it

Use libraries to analyse food intake data.

In [None]:
# Only run this install cell if something is missing.
# In Colab it will likely say "Requirement already satisfied".
%pip install numpy pandas matplotlib --quiet

In [None]:
# Import the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In the next cell, we store data in a **list**. A list in Python is an ordered collection of items ‚Äî in this case,
- the first list contains text (strings),
- the second contains numbers (integers).

Lists are written inside square brackets [ ], and the items are separated by commas.

# Tiny example data: fruit portions eaten each day over a week
days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
fruit_portions = [2, 1, 3, 2, 4, 1, 3]

Now, we can summarise the data and calculate the average amount of fruits and vegetables consumed.

- **`average = np.mean(fruit_portions)`**  
  Uses **NumPy** to compute the **mean** (arithmetic average) of the numeric list `fruit_portions`.

- **`df = pd.DataFrame({"day": days, "fruit_portions": fruit_portions})`**  
  Creates a **pandas DataFrame** (a tidy table) with two columns.  
  Each position in `days` is paired with the value at the **same position** in `fruit_portions`.

- **`display(df)`**  
  Nicely renders the table **below the cell** in Jupyter/Colab.

- **`print(f"\nAverage fruit portions per day: {average:.2f}")`**  
  Prints a message using an **f-string**; `{average:.2f}` formats the number to **2 decimal places**.  
  `\n` adds a blank line before the message for readability.

In [None]:
# Use numpy to calculate a simple statistic
average = np.mean(fruit_portions)

# Make a small, tidy table with pandas
df = pd.DataFrame({"day": days, "fruit_portions": fruit_portions})
display(df)

print(f"\nAverage fruit portions per day: {average:.2f}")

In the next cell, we create a figure:

- **`plt.figure()`**  
  Starts a new, empty **figure** (a plotting canvas). This ensures the plot is created cleanly rather than added to any previous figure.

- **`plt.bar(df["day"], df["fruit_portions"])`**  
  Draws a **bar chart**, using the column `"day"` for the labels on the x-axis and `"fruit_portions"` for the bar heights.

- **`plt.title("Fruit portions per day (example)")`**  
  Adds a **title** to the chart.

- **`plt.ylabel("Portions")`** and **`plt.xlabel("Day")`**  
  Label the y- and x-axes respectively.

- **`plt.show()`**  
  Displays the figure **below the cell**.  
  (In Jupyter notebooks, plots are not always shown automatically, so calling `plt.show()` makes sure they appear.)


In [None]:
# Quick bar chart with matplotlib
plt.figure()
plt.bar(df["day"], df["fruit_portions"])
plt.title("Fruit portions per day (example)")
plt.ylabel("Portions")
plt.xlabel("Day")
plt.show()


### Try it
- Change the numbers in `fruit_portions` (e.g. `[1,2,2,3,2,4,5]`) and re-run the two cells.
- Add a new column (e.g. `veg_portions`) to the table.
- Rename the title and axis labels to match your changes.

---

<details>
<summary><strong>Installing vs importing (quick tips)</strong></summary>

- Use **`%pip install ...`** inside notebooks so the package installs into the **current kernel**.  
- If you install something but still get ‚Äúmodule not found‚Äù, **restart the runtime/kernel** and run the cells again from the top.  
- In **Colab**, many libraries are pre-installed; the install step is often unnecessary.
</details>

<details>
<summary><strong>Troubleshooting</strong></summary>

- **ImportError / ModuleNotFoundError:** run the `%pip install` cell, then restart **Runtime/Kernel** and re-run.  
- **Plots not showing:** ensure the plotting cell ends with `plt.show()`.  
- **Jumbled state:** run cells **top to bottom**; if things look odd, choose *Restart & run all*.
</details>


---
# Appendix and further reading

## Appendix: Local installation

A **local installation** means running Jupyter on **your own computer** (laptop/desktop) rather than in the cloud (e.g. Google Colab). You install Python and Jupyter yourself, and notebooks execute using your machine‚Äôs CPU/GPU, files, and internet connection.

**Why use it?**
- Works **offline** and with **confidential data** stored locally.
- Full **control over packages**, versions, and performance.
- Easier to integrate with local tools (Git, VS Code, command line).

**Two common ways to install**
- **Anaconda/Miniconda (recommended for beginners):**  
  Creates isolated ‚Äúenvironments‚Äù with specific package sets.  
  ```bash
  # Install Miniconda first, then:
  conda create -n fb2nep python=3.11 -y
  conda activate fb2nep
  conda install jupyterlab -y
  jupyter lab
  ```
- **pip (lightweight):**  
  ```bash
  python -m venv fb2nep
  source fb2nep/bin/activate   # Windows: fb2nep\Scripts\activate
  pip install jupyterlab
  jupyter lab
  ```

**Key concepts**
- **Environment:** a sandboxed Python setup (avoids package conflicts).
- **Kernel:** the runtime a notebook uses; each environment provides a kernel.
- **Working directory:** the folder where your notebook reads/writes files.

**Common pitfalls**
- Mixing `pip` and `conda` in the same environment (can break dependencies).
- Installing a package but not seeing it in the notebook ‚Üí ensure the **kernel** matches the environment where you installed it.

