# 01 – Introduction to Colab and Jupyter

FB2NEP – Nutritional Epidemiology and Public Health

In the epidemiology component of FB2NEP, we will use **Jupyter notebooks** (in particular **Google Colab**) as a practical environment to:

- run small examples,
- inspect and summarise datasets,
- produce simple plots,
- and introduce core ideas of reproducible and transparent analysis.

The purpose of this notebook is to give you a first overview of:

1. How the teaching notebooks are stored in GitHub and opened in **Google Colab**.
2. How to run and edit notebook cells.
3. A brief introduction to **Python** and basic programme structure.
4. How to import and use core libraries: **NumPy**, **pandas**, and **Matplotlib**.
5. How to load and explore a small **hippo dietary survey** dataset.
6. Why notebooks are helpful for **reproducibility and open science**.

You do not need prior programming experience. The examples are small, explained line by line, and you will use them mainly as tools to understand epidemiological ideas.

> **Setup cell**  
>
> Please run the cell below. It prepares the environment in Colab.  
> You may see messages or warnings (for example *“not authored by Google”*).  
> These are normal and can be ignored.  
> If Colab asks, choose **“Run anyway”**.  
>
> If you are interested in the details, you can find an explanation at the very end of this notebook.

In [None]:
# ============================================================
# FB2NEP bootstrap cell (works both locally and in Colab)
#
# What this cell does:
# - Ensures that we are inside the fb2nep-epi repository.
# - In Colab: clones the repository from GitHub if necessary.
# - Loads and runs scripts/bootstrap.py.
# - Makes the main dataset available as the variable `df`.
#
# Important:
# - You may see messages printed below (for example from pip
#   or from the bootstrap script). This is expected.
# - You may also see WARNINGS (often in yellow). In most cases
#   these are harmless and can be ignored for this module.
# - The main thing to watch for is a red error traceback
#   (for example FileNotFoundError, ModuleNotFoundError).
#   If that happens, please re-run this cell first. If the
#   error persists, ask for help.
# ============================================================

import os
import sys
import pathlib
import subprocess
import importlib.util

# ------------------------------------------------------------
# Configuration: repository location and URL
# ------------------------------------------------------------
# REPO_URL: address of the GitHub repository.
# REPO_DIR: folder name that will be created when cloning.
REPO_URL = "https://github.com/ggkuhnle/fb2nep-epi.git"
REPO_DIR = "fb2nep-epi"

# ------------------------------------------------------------
# 1. Ensure we are inside the fb2nep-epi repository
# ------------------------------------------------------------
# In local Jupyter, you may already be inside the repository,
# for example in fb2nep-epi/notebooks.
#
# In Colab, the default working directory is /content, so
# we need to clone the repository into /content/fb2nep-epi
# and then change into that folder.
cwd = pathlib.Path.cwd()

# Case A: we are already in the repository (scripts/bootstrap.py exists here)
if (cwd / "scripts" / "bootstrap.py").is_file():
    repo_root = cwd

# Case B: we are outside the repository (for example in Colab)
else:
    repo_root = cwd / REPO_DIR

    # Clone the repository if it is not present yet
    if not repo_root.is_dir():
        print(f"Cloning repository from {REPO_URL} into {repo_root} ...")
        subprocess.run(["git", "clone", REPO_URL, str(repo_root)], check=True)
    else:
        print(f"Using existing repository at {repo_root}")

    # Change the working directory to the repository root
    os.chdir(repo_root)
    repo_root = pathlib.Path.cwd()

print(f"Repository root set to: {repo_root}")

# ------------------------------------------------------------
# 2. Load scripts/bootstrap.py as a module and call init()
# ------------------------------------------------------------
# The shared bootstrap script contains all logic to:
# - Ensure that required Python packages are installed.
# - Ensure that the synthetic dataset exists (and generate it
#   if needed).
# - Load the dataset into a pandas DataFrame.
#
# We load the script as a normal Python module (fb2nep_bootstrap)
# and then call its init() function.
bootstrap_path = repo_root / "scripts" / "bootstrap.py"

if not bootstrap_path.is_file():
    raise FileNotFoundError(
        f"Could not find {bootstrap_path}. "
        "Please check that the fb2nep-epi repository structure is intact."
    )

# Create a module specification from the file
spec = importlib.util.spec_from_file_location("fb2nep_bootstrap", bootstrap_path)
bootstrap = importlib.util.module_from_spec(spec)
sys.modules["fb2nep_bootstrap"] = bootstrap

# Execute the bootstrap script in the context of this module
spec.loader.exec_module(bootstrap)

# The init() function is defined in scripts/bootstrap.py.
# It returns:
# - df   : the main synthetic cohort as a pandas DataFrame.
# - CTX  : a small context object with paths, flags and settings.
df, CTX = bootstrap.init()

# Optionally expose a few additional useful variables from the
# bootstrap module (if they exist). These are not essential for
# most analyses, but can be helpful for advanced use.
for name in ["CSV_REL", "REPO_NAME", "REPO_URL", "IN_COLAB"]:
    if hasattr(bootstrap, name):
        globals()[name] = getattr(bootstrap, name)

print("Bootstrap completed successfully.")
print("The main dataset is available as the variable `df`.")
print("The context object is available as `CTX`.")


## 1. Where are the notebooks and how do I open them in Colab?

All teaching notebooks for FB2NEP live in a **read-only GitHub repository**:

- GitHub repository:  
  https://github.com/ggkuhnle/fb2nep-epi
- Published site (easier browsing):  
  https://ggkuhnle.github.io/fb2nep-epi/

You will usually access notebooks via the **published site**. For each notebook there is a link or badge labelled something like **"Open in Colab"**.

Typical workflow during the module:

1. Go to the published site and navigate to the notebook for the week.
2. Click the **"Open in Colab"** link or badge.
3. Colab will open the notebook in your browser.
4. At the top of the notebook Colab may display a warning such as:
   > This notebook was not authored by Google.
   
   This is a standard message. In the context of this module it simply means that the notebook comes from the course repository, not from Google. You can safely choose **"Run anyway"** for FB2NEP notebooks.
5. Once the notebook is open in Colab, use **File → Save a copy in Drive** to create **your own copy**. All your edits will then be stored in your Google Drive.

The original notebooks in GitHub remain unchanged. You cannot accidentally damage them. You work in your own copy.

## 2. Notebook basics: cells and Markdown

A notebook consists of **cells** arranged from top to bottom.

- **Code cells** contain Python code and produce outputs such as numbers, tables, or plots.
- **Text cells** (Markdown cells) contain formatted text for headings, lists, and explanations.

To run a code cell:
1. Click inside the cell.
2. Press **Shift + Enter** (or click the small play button on the left in Colab).

The output will appear directly below the cell.

**Markdown** is a light-weight mark-up language that controls basic formatting (headings, lists, bold, italics). In this module you only need a very small subset, and you can look it up when needed. A concise reference is available at:

- https://www.markdownguide.org/basic-syntax/

In the rest of this notebook we will focus on code cells and data handling.

### 2.1 First code cell: a simple message

Run the cell below. It prints a short message and demonstrates the basic **code → output** pattern.

When you run a cell in **Colab** you may see one or both of the following:

- A yellow bar at the top saying something like  
  *“This notebook was not authored by Google”* with a button **Run anyway**.  
  For FB2NEP notebooks this is expected: the code comes from the module GitHub repository, not from Google. Choose **Run anyway** to continue.

- Extra text in the output area such as  
  *“Connecting to kernel…”* or *“Setting up notebook environment”*.  
  These are system messages from Colab, not errors. As long as the expected output (a short printed message) appears under the cell, the code has run successfully.


In [None]:
# Run this cell (Shift + Enter)
print("Hello, FB2NEP")

#### Try it

- Change the text inside the quotation marks and run the cell again.
- Add a second line, for example:

```python
print(2 + 3)
```

Run the cell again and observe that both lines of output appear under the cell.

## 3. A very brief introduction to Python

This section introduces three ideas that are useful throughout FB2NEP:

1. What a **Python programme** is and why **indentation** matters.
2. What **libraries** are and how to use them.
3. Very basic **programme structure**: a condition (`if`) and a loop (`for`).

### 3.1 Core Python data structures (lists, sets, dictionaries)

Before we look at programmes and libraries, it is useful to know the basic data structures that appear throughout the notebooks. These are the building blocks for more advanced tools such as pandas.

We will mainly use:

- **List**  
  An ordered collection of items. Lists can contain duplicates and can be changed.  
  Example: a list of hippo names:  
  `["Helga", "Bruno", "Ama"]`

- **Set**  
  An unordered collection of **unique** items. Sets automatically remove duplicates.  
  Example: the unique habitats in a list:  
  `{"River", "Lake", "Zoo"}`

- **Dictionary** (`dict`)  
  A collection of key–value pairs. Each key maps to a value.  
  Example: information about one hippo:  
  `{"name": "Helga", "age_years": 5, "habitat": "River"}`

These are standard Python structures. In practice we often start from **lists** and **dictionaries**, and then build a pandas **DataFrame** from them.


In [None]:
# Lists: ordered collections of items.

# A list of hippo names.
hippo_names = ["Helga", "Bruno", "Ama", "Jessica"]

# A list of ages, in the same order.
hippo_ages = [5, 12, 9, 15]

print("Names:", hippo_names)
print("Ages: ", hippo_ages)

print("\nType of hippo_names:", type(hippo_names))

# Accessing elements by position (indexing starts at 0).
print("\nFirst hippo:", hippo_names[0])
print("Age of first hippo:", hippo_ages[0])

# Adding a new hippo to the list.
hippo_names.append("Jessica")
hippo_ages.append(3)

print("\nAfter appending a new hippo:")
print("Names:", hippo_names)
print("Ages: ", hippo_ages)


In [None]:
# Sets: collections of unique items.

# Suppose we have a list of habitats with duplicates.
habitats_list = ["River", "River", "Lake", "Zoo", "Lake", "River"]

print("Habitats list:", habitats_list)

# Convert to a set to obtain only unique habitats.
habitats_set = set(habitats_list)

print("Unique habitats (set):", habitats_set)
print("Type of habitats_set:", type(habitats_set))


In [None]:
# Dictionaries: key–value pairs.

# Information about one hippo.
hippo_info = {
    "name": "Helga",
    "age_years": 5,
    "habitat": "River"
}

print("Hippo info dictionary:", hippo_info)
print("Type:", type(hippo_info))

# Access values by key.
print("\nHippo name:", hippo_info["name"])
print("Hippo age:", hippo_info["age_years"])
print("Hippo habitat:", hippo_info["habitat"])

# A dictionary is also useful for look-up tables.
# For example, baseline grass intake (kg/day) by habitat.
baseline_grass = {
    "River": 55.0,
    "Lake": 45.0,
    "Zoo": 50.0
}

print("\nBaseline grass intake for river hippos:",
      baseline_grass["River"], "kg per day")


Later in this notebook we will use these basic structures to build a pandas **DataFrame**, for example by combining:

- a list of `hippo_id` values,
- a list of `name` values,
- and a dictionary that maps column names to those lists.

This will give us a table that is easier to analyse and plot.


### 3.2 What is a Python programme?

A Python programme is a sequence of **statements** that will be executed from top to bottom. In a notebook the programme is effectively the combination of all code cells that you run.

Important points:

- Python uses the **line order**: earlier lines usually run before later lines.
- Python uses **indentation** (spaces at the beginning of a line) to define structure. Indentation is not cosmetic formatting; it is part of the language.
- Lines starting with `#` are **comments** and are ignored by Python. They are for humans.

The small example below shows indentation and comments.

In [None]:
# A tiny example that uses indentation and a comment.

hippo_age = 12  # age in years

if hippo_age >= 10:
    # This line is indented and belongs to the 'if' block.
    print("This is an older hippo.")
else:
    # This line belongs to the 'else' block.
    print("This is a younger hippo.")

### 3.3 Conditions and loops (basic structure)

The two most common control structures are:

- **Condition**: `if condition: ... else: ...` to choose between two branches.
- **Loop**: `for item in collection: ...` to repeat an action for each element of a sequence.

The following example uses a `for` loop to look at several hippo ages.

In [None]:
# Example: loop over a list of hippo ages.

hippo_ages = [3, 7, 12]

for age in hippo_ages:
    if age >= 10:
        print("Age", age, "→ older hippo")
    else:
        print("Age", age, "→ younger hippo")

### 3.4 Libraries and how to use them

The Python standard library is small. Most data analysis tools live in **libraries** that you import when you need them.

In FB2NEP we will mainly use three libraries:

| Library | Typical import | Main purpose |
|--------|-----------------|--------------|
| NumPy  | `import numpy as np` | Fast numerical operations and random numbers |
| pandas | `import pandas as pd` | Reading, cleaning, and summarising tabular data |
| Matplotlib | `import matplotlib.pyplot as plt` | Creating plots |

We will now import these libraries. In Colab they are already installed.

In [None]:
# Only run this cell if Colab reports that a library is missing.
# In the teaching environment this step is usually not necessary.
%pip install numpy pandas matplotlib --quiet

In [None]:
# Import the core libraries used in this module.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Print versions (useful for reproducibility and debugging).
import sys, matplotlib
print("Python:", sys.version.split()[0])
print("NumPy:", np.__version__)
print("pandas:", pd.__version__)
print("Matplotlib:", matplotlib.__version__)

### 3.5 Objects, attributes, and methods (very briefly)

Libraries such as pandas and Matplotlib are built around **objects**. Examples include:

- a pandas **DataFrame** (a table),
- a pandas **Series** (a single column),
- a Matplotlib **Figure** (a plot canvas).

Objects usually provide:

- **attributes** (properties that you access with a dot, for example `hippos.shape`),
- **methods** (functions that belong to the object, for example `hippos.mean()` or `hippos.describe()`).

You will see this in practice when we work with the hippo dietary survey.

## 4. Hippo dietary survey: loading a small dataset

For this module we will use a small **hippo dietary survey** as a toy example. In the teaching repository it is stored as a CSV file at:

- `data/hippo_diet_survey.csv`

Each row represents one hippo. Columns include for example:

- `hippo_id` – unique identifier,  
- `name` – hippo name,  
- `age_years` – age in years,  
- `habitat` – for example `River`, `Lake`, `Zoo`,  
- `fruit_portions` – fruit portions per day,  
- `veg_portions` – vegetable portions per day,  
- `grass_kg` – kilograms of grass consumed per day.

This is a typical situation in applied work: **someone has prepared a dataset**, and your task is to load it, inspect it, and analyse it.

In this section we will use the hippo survey to illustrate:

- loading data from CSV,  
- basic inspection (rows, columns, types),  
- looking at individual variables,  
- using object methods such as `.mean()` and `.groupby()`,  
- creating a simple plot.


### 4.1 Loading the hippo dataset from CSV

In a typical workflow the data file already exists and you only need to **read** it. The code below reads `data/hippo_diet_survey.csv` into a pandas **DataFrame** called `hippos` and shows the first few rows.

In [None]:
# Load the hippo dietary survey from the CSV file.
# This assumes you are running the notebook from the root of the repository,
# with the data file present in the "data" subfolder.

import os

hippo_path = "data/hippo_diet_survey.csv"

if not os.path.exists(hippo_path):
    raise FileNotFoundError(
        f"Could not find {hippo_path}. "
        "Please check that you are running this notebook in the FB2NEP repository "
        "and that the data file is present."
    )

hippos = pd.read_csv(hippo_path)
hippos.head()

### 4.2 Inspecting the data and variables

The `hippos` object is a pandas **DataFrame**. It has attributes and methods that help you to understand the structure of the data.

Commonly used methods and attributes include:

- `hippos.shape` – attribute with number of rows and columns.
- `hippos.columns` – attribute with column names.
- `hippos.info()` – method with data types and missing values.
- `hippos.describe()` – method with summary statistics for numeric columns.

Run the cell below and examine the output carefully.

In [None]:
# Inspect the structure of the hippo dataset.

print("Shape (rows, columns):", hippos.shape)

print("\nColumn names:")
print(hippos.columns.tolist())

print("\nBasic information:")
hippos.info()

print("\nSummary statistics for numeric columns:")
hippos.describe()

### 4.3 Using methods to summarise the hippo data

Many operations are available as **methods**. For example:

- `hippos["fruit_portions"].mean()` computes the mean of the `fruit_portions` column.
- `hippos.groupby("habitat")["grass_kg"].mean()` computes the mean grass intake per habitat.

These methods are part of the **object-oriented** design of pandas: the DataFrame and Series objects provide the relevant functionality via the dot notation.

In [None]:
# Mean fruit portions per day (all hippos).
mean_fruit = hippos["fruit_portions"].mean()
print(f"Mean fruit portions per day (all hippos): {mean_fruit:.2f}")

# Mean grass intake per habitat.
mean_grass_by_habitat = hippos.groupby("habitat")["grass_kg"].mean()
print("\nMean grass intake (kg per day) by habitat:")
print(mean_grass_by_habitat)

# Example of a simple condition on the DataFrame: hippos older than 10 years.
older_hippos = hippos[hippos["age_years"] > 10]
print("\nNumber of hippos older than 10 years:", len(older_hippos))

### 4.4 Plotting the hippo data

We can now create a simple plot using **Matplotlib**. A common pattern is:

1. Prepare a summary table in pandas.
2. Pass the summary values to Matplotlib.

Below we create a bar chart of **mean grass intake by habitat**.

In [None]:
# Prepare the summary again (for clarity).
mean_grass_by_habitat = hippos.groupby("habitat")["grass_kg"].mean()

# Create a bar chart.
plt.figure()
plt.bar(mean_grass_by_habitat.index, mean_grass_by_habitat.values)
plt.xlabel("Habitat")
plt.ylabel("Mean grass intake (kg per day)")
plt.title("Hippo grass intake by habitat")
plt.xticks(rotation=15)
plt.tight_layout()
plt.show()

#### Try it

Using the `hippos` DataFrame:

1. Compute the mean fruit portions per habitat using `groupby` and `mean`.
2. Create a bar chart for mean fruit portions by habitat.
3. Change the title and axis labels of the plot so that they describe your new chart.

Optional:
- Check for missing values using `hippos.isna().sum()`.
- Select only hippos from one habitat, for example river hippos:  
  `river_hippos = hippos[hippos["habitat"] == "River"]`.

## 5. Reproducibility and open-science principles

One key reason to use notebooks and version-controlled repositories (GitHub) in nutritional epidemiology is **reproducibility**.

In a reproducible analysis:

- The path from data to results is visible.
- Another researcher (or your future self) can rerun the analysis and obtain the same numbers and plots.
- Important choices (for example exclusion criteria, variable definitions) are documented in text near the code.

Notebooks help with this because they combine:

- code (what you did),
- outputs (what you obtained),
- and explanations (why you did it).

The teaching notebooks for FB2NEP are stored in a **Git repository** on GitHub. Git records the history of changes over time. In later parts of your degree you may use Git directly for your own projects.

For now, a few simple good practices are sufficient:

- Keep notebooks and data in a consistent folder structure.
- Use clear, descriptive variable names in code.
- Record decisions in short Markdown notes.
- Fix a random seed (`np.random.seed(...)`) when you use random numbers, so that results are repeatable.
- When possible, use open formats such as CSV for data and share both data (if appropriate) and analysis code.

In [None]:
# Small demonstration of a fixed random seed.
# Run this cell several times and check that the numbers stay the same.

np.random.seed(11088)
values = np.random.normal(loc=0, scale=1, size=5)
print("Random values:", values)

## 6. Recap

In this introductory notebook you have:

- seen how the FB2NEP notebooks live in a GitHub repository and are opened in **Google Colab**,
- run and edited simple Python code cells,
- learned that indentation and line order matter for Python programmes,
- imported and briefly used the core libraries **NumPy**, **pandas**, and **Matplotlib**,
- created and loaded a small **hippo dietary survey** dataset from a CSV file,
- inspected the dataset with methods such as `.head()`, `.info()`, `.describe()`,
- used methods such as `.mean()` and `.groupby()` to summarise variables,
- produced a simple bar chart from summarised data,
- and discussed how notebooks and Git support **reproducible and transparent** analyses.

These elements will recur throughout the FB2NEP epidemiology materials. The aim is that the tools become familiar so that you can concentrate on the underlying nutritional and epidemiological questions.

---



## Appendix: running notebooks locally

If you prefer, you can also run the notebooks on your own computer instead of Colab. This requires a local installation of Python and Jupyter.

Two common approaches are:

1. **Conda / Miniconda** (recommended for beginners):

   ```bash
   conda create -n fb2nep python=3.11 -y
   conda activate fb2nep
   conda install jupyterlab numpy pandas matplotlib -y
   jupyter lab
   ```

2. **`venv` and `pip`**:

   ```bash
   python -m venv fb2nep
   # macOS / Linux
   source fb2nep/bin/activate
   # Windows (PowerShell)
   fb2nep\\Scripts\\activate
   pip install jupyterlab numpy pandas matplotlib
   jupyter lab
   ```

Key concepts:

- **Environment**: an isolated Python installation with its own set of packages.
- **Kernel**: the Python process that executes the code of a notebook.
- **Working directory**: the folder from which the notebook reads and writes files.


## Appendix: What the setup / bootstrap cell does

At the top of this notebook you were asked to run a **setup (bootstrap) cell** and to ignore the details for now.  
This appendix explains what that cell and the underlying `scripts/bootstrap.py` file actually do.

### 1. The setup cell in the notebook

The notebook code you saw looked roughly like this:

```python
import os
import sys
import runpy
import pathlib
import subprocess

REPO_URL = "https://github.com/ggkuhnle/fb2nep-epi.git"
REPO_NAME = "fb2nep-epi"

# 1. If we are in Colab and scripts/bootstrap.py is not present,
#    clone the repository and change into it.
if "google.colab" in sys.modules and not pathlib.Path("scripts/bootstrap.py").exists():
    root = pathlib.Path("/content")
    repo_dir = root / REPO_NAME

    if not repo_dir.exists():
        print(f"Cloning {REPO_URL} …")
        subprocess.run(["git", "clone", REPO_URL], check=True)

    os.chdir(repo_dir)
    print("Changed working directory to:", os.getcwd())

# 2. Now try to locate and run scripts/bootstrap.py
for p in ["scripts/bootstrap.py", "../scripts/bootstrap.py", "../../scripts/bootstrap.py"]:
    if pathlib.Path(p).exists():
        print(f"Bootstrapping via: {p}")
        runpy.run_path(p)
        break
else:
    print("⚠️ scripts/bootstrap.py not found – "
          "please check that the FB2NEP repository is available.")
```

This does two main things:

1. **If running in Google Colab and the repository is not present**, it clones the `fb2nep-epi` repository from GitHub into `/content/fb2nep-epi` and changes the working directory to that folder.

2. It then searches the current directory (and parents) for `scripts/bootstrap.py` and **runs it** as soon as it is found.

---

### 2. Goals of `scripts/bootstrap.py`

`bootstrap.py` is designed to solve three practical problems:

1. **Finding the repository root**  
   In Colab the working directory is often `/content`.  
   The script ensures the notebook ends up in the FB2NEP repository (the folder containing `scripts/` and `notebooks/`).

2. **Ensuring that required Python packages are available**  
   It checks whether libraries like `numpy`, `pandas`, `matplotlib`, and `statsmodels` can be imported.  
   - In **Colab**, if they are missing, it installs them.  
   - On a **local machine**, it prints a warning rather than installing anything automatically.

3. **Ensuring that the main teaching dataset exists**  
   It checks whether the primary synthetic FB2NEP dataset (e.g. `data/synthetic/fb2nep.csv`) is present.  
   - If missing, it runs the generator script (e.g. `scripts/generate_dataset.py`).  
   - In Colab it can also prompt for manual upload.

These steps are handled by the helper functions `ensure_repo_root`, `ensure_deps`, and `ensure_data`.

---

### 3. Details of the main functions

**(a) `ensure_repo_root()`**

- Looks for a directory that contains both `scripts/` and `notebooks/`.  
- Moves up one directory if the notebook was opened from within `notebooks/`.  
- In Colab, clones the repository if it is not present.

---

**(b) `ensure_deps()`**

- Attempts to import `numpy`, `pandas`, `matplotlib`, and `statsmodels`.  
- In Colab, installs missing dependencies.  
- Locally, prints a warning but does not modify the environment.

---

**(c) `ensure_data(csv_rel, gen_script)`**

- Checks whether the dataset exists.  
- Attempts to generate it via the relevant script if necessary.  
- In Colab, as a final fallback, requests manual upload.

---

### 4. The `init()` function

The `init()` function in `bootstrap.py`:

1. Calls the helper functions (`ensure_repo_root`, `ensure_deps`, `ensure_data`).  
2. Loads the primary dataset into a DataFrame `df`.  
3. Creates a small context object `ctx`.  
4. Injects helper variables (`df`, `CTX`, `CSV_REL`, `REPO_ROOT`, `IN_COLAB`) into the notebook environment.

Example use:

```python
from scripts.bootstrap import init
df, ctx = init()
df.head()
```

---

### 5. Why this is hidden at the top

The purpose of the setup cell is to ensure that:

- the notebook runs in the correct **repository directory**,  
- required Python libraries are available,  
- required datasets **exist and can be loaded**.

Once this is done, the rest of the notebook can focus on **epidemiology and analysis**, not technical setup.
