# Introduction to Colab & Python (for FB2NEP)

Welcome ‚Äî this notebook provides a brief introduction to **Colab** and **Python**. This can be used for practice, but also for reference. Like everything else in this module, this notebook assumes **no prior Python experience**. You will learn how to run cells, accept Colab warnings, and make tiny edits to code.

> **Tip:** If you see a message like *‚ÄúThis notebook was not authored by Google‚Äù*, click **Run anyway**. Our teaching notebooks are plain text and safe to run in Colab‚Äôs sandbox.

## 0. What does Python *code* look like?

<details>
<summary><strong>What is Python?</strong></summary>

**Python** is a popular, general-purpose programming language known for being **readable**, **beginner-friendly**, and widely used in **science, data analysis, and education**.

**Key points**
- **Easy to read:** clear, English-like syntax; indentation (spacing) shows structure.
- **Interpreted:** you run code line-by-line (great for notebooks and exploration).
- **‚ÄúBatteries included‚Äù:** comes with many built-in tools; huge ecosystem of libraries.
- **Cross-platform:** works on Windows, macOS, and Linux; also runs in the cloud (Colab).

**Why we use it in FB2NEP**
- Excellent libraries for data work: **NumPy** (numbers), **pandas** (tables), **Matplotlib** (plots).
- Integrates smoothly with **Jupyter/Colab**, so you can mix **code + text + results**.
- Perfect for **transparent, reproducible** analysis.

**Do I need to be a programmer?**
No. You‚Äôll use short, explained snippets to explore concepts. You‚Äôre learning **with** code, not learning to code per se.

**Which version?**
We use **Python 3** (the modern standard). Colab already provides it; local users can install it via **Miniconda/Anaconda** or `venv`.

*Tiny example:*
```python
print("Hello from Python!")
```
</details>

Python is a set of **instructions** written line by line.  
Anything after a `#` on a line is a **comment** ‚Äî it‚Äôs for humans to read and **Python ignores it**.

Example:

```python
# This is a comment ‚Äî it explains the next line
x = 2 + 3            # ‚Üê comment at the end of a line
print(x)             # prints 5 on the screen
```

You **run** code by clicking the little ‚ñ∂ button to the left of a cell, or by pressing **Shift + Enter**.

> If you see a warning like *‚ÄúThis notebook was not authored by Google‚Äù*, click **Run anyway**.  
> Colab runs code safely in the cloud and won‚Äôt change anything on your computer.


### Python has *commands* and *data*

When you run a cell, Python executes **commands** (instructions) that act on **data** (values).

- **Commands**: things like `print(...)`, `len(...)`, `sum(...)`, or your own functions.  
- **Data**: numbers, text, lists, tables, etc., usually stored in **variables** (names you choose).

Tiny example:
```python
name = "Fiona"       # data (text)
n = 3                # data (number)
print("Hello", name) # command acting on data
print("n + 2 =", n + 2)
```


### What do we mean by ‚Äúdata‚Äù?

**Data** are the values your code works with. Common kinds you‚Äôll see here:

- **Numbers**: `3`, `2.5` (counts, measurements)
- **Text (strings)**: `"apple"`, `"F"` (labels, categories)
- **Booleans**: `True` / `False` (yes/no flags)
- **Lists**: `["Mon","Tue","Wed"]`, `[2, 1, 3]` (ordered collections)
- **Dictionaries**: `{"age": 10, "height_cm": 140}` (named pieces of data)
- **Tables (pandas DataFrame)**: spreadsheet-like rows & columns for analysis

Mini examples:
```python
age = 10                           # number
fruit = ["apple", "banana"]        # list of text
heights = {"Alex":150, "Sam":146}  # dictionary
```

In FB2NEP we mostly use numbers, text labels, and tables (pandas DataFrame) because they map naturally to real datasets (participants √ó variables). You will also encounter missing values (shown as NaN) ‚Äî that simply means ‚Äúno recorded value‚Äù.

## 1. Running a cell

Click the ‚ñ∂ icon to the left of a cell, or press **Shift + Enter**.

Try it now ‚Äî run the cell below.


In [None]:
# ----------------------------------------------
# Adding two numbers
# This cell has comments that explain each step.
# Lines starting with '#' are comments and do not run.
# ----------------------------------------------
# ü¶õ Friendly hippo says: Maths can be fun!
2 + 2    # add two numbers


Now **edit** the code above (e.g. change it to `2 + 3`) and run it again. You should see the new result immediately.


## 2. What Python are we using?

In [None]:
# ----------------------------------------------
# Show Python version and executable
# ----------------------------------------------
import sys, platform   # import = bring tools into Python
print("Python executable:", sys.executable)      # where Python lives in Colab
print("Python version:", sys.version.split()[0])  # which Python version


### What are ‚Äúlibraries‚Äù (a.k.a. packages/modules)?

**Libraries** are add-on toolboxes for Python. They provide ready-made functions so you don‚Äôt have to write everything from scratch.

- **Terminology** (you‚Äôll see these used loosely):
  - **Library / package**: the installable toolbox (e.g. *pandas*, *NumPy*).
  - **Module**: a file inside a package (e.g. `matplotlib.pyplot`).
- **Install vs import**:
  - **Install** (usually once per environment): makes the library available to Python.
  - **Import** (every time you run a notebook): brings the library *into* your session.
- **Where do they ‚Äúlive‚Äù?**  
  In your current **environment/kernel** (Colab‚Äôs runtime or your local conda/venv). If you install into one environment but run a notebook with a different kernel, imports will fail.

**Common libraries in FB2NEP**
- **NumPy** (`import numpy as np`) ‚Äî fast maths and arrays.
- **pandas** (`import pandas as pd`) ‚Äî tables (DataFrames), reading CSV/Excel, tidying.
- **Matplotlib** (`import matplotlib.pyplot as plt`) ‚Äî plots and figures.

**Typical workflow**
1) *(If needed)* install  
   ```python
   # In notebooks, prefer %pip so it targets the current kernel
   %pip -q install numpy pandas matplotlib
   ```
2) Import
   ```python
   import numpy as np
   import pandas as pd
   import matplotlib.pyplot as plt
   ```
3) Use
   ```python
   arr = np.array([1, 2, 3])
   df = pd.DataFrame({"x": [1, 2, 3]})
   plt.plot(df["x"]); plt.show()
   ```

> Colab note: many libraries are already installed. If you install new ones, Runtime ‚Üí Restart runtime and re-run the notebook from the top.

Example: how to use libraries.

In [None]:
# Quick library demo (imports + versions)
import numpy as np, pandas as pd, matplotlib
print("NumPy:", np.__version__)
print("pandas:", pd.__version__)
print("Matplotlib:", matplotlib.__version__)


### 3. Variables: storing and using information

A **variable** is a named container that holds a piece of information ‚Äî such as text, numbers, or data.  
You can give variables meaningful names (e.g. `name`, `year`) and then use them in commands.

In this example:
- **`name = "Jessica"`** assigns a piece of text (called a *string*) to the variable `name`.
- **`year = datetime.now().year`** automatically retrieves the *current year* from your computer or Colab environment.
- **`print("Hello", name, "‚Äî welcome to FB2NEP", year)`** combines both variables in one output line.

Variables make your code flexible ‚Äî if you change `name`, the message updates automatically.

In [None]:
# ----------------------------------------------
# Say hello with variables
# ----------------------------------------------
from datetime import datetime   # import module to get current date/time
name = "Jessica"            # a text value (a 'string')
year = datetime.now().year  # a number
print("Hello", name, "‚Äî welcome to FB2NEP", year)   # show something on the screen


## 4. Working with small collections of data: lists and simple calculations

In Python, a **list** is an *ordered collection* of values. It can hold numbers, text, or a mixture of both, and is written inside square brackets `[ ]`, with items separated by commas.

In this example, we have a small list of numbers representing, for instance, the **number of fruit portions eaten by four participants**.  
We then use a few **built-in functions** to calculate basic descriptive statistics:

- **`sum(nums)`** ‚Äî adds up all numbers in the list.  
- **`len(nums)`** ‚Äî counts how many items are in the list (its *length*).  
- **`total / n`** ‚Äî divides the total by the number of items to obtain the **mean** (average).  
- **`print(...)`** ‚Äî displays the results clearly below the cell.

You can change the numbers, re-run the cell, and observe how the total and mean change accordingly.

In [None]:
# ----------------------------------------------
# List of numbers ‚Äî total and mean
# ----------------------------------------------
nums = [3, 5, 8, 12]  # a list: an ordered box of values
total = sum(nums)     # sum() adds up numbers
n     = len(nums)     # Length of list = number of observations
mean = total / n      # average = total divided by how many
print("Numbers:", nums)
print("Total:", total, "Observations:", n, "Mean:", mean)


## 5. First steps with data

**NumPy** is the library Python uses for **numbers and maths with arrays**.  
An array is like a list, but stored very efficiently ‚Äî so you can do big calculations quickly.  

For example, you can add two arrays together in one go, instead of looping through numbers.  
Pandas actually uses NumPy *under the hood* to store and crunch the numbers inside its tables.

**Pandas** is a Python library that makes it easy to handle **tables of data** (like spreadsheets).  
The main building block is the **DataFrame**, which is like a whole sheet with rows and columns.  
You can put numbers or text inside, and then quickly look at, sort, filter, or calculate things.  

Think of pandas as your **spreadsheet inside Python** ‚Äî but much more powerful and flexible.

In [None]:
# ----------------------------------------------
# Create a tiny table (DataFrame)
# ----------------------------------------------
import numpy as np       # import = bring tools into Python
import pandas as pd

# ü¶õ Hippo hint: NumPy = numbers, pandas = tables

# --- NumPy examples ---
# Make a small NumPy array (like a list, but faster for maths)
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)

# Do maths with the whole array at once
print("Array + 10:", arr + 10)     # add 10 to each element
print("Array squared:", arr ** 2)  # square every element


In [None]:
# --- pandas example ---
# Create a tiny DataFrame (like a mini spreadsheet)
df = pd.DataFrame({
  "age": [8, 9, 10, 11, 12],
  "height_cm": [130, 135, 140, 145, 150]
})

# Show the DataFrame
df


Change a number in the table creation above (e.g. one of the heights) and run again. Do you see the change?


## 6. A first plot

**Matplotlib** is Python‚Äôs main tool for making graphs and pictures.  
It lets you draw lines, bars, scatter plots, histograms, and much more.  

Think of it as the **drawing toolbox**:  
- You tell it what data to use (x and y values).  
- You can add labels, titles, and colours.  
- When you‚Äôre ready, `plt.show()` displays the finished picture.  

*(pandas can also plot directly, but under the hood it still uses matplotlib)*.


In [None]:
# ----------------------------------------------
# Line plot of age vs height
# ----------------------------------------------
import matplotlib.pyplot as plt   # plotting library

# ü¶õ Friendly hippo says: A picture tells a story!
plt.plot(df["age"], df["height_cm"], marker="o")   # draw a line with dots
plt.xlabel("Age (years)")   # label the x-axis
plt.ylabel("Height (cm)")   # label the y-axis
plt.title("Example plot")   # add a title
plt.show()                   # display the picture


## 7. Save your own copy (important)

Go to **File ‚Üí Save a copy in Drive**. That creates a personal copy you can edit without affecting the teaching version.

## 8. If something breaks

- **Runtime ‚Üí Restart runtime**, then run cells again from the top.  
- If a library is missing, install it with `!pip install ...` then restart the runtime.


---

## 9. Some further reading

These are ideas you‚Äôll meet later. It‚Äôs okay if they feel new!

- **List**: an ordered collection that you can change.  
- **Tuple**: like a list, but *fixed* once created.  
- **Dictionary (dict)**: pairs of `key: value` (like a mini address book).  
- **Set**: a bag of *unique* items, no duplicates.  
- **OOP (Object-Oriented Programming)**: a way to bundle **data** and **behaviour** together.


In [None]:
# ----------------------------------------------
# Advanced containers (lists, tuples, dicts, sets)
# ----------------------------------------------

# List: ordered, changeable
fruits = ["apple", "banana", "pear"]
fruits.append("apple")        # duplicate allowed
# ü¶õ notice: lists keep order and allow repeats
print("List:", fruits)

# Tuple: ordered, not changeable (immutable)
coords = (51.44, -0.94)       # (lat, lon) for Reading-ish
print("Tuple:", coords)

# Dict: key ‚Üí value mapping
heights = {"Alex": 150, "Sam": 145}
heights["Sam"] = 146          # update a value
print("Dict:", heights)

# Set: unique items, no order
unique_fruits = set(fruits)
print("Set (unique):", unique_fruits)


### A tiny taste of OOP (Object-Oriented Programming)

Think of **OOP** as a way to bundle **data** and the **things you can do with that data** into one unit.

#### Core ideas (mapped to our example)
- **Class** ‚Üí a **blueprint** that defines what data an object has and what it can do.  
  *Example:* `Hippo` is the blueprint.
- **Object / instance** ‚Üí a **concrete thing** made from a class.  
  *Example:* `h = Hippo("Hildegard", favourite_food="lotus leaves")`.
- **Attributes** ‚Üí the **data** an object stores.  
  *Example:* `self.name`, `self.favourite_food`.
- **Methods** ‚Üí the **behaviours** (functions) an object can perform.  
  *Example:* `introduce()`, `eat(food)`.
- **`__init__`** ‚Üí the **constructor**: runs when you create a new object, initialising its attributes.
- **`self`** ‚Üí the current object; it lets methods access the object‚Äôs own data.

#### Why OOP can be useful
- **Encapsulation:** keep related data and actions together (tidy, easier to reason about).
- **Reusability:** create many objects from one blueprint (e.g., lots of hippos with different names).
- **Extensibility:** add new methods or attributes later without breaking existing code.

#### Reading the Hippo example
- The class defines what every hippo **has** (`name`, `favourite_food`) and what every hippo **does** (`introduce`, `eat`).
- When we write `h = Hippo("Hildegard", ...)`, Python calls `__init__` to set up that object‚Äôs data.
- Calling `h.eat("cabbage")` runs the `eat` **method** *on that object‚Äôs data* (via `self`).

#### When should you use OOP here?
For most FB2NEP notebooks, **functions + data frames** are enough.  
Use a small class only when it **clarifies structure** (e.g., simulating participants, instruments, or study arms with shared behaviour).

---

<details>
<summary><strong>Optional: two more OOP ideas</strong></summary>

- **Inheritance:** make a new class that extends another (e.g., `BabyHippo(Hippo)`).
- **Polymorphism:** different classes can share a method name but behave differently (e.g., `eat` for different animals).

You won‚Äôt need these for this module, but you‚Äôll see them in larger software projects.
</details>

### Try it
- Make another object: `h2 = Hippo("Fiona")`; call `h2.introduce()`.
- Change the behaviour: inside `eat`, add a line to count how many times the hippo has eaten.
- Add an attribute: e.g., `age_years`, and show it in `introduce()`.


In [None]:
# ----------------------------------------------
# A tiny class: Hippo ‚Äî with data and behaviours
# ----------------------------------------------
class Hippo:
  # The special __init__ method sets up new hippos
  def __init__(self, name, favourite_food="water plants"):
    self.name = name                    # data (an attribute)
    self.favourite_food = favourite_food

  # A behaviour (a method): the hippo can introduce itself
  def introduce(self):
    print(f"Hello, I'm {self.name} the hippo. I like {self.favourite_food}.")

  # Another behaviour
  def eat(self, food):
    if food == self.favourite_food:
      print(f"Yum! {self.name} loves {food}. ü¶õ")
    else:
      print(f"{self.name} tries {food}‚Ä¶ not bad!")

# Make an object (an instance) from the class
h = Hippo("Hildegard", favourite_food="lotus leaves")
h.introduce()      # call a method
h.eat("lotus leaves")
h.eat("cabbage")


## 10. Further information & resources

If you‚Äôd like to explore beyond this notebook:

- üìò **Course-specific projects**  
  [Data Analysis Projects](https://ggkuhnle.github.io/data-analysis-projects/) ‚Äî examples, workflows, and ideas for project students.

- üåê **Official documentation:**  
  - [NumPy](https://numpy.org/doc/stable/) ‚Äî arrays and fast maths.  
  - [pandas](https://pandas.pydata.org/docs/) ‚Äî data tables and analysis.  
  - [matplotlib](https://matplotlib.org/stable/contents.html) ‚Äî plotting and visualisation.

- üí° **Gentle tutorials:**  
  - [W3Schools Python Tutorial](https://www.w3schools.com/python/) (step-by-step basics).  
  - [Real Python](https://realpython.com/) (clear, example-driven articles).  
  - [Google Colab Guide](https://colab.research.google.com/notebooks/intro.ipynb) (how Colab works).

*(These are optional reading ‚Äî you don‚Äôt need to learn everything at once!)*


## 11. Loading data (reference)

> **Good to know:** In FB2NEP **teaching notebooks, data are loaded automatically**. You do *not* need to do this yourself for the module activities. The examples below are here in case you want to practise with your own files.

### Option A ‚Äî Load a small CSV from the web (GitHub raw)
Use this for public CSVs hosted online (fastest way to test).

In [None]:
import pandas as pd
url = "https://raw.githubusercontent.com/ggkuhnle/fb2nep-epi/main/data/synthetic/fb2nep.csv"  # replace
df = pd.read_csv(url)
df.head()


### Option B ‚Äî Upload a file from your computer (Colab)
Uncomment and run the lines below, then pick a CSV. Colab will store it temporarily in the session.

In [None]:
# from google.colab import files
# up = files.upload()                            # choose a file
# import io
# import pandas as pd
# name = next(iter(up))                          # first uploaded filename
# df = pd.read_csv(io.BytesIO(up[name]))
# df.head()


### Option C ‚Äî Load/save via Google Drive (persistent across Colab sessions)
Mount Drive, then read/write using a path in your Drive.

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')
# import pandas as pd
# df = pd.read_csv('/content/drive/MyDrive/fb2nep/example.csv')
# df.head()

# Save back to Drive
# df.to_csv('/content/drive/MyDrive/fb2nep/output.csv', index=False)


### Quick checks (useful with any dataset)
- What are the dimensions? What columns exist? Any missing values?

In [None]:
print("Shape:", df.shape)
df.info()
df.isna().mean()  # fraction missing per column


### Optional ‚Äî Save a copy locally (Colab session storage)
Files saved here can be downloaded from the **Files** pane on the left.

In [None]:
df.to_csv('my_output.csv', index=False)
print("Saved as my_output.csv ‚Äî in Colab, open the Files pane ‚Üí three dots ‚Üí Download.")
