# Python Review:
---
This notebook provides a concise review of Python concepts essential for the course. 

You are supposed to know hot to program in python. 

## Using Environments with `uv`

Virtual environments are crucial for managing project dependencies. This solves problems like
- Using specific package versions for specific applications
- Using package versions different from what the os system provides
- Virtualize and test in reproducible environments
- ...

There are several tools to do this: pyenv, conde, uv , peotry, ... . Here, we will use `uv`.

[`uv`](https://docs.astral.sh/uv/) is a modern, fast package and environment manager.

**Comparison: `uv` vs. `conda`**

| Feature | `uv` | `conda` |
|---|---|---|
| **Speed** | Extremely fast (written in Rust) | Slower | 
| **Environment Location** | Local to the project (`.venv` folder) | Centralized (in the `conda` installation) |
| **Package Types** | Python packages (from PyPI and, recentrly, from [pyx](https://astral.sh/pyx) ) | Python and non-Python packages |
| **Use Case** | Web development, scripting, API backends, Scientific computing | Scientific computing, complex dependencies |

To create a virtual env, use 
```bash
uv venv NOMBRE --python=PYTHONVERSION
```
Here, `NOMBRE` is the name and path of the virtual environment, and `PYTHONVERSION` the python version you want. 
Check all the available options with
```bash
uv venv --help
```

To activate the environment (in your terminal)
```bash
# On macOS/Linux
# source .venv/bin/activate

# On Windows
# .venv\Scripts\activate
```
Once the venv is activated, the `PATH` is modified, and you will be using the commands inside the venv. To deactivate it, just execute `deactivate`.

To install packages
```bash
uv pip install numpy pandas
```

To run a script without activating the environment
```python 
# Create a file named 'my_script.py' with:
# import numpy as np
# print(np.zeros(5))
```

then run it as 
```bash
uv run my_script.py
```

You can also add dependencies inside the script, using PEP 723

In [None]:
%%writefile example.py
# /// script
# requires-python = ">=3.12"
# dependencies = [
#   "requests<3",
#   "beautifulsoup4",
# ]
# ///

import requests
from bs4 import BeautifulSoup

def fetch_and_parse(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    return soup.title.string

if __name__ == "__main__":
    url = "https://example.com"
    title = fetch_and_parse(url)
    print(f"The title of {url} is: {title}")

Now run as 
```bash
uv run example.py
```
and it will download and install everything (even python) in a temporal venv and run the script.

### Exercise

1. Create a new virtual environment using `uv`.
2. Install the `scikit-learn` library.
3. Create a `requirements.txt` file from the installed packages (`uv pip freeze > requirements.txt`).

## Simple Variables

Variables are used to store data.

In [None]:
name = "Alice"
age = 30
is_scientist = True
pi = 3.14

### Exercise

1. Fint the limits on integeres and floats in python. Does python follows IEEE754?
2. Are there operator precendece rules in python?

## Printing with f-strings

f-strings provide a concise and convenient way to embed expressions inside string literals for formatting.

In [None]:
name = "Einstein"
year = 1905
print(f"{name}'s miracle year was {year}.")

pi = 3.14159
print(f"Pi rounded to two decimal places: {pi:.2f}")

#### Exercise

1. Use an f-string to print the values of the variables you created in the 'Simple Variables' exercise.

## Functions with Type Declarations, Docstrings, and Decorators

Functions are reusable blocks of code. Type hints and docstrings improve code clarity.

In [None]:
def greet(name: str) -> str:
    """Greets the given name."""
    return f"Hello, {name}!"

print(greet("World"))

The type hint allow linters to check your code for correcteness, something that you hould always do and maybe automate after each git commit. Some famous linters are
- [pylint](https://github.com/pylint-dev/pylint)
- [flake8](https://flake8.pycqa.org/en/latest/)
- [ruff](https://docs.astral.sh/ruff/)
- [mypy](https://mypy-lang.org/)

Sometimes, adding a [`decorator`](https://realpython.com/primer-on-python-decorators/) to a function helps to get more functionality, info, or whatever you want, from a function. Basically they allow you to extend a function functionality:

In [None]:
def log_activity(func):
    def wrapper(*args, **kwargs):
        print(f"Calling function '{func.__name__}'...")
        return func(*args, **kwargs)
    return wrapper

@log_activity
def greet(name: str) -> str:
    """Greets the given name."""
    return f"Hello, {name}!"

print(greet("World"))

What are `*args` and `**kwargs`?  In Python, `*args` and `**kwargs` are special syntax that let you pass a **variable number of arguments** to a function.

- `*args`:  For **positional arguments** (non-keyword)
    - `args` is a tuple of all extra **positional** arguments.
    - Use `*` before the parameter name.

- `**kwargs`: For **keyword arguments** (named arguments)
    - `kwargs` is a dictionary of all extra **named** arguments.
    - Use `**` before the parameter name.

This is a simple example:

In [None]:
def my_function(*args, **kwargs):
    print("Positional args (args):", args)
    print("Keyword args (kwargs):", kwargs)

# Call it with different numbers of arguments
print("First call:")
my_function(1, 2, 3, name="Alice", age=25, city="Boston")
print("\nSecond call:")
my_function(4, age=52, city="Tabogo")


Imagine you're analyzing gene expression data and want to write a flexible function that:
- Accepts any number of gene names (positional),
- And allows optional parameters like `normalization`, `plot=True`, etc.

```python
def analyze_genes(*genes, normalization="zscore", plot=True, threshold=0.5):
    print(f"Analyzing genes: {genes}")
    print(f"Normalization method: {normalization}")
    if plot:
        print("Plotting results...")
    if threshold > 0.1:
        print(f"Threshold set to {threshold}, may filter noisy data.")
    else:
        print("Low threshold — keep more signals.")

# Use it in different ways:
analyze_genes("TP53", "BRCA1", "MYC", plot=False, threshold=0.3)
```

### Exercise

1. Write a function that takes two numbers and returns their sum. Include type hints and a docstring.
2. Create a decorator that times how long a function takes to execute.

## Lists, Sets, and Dictionaries

In [None]:
# Lists (ordered, mutable)
elements = ["H", "He", "Li"]
elements.append("Be")
print(f"First element: {elements[0]}")

In [None]:
# Sets (unordered, UNIQUE elements)
primes = {2, 3, 5, 7, 7}
print(f"Primes: {primes}")

In [None]:
# Dictionaries (key-value pairs)
element_masses = {"H": 1.008, "He": 4.0026}
print(f"Mass of Helium: {element_masses['He']}")



A  comparison

| Feature | **List** | **Set** | **Dictionary** |
|--------|----------|--------|----------------|
| **Purpose** | Ordered collection of items (e.g., time series, gene order) | Unordered collection of **unique** items (e.g., unique genes, species) | Key-value pairs (e.g., gene → expression level) |
| **Syntax** | `[]` | `{}` | `{key: value}` |
| **Order** | ✅ Ordered (index matters) | ❌ Unordered | ✅ Ordered (in Python 3.7+) |
| **Duplicates Allowed?** | ✅ Yes | ❌ No (automatically removes duplicates) | ✅ Keys must be unique; values can repeat |
| **Access by Index?** | ✅ Yes (`my_list[0]`) | ❌ No (no indexing) | ❌ No (use keys) |
| **Access by Key?** | ❌ No | ❌ No | ✅ Yes (`my_dict['key']`) |
| **Mutability** | ✅ Mutable (can change) | ✅ Mutable | ✅ Mutable |
| **Use Case in Science** | Time series, ordered data (e.g., measurements over time) | Unique elements (e.g., all unique proteins in a sample) | Gene → expression level, sample → metadata |
| **Example** | `genes = ["TP53", "BRCA1", "MYC"]` | `unique_genes = {"TP53", "BRCA1", "MYC"}` | `expression = {"TP53": 2.1, "BRCA1": 0.8}` |
| **Fast Lookup?** | ❌ Slow (linear search) | ✅ Very fast (hash-based) | ✅ Very fast (by key) |
| **Can contain different types?** | ✅ Yes (e.g., numbers, strings, even other lists) | ✅ Yes (but only immutable types like strings, numbers) | ✅ Yes (keys must be immutable) |

:::{warning}
Never use a `list` for data to be transversed fast, use `numpy` arrays or similar. Never use a `numpy` array when you need to insert data in the middle, frequently.
:::

### Exercises
#### Tracking Particle Trajectories
You're simulating the motion of 3 particles (e.g., electrons in a magnetic field) over time. You have:

- Time points (list)
- Positions at each time (list of tuples)
- Unique particle IDs (set)
- A dictionary mapping particle ID → trajectory (list of positions)
- Goal: Find all particles that passed near a specific point (e.g., (0,0)).





In [None]:
# Given:
time_points = [0, 1, 2, 3, 4]  # seconds
particles = {"P1", "P2", "P3"}  # unique IDs
trajectories = {
    "P1": [(1.0, 2.0), (0.5, 0.3), (0.1, -0.1), (0.0, 0.01), (-0.2, -0.3)],
    "P2": [(2.0, 1.0), (1.8, 0.9), (1.5, 0.6), (1.0, 0.2), (0.5, -0.1)],
    "P3": [(-1.0, -1.0), (-0.5, -0.5), (0.0, 0.0), (0.2, 0.1), (0.3, 0.4)]
}

# Task: Find which particles came within 0.2 units of (0,0)

# Your code here:
# YOUR CODE HERE




#### Analyzing Chemical Reactions
You're studying a reaction network. You have:

A list of molecules involved
A set of unique reaction types
A dictionary: reaction_type → list of molecules
Goal: Identify all molecules that appear in more than one reaction type.



In [None]:
# Given:
molecules = ["H2", "O2", "H2O", "CO2", "CH4", "O2", "CO2"]  # list (duplicates allowed)
reaction_types = {"combustion", "oxidation", "hydrolysis"}  # set
reactions = {
    "combustion": ["CH4", "O2", "CO2", "H2O"],
    "oxidation": ["Fe", "O2", "Fe2O3"],
    "hydrolysis": ["NaCl", "H2O", "NaOH", "HCl"]
}

# Task: Find molecules that appear in ≥2 reaction types

# Your code here:
# YOUR CODE HERE




## Objects and Basic OOP

Object-Oriented Programming (OOP) helps in structuring programs.

In [None]:
class Scientist:
    def __init__(self, name: str, field: str):
        self.name = name
        self.field = field
        self.another_field_ = 3.14
        print(f"Scientist {self.name} in {self.field} has been created.")

    def introduce(self):
        return f"I am {self.name}, and I work in {self.field}."

    def __del__(self):
        print(f"Scientist {self.name} in {self.field} is being destroyed.")


# Create an instance
marie = Scientist("Marie Curie", "Physics and Chemistry")
print(marie.introduce())

# Delete the object explicitly
del marie


Notes on `__del__`:
- The `__del__` method is called **just before** the object is deleted from memory.
- It's useful for cleanup tasks like:
  - Closing files
  - Releasing external resources
  - Logging destruction (as in this example)
- **Don’t rely on `__del__` for critical cleanup** — use `with` blocks or context managers for reliable resource handling.
- It may **not** be called immediately if references to the object still exist (due to garbage collection).

#### Exercise

1. Create a `Model` class with attributes like `name` and `accuracy`.
2. Add a method to the `Model` class that prints the model's information.

## Using Libraries: NumPy and Scikit-learn

These are fundamental libraries for scientific computing and machine learning.

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

# NumPy
X = np.array([[1], [2], [3], [4]])
y = np.array([2, 4, 5, 4])

# Scikit-learn
model = LinearRegression()
model.fit(X, y)
print(f"Coefficient: {model.coef_}")
print(f"Intercept: {model.intercept_}")

ALWAYS read the libraries docs: https://scikit-learn.org/stable/
### Exercise

1. Create a NumPy array with 10 random numbers. COmpute the mean. Write it into a file.
2. Use scikit-learn to train a simple classification model on a sample dataset.

## Plotting

Visualization is key in data science. Here are some popular libraries:

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from bokeh.plotting import figure, show

# Matplotlib
plt.scatter(X, y)
plt.title("Matplotlib Scatter Plot")
plt.show()

# Seaborn
sns.set_context('talk')
sns.scatterplot(x=X.flatten(), y=y)
plt.title("Seaborn Scatter Plot")
plt.show()

# Plotly
fig = px.scatter(x=X.flatten(), y=y, title="Plotly Scatter Plot")
fig.show()

# Bokeh
p = figure(title="Bokeh Scatter Plot")
p.circle(X.flatten(), y, size=10)
show(p)

#### Exercise

1. Create a line plot of a sine wave using Matplotlib.
2. Create a histogram of a random dataset using Seaborn.

### Best Practices

- **Maintainable Scripts:** Use functions, add comments, and follow a consistent style (e.g., PEP 8). Use linters and testing. Use logs. 
- **Notebooks:** Use markdown cells for explanations, keep code cells short, and restart the kernel and run all cells before saving.
- **Testing:** Use the `assert` statement for simple tests.

In [None]:
def add(a, b):
    return a + b

assert add(2, 3) == 5
assert add(-1, 1) == 0

#### Exercise

1. Write a simple test for the function you created in the 'Functions' exercise.