# Unit Testing Basics


## Learning goals

Unit tests are a way to “lock in” the behavior of your code. After this lesson, you should be able to:

- Use `assert` as the core mechanism behind tests
- Write tests with `pytest` (recommended in most projects)
- Test both expected behavior and failure behavior (exceptions)
- Reuse test data with fixtures
- Test file output safely without touching your real filesystem
- Understand why mocking / dependency injection matters for unit tests


## What is a unit test and why do we care?

A **unit test** is a small automated check for a small unit of code (often a single function). The goal is simple:
given some input, verify the output or behavior is what we expect.

In real projects, tests matter because code changes constantly: new features, bug fixes, refactors, performance work.
Tests give you a safety net. If a change breaks something old, a test should fail quickly and tell you exactly what went wrong.

- Unit tests should be **fast**, **repeatable**, and **deterministic**.
- A unit test checks a **single behavior** (or a very small set of related behaviors).
- Good tests act as **documentation**: they show what the code is supposed to do.


## Assertions (The core idea on what testing frameworks build on)

At the heart of unit testing is the concept of an **assertion**: a statement that must be true.
In Python, `assert condition` raises an `AssertionError` if the condition is false.

Most testing frameworks (like `pytest`) are structured ways to run many `assert`s and report failures clearly.


### Example function:

In [None]:
def normalize_email(email: str) -> str:
    return email.strip().lower()

### “Test” using plain assert:

In [None]:
def test_normalize_email_basic():
    result = normalize_email("  Carlos@Example.COM ")
    assert result == "carlos@example.com"

## Arrange–Act–Assert

When tests get longer, structure matters. A common pattern is **Arrange–Act–Assert (AAA)**:

- **Arrange**: prepare inputs and context  
- **Act**: call the function  
- **Assert**: verify the result  

This pattern makes tests easier to read later, especially when a test fails.


In [None]:
def add_tax(price: float, tax_rate: float) -> float:
    return round(price * (1 + tax_rate), 2)

def test_add_tax():
    # Arrange
    price = 100.0
    tax_rate = 0.16

    # Act
    total = add_tax(price, tax_rate)

    # Assert
    assert total == 116.00


## pytest
### Usually the best starting point

In modern Python teams, `pytest` is commonly preferred because it keeps tests readable:

- You write tests as normal functions
- You use plain `assert`
- It gives good failure messages (diffs, values, traces)
- It supports fixtures for reusable setup

### Typical structure:

```
project/
  src/
    pricing.py
  tests/
    test_pricing.py
```

### How to run:

- `pytest -q`


### Example code: business rule + validation

In [None]:
def apply_discount(price: float, discount: float) -> float:
    """
    discount in range [0, 1]
    """
    if not 0 <= discount <= 1:
        raise ValueError("discount must be between 0 and 1")
    return round(price * (1 - discount), 2)


### Tests (pytest style)

> Note: In a real project, this code would live in `tests/test_pricing.py` and you would run it with `pytest`.
> Here I show the standard pytest syntax.



In [None]:
# pytest example (place each of this examples inside tests/test_pricing.py)
import pytest

def test_apply_discount_happy_path():
    assert apply_discount(100, 0.2) == 80.0

def test_apply_discount_zero_discount():
    assert apply_discount(100, 0.0) == 100.0

def test_apply_discount_invalid_discount_raises():
    with pytest.raises(ValueError):
        apply_discount(100, 1.5)


## Happy path vs edge cases

A common beginner mistake is writing only one test for a function and assuming it’s enough.
Real bugs often live in **edge cases**.

From a data engineering perspective, edge cases often include:
- empty strings
- missing values
- malformed numeric formats
- unexpected whitespace
- negative numbers or invalid ranges

Below is a small example where we decide (explicitly) what happens when `b == 0`.


In [None]:
def safe_divide(a: float, b: float) -> float:
    if b == 0:
        raise ZeroDivisionError("b must not be zero")
    return a / b

In [None]:
# pytest example tests
import pytest

def test_safe_divide_normal():
    assert safe_divide(10, 2) == 5

def test_safe_divide_raises_on_zero():
    with pytest.raises(ZeroDivisionError):
        safe_divide(10, 0)

## Fixtures

A **fixture** is reusable test setup. Instead of repeating the same data in every test,
you define it once and inject it into any test that needs it.

Tip: This is especially helpful in ETL contexts where “sample records” show up everywhere.


In [None]:
# pytest fixture example
import pytest

@pytest.fixture
def sample_orders():
    return [
        {"id": 1, "amount": 100.0},
        {"id": 2, "amount": 250.5},
        {"id": 3, "amount": 0.0},
    ]

def total_amount(orders):
    return sum(o["amount"] for o in orders)

def test_total_amount(sample_orders):
    assert total_amount(sample_orders) == 350.5


## Testing file output safely

ETL code often reads/writes files. A bad testing practice is writing to real paths like `./output/report.txt`
because tests can:

- overwrite real files
- conflict with other runs
- behave differently on different machines

Pytest provides `tmp_path`, a temporary directory unique to the test run.


In [None]:
from pathlib import Path

def write_report(path: Path, content: str) -> None:
    path.write_text(content, encoding="utf-8")

In [None]:
# pytest example using tmp_path
def test_write_report(tmp_path):
    report_path = tmp_path / "report.txt"
    write_report(report_path, "hello")

    assert report_path.exists()
    assert report_path.read_text(encoding="utf-8") == "hello"

## Mocking and dependency injection (unit tests should not call the real world)

Unit tests should be fast and stable. External systems (network calls, DB queries, cloud storage)
are slow and can fail for reasons unrelated to your code.

A clean approach is **dependency injection**: pass the dependency in. Then in tests, you pass a fake version.


In [None]:
def transform_user(user: dict) -> dict:
    return {"id": user["id"], "email": user["email"].strip().lower()}

def process_users(fetch_users_func):
    users = fetch_users_func()
    return [transform_user(u) for u in users]

In [None]:
def test_process_users_with_fake_fetch():
    def fake_fetch():
        return [{"id": 1, "email": "  A@B.COM "}]

    result = process_users(fake_fetch)
    assert result == [{"id": 1, "email": "a@b.com"}]

# Exercises

These exercises are designed to practice:
- normal expected behavior
- exception testing
- file round-trip testing


## Exercise 1: Test input cleaning + exception behavior

### Function


In [None]:
def clean_country(country: str) -> str:
    country = country.strip().title()
    if not country:
        raise ValueError("country must not be empty")
    return country


### Task

Write 3 tests:

- `"  mexico "` → `"Mexico"`
- `"Canada"` → `"Canada"`
- `""` raises `ValueError`


In [None]:
import pytest
# Have a little help, fill in the required code
def test_clean_country_normal():
    pass

def test_clean_country_already_clean():
    pass

def test_clean_country_empty_raises():
    with pytest.raises(ValueError):
        pass

## Exercise 2: Test a mini ETL parsing function

### Function


In [None]:
def parse_amount(value: str) -> float:
    value = value.replace(",", "").strip()
    return float(value)

### Task

- `"1,234.50"` → `1234.50`
- `" 0 "` → `0.0`
- `"abc"` raises `ValueError`


## Exercise 3: File round-trip test with tmp_path

### Function


In [None]:
from pathlib import Path
import json

def write_json(path: Path, data: dict) -> None: # Note it receives a Path object, not a path string
    path.write_text(json.dumps(data), encoding="utf-8")


### Task

Use `tmp_path` to verify:

- file exists  
- content loads back to the same dict  


### Solution

In [None]:
import json

# Fill in the spaces

def test_write_json_roundtrip(tmp_path):
    path = ___ / "data.json" # Needs to be a Path object! 
    payload = {"a": 1, "b": "x"}

    write_json(path, payload)

    assert ___.exists()
    loaded = json.loads(___) # Ptss: ___.read_text(encoding="utf-8")
    assert ___ # Compare what you loaded with the default payload


## Common mistakes (and how to talk about them)

A lot of testing pain comes from a few repeating mistakes:

- Writing tests that depend on the network / DB / real APIs
- Using random test data without controlling it (seed it, or use fixed fixtures) # for seeds check ´random´ module or Numpy's random.seed()
- Testing too many things in one test (hard to diagnose failures)
- Tests that rely on execution order (tests should be independent)
- Not testing failure behavior (exceptions, invalid input)


> Content created by [**Carlos Cruz-Maldonado**](https://www.linkedin.com/in/carloscruzmaldonado/).  
> I am available to answer any questions or provide further assistance.   
> Feel free to reach out to me at any time.