# API Design (FastAPI + Pydantic)

## Agenda (high-level)
1. Expectations + objectives
2. Recap: OOP pipeline → service layer → API boundary
3. Environment setup (venv, dependencies, running FastAPI)
4. Guided build: `POST /normalize-text`
5. Validation + API contract behavior (422)
6. Testing basics: pytest + TestClient
7. Lab: `POST /keywords-simple` + validations + tests
8. Wrap-up + next session preview

## Session expectations
- You will **run commands in a terminal** (venv, uvicorn, pytest).
- We will keep endpoints **deterministic** (no LLM yet).

## Session objectives
By the end of this session:
1. Explain the role of an API as a “data contract boundary”.
2. Create and run a basic FastAPI service locally.
3. Define request/response schemas using Pydantic.
4. Separate route handlers from service logic (OOP layering).
5. Write simple endpoint tests with pytest + TestClient.
6. Complete a lab extending endpoints and validations.

## 2) Quick recap: OOP pipeline → service layer → API boundary (10–15 min)


## 3) Environment setup

### 3.1 Create and activate a virtual environment (venv)

Run these in a terminal in your project folder:

**macOS / Linux**
```bash
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
```

**Windows (PowerShell)**
```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
```

### 3.2 Install dependencies
We’ll use only: `fastapi`, `uvicorn`, `pydantic`, `pytest`, `httpx`.

### 3.3 Run FastAPI
From the same folder where `app.py` exists:
```bash
uvicorn app:app --reload
```

**What does `--reload` do?**
- It watches your code files; when you save changes, it restarts the server automatically.
- It’s great for development, but it’s not used for production deployments.

### 3.4 `requirements.txt`
- It’s a minimal “dependency contract” for your project.
- In teams, this is how you keep environments reproducible.

### Common mistakes / troubleshooting
- **Port in use**: change port: `uvicorn app:app --reload --port 8001`
- **venv not activated**: `which python` (mac/linux) or `where python` (windows)
- **pytest not found**: run `python -m pytest -q` instead of `pytest -q`


In [None]:
# Create a minimal project on disk: app.py, requirements.txt, tests/test_api.py

from pathlib import Path

project_root = Path.cwd()
tests_dir = project_root / "tests"
tests_dir.mkdir(parents=True, exist_ok=True)

requirements_txt = """fastapi
uvicorn
pydantic
pytest
httpx
"""

(project_root / "requirements.txt").write_text(requirements_txt, encoding="utf-8")

print("Wrote requirements.txt")
print((project_root / "requirements.txt").read_text(encoding="utf-8"))


## 4) “Hello API” guided build

### Suggested project structure (what we would do outside a notebook)
- `app.py` (FastAPI application + routes)
- `schemas/` (Pydantic request/response models)
- `services/` (business logic / transformations)
- `tests/` (pytest tests)

In this notebook we will **simulate** that structure but keep everything runnable by writing a single `app.py`.

### Endpoint #1: `POST /normalize-text`
- Input schema: `NormalizeTextRequest(text: str, lowercase: bool=False)`
- Output schema: `NormalizeTextResponse(normalized_text: str, char_count: int, word_count: int)`

In [None]:
# Add project_root to app.py below
print(f"{project_root}/app.py")

In [None]:
%%writefile app.py

import re
from typing import Tuple

from fastapi import FastAPI
from pydantic import BaseModel, Field


app = FastAPI(title="Module 1 - Session 1 API", version="1.0.0")


# -----------------------------
# Schemas (Pydantic models)
# -----------------------------

class NormalizeTextRequest(BaseModel):
    text: str
    lowercase: bool = False


class NormalizeTextResponse(BaseModel):
    normalized_text: str
    char_count: int
    word_count: int


# -----------------------------
# Services (business logic)
# -----------------------------

def _collapse_whitespace(value: str) -> str:
    return re.sub(r"\s+", " ", value).strip()


def normalize_text(text: str, lowercase: bool = False) -> Tuple[str, int, int]:
    """Deterministic text normalization.

    Design notes:
    - Keep this function pure-ish (no global state, no I/O).
    - This style is easy to test and safe to reuse in pipelines.
    """
    if lowercase:
        text = text.lower()

    normalized = _collapse_whitespace(text)
    char_count = len(normalized)
    word_count = 0 if not normalized else len(normalized.split(" "))
    return normalized, char_count, word_count


# -----------------------------
# Routes (API boundary)
# -----------------------------

@app.post("/normalize-text", response_model=NormalizeTextResponse)
def post_normalize_text(payload: NormalizeTextRequest) -> NormalizeTextResponse:
    normalized, char_count, word_count = normalize_text(
        text=payload.text,
        lowercase=payload.lowercase,
    )
    return NormalizeTextResponse(
        normalized_text=normalized,
        char_count=char_count,
        word_count=word_count,
    )

In [None]:
print("app.py size:", (project_root / "app.py").stat().st_size, "bytes")

### How to run the API (terminal)

From the folder that contains `app.py`:
```bash
uvicorn app:app --reload
```

Then open:
- Swagger UI: `http://127.0.0.1:8000/docs`
- ReDoc UI: `http://127.0.0.1:8000/redoc`
- OpenAPI JSON: `http://127.0.0.1:8000/openapi.json`

**Why this matters:**
- FastAPI generates docs from your schemas. This meand our API is now self-describing! :D


In [None]:
# Quick in-notebook smoke test (no server needed): use TestClient to call the app directly.

from fastapi.testclient import TestClient
import importlib

app_module = importlib.import_module("app")
client = TestClient(app_module.app)

payload = {
    "text": "  Hello Python Academy Trainees!   ",
    "lowercase": False
}

resp = client.post("/normalize-text", json=payload)
print("Status:", resp.status_code)
print(resp.json())


## 5) API contract and validation

Now we’ll add a schema constraint:
- `text` must be at least **20 characters**.

Rememner our last sessions and validation/error handling:
- Upstream producers often send incomplete data.
- If you validate early, you avoid downstream processing of low-quality records.

FastAPI behavior:
- If the request body doesn’t match the schema, FastAPI returns **422 Unprocessable Entity**.
- This is a *contract-level* failure, not a business-rule failure.

### Sample requests/responses:

**Request (valid)**
```json
{
  "text": "Hello Python Academy Trainees!",
  "lowercase": true
}
```

**Response (200)**
```json
{
  "normalized_text": "hello python academy trainees!",
  "char_count": 30,
  "word_count": 4
}
```

**Request (invalid: too short)**
```json
{
  "text": "Too short",
  "lowercase": false
}
```

**Response (422)**
```json
{
  "detail": [
    {
      "loc": ["body", "text"],
      "msg": "String should have at least 20 characters",
      "type": "string_too_short"
    }
  ]
}
```


In [None]:
%%writefile ____
# Add the correct path for the file! 
# Update app.py to enforce: text min length = 20 # App with Validation

import re
from typing import Tuple

from fastapi import FastAPI
from pydantic import BaseModel, Field


app = FastAPI(title="Module 1 - Session 1 API", version="1.0.0")


# -----------------------------
# Schemas (Pydantic models)
# -----------------------------

class NormalizeTextRequest(BaseModel):
    # Contract-level constraint: reject short inputs early
    text: str = Field(..., min_length=20)
    lowercase: bool = False


class NormalizeTextResponse(BaseModel):
    normalized_text: str
    char_count: int
    word_count: int


# -----------------------------
# Services (business logic)
# -----------------------------

def _collapse_whitespace(value: str) -> str:
    return re.sub(r"\s+", " ", value).strip()


def normalize_text(text: str, lowercase: bool = False) -> Tuple[str, int, int]:
    """Deterministic text normalization."""
    if lowercase:
        text = text.lower()

    normalized = _collapse_whitespace(text)
    char_count = len(normalized)
    word_count = 0 if not normalized else len(normalized.split(" "))
    return normalized, char_count, word_count


# -----------------------------
# Routes (API boundary)
# -----------------------------

@app.post("/normalize-text", response_model=NormalizeTextResponse)
def post_normalize_text(payload: NormalizeTextRequest) -> NormalizeTextResponse:
    normalized, char_count, word_count = normalize_text(
        text=payload.text,
        lowercase=payload.lowercase,
    )
    return NormalizeTextResponse(
        normalized_text=normalized,
        char_count=char_count,
        word_count=word_count,
    )

In [None]:
# Reload module to reflect changes
import importlib
import app as app_module
importlib.reload(app_module)

from fastapi.testclient import TestClient
client = TestClient(app_module.app)

valid = {"text": "Hello Python Academy Trainees!", "lowercase": True}
invalid = {"text": "Too short", "lowercase": False}

r1 = client.post("/normalize-text", json=valid)
r2 = client.post("/normalize-text", json=invalid)

print("Valid status:", r1.status_code)
print("Valid body:", r1.json())
print("Invalid status:", r2.status_code)
print("Invalid body (detail keys):", list(r2.json().keys()))
print("Invalid detail sample:", r2.json().get("detail", [])[:1])

## 6) Testing basics (pytest)

Testing an API:
- You’re testing the **contract** and the **deterministic behavior**.
- If your downstream pipeline calls this service, tests prevent “silent changes”.

We’ll add 3 tests:
1. Happy path returns 200 + expected fields
2. Invalid payload returns 422
3. `lowercase=true` affects output

### How to run (terminal)
```bash
pytest -q
```
If that fails because `pytest` is not in PATH:
```bash
python -m pytest -q
```

### Minimal `tests/test_api.py` content
```python
import pytest
from fastapi.testclient import TestClient
import app

client = TestClient(app.app)

def test_normalize_text_happy_path():
    payload = {"text": "Hello Data Engineering world!", "lowercase": False}
    resp = client.post("/normalize-text", json=payload)
    assert resp.status_code == 200
    data = resp.json()
    assert "normalized_text" in data
    assert "char_count" in data
    assert "word_count" in data

def test_normalize_text_invalid_payload_returns_422():
    payload = {"text": "Too short", "lowercase": False}
    resp = client.post("/normalize-text", json=payload)
    assert resp.status_code == 422

def test_normalize_text_lowercase_changes_output():
    payload = {"text": "Hello Data Engineering world!", "lowercase": True}
    resp = client.post("/normalize-text", json=payload)
    assert resp.status_code == 200
    assert resp.json()["normalized_text"].startswith("hello")
```


In [None]:
%%writefile ____
# Write tests/test_api.py to disk

import app
from fastapi.testclient import TestClient

client = TestClient(app.app)


def test_normalize_text_happy_path():
    payload = {"text": "Hello Data Engineering world!", "lowercase": False}
    resp = client.post("/normalize-text", json=payload)
    assert resp.status_code == 200
    data = resp.json()
    assert "normalized_text" in data
    assert "char_count" in data
    assert "word_count" in data
    assert isinstance(data["char_count"], int)
    assert isinstance(data["word_count"], int)


def test_normalize_text_invalid_payload_returns_422():
    payload = {"text": "Too short", "lowercase": False}
    resp = client.post("/normalize-text", json=payload)
    assert resp.status_code == 422


def test_normalize_text_lowercase_changes_output():
    payload = {"text": "Hello Data Engineering world!", "lowercase": True}
    resp = client.post("/normalize-text", json=payload)
    assert resp.status_code == 200
    assert resp.json()["normalized_text"].startswith("hello")

In [None]:
# Run pytest from within the notebook (still mirrors the terminal command) or use !

import sys
import subprocess

result = subprocess.run(
    [sys.executable, "-m", "pytest", "-q"],
    capture_output=True,
    text=True,
)

print("pytest exit code:", result.returncode)
print("--- stdout ---")
print(result.stdout)
print("--- stderr ---")
print(result.stderr)

assert result.returncode == 0, "Pytest failed. Read stdout/stderr above."


## 7) Lab (30–40 min)

You will extend the API with a second endpoint and add business-rule validation + tests.

### Task A: Add endpoint `POST /keywords-simple`

**Input schema**
- `KeywordsRequest(text: str, top_k: int=5)`

**Output schema**
- `KeywordsResponse(keywords: list[str])`

**Baseline keyword extraction (deterministic)**
1. Lowercase
2. Remove punctuation
3. Split on whitespace
4. Count frequency
5. Return top `top_k` tokens

> Data Engineering mindset: this is a tiny “feature extraction” stage.

### Task B: Add business-rule validation
- If `top_k < 1` or `top_k > 20`, raise `HTTPException(400)` with a clear message.

**Important:** This is different from schema validation (422).
- 422 = request body does not satisfy the contract schema.
- 400 = request is well-formed but violates a business rule.

### Task C: Add tests
- Happy path: returns 200 + list of keywords size ≤ top_k
- Invalid `top_k` (0 or 21): returns 400

### Student placeholders
Below we write a **starter template** to `lab_starter.txt` so you can copy/paste into `app.py` and `tests/test_api.py`.
You do not need to run anything in this section until you implement the endpoint.

Common mistakes
- Forgetting to import `HTTPException`.
- Returning a plain dict instead of a Pydantic response model.
- Using non-deterministic ordering (fix by sorting ties consistently).


In [None]:
%%writefile lab_starter.txt

# Write lab starter guidance to disk (does not affect the running code yet)

r"""LAB STARTER (copy/paste hints)

1) In app.py:
- Add Pydantic models: KeywordsRequest, KeywordsResponse
- Add a service function: extract_keywords_simple(text: str, top_k: int) -> list[str]
- Add route: POST /keywords-simple
- Add business-rule validation for top_k using HTTPException(400)

2) In tests/test_api.py:
- Add two tests:
  a) POST /keywords-simple happy path
  b) POST /keywords-simple top_k invalid returns 400

Suggested happy-path payload:
{
  "text": "Data Engineering is engineering data. Data quality matters a lot.",
  "top_k": 3
}

Expected response shape:
{
  "keywords": ["data", "engineering", "..."]
}
"""

In [None]:
# Reload app module and run pytest again (solution should pass)

import importlib
import app as app_module
importlib.reload(app_module)

import sys
import subprocess

result = subprocess.run(
    [sys.executable, "-m", "pytest", "-q"],
    capture_output=True,
    text=True,
)

print("pytest exit code:", result.returncode)
print("--- stdout ---")
print(result.stdout)
print("--- stderr ---")
print(result.stderr)

assert result.returncode == 0, "Pytest failed. Read stdout/stderr above."


## Wrap-up

### Key takeaways
- An API is a **data contract boundary**: it stabilizes how upstream and downstream systems interact.
- Pydantic schemas give you **typed contracts**, **validation**, and **auto-generated docs**.
- Keep route handlers **thin**; move deterministic logic to **services**.
- FastAPI distinguishes **schema validation errors (422)** from **business rule errors (400)**.
- Tests protect the contract: status codes + response shape + critical behavior.

### Next session
- Error handling patterns
- Logging and debug visibility
- Configuration (env vars, settings)
- Stronger project layout for multi-module services


> Content created by [**Carlos Cruz-Maldonado**](https://www.linkedin.com/in/carloscruzmaldonado/).  
> I am available to answer any questions or provide further assistance.   
> Feel free to reach out to me at any time.