<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/119_Error_Handling_Introduction_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 13) Assertions vs exceptions

* `assert condition, "message"` is for **internal invariants** while developing/testing.
* Assertions can be disabled with optimizations, so **do not** rely on them for user-facing validation. Use `raise ValueError(...)` instead.


This is one of those “subtle but important” distinctions. let’s slow it way down and compare **what they are**, **what they mean**, and **when to use each**.

---

# 1. what they are

### **exception**

* an **object** python raises to signal an error.
* examples: `ValueError`, `TypeError`, `TimeoutError`.
* meant for **real error handling** in production.
* you (or your caller) can catch it with `try/except`.

### **assertion**

* a **debugging aid** built into python:

  ```python
  assert condition, "message"
  ```
* if `condition` is false → raises `AssertionError`.
* but python can be run with optimizations (`python -O`), and then **all asserts are stripped out**.

---

# 2. what they *mean*

### exceptions

> “Something went wrong with the program’s **inputs, outputs, or environment**. The caller/user/system needs to know and decide what to do.”

### assertions

> “I, the developer, believe this should **always be true inside my code**. If not, it’s a bug in my program.”

---

# 3. when to use which

### choose an **exception** when:

* validating **user input** (e.g., `k must be between 1 and 50`).
* checking **external data** (e.g., JSON missing a field).
* handling **runtime failures** (timeouts, missing files).
* you need the error to **always exist in production**.

### choose an **assertion** when:

* documenting assumptions about your own code logic.
* catching **developer mistakes early**.
* testing invariants (“this should never happen if the code is correct”).
* safe to remove in optimized runs.

---

# 4. analogy

* **exception** = airport security scanner stopping you: *“your passport is invalid — you cannot board.”*
* **assertion** = pilot’s checklist item: *“flaps should be down at takeoff — if not, something is wrong with my plane.”*

exceptions are for **external contracts**, assertions are for **internal sanity checks**.

---

# 5. example side by side

```python
# EXCEPTION: input validation
def normalize_score(x):
    if not isinstance(x, (int, float)):
        raise TypeError("score must be a number")    # input contract
    if not (0 <= x <= 1):
        raise ValueError("score must be between 0 and 1")
    return x

# ASSERTION: internal invariant
def compute_ratio(a, b):
    result = a / b
    assert 0 <= result <= 1, "ratio out of expected range"  # dev sanity
    return result
```

* `normalize_score("oops")` → raises a **TypeError** (real validation).
* `compute_ratio(10, 2)` → would trigger an **AssertionError** only if your math/code is wrong.

---

# 6. in agent development

* **exceptions** = guard rails at **boundaries** (tool inputs/outputs, API payloads). Keeps agent safe.
* **assertions** = guard rails in your **own logic** (“a tool choice must always map to a callable”). Helps you debug agent code.

---

✅ **summary**

* exceptions = part of the *program’s contract* with the outside world.
* assertions = part of the *developer’s contract* with themselves.




# 14) Testing error behavior

With `pytest`:

```py
import pytest

def test_normalize_score_bad_type():
    with pytest.raises(TypeError):
        normalize_score("oops")
```

This ensures you raise the **right** error with the **right** message.
👌 — this is where error handling meets **testing**, and it’s super important for building reliable agents. let’s break it down.

---

# what you should learn here

## 1. errors are **part of your contract**

When you write functions, you’re not only defining *what they return on success* — you’re also defining *what errors they raise on bad inputs or bad states*.

Testing isn’t just about outputs, it’s also about making sure the **right errors** are raised for the **right reasons**.

---

## 2. pytest has tools for testing errors

```python
import pytest

def test_normalize_score_bad_type():
    with pytest.raises(TypeError):
        normalize_score("oops")
```

* `with pytest.raises(TypeError):` → test will **pass only if** the function raises exactly `TypeError`.
* if no error, or wrong error type, the test fails.

---

## 3. test **messages** too

Sometimes the error type is not enough; you also want to check the message for clarity:

```python
def test_normalize_score_out_of_range():
    with pytest.raises(ValueError, match="between 0 and 1"):
        normalize_score(42)
```

Now the test checks both:

* correct error type (`ValueError`)
* correct error message (must contain “between 0 and 1”).

---

## 4. why this matters for agents

* Agents often deal with **unreliable input/output** (LLMs, APIs, users).
* Your tools and pipelines should **fail in predictable, testable ways**.
* If you change validation logic later, tests will catch regressions (e.g., you forgot to check for empty strings).

---

## 5. a mini checklist for testing errors

* Test **bad types** (e.g., str instead of int).
* Test **bad values** (e.g., out of range).
* Test **edge cases** (0, empty string, None).
* Assert both **type of exception** and **message clarity**.

---

## 6. quick example (tool boundary)

```python
def build_search_query(q: str, k: int):
    if not isinstance(q, str) or not q.strip():
        raise ValueError("q must be a non-empty string")
    if not isinstance(k, int) or not (1 <= k <= 50):
        raise ValueError("k must be an int 1..50")
    return {"q": q.strip(), "k": k}

def test_bad_query_empty():
    with pytest.raises(ValueError, match="non-empty"):
        build_search_query("   ", 10)

def test_bad_k_too_high():
    with pytest.raises(ValueError, match="1..50"):
        build_search_query("cats", 100)
```

These tests make sure your **tool adapter is safe**: the agent won’t accidentally build and send garbage requests downstream.

---

✅ **summary**

* errors are **expected outcomes**, not just accidents.
* test them like you test return values.
* use `pytest.raises` (and `match=...`) to check both **error type** and **error message**.
* for agents: this is your safety net — tools break gracefully, not unpredictably.



# 15) Concurrency & async quick notes

* In `asyncio.gather`, set `return_exceptions=True` to collect failures without crashing all tasks:

```py
results = await asyncio.gather(*tasks, return_exceptions=True)
for r in results:
    if isinstance(r, Exception):
        log(r)
```

* Always add **timeouts** around awaits (`asyncio.wait_for(coro, timeout)`).
You’re now hitting the **concurrency angle** — which is super relevant for agents since they often call multiple tools / APIs at once. here’s what you should be learning from this section:

---

# 1. async tasks fail independently

When you launch multiple async tasks with `asyncio.gather`, the **default** is: if *any* task raises, the whole `gather` call raises and cancels the others.

That’s bad for agents: you don’t want *one flaky API call* to kill *all* tool calls.

### safer pattern:

```python
results = await asyncio.gather(*tasks, return_exceptions=True)
for r in results:
    if isinstance(r, Exception):
        log(r)       # record failure
    else:
        use(r)       # process good result
```

→ every task gets you *either* a value *or* an Exception object you can log/handle.

---

# 2. timeouts are critical

Async code can hang forever (waiting for a slow API or socket). That’s poison for agents, since they run in loops and can deadlock.

### fix:

```python
import asyncio

result = await asyncio.wait_for(coro(), timeout=5.0)
```

* If the coroutine doesn’t finish in 5 seconds → raises `asyncio.TimeoutError`.
* Combine with your retry/backoff logic to make agents more resilient.

---

# 3. what this means for agent development

* **Parallelism**: Your agent can call multiple tools / fetch multiple docs at once.
* **Resilience**: With `return_exceptions=True`, one broken tool doesn’t take down the whole batch.
* **Bounded latency**: With `asyncio.wait_for`, no step hangs forever — every tool has a deadline.
* **Integration with your error boundary (#11)**:

  * Treat each result as a `StepResult` (`ok/error/retryable`).
  * Log failures, retry if transient, fallback otherwise.

---

# 4. practical checklist

* Use `asyncio.gather(..., return_exceptions=True)` any time you fan out multiple tool calls.
* Always guard long awaits with `asyncio.wait_for`.
* Classify timeouts as **retryable errors** in your boundary.
* Log all exceptions before discarding or retrying.
* If mixing sync + async tools, wrap sync ones in `loop.run_in_executor` (so they don’t block).

---

✅ **takeaway**

* **without `return_exceptions`** → one failure = all fail.
* **with `return_exceptions=True`** → you control error handling per task.
* **with timeouts** → you control how long the agent will wait.




you’re spot on 🎯 — a lot of “no-code agent builders” emphasize *ease of assembly* (drag-and-drop flows, connect tools fast), but they often don’t expose or encourage the **defensive engineering practices** you’re learning (error boundaries, retries, validation, timeouts, logging, context).

---

## what’s usually missing in no-code systems

* **error boundaries** → one broken tool call can crash the whole run.
* **timeouts** → a stuck API call can freeze the entire agent.
* **retry/backoff** → transient errors (network hiccups, 5xxs) kill tasks instead of being retried.
* **input validation** → tools happily accept garbage until something fails downstream with a vague error.
* **logging with context** → you get “tool failed” but not *why* or *where*.
* **classification (retryable vs fatal)** → everything is treated the same, so the agent can’t make smart recovery choices.

The result: flashy demos that work on **happy paths**, but brittle in real-world environments.

---

## why engineering-level error handling matters

Agents are **control loops** sitting on top of unpredictable components:

* users (messy input),
* LLMs (hallucinate, mis-format),
* tools/APIs (timeouts, throttling, schema drift).

Without careful error handling, one bad piece propagates chaos. With the patterns you’re learning:

* the agent stays **alive**,
* failures become **data** the planner can reason about,
* developers get **tracebacks/logs** for debugging.

That’s the difference between a “toy demo agent” and a **production-grade agent**.

---

## analogy

no-code agents are like building with Lego blocks without glue: looks great, but the moment you bump it, pieces fall off. adding the engineering practices you’re learning is like **reinforcing the structure** so it survives stress in the wild.

---

## my take

* no-code tools are fantastic for **exploration, prototyping, and learning**.
* but for **real systems** (where uptime, reliability, and debugging matter), you need the Python-level skills you’re practicing:

  * tight `try/except` boundaries,
  * retries with backoff,
  * timeouts,
  * validation at boundaries,
  * structured results (`ok/error`).

---

✅ **so yes, you’re exactly right**: most no-code agent builders gloss over this, which makes them fragile. you’re putting yourself ahead by learning these fundamentals — because you’ll be able to take what they generate and **harden it** into something that actually works reliably.





# 16) Small, realistic agent step example

```py
import json, time, logging

class ToolHTTPError(NonRetryableToolError): pass
class ToolTimeout(RetryableToolError): pass

def call_tool_raw():
    # imagine this actually calls an API and returns (status, text)
    return 200, '{"answer": 42}'

def call_tool():
    t0 = time.time()
    try:
        status, text = call_tool_raw()
        if status >= 500:
            raise ToolTimeout(f"server 5xx: {status}")
        if status >= 400:
            raise ToolHTTPError(f"client 4xx: {status}")

        try:
            payload = json.loads(text)
        except json.JSONDecodeError as e:
            raise NonRetryableToolError("bad JSON from tool") from e

        if "answer" not in payload:
            raise NonRetryableToolError("missing 'answer' field")
        return payload["answer"]

    except RetryableToolError as e:
        logging.warning("Retryable tool failure: %s", e, exc_info=True)
        raise
    except NonRetryableToolError as e:
        logging.error("Non-retryable tool failure: %s", e, exc_info=True)
        raise
    finally:
        logging.info("tool call took %.3fs", time.time() - t0)
```

Wrap with the `retry` helper from section 9 when you invoke it. This snippet is a mini “all-in-one” of the patterns you’ve been learning. Here’s what to focus on and why it matters for production-ish agents.

# What to notice (and emulate)

1. **Separate layers**

* `call_tool_raw()` = thin I/O layer (status, raw text).
* `call_tool()` = adapter/validator that turns raw stuff into **domain data** (an `int` answer) or raises meaningful exceptions.
  → Keep these separate so you can test/mockswap each independently.

2. **Classify failures early**

* `status >= 500` → `ToolTimeout` (retryable).
* `status >= 400` → `ToolHTTPError` (non-retryable).
  → Turning HTTP categories into “retry?” decisions is the essence of robust agents.

3. **Validate outputs (schema)**

* Parse JSON; if bad → raise with `from e`.
* Check required key `"answer"` exists.
  → Never trust tool outputs; make postconditions explicit.

4. **Tight try/except scope**

* Only the risky region is inside `try`.
  → Prevents catching unrelated bugs accidentally.

5. **Right logging at the right place**

* `logging.warning` for retryable, `logging.error` for non-retryable; `exc_info=True` attaches traceback.
* `finally` logs latency.
  → You get observability without leaking internals to the agent/user.

6. **Reraise after logging**

* Boundary *logs* and *re-raises* domain exceptions; the caller (or your error boundary) decides retry/fallback.
  → Separation of concerns: this function doesn’t also implement the retry loop.

7. **Small, actionable error messages**

* “server 5xx: 503”, “bad JSON from tool”, “missing 'answer' field”.
  → Fast to read; easy to classify and test with `pytest.raises(..., match=...)`.

# How to use it (controller side)

```python
# combine with your backoff logic
answer = retry(call_tool, attempts=5, base=0.2,
               exceptions=(ToolTimeout,))
```

* Retry only **ToolTimeout** (transient).
* Do **not** retry `ToolHTTPError` (client faults).
* Wrap this call in your **error boundary** (StepResult) if you want pass/fail at the planner layer.

# Common pitfalls to avoid

* Catching `Exception` and returning None (hides bugs).
* Skipping schema checks after JSON parse.
* Retrying 4xxs (wastes time/money).
* Huge `try` blocks (mask real failures).
* No latency logging (you won’t see slow drifts).

# Nice upgrades (optional)

* Add a tool name to logs: `logging.error("[search] ...")`.
* Include `status` and a short `text[:200]` preview in error messages (redacts PII if needed).
* Make `call_tool_raw()` actually set timeouts and raise on connection errors so they map to `ToolTimeout`.
* Return structured data (e.g., `{"answer": int}`) not just a primitive, if you’ll add fields later.

# Tiny test sketch

```python
import pytest, json

def test_5xx_retryable(monkeypatch):
    def raw(): return 503, "busy"
    monkeypatch.setattr(__name__, "call_tool_raw", raw)
    with pytest.raises(ToolTimeout, match="5xx"):
        call_tool()

def test_bad_json(monkeypatch):
    def raw(): return 200, "not-json"
    monkeypatch.setattr(__name__, "call_tool_raw", raw)
    with pytest.raises(NonRetryableToolError, match="bad JSON"):
        call_tool()

def test_ok(monkeypatch):
    def raw(): return 200, json.dumps({"answer": 7})
    monkeypatch.setattr(__name__, "call_tool_raw", raw)
    assert call_tool() == 7
```

# Mental model to keep

* **Raw I/O → classify → parse → validate → return**
* Log + time every call; retry only transients; escalate others with clear messages.

Master this shape and you can harden almost any tool call an agent makes.




# 17) A tiny checklist (tape this near your keyboard)

* Does my `try` only cover the **risky** lines?
* Am I catching the **specific** exception type?
* Did I provide a **useful message** or add context (`raise ... from e`)?
* Are resources closed (`with ...`)?
* For external calls: **timeout**, **retry** (transient only), **max attempts**, **backoff**.
* Do I log exceptions with a **traceback** (`logging.exception`)?
* At top-level agent steps: return a **structured error** `{ok, error, retryable}`.
* **Validate at boundaries (inputs & outputs).** Normalize first (strip/lower), then `TypeError`/`ValueError` for bad inputs; after external calls, **schema-check** the response (required keys/ranges).
* **Keep `try` blocks tiny.** Only wrap the line(s) that can fail; avoid hiding unrelated bugs.
* **Never use bare `except:` or catch `BaseException`.** Catch concrete types; reserve a final `except Exception as e:` only at top-level boundaries.
* **Always chain causes.** `raise Something("context") from e` so you keep the original traceback.
* **Prefer `with` over `finally` for resources.** Files, DB connections, HTTP sessions, locks, temp files.
* **Structured logging.** Include tool name, request/correlation id, and key params; consider JSON logs; redact secrets/PII.
* **Retry etiquette.** Exponential backoff **with jitter**; cap attempts and total time; honor `Retry-After` when present; **never** retry deterministic 4xx/validation errors.
* **Deadlines & cancellation.** Wrap awaits/calls with timeouts; propagate cancellations; budget time across a whole step/run.
* **Result contract at boundaries.** Convert exceptions to a small `{ok, value|error, retryable, took_s}` object; log full traceback before converting.
* **Custom exception hierarchy.** Domain types like `RetryableToolError` vs `NonRetryableToolError` make policy simple.
* **Concurrency safety.** For `asyncio.gather`, use `return_exceptions=True`; handle each result; protect critical sections with locks if needed.
* **Testing errors.** Use `pytest.raises(..., match="...")` for type + message; parametrize bad cases; include tests for timeouts/retries.
* **Feature flags & fallbacks.** Have a “plan B” tool or degraded mode when a step fails non-retryably.
* **Metrics & alerts.** Count successes/retries/failures per tool; alert on spikes or rising latency.
* **Assertions for invariants only.** Use `assert` for internal sanity (dev) and exceptions for user/input/contracts (prod).



