<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/117_Error_Handling_Introduction_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# 0) What “errors” are and why exceptions exist

* **Bug**: your code is wrong. (e.g., divide by zero)
* **Exceptional situation**: something outside your control. (e.g., network down, file missing)
* Python uses **exceptions** to signal both. If an exception isn’t handled, the program stops and prints a **traceback** (the error stack).

> **Goal**: let *normal logic* stay clean and route *non-normal cases* to a safe place where you can decide to fix, retry, or report.

# 1) Reading a traceback (the most important habit)

## how to read a traceback

* **start at the bottom** → that’s the **exception type + message**.
  👉 tells you *what* went wrong.

* then **move upward one frame at a time** → these are the **calls that led there**.
  👉 tells you *where in your code* it happened, and who called it.

so yes: you read it **bottom-up** to reconstruct the story of the error.

---

## example (your traceback)

bottom:

```
ZeroDivisionError: division by zero
```

→ problem type: `ZeroDivisionError`.

one frame up:

```
----> 2     return 1/0
```

→ happened inside `f`, at line 2.

one more frame up:

```
----> 4 f()
```

→ that call to `f()` in your notebook cell triggered it.

if there were 10 more frames (e.g. from libraries), you’d usually only care about the **last few that are *your* code**.

---

✅ **rule of thumb**:

* **bottom** = what happened
* **just above** = where it happened in your code
* **further up** = who called it




---

## 1. The **error type** (bottom line)

```
ZeroDivisionError: division by zero
```

👉 **focus here first**.

* `ZeroDivisionError` is the **exception type**. it tells you the “kind” of failure.
* `division by zero` is the **message**: extra info about what went wrong.

that pair (`ZeroDivisionError` + message) is always the *essential clue*.

---

## 2. The **call stack** (the lines above)

These lines show *where the error came from*, in reverse order:

* **last thing executed** → at the bottom
* **who called it** → further up

example from your traceback:

```
----> 4 f()
```

means: line 4 in your code tried to run `f()`.

```
----> 2     return 1/0
```

means: inside `f()`, line 2 tried to do `1/0`.

so the error happened inside `f`, specifically at `return 1/0`.

---

## 3. What you can **ignore (at first)**

* the “Traceback (most recent call last)” header → it’s just decoration.
* paths like `/tmp/ipython-input-428337117.py` → these are notebook-specific filenames, not important for logic.
* repeated boilerplate → jupyter/ipython often repeats cells and internal machinery; focus on your own function lines.

---

## 4. What you should **always read**

1. the **exception type & message** (bottom line).
2. the **last 1–2 frames** of your own code (not the whole giant stack if it’s library calls).

that gives you:

* **what** went wrong (`ZeroDivisionError`)
* **where** in *your* code it happened (`return 1/0` in `f`)

---

✅ **summary habit**:
when you see a traceback, jump straight to the **bottom line** → learn the error type & message. then look just 1–2 frames above to see *your code line* where it triggered. ignore system/noise frames unless you’re debugging deep library code.




In [1]:
def f():
    return 1/0

f()

ZeroDivisionError: division by zero


# 2) The basic `try / except / else / finally`
```py
try:
    x = int("42")          # risky code
except ValueError:
    x = 0                  # recovery path for a specific error
else:
    print("Converted!")    # runs only if no exception
finally:
    print("Always runs")   # cleanup (close files, release locks)
```

**Guidelines**

* Catch the **narrowest** exception you can (e.g., `ValueError`, not plain `Exception`) unless you’re at a top-level boundary.
* Put only the **risky line(s)** inside `try`, not the whole function.

---

## 1. **Exception**

* an **object** in python that represents an error or unusual condition.
* every exception is an instance of a class (e.g., `ValueError`, `TypeError`, `ZeroDivisionError`).
* python raises (throws) exceptions when something goes wrong.

example:

```python
1 / 0
```

raises an **exception**:

```
ZeroDivisionError: division by zero
```

---

## 2. **except**

* a **keyword** in python.
* part of the `try / except` statement.
* it tells python *how to catch and handle* exceptions of a certain type.

example:

```python
try:
    x = int("not a number")
except ValueError:
    print("Oops, that wasn’t a number!")
```

here:

* the `ValueError` **exception** is raised.
* the `except` block **catches** it.

---

## 3. how they connect

* **exception** = the “thing” that happened.
* **except** = the “door” that can catch it and decide what to do.

---

✅ think of it like this:

* exceptions are **thrown** by python when it sees a problem.
* `except` is you saying: “if that kind of problem is thrown, I know how to handle it.”




This is the core pattern. let’s slow down and unpack each piece of `try / except / else / finally` so you know exactly when each block runs and why it’s useful.

---

## 1. The `try` block

this is the **risky zone** — code that *might* raise an exception.

```python
try:
    x = int("42")          # risky code
```

* python runs everything in here.
* if no exception → it moves to the `else` block (if present).
* if an exception happens → it jumps immediately to the matching `except`.
* any code *after* the error inside `try` is skipped.

---

## 2. The `except` block(s)

this is the **recovery path**.

```python
except ValueError:
    x = 0
```

* only runs if the `try` raised that specific error type (`ValueError` here).
* you can have multiple excepts:

  ```python
  except ValueError:
      ...
  except TypeError:
      ...
  ```
* if the error type doesn’t match, python keeps searching higher up the call stack.

👉 rule of thumb: **catch the narrowest error you can**.

---

## 3. The `else` block

this is the **“all good” branch**.

```python
else:
    print("Converted!")
```

* only runs **if the try finished with no exceptions**.
* great for code that you *don’t want in try*, but should only happen if try succeeded.

example:

```python
try:
    data = json.loads(raw)
except json.JSONDecodeError:
    log("bad json")
else:
    process(data)   # only if json parsing worked
```

---

## 4. The `finally` block

this is the **always-do cleanup**.

```python
finally:
    print("Always runs")
```

* runs no matter what:

  * success
  * handled error
  * unhandled error
* use it for cleanup tasks: closing files, releasing resources, stopping timers.

example:

```python
f = open("data.txt")
try:
    content = f.read()
except OSError:
    print("could not read file")
finally:
    f.close()  # guarantee resource is released
```

---

## 5. Execution flow summary

1. try runs.
2. if exception → jump to matching except, skip else.
3. if no exception → skip excepts, run else.
4. finally runs **always** (even if there was an error).

---

✅ **mnemonic**

* `try` → attempt risky stuff.
* `except` → handle problems.
* `else` → success-only code.
* `finally` → always-cleanup code.

---

👌 You don’t *always* need all four.

think of `try/except/else/finally` as a **toolbox**. you grab only the tools you need.

---

## the minimal case (most common)

```python
try:
    risky()
except ValueError:
    recover()
```

* this is **enough** most of the time.
* you only reach for the other two (`else`, `finally`) when they add clarity.

---

## when to add `else`

use `else` for code that should run **only if try succeeded**, but that you don’t want to live inside the `try` block (to avoid catching unintended errors).

example:

```python
try:
    data = json.loads(raw)
except json.JSONDecodeError:
    print("bad json")
else:
    process(data)   # safe: only runs if parsing succeeded
```

👉 best for “happy path continuation.”

---

## when to add `finally`

use `finally` when you must **clean up resources** no matter what (success, error, even crash).

example:

```python
f = open("data.txt")
try:
    content = f.read()
except OSError:
    print("problem reading file")
finally:
    f.close()   # always close file
```

👉 best for closing files, releasing locks, stopping timers, disconnecting.

---

## do you *ever* use all 4 together?

rarely. it’s valid and sometimes helpful for clarity, but most real-world code uses **2 or 3 pieces**.

---

✅ **rule of thumb**

* always start with **try + except**.
* add **else** if you have follow-up “happy path” logic.
* add **finally** if you must clean up regardless of outcome.

---

This is the part that makes error handling feel more like **design** than syntax.

you start with `try + except` (the minimum), and then you **ask yourself some guiding questions**.

---

## 1. do i have “happy path” code i don’t want inside the try? → add `else`

rule of thumb: keep the `try` block **as small as possible**.
if you’d otherwise dump a lot of normal logic inside it, use `else`.

example without `else` (messy):

```python
try:
    data = json.loads(raw)
    process(data)        # ← this also lives inside try now
except json.JSONDecodeError:
    print("bad json")
```

better with `else`:

```python
try:
    data = json.loads(raw)
except json.JSONDecodeError:
    print("bad json")
else:
    process(data)        # ← clearly “happy path only”
```

👉 if you want to separate **risk zone** from **normal flow**, reach for `else`.

---

## 2. do i need to clean something up no matter what? → add `finally`

examples:

* close a file or database connection
* release a lock in multithreading
* stop a timer
* free GPU memory

```python
f = open("data.txt")
try:
    contents = f.read()
except OSError:
    print("could not read")
finally:
    f.close()   # must always happen
```

👉 if your code *uses a resource* that needs tidy-up, add `finally` (or a context manager `with`).

---

## 3. am i dealing with external systems (files, APIs, models)?

* these often require both `else` (to keep “happy path” separate) **and** `finally` (to release resources).

---

## 4. when to stick to just try + except

* if the risky code is just **one or two lines**
* and there’s **no cleanup** needed
* and you can continue right there

example:

```python
try:
    x = int(user_input)
except ValueError:
    x = 0
```

simple, clean, enough.

---

✅ **decision flow**

1. start with `try + except`.
2. do you need follow-up logic only if success? → add `else`.
3. do you need cleanup always? → add `finally`.




# 3) Common built-in exceptions you’ll actually use

* `ValueError`: bad value (right type, wrong content).
* `TypeError`: wrong type of argument.
* `KeyError`, `IndexError`: lookup failures.
* `FileNotFoundError`, `PermissionError`: file system issues.
* `TimeoutError` or library-specific timeouts (e.g., `requests.Timeout`).
* `RuntimeError`: generic “this state shouldn’t happen.”
* `AssertionError`: for internal invariants in dev/test (not user errors).

---
Here’s a gentle, practical map of exceptions: what kinds exist, who defines them, and how to choose the right one.

# Big picture

* **Exceptions are classes** (objects). They live in a **hierarchy**.
* You can use **built-in** ones (most of the time), **library-specific** ones (e.g., `requests.Timeout`), or **custom** ones you define for your domain.
* Rule of thumb: **prefer a built-in** if it already expresses the situation. Use **custom** only when you need domain meaning (e.g., `ToolError`, `RateLimitExceeded`).

---

# The core built-ins you’ll actually use (with when to choose them)

### Value vs Type

* **`ValueError`** — the type is correct, but the **content** is invalid.
  “Right box, wrong stuff.”

  ```py
  def set_k(k: int):
      if not (1 <= k <= 100): raise ValueError("k must be 1..100")
  ```
* **`TypeError`** — the **type** is wrong.
  “Wrong box entirely.”

  ```py
  if not isinstance(user_id, int): raise TypeError("user_id must be int")
  ```

### Lookup failures

* **`KeyError`** — missing dict key.
* **`IndexError`** — out-of-range list/tuple index.

### Files & OS

* **`FileNotFoundError`**, **`PermissionError`**, **`IsADirectoryError`** — specific OS issues.
  (All inherit from `OSError`, which you can catch if you truly mean “any filesystem error.”)

### Time & external calls

* **`TimeoutError`** — generic timeout; prefer **library-specific** timeouts when available:

  * `requests.Timeout`, `httpx.ReadTimeout`, `asyncio.TimeoutError`, etc.
    Catch the library’s exception so your handler is precise.

### Program state / contracts

* **`RuntimeError`** — a reasonable generic when nothing else fits (“this state shouldn’t happen”).
* **`NotImplementedError`** — abstract method or feature stub not implemented *yet*.
* **`AssertionError`** — internal invariants during dev/tests (don’t use for user input validation).

### Parsing / text

* **`UnicodeDecodeError`**, **`json.JSONDecodeError`** — use the specific parser error when available.

### Arithmetic

* **`ZeroDivisionError`**, **`OverflowError`** — math blew up.

> Choosing heuristic:
>
> 1. Is there a **specific** built-in? Use it.
> 2. Is there a **library** exception? Prefer that.
> 3. Otherwise, `ValueError`/`TypeError`/`RuntimeError` are safe defaults (in that order).

---

# A tiny “which one?” decision cheat-sheet

* Invalid *content* (range/format): **ValueError**
* Wrong *type* (expected int, got str): **TypeError**
* Missing dict key: **KeyError**
* Bad list index: **IndexError**
* File doesn’t exist / no permission: **FileNotFoundError / PermissionError**
* Timed out external call: **library timeout** (fallback: **TimeoutError**)
* Feature stub / abstract method: **NotImplementedError**
* Parser failed (JSON/YAML/etc): **parser’s exception** (e.g., **JSONDecodeError**)
* Unexpected internal state with no better fit: **RuntimeError**

---

# Custom exceptions (when domain meaning matters)

Define a small hierarchy so you can catch by category:

```py
class AgentError(Exception): pass
class ToolError(AgentError): pass
class RetryableToolError(ToolError): pass
class NonRetryableToolError(ToolError): pass
```

Use them at **boundaries** (tool calls, API adapters) to translate messy library errors into meaningful agent semantics:

```py
try:
    resp = httpx.get(url, timeout=5)
except httpx.TimeoutException as e:
    raise RetryableToolError("tool timed out") from e
except httpx.HTTPStatusError as e:
    if 400 <= e.response.status_code < 500:
        raise NonRetryableToolError(f"client error {e.response.status_code}") from e
    else:
        raise RetryableToolError(f"server error {e.response.status_code}") from e
```

Now upstream logic can simply say: “retry if `RetryableToolError`, otherwise summarize the failure.”

---

# Catching: how specific?

* **Narrow first**: catch `json.JSONDecodeError`, not plain `Exception`, when you know what you’re handling.
* **Broaden at top-level boundaries** (e.g., one agent step) to avoid crashing:

  ```py
  try:
      run_step()
  except RetryableToolError as e:
      schedule_retry(e)
  except Exception as e:
      log_unexpected(e)   # last-resort guardrail
  ```
* **Avoid catching `BaseException`** — it includes `KeyboardInterrupt`, `SystemExit`. If you write `except Exception:`, that’s broad enough for app code.

---

# Messaging & chaining (make errors useful)

Add context and keep the original traceback:

```py
try:
    payload = json.loads(text)
except json.JSONDecodeError as e:
    raise ValueError("LLM tool returned invalid JSON") from e
```

This shows *both* your message and the original cause.

---

# Anti-patterns to avoid

* Swallowing errors silently:

  ```py
  except Exception:
      pass    # ❌ you just hid a bug
  ```
* Over-broad catching far from the source (makes debugging hard).
* Massive `try:` blocks that catch unrelated errors — keep `try` **tight**.
* Using `AssertionError` for user input validation (use `ValueError`/`TypeError`).

---

# Quick practice prompts (for your notebook)

1. Write `parse_score(s)`:

   * Accepts str; returns float in `[0,1]`.
   * `TypeError` if not str; `ValueError` if parse fails or out of range.

2. Build an HTTP wrapper using `httpx` (or `requests`):

   * Map timeouts to `RetryableToolError`.
   * Map 4xx to `NonRetryableToolError`, 5xx to `RetryableToolError`.
   * Use `raise ... from e` to chain.

3. Dictionary access helper:

   * `get_required(d, key)` → returns value or raises `KeyError` with a helpful message that includes available keys.




# 4) Raising your own exceptions (to fail early & clearly)

```py
def normalize_score(x):
    if not isinstance(x, (int, float)):
        raise TypeError("score must be a number")
    if not (0 <= x <= 1):
        raise ValueError("score must be between 0 and 1")
    return x
```

* Prefer **standard types** first (`ValueError`, `TypeError`) so others instantly know what went wrong.

### Chaining for context

```py
try:
    data = json.loads(bad_text)
except json.JSONDecodeError as e:
    raise ValueError("Invalid JSON from LLM tool output") from e
```

`from e` keeps the original traceback linked.

---

🙌 — you can build **small, reusable error-checking functions** that sit inside (or around) other functions, to *validate*, *guard*, or *normalize* data.

the pattern you wrote (`normalize_score`) is a **classic defensive programming helper**:

```python
def normalize_score(x):
    if not isinstance(x, (int, float)):
        raise TypeError("score must be a number")
    if not (0 <= x <= 1):
        raise ValueError("score must be between 0 and 1")
    return x
```

---

## how this helps in practice

### 1. validation helpers

Instead of repeating checks everywhere, you centralize them:

```python
def validate_id(user_id):
    if not isinstance(user_id, int):
        raise TypeError("user_id must be int")
    if user_id <= 0:
        raise ValueError("user_id must be positive")
    return user_id
```

Now every function that deals with `user_id` can just call `validate_id(user_id)` at the start.

---

### 2. decorators for error handling

You can even wrap functions with a *reusable handler*:

```python
def swallow_errors(default=None, exceptions=(Exception,)):
    def decorator(fn):
        def wrapper(*args, **kwargs):
            try:
                return fn(*args, **kwargs)
            except exceptions as e:
                print(f"Handled error: {e}")
                return default
        return wrapper
    return decorator

@swallow_errors(default="fallback")
def risky():
    return int("oops")

print(risky())   # → "fallback"
```

---

### 3. custom exception hierarchies

For AI agents, this is super useful. You can write a **translator** that maps low-level library errors into your own “agent language”:

```python
class AgentError(Exception): pass
class RetryableToolError(AgentError): pass
class NonRetryableToolError(AgentError): pass

def safe_parse_json(text):
    import json
    try:
        return json.loads(text)
    except json.JSONDecodeError as e:
        raise NonRetryableToolError("bad JSON from LLM tool") from e
```

Now the rest of your system only has to think about `RetryableToolError` vs `NonRetryableToolError`, not every possible library exception.

---

✅ **big idea**:

* validation helpers → fail fast, clear messages.
* wrappers/decorators → reusable handling.
* custom exceptions → make errors meaningful at the right abstraction level.



# 5) Custom exception classes (great for agents)

```py
class AgentError(Exception): pass
class ToolError(AgentError): pass
class RetryableToolError(ToolError): pass
class NonRetryableToolError(ToolError): pass
```

Now you can catch meaningful categories:

```py
try:
    call_tool()
except RetryableToolError:
    schedule_retry()
except NonRetryableToolError as e:
    report_to_user(str(e))
```


# 6) Resource safety: `finally` vs `with`

These two snippets are solving the **same problem** (making sure the file always closes), but they do it in different styles. let’s unpack the difference.

---

## 1. `try / finally` version

```python
f = open("data.txt")
try:
    contents = f.read()
finally:
    f.close()
```

* **you** are responsible for opening, handling, and closing.
* `finally` guarantees the `f.close()` happens whether `read()` succeeds or fails.
* explicit, but a little noisy — you always have to remember to write that `finally`.

---

## 2. `with` (context manager) version

```python
with open("data.txt") as f:
    contents = f.read()
```

* `with` is just a **shorthand** for the pattern above.
* it calls `f.__enter__()` when entering the block, and `f.__exit__()` when leaving (which does the `close()` for you).
* **cleaner, less error-prone** — no chance of forgetting the `finally`.
* preferred for *anything that acquires and releases a resource*.

---

## 3. Where else `with` is used

many libraries give you context managers:

* **HTTP sessions** (e.g., `with requests.Session() as s:`) → session closes at the end.
* **database connections / cursors** (`with sqlite3.connect("db.sqlite") as conn:`).
* **temporary files**, **locks**, **thread pools**, **async context managers** (`async with`).

---

## 4. When you might still use `finally`

* when you’re working with an object that doesn’t provide a context manager (`with` won’t work unless the class implements `__enter__`/`__exit__`).
* when you need more **complex cleanup** than just “close this one thing.”

  ```python
  acquire_resource()
  try:
      do_work()
  finally:
      cleanup_part_A()
      cleanup_part_B()
  ```

---

✅ **rule of thumb**:

* if the library/object supports `with` → **use it** (cleaner, safer).
* if it doesn’t → fall back to `try / finally`.



# 7) Logging errors (don’t just print)

```py
import logging
logging.basicConfig(level=logging.INFO)

try:
    risky()
except Exception:
    logging.exception("risky() failed")  # includes traceback
```

Use `logging.exception` inside an `except` block to capture stack traces.

---




##**Traceback vs Logging**
Both are related but serve different purposes:

---

## 1. A traceback

* what python automatically prints when an exception is **unhandled**.
* shows the call stack (frames, lines of code) → ends with `ExceptionType: message`.
* output goes to the console (stderr).
* once it prints, your program usually **stops** (unless you’re in a notebook/REPL, where it just halts that cell).

example:

```
Traceback (most recent call last):
  File "script.py", line 5, in <module>
    risky()
  File "script.py", line 2, in risky
    return 1/0
ZeroDivisionError: division by zero
```

👉 a traceback is *automatic debugging info*, but it’s ephemeral — gone once the program ends.

---

## 2. Logging an error

* **you** decide what to record, how much detail, and *where* it goes (console, file, JSON log aggregator, cloud service).
* `logging.exception(...)` is special:

  * you call it inside an `except` block.
  * it logs your message **plus the full traceback** of the exception that was caught.

example:

```py
import logging
logging.basicConfig(level=logging.INFO, filename="app.log")

try:
    1/0
except Exception:
    logging.exception("Computation failed")
```

contents of `app.log`:

```
ERROR:root:Computation failed
Traceback (most recent call last):
  File "script.py", line 5, in <module>
    1/0
ZeroDivisionError: division by zero
```

👉 logging gives you a **persistent record** (with context + traceback) without killing the program.

---

## 3. Key differences

| **Traceback**                                            | **Logging error**                              |
| -------------------------------------------------------- | ---------------------------------------------- |
| automatic, printed only if you don’t catch the exception | deliberate, you control when & what to log     |
| goes to console/stderr only                              | can go to file, syslog, JSON, cloud logs, etc. |
| program halts (unless caught)                            | program can continue after logging             |
| bare stack trace                                         | you add your own message/context               |

---

## 4. Why it matters for AI agents

* you don’t want an agent **crashing silently** on some bad tool call.
* the agent can convert the error into a `Result` (pass/fail) for its reasoning loop.
* meanwhile, **logging.exception** captures the full traceback so *you* (the developer) can debug what really went wrong.

---

✅ **rule of thumb**

* use exceptions for *control flow of failures* inside the program.
* use logging to *record those failures* for later debugging/monitoring.

---

Logging is basically your **memory + notebook of what happened while the program was running**.

here’s why it’s so powerful for debugging and for agent development:

---

## why logging > just seeing a traceback

* **traceback alone** = tells you *where* and *what* crashed, but only at that moment. if you weren’t watching, it’s gone.
* **logging.exception** = lets you:

  1. **label the error** with your own context
     (e.g., “failed during embedding call” vs. “failed during db save”).
  2. **preserve the traceback** for later inspection.
  3. **choose the destination**: console, file, monitoring system, cloud log aggregator.

---

## simple example

```py
import logging
logging.basicConfig(level=logging.INFO, filename="agent.log")

def risky():
    return 1/0

try:
    risky()
except Exception:
    logging.exception("step: math tool call failed")
```

content of `agent.log`:

```
ERROR:root:step: math tool call failed
Traceback (most recent call last):
  File "agent.py", line 7, in <module>
    risky()
  File "agent.py", line 4, in risky
    return 1/0
ZeroDivisionError: division by zero
```

* the **agent logic** just knows “step failed.”
* but *you* get a file that says: *which step, why, and exactly where in code it happened*.

---

## why this matters for AI agents

* agents can stay **mentally simple** (pass/fail → retry or not).
* developers can still **debug deeply** using logs.
* when things fail in production, logs = your **black box recorder**.

---

✅ **core insight**:

* **exceptions** = program’s way of signaling an error *now*.
* **logging** = developer’s way of keeping a durable, searchable *history* of those errors (plus context).

