# Tutorial: Concurrency Error Handling & Timeouts

**Category**: Concurrency  
**Difficulty**: Beginner to Intermediate  
**Time**: 20 minutes

## Overview

Learn lionherd-core's three essential concurrency utilities for handling errors and timeouts:

1. **`fail_after()`** - Hard timeout that raises `TimeoutError`
2. **`move_on_after()`** - Soft timeout with silent cancellation
3. **`bounded_map()`** - Parallel execution with concurrency limits

**What You'll Learn**:
- When to use hard vs soft timeouts
- How to handle partial failures in parallel operations
- Controlling concurrency to avoid overwhelming resources

**Prerequisites**:
```bash
pip install lionherd-core
```

In [1]:
# Import lionherd-core concurrency utilities
# Standard library

from lionherd_core.libs.concurrency import (
    bounded_map,
    fail_after,
    move_on_after,
    sleep,
)

## Section 1: `fail_after()` - Hard Timeouts

**Use Case**: Critical operations that MUST complete within a deadline or fail explicitly.

**API**:
```python
with fail_after(seconds):
    await operation()  # Raises TimeoutError if exceeds seconds
```

**When to Use**:
- ✅ API requests with SLA requirements
- ✅ Database queries that shouldn't hang
- ✅ User-facing operations (prevent indefinite waiting)
- ❌ Optional operations (use `move_on_after` instead)

In [2]:
# Example 1: Successful operation within timeout
async def quick_operation():
    """Completes in 0.5 seconds."""
    await sleep(0.5)
    return "success"


# This succeeds (0.5s < 2s timeout)
with fail_after(2.0):
    result = await quick_operation()
    print(f"✓ Completed: {result}")


# Example 2: Operation exceeds timeout
async def slow_operation():
    """Takes 3 seconds (too slow)."""
    await sleep(3.0)
    return "too late"


# This raises TimeoutError (3s > 1s timeout)
try:
    with fail_after(1.0):
        result = await slow_operation()
except TimeoutError:
    print("✗ TimeoutError: Operation took too long")

✓ Completed: success
✗ TimeoutError: Operation took too long


**Key Points**:
- **Raises exception**: `TimeoutError` is raised, operation is cancelled
- **Caller handles failure**: Use try/except to handle timeout
- **Resource cleanup**: Cancelled tasks are properly cleaned up

In [3]:
# Practical Example: HTTP-like request with timeout
async def fetch_data(url: str, timeout: float = 2.0) -> dict:
    """Fetch data with enforced timeout."""
    try:
        with fail_after(timeout):
            # Simulate network request
            await sleep(1.5)  # Realistic latency
            return {"url": url, "data": "response"}
    except TimeoutError:
        # Timeout is treated as an error
        raise RuntimeError(f"Request to {url} timed out after {timeout}s")


# Usage
result = await fetch_data("https://api.example.com/data")
print(f"Success: {result}")

# This will raise RuntimeError (wrapping TimeoutError)
try:
    await fetch_data("https://slow-api.example.com/data", timeout=1.0)
except RuntimeError as e:
    print(f"Error: {e}")

Success: {'url': 'https://api.example.com/data', 'data': 'response'}
Error: Request to https://slow-api.example.com/data timed out after 1.0s


## Section 2: `move_on_after()` - Soft Timeouts

**Use Case**: Optional operations that shouldn't block the main flow.

**API**:
```python
with move_on_after(seconds) as scope:
    await optional_operation()
    
if scope.cancel_called:
    # Operation timed out (no exception raised)
```

**When to Use**:
- ✅ Optional cache lookups
- ✅ Best-effort enrichment
- ✅ Graceful degradation (show partial data)
- ❌ Critical operations (use `fail_after` instead)

In [4]:
# Example 1: Operation completes within timeout
with move_on_after(2.0) as scope:
    await sleep(0.5)
    print("✓ Operation completed")

print(f"Timed out: {scope.cancel_called}")  # False

# Example 2: Operation times out silently
with move_on_after(1.0) as scope:
    await sleep(3.0)  # This gets cancelled
    print("This won't print")  # Never reached

print(f"Timed out: {scope.cancel_called}")  # True

✓ Operation completed
Timed out: False
Timed out: True


**Key Points**:
- **No exception**: Operation is silently cancelled
- **Check `scope.cancel_called`**: Detect if timeout occurred
- **Execution continues**: Code after `with` block runs normally
- **Common pattern**: Cache-with-fallback - try fast source first, use slow reliable source if timeout

In [5]:
# Practical Example: Optional cache with fallback
async def get_user_data(user_id: int, cache_latency: float = 0.2) -> dict:
    """Get user data with optional cache lookup.

    Pattern: Try fast cache first (with timeout), fall back to reliable database.
    This prevents slow caches from degrading user experience.

    Args:
        user_id: User identifier
        cache_latency: Simulated cache response time (default: 0.2s fast cache)
    """
    cache_data = None

    # Try cache with 0.5s timeout (optional operation)
    with move_on_after(0.5) as scope:
        await sleep(cache_latency)  # Simulate cache lookup
        cache_data = {"id": user_id, "name": "Cached User", "source": "cache"}

    # Check if timeout occurred
    if scope.cancel_called:
        print(f"⏱ Cache timeout for user {user_id}, fetching from DB")

    # Use cache result if available, otherwise fall back to database
    if cache_data:
        return cache_data
    else:
        # Fallback: reliable database query (slower but always available)
        await sleep(0.3)
        return {"id": user_id, "name": "DB User", "source": "database"}


# Scenario 1: Fast cache (0.2s < 0.5s timeout) - cache succeeds
user1 = await get_user_data(123, cache_latency=0.2)
print(f"✓ Fast cache: {user1}")

# Scenario 2: Slow cache (2.0s > 0.5s timeout) - timeout triggers, DB fallback
user2 = await get_user_data(456, cache_latency=2.0)
print(f"✓ Slow cache (timeout): {user2}")

✓ Fast cache: {'id': 123, 'name': 'Cached User', 'source': 'cache'}
⏱ Cache timeout for user 456, fetching from DB
✓ Slow cache (timeout): {'id': 456, 'name': 'DB User', 'source': 'database'}


## Section 3: `bounded_map()` - Parallel with Concurrency Limits

**Use Case**: Process many items in parallel without overwhelming resources.

**API**:
```python
results = await bounded_map(
    async_function,
    items,
    limit=10,  # Max concurrent operations
    return_exceptions=False  # Collect exceptions or raise on first
)
```

**When to Use**:
- ✅ Batch API calls (rate limiting)
- ✅ Parallel file processing
- ✅ Database bulk operations
- ❌ Unlimited concurrency OK (use `asyncio.gather`)

In [6]:
# Example 1: Basic parallel processing with concurrency limit
async def process_item(item: int) -> int:
    """Simulate processing (I/O-bound work)."""
    await sleep(0.1)
    return item * 2


items = list(range(10))

# Process with max 3 concurrent operations
results = await bounded_map(process_item, items, limit=3)
print(f"Results: {results}")
print(f"Processed {len(results)} items")

Results: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Processed 10 items


**Key Points**:
- **Concurrency control**: Max `limit` operations run simultaneously
- **Order preserved**: Results match input order
- **Automatic batching**: Processes items in batches of size `limit`

In [7]:
# Example 2: Handling partial failures with return_exceptions=True
async def unreliable_operation(item: int) -> int:
    """Sometimes fails."""
    await sleep(0.05)
    if item % 3 == 0:
        raise ValueError(f"Item {item} failed")
    return item * 2


items = list(range(10))

# Collect both successes and failures
results = await bounded_map(
    unreliable_operation,
    items,
    limit=5,
    return_exceptions=True,  # Don't halt on first error
)

# Separate successes from failures
successes = [r for r in results if not isinstance(r, Exception)]
failures = [r for r in results if isinstance(r, Exception)]

print(f"Successes: {successes}")
print(f"Failures: {len(failures)} items failed")
print(f"Success rate: {len(successes)}/{len(items)} ({len(successes) / len(items) * 100:.0f}%)")

Successes: [2, 4, 8, 10, 14, 16]
Failures: 4 items failed
Success rate: 6/10 (60%)


**`return_exceptions` Behavior**:
- **`False` (default)**: Raises on first error, cancels remaining
- **`True`**: Collects all results, exceptions included

In [8]:
# Practical Example: Rate-limited API calls
async def fetch_user(user_id: int) -> dict:
    """Simulate API call with realistic latency."""
    await sleep(0.2)  # 200ms per request
    return {"id": user_id, "name": f"User {user_id}"}


# Fetch 20 users with max 5 concurrent requests (respect rate limit)
user_ids = list(range(1, 21))

import time

start = time.time()
users = await bounded_map(fetch_user, user_ids, limit=5)
elapsed = time.time() - start

print(f"Fetched {len(users)} users in {elapsed:.2f}s")
print(f"Average: {elapsed / len(users) * 1000:.0f}ms per user")
print(f"First 3 users: {users[:3]}")

Fetched 20 users in 0.80s
Average: 40ms per user
First 3 users: [{'id': 1, 'name': 'User 1'}, {'id': 2, 'name': 'User 2'}, {'id': 3, 'name': 'User 3'}]


**Performance Insight**:
- Sequential: 20 users × 200ms = 4 seconds
- `bounded_map(limit=5)`: ~800ms (5× speedup)
- Unlimited (`asyncio.gather`): ~200ms but may overwhelm API

## Combining Techniques

Use all three together for robust parallel processing:

In [9]:
# Real-world scenario: Fetch data from multiple sources with timeouts and concurrency control
async def fetch_with_timeout(source_id: int) -> dict | None:
    """Fetch from source with hard timeout, return None on failure."""
    try:
        # Hard timeout for critical operation
        with fail_after(1.0):
            # Simulate variable latency
            await sleep(0.3 + (source_id % 3) * 0.2)
            return {"source": source_id, "data": f"data_{source_id}"}
    except TimeoutError:
        return None  # Graceful degradation


# Process 10 sources with max 3 concurrent fetches
source_ids = list(range(10))
results = await bounded_map(
    fetch_with_timeout,
    source_ids,
    limit=3,
    return_exceptions=True,  # Don't fail entire batch on error
)

# Filter out None and exceptions
successful = [r for r in results if r is not None and not isinstance(r, Exception)]
failed_count = len([r for r in results if r is None or isinstance(r, Exception)])

print(f"✓ Successful: {len(successful)}/{len(source_ids)}")
print(f"✗ Failed/Timeout: {failed_count}")
print(f"Data: {successful[:3]}...")  # Show first 3

✓ Successful: 10/10
✗ Failed/Timeout: 0
Data: [{'source': 0, 'data': 'data_0'}, {'source': 1, 'data': 'data_1'}, {'source': 2, 'data': 'data_2'}]...


## Summary

### Quick Reference

| Function | Timeout Behavior | Use Case | Exception |
|----------|-----------------|----------|------------|
| `fail_after(t)` | Raises `TimeoutError` | Critical operations | ✅ Raises |
| `move_on_after(t)` | Silent cancellation | Optional operations | ❌ Silent |
| `bounded_map(..., limit=N)` | Concurrency control | Batch processing | Configurable |

### Decision Tree

**Need timeout?**
- Operation is critical → `fail_after()` (raises error)
- Operation is optional → `move_on_after()` (graceful degradation)

**Processing multiple items?**
- Unlimited concurrency OK → `asyncio.gather()`
- Need rate limiting → `bounded_map(limit=N)`
- Partial failures acceptable → `bounded_map(..., return_exceptions=True)`

### Key Takeaways

1. **`fail_after()`**: Hard timeouts for critical paths
2. **`move_on_after()`**: Soft timeouts for graceful degradation
3. **`bounded_map()`**: Parallel execution with resource control
4. **Combine them**: Build robust error-tolerant systems

### Related Resources

- [API Reference: Concurrency](../../docs/api/libs/concurrency/)
- [Reference Notebooks: Concurrency](../references/concurrency.ipynb)