# Byte #2: Calling Models Asynchronously (Fast Calls) ‚ö°

**‚è±Ô∏è Time to Complete: 5-10 minutes**

## AsyncIO in Python
 
Asynchronous programming lets Python juggle multiple tasks without waiting for each one to finish before starting the next. Instead of blocking on slow operations, code yields control back to an event loop, which resumes work as soon as results are ready.
 
Picture a restaurant shift with one chef:
 
- **üêå Synchronous shift:** Cook one order to completion before starting the next‚Äîlong waits when dishes take time.
- **‚ö° AsyncIO shift:** Prep several orders, sending each to simmer while you season the next‚Äîthe chef keeps moving, and diners eat sooner.
 
### Core Ideas
- **Event loop:** The scheduler that keeps track of what should run next.
- **Coroutines (`async def`):** Functions that can pause themselves with `await` and resume later.
- **Awaitables:** Objects you can `await`‚Äîcoroutines, tasks, or anything implementing `__await__`.
- **Tasks:** Wrappers created with `asyncio.create_task` so the event loop can run multiple coroutines concurrently.
- **Non-blocking I/O:** Ideal use case‚Äînetwork calls, file reads, timers‚Äîanything that spends time waiting on external work.
 
AsyncIO doesn‚Äôt give Python extra CPU cores; it simply keeps your program busy while I/O waits resolve.

## Why Use Async with OpenAI?
 
Now that you know how AsyncIO keeps Python busy while I/O waits resolve, apply the same pattern to OpenAI calls. Every request is network-bound, so overlapping them can slash total runtime when you have multiple prompts to evaluate.
 
Use async when you need to:
- Ask multiple models the same question (compare responses)
- Process many questions at once
- Call different APIs simultaneously
- Save time when making multiple API calls

### AsyncIO Building Blocks in Practice
 
1. Define coroutines with `async def`.
2. Pause inside those coroutines with `await` whenever you hit an I/O wait.
3. Group coroutines into tasks so the event loop can juggle them.
4. Kick off the event loop with `await main()` in notebooks or `asyncio.run(main())` in scripts.
 
This pattern scales from simple timers to complex services that stream data, call APIs, or orchestrate background jobs.

In [None]:
import asyncio

async def boil_pasta(name: str, seconds: int) -> str:
    """Simulate a slow kitchen task."""
    await asyncio.sleep(seconds)
    return f"{name} done in {seconds}s"

async def main():
    # Start two pots at once; the event loop keeps track of both timers.
    tasks = [asyncio.create_task(boil_pasta("Rigatoni", 2)),
             asyncio.create_task(boil_pasta("Farfalle", 3))]

    for result in await asyncio.gather(*tasks):
        print(result)

await main()

## The Performance Difference

### Side-by-Side Timing Sketch


```text
Synchronous (one at a time)
- Call Model 1 ‚Üí wait 2 seconds
- Call Model 2 ‚Üí wait 2 seconds
- Call Model 3 ‚Üí wait 2 seconds
- Total: 6 seconds

Asynchronous (all at once)
- Dispatch all 3 requests together
- Wait 2 seconds while they run in parallel
- Total: 2 seconds ‚ú®
```

## Setup: Install Async OpenAI

## Basic Async Example: Single Call

In [None]:
import asyncio
from openai import AsyncOpenAI

async def ask_ai(question):
    """Ask a question asynchronously using the Responses API."""
    client = AsyncOpenAI()

    response = await client.responses.create(
        model="gpt-5-nano",
        input=[
            {
                "role": "user",
                "content": [{"type": "input_text", "text": question}],
            }
        ],
    )

    return response.output_text

async def main():
    answer = await ask_ai("What is Python?")
    print(answer)

await main()

**Key Differences from Sync:**
1. Use `AsyncOpenAI` instead of `OpenAI`
2. Functions are defined with `async def`
3. Use `await` before async operations
4. Run the coroutine with `await main()` here (or `asyncio.run(main())` in a standalone script)

## Calling Multiple Models at Once

In [None]:
import asyncio
from openai import AsyncOpenAI
import time

async def ask_model(client, model, question):
    """Ask one model a question via the Responses API."""
    start = time.time()

    response = await client.responses.create(
        model=model,
        input=[
            {
                "role": "user",
                "content": [{"type": "input_text", "text": question}],
            }
        ],
    )

    duration = time.time() - start

    return {
        "model": model,
        "answer": response.output_text,
        "tokens": response.usage.total_tokens,
        "time": duration,
    }

async def compare_models(question):
    """Ask the same question to multiple models simultaneously."""
    client = AsyncOpenAI()

    # List of models to compare
    models = ["gpt-5-nano", "gpt-4.1"]

    # Create tasks for all models
    tasks = [ask_model(client, model, question) for model in models]

    # Run all tasks in parallel
    results = await asyncio.gather(*tasks)

    return results

async def main():
    print("Asking multiple models the same question...\n")

    results = await compare_models("Explain async programming in one sentence")

    for result in results:
        print(f"Model: {result['model']}")
        print(f"Answer: {result['answer']}")
        print(f"Tokens: {result['tokens']}")
        print(f"Time: {result['time']:.2f}s")
        print("-" * 60)

await main()

## Speed Comparison: Sync vs Async

In [None]:
from openai import OpenAI

questions = [
    "What is Python?",
    "What is JavaScript?",
]


# SYNCHRONOUS VERSION (Slow)
def sync_multiple_calls():
    """Make 5 calls one at a time using the Responses API."""
    client = OpenAI()

    start = time.time()

    results = []
    for question in questions:
        response = client.responses.create(
            model="gpt-5-nano",
            input=[
                {
                    "role": "user",
                    "content": [{"type": "input_text", "text": question}],
                }
            ],
        )
        results.append(response.output_text)

    duration = time.time() - start
    print(f"‚è±Ô∏è  Synchronous: {duration:.2f} seconds")
    return results


In [None]:

# ASYNCHRONOUS VERSION (Fast)
async def async_multiple_calls():
    """Make 5 calls all at once using the Responses API."""
    client = AsyncOpenAI()

    start = time.time()

    async def ask(question):
        response = await client.responses.create(
            model="gpt-5-nano",
            input=[
                {
                    "role": "user",
                    "content": [{"type": "input_text", "text": question}],
                }
            ],
        )
        return response.output_text

    # Create all tasks
    tasks = [ask(q) for q in questions]

    # Run all in parallel
    results = await asyncio.gather(*tasks)

    duration = time.time() - start
    print(f"‚ö° Asynchronous: {duration:.2f} seconds")
    return results

print("Making 5 API calls...\n")
sync_multiple_calls()
await async_multiple_calls()

## Practical Example: Batch Question Processor

In [None]:
async def process_questions_batch(questions, model="gpt-5-mini"):
    """Process multiple questions efficiently via the Responses API."""
    client = AsyncOpenAI()

    async def get_answer(question):
        response = await client.responses.create(
            model=model,
            input=[
                {
                    "role": "user",
                    "content": [{"type": "input_text", "text": question}],
                }
            ],
        )
        return {
            "question": question,
            "answer": response.output_text,
            "tokens": response.usage.total_tokens,
        }

    # Process all questions in parallel
    tasks = [get_answer(q) for q in questions]
    results = await asyncio.gather(*tasks)

    return results

# Example usage
async def main():
    questions = [
        "What is machine learning?",
        "What is deep learning?",
        "What is neural network?",
        "What is GPT?",
        "What is transformer architecture?",
    ]

    print(f"Processing {len(questions)} questions...\n")

    results = await process_questions_batch(questions)

    total_tokens = 0
    for i, result in enumerate(results, 1):
        print(f"{i}. Q: {result['question']}")
        print(f"   A: {result['answer'][:80]}...")
        print(f"   Tokens: {result['tokens']}")
        total_tokens += result['tokens']
        print()

    print(f"üìä Total tokens used: {total_tokens}")

await main()

## Error Handling in Async

In [None]:
import asyncio
from openai import AsyncOpenAI

async def safe_api_call(client, question, model="gpt-5-mini"):
    """API call with error handling using the Responses API."""
    try:
        response = await client.responses.create(
            model=model,
            input=[
                {
                    "role": "user",
                    "content": [{"type": "input_text", "text": question}],
                }
            ],
        )
        return {
            "success": True,
            "answer": response.output_text,
        }
    except Exception as e:
        return {
            "success": False,
            "error": str(e),
        }

async def main():
    client = AsyncOpenAI()

    questions = [
        "What is Python?",
        "What is AI?",
        "What is blockchain?",
    ]

    tasks = [safe_api_call(client, q) for q in questions]
    results = await asyncio.gather(*tasks)

    for i, result in enumerate(results):
        if result["success"]:
            print(f"‚úÖ Question {i+1}: {result['answer'][:50]}...")
        else:
            print(f"‚ùå Question {i+1}: Error - {result['error']}")

await main()

## When to Use Async vs Sync

| Scenario | Use This | Why |
|----------|----------|-----|
| Single API call | Sync | Simpler code |
| Multiple API calls | Async | Much faster |
| Real-time chatbot | Sync | Sequential conversation |
| Batch processing | Async | Process many at once |
| Comparing models | Async | Get all responses together |
| Simple script | Sync | Easier to understand |

## üéØ Your Practice Tasks

1. **Task 1:** Create an async function that asks 3 different models the same question and compares their response times.

2. **Task 2:** Process this list of questions asynchronously:
   ```python
   questions = [
       "Define recursion",
       "What is a REST API?",
       "Explain Python decorators",
       "What is async/await?"
   ]
   ```

3. **Task 3:** Time the difference - run 10 API calls synchronously, then asynchronously. Calculate the speedup.

## ‚úÖ Key Takeaways

‚úì Async = running multiple tasks in parallel

‚úì Use `AsyncOpenAI` for async calls

‚úì Use `await` for async operations

‚úì Use `asyncio.gather()` to run multiple tasks together

‚úì Async is 3-5x faster for multiple calls

‚úì Perfect for batch processing and model comparison

---

**üéâ Congratulations!** You now know how to efficiently work with OpenAI models!

## Quick Reference Cheat Sheet

In [None]:
# Sync (simple, one at a time)
client = OpenAI()
response = client.responses.create(
    model="gpt-5-nano",
    input=[
        {
            "role": "user",
            "content": [{"type": "input_text", "text": "Explain AsyncIO in one sentence."}],
        }
    ],
 )
print(response.output_text)

In [None]:
# Async (fast, many at once)
async def main():
    client = AsyncOpenAI()
    response = await client.responses.create(
        model="gpt-5-nano",
        input=[
            {
                "role": "user",
                "content": [{"type": "input_text", "text": "Explain AsyncIO in one sentence."}],
            }
        ],
    )
    print(response.output_text)

await main()