# Real-Time Interaction with Streaming

Hello everyone. So far, our API calls have followed a standard pattern: we send a request, wait for the *entire* response to be generated, and then receive it all at once. This is fine for quick tasks, but for longer generations, it can lead to a poor user experience with a lot of waiting.

This is where **streaming** comes in. Instead of waiting for the full response, streaming lets us receive the output **a few tokens at a time, as they're being generated.** This is how applications like ChatGPT feel so fast and responsive.

Let's see how to implement this simple but powerful feature.

## Setup

To appreciate the benefit of streaming, we need a task that isn't instantaneous. We'll ask the model to generate a detailed explanation of a programming concept. This will take a few seconds, making the difference between blocking and streaming very clear.

In [None]:
import time
import litellm
from textwrap import dedent
from dotenv import load_dotenv

load_dotenv()

MODEL_NAME = "openai/gpt-4o-mini"
MAX_TOKENS = 200

long_prompt = [
    {
        "role": "user",
        "content": dedent("""
            You are a senior Python developer tutoring a junior colleague.
            Explain the difference between concurrency and parallelism in Python.
            Keep it concise.
        """)
    }
]

## The Blocking Request

First, let's make a standard API call. When you run the cell below, pay close attention to the pause between when you execute it and when the full text appears. This waiting period is what we want to eliminate.

## The Streaming Request

Now, let's make the exact same request, but this time we'll add one crucial parameter: `stream=True`.

When we do this, the function no longer returns a complete response object. Instead, it returns a **generator**. We can then loop through this generator to get each "chunk" of the response as it's created. This allows us to print the text to the screen almost instantly.