# Chat Completions Three 

## Basic Connection and Packages

### Importing OpenAI and Initializing the Client

To begin, we'll import the `OpenAI` class from the `openai` library, which allows us to interact with the OpenAI API. Next, we initialize a client instance, which we'll use to send requests and receive responses from the OpenAI models.

In [None]:
"""
This script is a simple example of using the OpenAI API
It uses the OpenAI Python client library to open a connection to the OpenAI API.
This also looks for the OPENAI_API_KEY environment variable to authenticate the client.
"""
from openai import OpenAI

client = OpenAI()

## Stream Options

### Real-Time Responses Using the Stream Parameter

In this example, we introduce the `stream` parameter (`stream=True`) to enable real-time streaming of the model's output. Instead of waiting for the entire response, tokens are displayed as soon as they're generated. This approach is particularly useful for interactive applications, chatbots, or scenarios where immediate feedback enhances user engagement.

In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the stream parameter to dynamically show tokens to the user in real-time.
In this case, we want the response to start showing as soon as possible.
"""

stream = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
    ],
    response_format=None,  
    temperature=None,
    max_completion_tokens=None, 
    stop=None,
    top_p=None, 
    frequency_penalty=None,
    presence_penalty=None,
    stream=True 
    )

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

### Displaying Token Usage Statistics with `stream_options`

In this example, we introduce the `stream_options` parameter to access detailed token usage statistics during streaming responses from the OpenAI API. Setting `stream_options={"include_usage": True}` enables real-time reporting of token usage, providing insights into:

- **Prompt Tokens**: Number of tokens used in your prompt.
- **Completion Tokens**: Number of tokens generated by the model in response.
- **Total Tokens**: Combined total of prompt and completion tokens.

This approach is helpful for monitoring and managing token consumption during model interactions, especially when optimizing costs or tracking usage limits.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the stream_options parameter to show token usage statistics.
"""

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a brilliant author of children's books."},
        {"role": "user", "content": "Write one paragraph about a frog."}
    ],
    response_format=None,  
    temperature=None,
    max_completion_tokens=None, 
    stop=None,
    top_p=None, 
    frequency_penalty=None,
    presence_penalty=None,
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in stream:
    # Handle content chunks
    if hasattr(chunk, 'choices') and len(chunk.choices) > 0:
        if chunk.choices[0].delta.content is not None:
            print(chunk.choices[0].delta.content, end="")
    # Handle usage or other non-content chunks
    elif hasattr(chunk, 'usage') and chunk.usage is not None:
        print("\n\nUsage Statistics:")
        print(f"Prompt Tokens: {chunk.usage.prompt_tokens}")
        print(f"Completion Tokens: {chunk.usage.completion_tokens}")
        print(f"Total Tokens: {chunk.usage.total_tokens}")

### Inspecting Raw Stream Chunks for Debugging

In this example, we print the raw data (`chunk`) returned from the streaming response. Inspecting raw chunks can help you better understand the structure of the streamed responses from the OpenAI API. This approach is particularly valuable for debugging or exploring how streaming data is structured, especially when you’re handling different kinds of chunks—such as content chunks or chunks containing usage statistics.


In [None]:
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a brilliant author of children's books."},
        {"role": "user", "content": "Write one paragraph about a frog."}
    ],
    response_format=None,  
    temperature=None,
    max_completion_tokens=None, 
    stop=None,
    top_p=None, 
    frequency_penalty=None,
    presence_penalty=None,
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in stream:
    # Print the raw chunk data for inspection
    print("\nRaw Chunk:", chunk)
    
    # Handle content chunks
    if hasattr(chunk, 'choices') and len(chunk.choices) > 0:
        if chunk.choices[0].delta.content is not None:
            print(chunk.choices[0].delta.content, end="")
    # Handle usage or other non-content chunks
    elif hasattr(chunk, 'usage') and chunk.usage is not None:
        print("\n\nUsage Statistics:")
        print(f"Prompt Tokens: {chunk.usage.prompt_tokens}")
        print(f"Completion Tokens: {chunk.usage.completion_tokens}")
        print(f"Total Tokens: {chunk.usage.total_tokens}")

## Modalities

### Specifying Output Modality with the `modalities` Parameter

In this example, we introduce the `modalities` parameter to explicitly define the type of content the OpenAI API returns. By setting `modalities=["text"]`, we specify that the model's output should be purely text-based. This parameter is useful when interacting with multimodal models capable of producing various output types (text, images, audio, etc.), allowing precise control over the desired output format.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the modalities parameter to set the type of output we receive.
In this case, we want the response to be text-based.
"""

stream = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
    ],
    response_format=None,  
    temperature=None,
    max_completion_tokens=None, 
    stop=None,
    top_p=None, 
    frequency_penalty=None,
    presence_penalty=None,
    stream=True,
    modalities=["text"], 
    )

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

## n

### Generating Multiple Completions Using the `n` Parameter

In this example, we introduce the `n` parameter, which specifies the number of distinct completions to generate from a single prompt. By setting `n=3`, we request three separate responses from the model for the same input. This approach is valuable for generating diverse ideas, exploring multiple writing styles, or choosing from alternative outputs.

The maximum possible value for `n` is `128` completions

The script collects each of the three completions individually from the stream and prints them separately for easy comparison.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the n parameter to set the number of completions we receive.
"""

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "developer", "content": "You are a brilliant author of children's books."},
        {"role": "user", "content": "Write two paragraphs about a frog."}
    ],
    response_format=None,  
    temperature=None,
    max_completion_tokens=None, 
    stop=None,
    top_p=None, 
    frequency_penalty=None,
    presence_penalty=None,
    stream=True,
    modalities=None,
    n=3,  # Number of completions to generate 
)

# Dictionary to store the content of each completion
completions = {0: "", 1: "", 2: ""}

# Iterate over the stream chunks
for chunk in stream:
    # Loop through all choices in the current chunk
    for choice in chunk.choices:
        if choice.delta.content is not None:
            # Append the content to the corresponding completion
            completions[choice.index] += choice.delta.content

# Print all three completions
for i in range(3):
    print(f"\nCompletion {i + 1}:\n{completions[i]}\n{'-' * 50}")

### Can We Stream All Three Simultaneously?

When `stream=True` and `n>1`, the OpenAI API doesn't guarantee simultaneous, perfectly synchronized streaming of all completions. Instead:

- **Each chunk** may contain updates for one or more of the completions, depending on how the model generates them.
- **Chunks arrive sequentially**, and each chunk's choices reflect the progress of each completion at that moment.
- **Real-time parallel streaming**—such as displaying multiple completions side-by-side as they grow—is not possible, as the API streams completions as a single sequential event stream.


### Generating Multiple Completions without Streaming

In this example, we use the `n` parameter along with `stream=False` to generate multiple completions simultaneously without streaming. By setting `n=3`, the API returns three distinct completions in a single response, each accessible individually. This method simplifies gathering multiple responses, making it straightforward to review, compare, and select from multiple generated outputs.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the n parameter to set the number of completions we receive.
"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "developer", "content": "You are a brilliant author of children's books."},
        {"role": "user", "content": "Write one sentence about a frog."}
    ],
    response_format=None,  
    temperature=None,
    max_completion_tokens=None, 
    stop=None,
    top_p=None, 
    frequency_penalty=None,
    presence_penalty=None,
    stream=False,  # Disable streaming
    n=10,  # Number of completions to generate 
)

# Print all three completions
for i, choice in enumerate(response.choices):
    print(f"\nCompletion {i + 1}:\n{choice.message.content}\n{'-' * 50}")

### Adding User, Store, and Metadata Parameters for Enhanced Tracking

In this example, we use additional parameters—`user`, `store`, and `metadata`—with the OpenAI API. These parameters provide ways to track and organize your API interactions:

- **`user`**: Assigns an identifier (`"user_id_123"`) to associate completions with a specific user.
- **`store`**: Indicates (`True`) that the API should retain the interaction, allowing later retrieval or analysis.
- **`metadata`**: Adds custom structured data (such as project details, audience age, priority, and genre) to help categorize or filter interactions within your application or analytics pipeline.

**Note:** The `store` parameter must be set to `True` in order to use the `metadata` parameter.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the user, store, and metadata parameters to set values that we can use later.
"""

stream = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
    ],
    response_format=None,  
    temperature=None,
    max_completion_tokens=None, 
    stop=None,
    top_p=None, 
    frequency_penalty=None,
    presence_penalty=None,
    stream=True,
    modalities=None, 
    user="user_id_123",
    store=True,
    metadata={
        "project": "Frog Adventures Book Series",
        "audience_age": "5-8",
        "priority": "medium",
        "genre": "educational fiction"
    },
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")