# Image Generation

## Creating the Connection

### Importing OpenAI and Initializing the Client

To begin, we'll import the `OpenAI` class from the `openai` library, which allows us to interact with the OpenAI API. Next, we initialize a client instance, which we'll use to send requests and receive responses from the OpenAI models.

In [1]:
"""
This script is a simple example of using the OpenAI API
It uses the OpenAI Python client library to open a connection to the OpenAI API.
This also looks for the OPENAI_API_KEY environment variable to authenticate the client.
"""
from openai import OpenAI

client = OpenAI()

## Extra packages to import

In [None]:
import time
from IPython.display import display, HTML, update_display

## Web Search
Using the Responses API, you can enable web search by configuring it in the tools array in an API request to generate content. Like any other tool, the model can choose to search the web or not based on the content of the input prompt.


In [2]:

response = client.responses.create(
    model="gpt-4o",
    tools=[{"type": "web_search_preview"}],
    input="What was a positive news story from today? give me a one paragraph summary and a link to the story.",
    stream=True,
)

for event in response:  
    if event.type == "response.output_text.delta": 
        print(event.delta, end='', flush=True)  

As of April 13, 2025, a significant positive development is the agreement between Israel and Hamas on a ceasefire and hostage release deal, aiming to halt a conflict that has resulted in over 46,000 Palestinian deaths over the past 15 months. The ceasefire is set to commence today and last for an initial six weeks. While its durability remains uncertain, human rights agencies have cautiously welcomed the agreement and urged all parties to adhere to its terms. ([positive.news](https://www.positive.news/society/good-news-stories-from-week-03-of-2025/?utm_source=openai)) 

### Output and Citations

Model responses that use the web search tool will include two parts:

1. A `web_search_call` output item with the ID of the search call.
2. A `message` output item containing:
   - The text result in `message.content[0].text`
   - Annotations `message.content[0].annotations` for the cited URLs

By default, the model's response will include inline citations for URLs found in the web search results. In addition to this, the `url_citation` annotation object will contain the URL, title and location of the cited source.


In [None]:

# Define the prompt.
prompt = "Give me the first paragraph of one recent article about the US economy."

# Create the streaming response.
response = client.responses.create(
    model="gpt-4o",
    tools=[{"type": "web_search_preview"}],
    input=prompt,
    stream=True,
)

# Initialize an empty string to hold the streamed text.
story_text = ""

# Create a single HTML display cell with styles to auto-wrap text.
# 'overflow-wrap: break-word;' and 'white-space: pre-wrap;' ensure the text fits the screen.
display_handle = display(
    HTML("<div style='max-width:100%; overflow-wrap: break-word; white-space: pre-wrap;'></div>"),
    display_id=True
)

# Process and stream each incoming event.
for event in response:
    if event.type == "response.output_text.delta":
        # If event.delta is a dict, extract the text; otherwise assume it's a plain string.
        chunk = event.delta.get("text", "") if isinstance(event.delta, dict) else event.delta
        story_text += chunk

        # Build the new HTML content.
        html_content = f"<div style='max-width:100%; overflow-wrap: break-word; white-space: pre-wrap;'>{story_text}</div>"

        # Update the same display cell with the new content.
        update_display(HTML(html_content), display_id=display_handle.display_id)
        
        # (Optional) Small pause to give the cell time to update and the user to read the text as it is produced
        # time.sleep(0.1)


### Location Data

To refine search results based on geography, you can specify an approximate user location using country, city, region, and/or timezone.

- The `type` field is fixed at "approximate".
- The `city` and `region` fields are free text strings, like Minneapolis and Minnesota respectively.
- The `country` field is a two-letter ISO country code, like US.
- The `timezone` field is an IANA timezone like America/Chicago.

ISO Country Codes:
- [ISO 3166-1 alpha-2 Codes](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2)

IANA Time Zones:
- [List of TZ time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones)



In [7]:
# Define a basic prompt.
prompt = (
    "Search the web and give me the first paragraph of one recent article from my area. Make sure to include the curerrent date and time before the article."
)

# Create the streaming response.
response = client.responses.create(
    model="gpt-4o",
    tools=[{
        "type": "web_search_preview",
        "user_location": {
            "type": "approximate",
            "country": "US",
            "city": "Houston",
            "region": "Texas",
            "timezone": "America/Chicago",
        }
    }],
    input=prompt,
    stream=True,
)

story_text = ""
annotations = []  # We'll collect raw annotation objects here.

# Process each event from the response.
for event in response:
    # For text delta events (which deliver parts of the story text)
    if event.type == "response.output_text.delta":
        delta = event.delta
        # If delta is a dict, it may contain text and annotations.
        if isinstance(delta, dict):
            story_text += delta.get("text", "")
            # If the delta includes annotations, add them.
            if "annotations" in delta:
                annotations.extend(delta["annotations"])
        else:
            story_text += delta

    # Some annotation events come separately.
    elif event.type == "response.output_text.annotation.added":
        # These events include an 'annotation' attribute.
        annotations.append(event.annotation)

# Optionally, sort annotations by their starting index.
annotations.sort(key=lambda ann: ann.start_index if hasattr(ann, 'start_index') else 0)

# Print the final raw story text.
print("Story Summary:\n")
print(story_text.strip())

# Print the raw annotation objects.
print("\nRaw Annotations:")
for ann in annotations:
    # Print the entire annotation as a Python dictionary.
    print(ann)


Story Summary:

As of Sunday, April 13, 2025, at 09:19:58 AM CDT, here's the first paragraph of a recent article from your area:

"Tyler Anderson delivered a stellar performance, taking a no-hitter into the sixth inning and helping the Los Angeles Angels defeat the Houston Astros 4-1 on Saturday." ([reuters.com](https://www.reuters.com/sports/baseball/tyler-anderson-sparkles-angels-pound-two-homers-win-over-astros-2025-04-13/?utm_source=openai))

Raw Annotations:
AnnotationURLCitation(end_index=433, start_index=283, title='Tyler Anderson sparkles, Angels pound 2 HRs in win over Astros', type='url_citation', url='https://www.reuters.com/sports/baseball/tyler-anderson-sparkles-angels-pound-two-homers-win-over-astros-2025-04-13/?utm_source=openai')


In [15]:

# Define a basic prompt.
prompt = (
    "Search the web and give me the first paragraph of one recent article from my area. Make sure to include the current date and time before the article."
)

# Create the streaming response.
response = client.responses.create(
    model="gpt-4o",
    tools=[{
        "type": "web_search_preview",
        "user_location": {
            "type": "approximate",
            "country": "US",
            "city": "Houston",
            "region": "Texas",
            "timezone": "America/Chicago",
        }
    }],
    input=prompt,
    stream=True,
)

# Initialize a string to accumulate the story text and a list for annotations.
story_text = ""
annotations = []  

# Create a single HTML display cell with CSS for auto-wrapping.
display_handle = display(
    HTML("<div style='max-width:100%; overflow-wrap: break-word; white-space: pre-wrap;'></div>"),
    display_id=True
)

# Process each incoming event from the streaming response.
for event in response:
    if event.type == "response.output_text.delta":
        # If delta is a dict, extract the text and annotations (if any).
        if isinstance(event.delta, dict):
            text_chunk = event.delta.get("text", "")
            story_text += text_chunk
            if "annotations" in event.delta:
                annotations.extend(event.delta["annotations"])
        else:
            # Otherwise, assume delta is just a text string.
            story_text += event.delta

        # Build updated HTML content that auto-wraps text.
        html_content = f"""
        <div style='max-width:100%; overflow-wrap: break-word; white-space: pre-wrap;'>
            {story_text}
        </div>
        """

        # Update the same output cell.
        update_display(HTML(html_content), display_id=display_handle.display_id)
        
        # time.sleep(0.1)  # Optional brief pause for smoother updates.

    elif event.type == "response.output_text.annotation.added":
        annotations.append(event.annotation)

# Once streaming has finished, append any annotations at the bottom.
if annotations:
    annotations_html = "<hr><strong>Raw Annotations:</strong><br>" + "<br>".join(str(ann) for ann in annotations)
    final_html = f"""
    <div style='max-width:100%; overflow-wrap: break-word; white-space: pre-wrap;'>
        {story_text}
        {annotations_html}
    </div>
    """
else:
    final_html = f"""
    <div style='max-width:100%; overflow-wrap: break-word; white-space: pre-wrap;'>
        {story_text}
    </div>
    """

# Final update to the same cell with the complete output.
update_display(HTML(final_html), display_id=display_handle.display_id)


# File Searching

## Base64
Next, we'll import Python's built-in `base64` library. This module allows us to encode or decode binary data (such as images or files) into a text-based representation, which is often required when working with images in API requests or responses.


In [None]:
import base64

# Helper function to encode images in base64
# This is necessary because the OpenAI API requires images to be in base64 format
# Base64 encoding converts binary image data to a text string that can be safely transmitted
def encode_image(image_path):
    # Open the image file in binary read mode ("rb")
    with open(image_path, "rb") as image_file:
        # Read the binary data, encode it to base64, and convert to UTF-8 string
        # This format is required by the API as part of the data:image/jpeg;base64 URL format
        return base64.b64encode(image_file.read()).decode("utf-8")

## Passing a Base64 Encoded Image
In the following code cell, we'll use **"gpt-4o"** to analyze an image using a Base64 encoded image. The model will examine the picture and generate a descriptive response, which we'll then print out. This demonstrates how AI can interpret visual content alongside text-based instructions.

<img src="./artifacts/mystery_gathering.jpg" width="512" height="512">



In [None]:
# Path to our image
image_path = "./artifacts/mystery_gathering.jpg"

# Getting the Base64 string - converting the binary image to a text representation
base64_image = encode_image(image_path)


response = client.responses.create(
    # Specify the model to use - gpt-4o-mini is a more efficient version of GPT-4
    model="gpt-4o-mini",
    input=[
        {
            # Define the role of the message sender (user in this case)
            "role": "user",
            "content": [
                # The text instruction for the model - what we want it to do with the image
                { "type": "input_text", "text": "Tell me what is in this image." },
                {
                    # The image data being sent to the model via base64 encoding
                    # This format allows us to send images from local files rather than URLs
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                },
            ],
        }
    ],
)

# Print the model's text response to the console
print(response.output_text)

## Detail: Low vs High Resolution
The detail parameter tells the model what level of detail to use when processing and understanding the image (low, high, or auto to let the model decide). If you skip the parameter, the model will use auto.

### Low Detail
You can save tokens and speed up responses by using "detail": "low". This lets the model process the image with a budget of 85 tokens. The model receives a low-resolution 512px x 512px version of the image. This is fine if your use case doesn't require the model to see with high-resolution detail (for example, if you're asking about the dominant shape or color in the image).

In [None]:
# Path to our image
image_path = "./artifacts/mystery_gathering.jpg"

# Getting the Base64 string - converting the image to base64 format
base64_image = encode_image(image_path)


response = client.responses.create(
    # Using the o1 model (OpenAI's newer model with enhanced vision capabilities)
    model="gpt-4o-mini",
    input=[
        {
            # Set the role to user for this conversation
            "role": "user",
            "content": [
                # The text prompt asking for detailed image analysis
                { "type": "input_text", "text": "Tell me all the details you can see in this picture." },
                {
                    # Include the base64-encoded image data
                    # The "detail":"low" parameter tells the model to use low-resolution analysis
                    # This saves tokens (85 tokens) and is suitable for basic image understanding
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                    "detail":"low"
                },
            ],
        }
    ],
)

# Print the model's response to the console
print(response.output_text)

### High Detail
You can give the model more detail to generate its understanding by using "detail": "high". This lets the model see the low-resolution image (using 85 tokens) and then creates detailed crops using 170 tokens for each 512px x 512px tile.

In [None]:
# Path to our image
image_path = "./artifacts/mystery_gathering.jpg"

# Getting the Base64 string - converting the image to base64 format
base64_image = encode_image(image_path)


response = client.responses.create(
    # Using the o1 model (OpenAI's advanced vision-capable model)
    model="gpt-4o-mini",
    input=[
        {
            # Set the role to user for this conversation
            "role": "user",
            "content": [
                # The text prompt asking for detailed image analysis
                { "type": "input_text", "text": "Tell me all the details you can see in this picture." },
                {
                    # Include the base64-encoded image data
                    # The "detail":"high" parameter tells the model to analyze at high resolution
                    # This uses more tokens (85 tokens for low-res overview plus 170 tokens per 512px tile)
                    # High detail is useful for reading text, identifying small details, or complex images
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                    "detail":"high"
                },
            ],
        }
    ],
)

# Print the model's detailed response to the console
print(response.output_text)

## Multiple Image Inputs
The Responses API can take in and process multiple image inputs. The model processes each image and uses the information to answer questions about all images or each image independently.

In [None]:
# Path to our image
image_path = "./artifacts/mystery_gathering.jpg"

# Getting the Base64 string - converting the image to base64 format
base64_image = encode_image(image_path)


response = client.responses.create(
    # Using gpt-4o-mini for efficient multi-image processing
    model="gpt-4o-mini",
    input=[
        {
            # Set the role to user for this conversation
            "role": "user",
            "content": [
                # Ask the model to describe each image with one sentence
                { "type": "input_text", "text": "Give me one sentence describing each image you see." },
                {
                    # First image: local file converted to base64
                    # Using low detail to save tokens since we're processing multiple images
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                },
                {
                # Second image: directly from URL (Taiwan train station)
                "type": "input_image",
                "image_url": "https://upload.wikimedia.org/wikipedia/commons/5/53/202412_Taiwan_Railway_Haifeng_EMU500_Tourist_Train_at_Houlong_Station.jpg",
                },
            ],
        }
    ],
)

# Print the model's descriptions of both images
print(response.output_text)

### Maximum Number of Images

### Passing 10 Images

Conventional wisdom (and ChatGPT) says you can only pass 10 images at a time.

<table>
  <tr>
    <td><img src="./artifacts/mystery_gathering.jpg" width="200" height="200" alt="Image 1"></td>
    <td><img src="https://upload.wikimedia.org/wikipedia/commons/5/53/202412_Taiwan_Railway_Haifeng_EMU500_Tourist_Train_at_Houlong_Station.jpg" width="200" height="200" alt="Image 2"></td>
    <td><img src="https://plantperfect.com/wp-content/uploads/2021/02/plantperfect_planningyourspringgarden_header.png" width="200" height="200" alt="Image 3"></td>
    <td><img src="https://leehamnews.com/wp-content/uploads/2024/04/1379352511007-AP-Canada-Bombardier-CSeries-002-scaled.webp" width="200" height="200" alt="Image 4"></td>
    <td><img src="https://cdn.pixabay.com/photo/2021/12/12/20/00/play-6865967_640.jpg" width="200" height="200" alt="Image 5"></td>
  </tr>
  <tr>
    <td><img src="https://thumbs.dreamstime.com/b/obesicat-garden-random-image-fat-pussy-cat-dressed-as-soccer-player-dutch-national-team-exercising-spring-87947898.jpg" width="200" height="200" alt="Image 6"></td>
    <td><img src="https://hatrabbits.com/wp-content/uploads/2017/01/tafel-1.jpg" width="200" height="200" alt="Image 7"></td>
    <td><img src="https://awkwardfamilyphotos.com/wp-content/uploads/2009/05/IMG_7352-e1458253508588-835x1024.jpg" width="200" height="200" alt="Image 8"></td>
    <td><img src="https://i.redd.it/jeuusd992wd41.jpg" width="200" height="200" alt="Image 9"></td>
    <td><img src="https://c8.alamy.com/comp/2J53W86/human-brain-with-electric-plug-3d-illustration-2J53W86.jpg" width="200" height="200" alt="Image 10"></td>
  </tr>
</table>



In [None]:
# Path to our image
image_path = "./artifacts/mystery_gathering.jpg"

# Getting the Base64 string
base64_image = encode_image(image_path)


response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {
            "role": "user",
            "content": [
                { "type": "input_text", "text": "Give me one sentence describing each image you see." },
                {
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                    "detail":"low"
                },
                {
                "type": "input_image",
                "image_url": "https://upload.wikimedia.org/wikipedia/commons/5/53/202412_Taiwan_Railway_Haifeng_EMU500_Tourist_Train_at_Houlong_Station.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://plantperfect.com/wp-content/uploads/2021/02/plantperfect_planningyourspringgarden_header.png",
                },
                {
                "type": "input_image",
                "image_url": "https://leehamnews.com/wp-content/uploads/2024/04/1379352511007-AP-Canada-Bombardier-CSeries-002-scaled.webp",
                },
                {
                "type": "input_image",
                "image_url": "https://cdn.pixabay.com/photo/2021/12/12/20/00/play-6865967_640.jpg",
                },
                  {
                "type": "input_image",
                "image_url": "https://thumbs.dreamstime.com/b/obesicat-garden-random-image-fat-pussy-cat-dressed-as-soccer-player-dutch-national-team-exercising-spring-87947898.jpg",
                },
                   {
                "type": "input_image",
                "image_url": "https://hatrabbits.com/wp-content/uploads/2017/01/tafel-1.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://awkwardfamilyphotos.com/wp-content/uploads/2009/05/IMG_7352-e1458253508588-835x1024.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://i.redd.it/jeuusd992wd41.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://c8.alamy.com/comp/2J53W86/human-brain-with-electric-plug-3d-illustration-2J53W86.jpg",
                },
            ],
        }
    ],
)

print(response.output_text)

### Passing 21 Images
Now let's pass 21 images.

<table>
  <tr>
    <td><img src="./artifacts/mystery_gathering.jpg" width="200" height="200" alt="Image 1"></td>
    <td><img src="https://upload.wikimedia.org/wikipedia/commons/5/53/202412_Taiwan_Railway_Haifeng_EMU500_Tourist_Train_at_Houlong_Station.jpg" width="200" height="200" alt="Image 2"></td>
    <td><img src="https://plantperfect.com/wp-content/uploads/2021/02/plantperfect_planningyourspringgarden_header.png" width="200" height="200" alt="Image 3"></td>
    <td><img src="https://leehamnews.com/wp-content/uploads/2024/04/1379352511007-AP-Canada-Bombardier-CSeries-002-scaled.webp" width="200" height="200" alt="Image 4"></td>
    <td><img src="https://cdn.pixabay.com/photo/2021/12/12/20/00/play-6865967_640.jpg" width="200" height="200" alt="Image 5"></td>
    <td><img src="https://thumbs.dreamstime.com/b/obesicat-garden-random-image-fat-pussy-cat-dressed-as-soccer-player-dutch-national-team-exercising-spring-87947898.jpg" width="200" height="200" alt="Image 6"></td>
    <td><img src="https://hatrabbits.com/wp-content/uploads/2017/01/tafel-1.jpg" width="200" height="200" alt="Image 7"></td>
  </tr>
  <tr>
    <td><img src="https://awkwardfamilyphotos.com/wp-content/uploads/2009/05/IMG_7352-e1458253508588-835x1024.jpg" width="200" height="200" alt="Image 8"></td>
    <td><img src="https://i.redd.it/jeuusd992wd41.jpg" width="200" height="200" alt="Image 9"></td>
    <td><img src="https://c8.alamy.com/comp/2J53W86/human-brain-with-electric-plug-3d-illustration-2J53W86.jpg" width="200" height="200" alt="Image 10"></td>
    <td><img src="https://sunshinehouse.com/media/vwrd2hsm/8-random-acts-of-kindness-ideas-for-kids.jpg" width="200" height="200" alt="Image 11"></td>
    <td><img src="https://machinelearningmastery.com/wp-content/uploads/2017/01/A-Gentle-Introduction-to-the-Random-Walk-for-Times-Series-Forecasting-with-Python.jpg" width="200" height="200" alt="Image 12"></td>
    <td><img src="https://m.media-amazon.com/images/M/MV5BOWM2OWZmMDktOTMyZi00OWRiLWFkZTMtZGZlNTMyYzA0YjI1XkEyXkFqcGdeQXRyYW5zY29kZS13b3JrZmxvdw@@._V1_.jpg" width="200" height="200" alt="Image 13"></td>
    <td><img src="https://i.pinimg.com/736x/65/01/2e/65012e67a6c13dd3174d2949bbd815ed.jpg" width="200" height="200" alt="Image 14"></td>
  </tr>
  <tr>
    <td><img src="https://i.pinimg.com/736x/44/8f/57/448f57e6c69f821c1e0295478b1e5a18.jpg" width="200" height="200" alt="Image 15"></td>
    <td><img src="https://creator.nightcafe.studio/jobs/tfhWtka8Mb8qxquFAaKZ/tfhWtka8Mb8qxquFAaKZ--1--yljno.jpg" width="200" height="200" alt="Image 16"></td>
    <td><img src="https://i.redd.it/gpwe1akq6v7d1.jpeg" width="200" height="200" alt="Image 17"></td>
    <td><img src="https://www.randomlists.com/img/animals/snowy_owl.webp" width="200" height="200" alt="Image 18"></td>
    <td><img src="https://www.thewordfinder.com/random-animal-generator/images/data_mountaingoat.webp" width="200" height="200" alt="Image 19"></td>
    <td><img src="https://www.techtarget.com/rms/onlineImages/GC5A444272_ram_mobile.jpg" width="200" height="200" alt="Image 20"></td>
    <td><img src="https://thumbs.dreamstime.com/b/cute-cat-sleeping-street-car-random-58655731.jpg" width="200" height="200" alt="Image 21"></td>
  </tr>
</table>


In [None]:
# Path to our image
image_path = "./artifacts/mystery_gathering.jpg"

# Getting the Base64 string
base64_image = encode_image(image_path)


response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {
            "role": "user",
            "content": [
                { "type": "input_text", "text": "Give me one sentence describing each image you see." },
                {
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                    "detail":"low"
                },
                {
                "type": "input_image",
                "image_url": "https://upload.wikimedia.org/wikipedia/commons/5/53/202412_Taiwan_Railway_Haifeng_EMU500_Tourist_Train_at_Houlong_Station.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://plantperfect.com/wp-content/uploads/2021/02/plantperfect_planningyourspringgarden_header.png",
                },
                {
                "type": "input_image",
                "image_url": "https://leehamnews.com/wp-content/uploads/2024/04/1379352511007-AP-Canada-Bombardier-CSeries-002-scaled.webp",
                },
                {
                "type": "input_image",
                "image_url": "https://cdn.pixabay.com/photo/2021/12/12/20/00/play-6865967_640.jpg",
                },
                  {
                "type": "input_image",
                "image_url": "https://thumbs.dreamstime.com/b/obesicat-garden-random-image-fat-pussy-cat-dressed-as-soccer-player-dutch-national-team-exercising-spring-87947898.jpg",
                },
                   {
                "type": "input_image",
                "image_url": "https://hatrabbits.com/wp-content/uploads/2017/01/tafel-1.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://awkwardfamilyphotos.com/wp-content/uploads/2009/05/IMG_7352-e1458253508588-835x1024.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://i.redd.it/jeuusd992wd41.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://c8.alamy.com/comp/2J53W86/human-brain-with-electric-plug-3d-illustration-2J53W86.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://sunshinehouse.com/media/vwrd2hsm/8-random-acts-of-kindness-ideas-for-kids.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://machinelearningmastery.com/wp-content/uploads/2017/01/A-Gentle-Introduction-to-the-Random-Walk-for-Times-Series-Forecasting-with-Python.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://m.media-amazon.com/images/M/MV5BOWM2OWZmMDktOTMyZi00OWRiLWFkZTMtZGZlNTMyYzA0YjI1XkEyXkFqcGdeQXRyYW5zY29kZS13b3JrZmxvdw@@._V1_.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://i.pinimg.com/736x/65/01/2e/65012e67a6c13dd3174d2949bbd815ed.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://i.pinimg.com/736x/44/8f/57/448f57e6c69f821c1e0295478b1e5a18.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://creator.nightcafe.studio/jobs/tfhWtka8Mb8qxquFAaKZ/tfhWtka8Mb8qxquFAaKZ--1--yljno.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://i.redd.it/gpwe1akq6v7d1.jpeg",
                },
                {
                "type": "input_image",
                "image_url": "https://www.randomlists.com/img/animals/snowy_owl.webp",
                },
                {
                "type": "input_image",
                "image_url": "https://www.thewordfinder.com/random-animal-generator/images/data_mountaingoat.webp",
                },
                {
                "type": "input_image",
                "image_url": "https://www.techtarget.com/rms/onlineImages/GC5A444272_ram_mobile.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://thumbs.dreamstime.com/b/cute-cat-sleeping-street-car-random-58655731.jpg",
                },
            ],
        }
    ],
)

print(response.output_text)

So we can pass as many images as we want as long as we are willing to pay for the tokens. This isn't ChatGPT and you must get out of that mindset.

### Confusing Server Errors
Now let's pass just four images. We get an error. Why?

In [None]:
# Path to our image
image_path = "./artifacts/mystery_gathering.jpg"

# Getting the Base64 string
base64_image = encode_image(image_path)


response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {
            "role": "user",
            "content": [
                { "type": "input_text", "text": "Give me one sentence describing each image you see." },
                {
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                    "detail":"low"
                },
                {
                "type": "input_image",
                "image_url": "https://upload.wikimedia.org/wikipedia/commons/5/53/202412_Taiwan_Railway_Haifeng_EMU500_Tourist_Train_at_Houlong_Station.jpg",
                },
                {
                "type": "input_image",
                "image_url": "https://www.bmwgroup.com/en/news/general/2024/humanoid-robots/_jcr_content/newsarticle.coreimg.jpeg/1725965708987/humanoid-robots-2560x896px.jpeg",
                },
                {
                "type": "input_image",
                "image_url": "https://plantperfect.com/wp-content/uploads/2021/02/plantperfect_planningyourspringgarden_header.png",
                },
            ],
        }
    ],
)

print(response.output_text)

The clue is in the third image URL:

https://www.bmwgroup.com/en/news/general/2024/humanoid-robots/_jcr_content/newsarticle.coreimg.jpeg/1725965708987/humanoid-robots-2560x896px.jpeg

Look at the size: 2560x896

Recall that, even at high resolution, the system will only accept a maximim size of 2000x768

So, we are way off on the size and we get some generic message: "'An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists."

This will cause you to go crazy if you don't know what is going on. Be on the lookout for this 500 error as it usually means you have an image that isn't allowed.
