-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Confirm this is an issue with the Python library and not an underlying OpenAI API
- This is an issue with the Python library
Describe the bug
I have some feature that basically consists of making a multimodal API call to one of the chat completions models to parse Information found in Images. After empirically evaluating the responses in terms of factual accuracy and completeness, I have discovered that the performance is much poorer compared to making the same api call with the exact same hyperparamters but as an http curl request with postman. (Case in point; the content length in the screenshot attached)
P.S., I also tried making the API call but with the requests library instead of the openai client, and the responses were equally poor
To Reproduce
- Get the base64 encoding of the attached sample image.
- Populate the necessary vars in the code snippet below.
- Make the api call for an x number of times (Try 10)
- Validate completeness by comparing the outputs with the image (Based on our extensive testing, the shortcut to this is to just check whether the value of 0.781 is present for a total of 6 times in the output)

Code snippets
import requests
from utility_modules.util_funcs import encode_image
import json
# Azure OpenAI endpoint configuration
url = "https://{ENDPOINT_DNS}.openai.azure.com/openai/deployments/{model_deployment}/chat/completions"
api_version = "{{API_VERSION}}"
api_key = "{{OPENAI_API_KEY}}"
img_file_name = "{{IMG_FILE}}"
# Headers
headers = {
"Content-Type": "application/json",
"api-key": api_key
}
def populate_request_dict(body_dict,encoded_img,model_deployment="gpt-4.1"):
body_dict["model"]=body_dict["model"].format(model_deployment=model_deployment)
body_dict["messages"][0]["content"][0]["image_url"]["url"]=body_dict["messages"][0]["content"][0]["image_url"]["url"].format(encoded_image=encoded_img)
return body_dict
# Request payload
data = {
"model": "{model_deployment}",
"max_tokens": 3000,
"temperature": 0.0,
"seed": 42,
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,{encoded_image}"
}
},
{
"type": "text",
"text": """What is the information in this table?\n Return the information found in this image of a table based on the following specifications:\n - Return the table information in the form of key value pairs separated by a comma.\n - Each key corresponds to a column in the table, and each value corresponds to the cell value.\n - Make sure to include all the information in the table in your final output.\n - You must treat each row as a completely independent entity - if multiple rows have the same value in a column, you must repeat that value for each row explicitly.\n - Never combine, merge or deduplicate values across rows, even if they are identical.\n - For tables with repeated values:\n * Each row must have all its values specified explicitly\n * Do not use any shorthand notations like \"same as above\" or \"ditto\"\n * If a value appears in multiple rows, repeat it fully in each row\n * Never reference other rows or use relative references\n - You may encounter nested columns; if so, take the time to deduce the right way to format them as key value pairs.\n - Return HTML tags instead of plain text to preserve the original style of the text, whenever needed.\n\t- Merged cells: Repeat the merged cell value in ALL cells that are part of the merge area\n - Do not leave any row empty unless the original table cell is empty.\n - Each row's output must be self-contained and complete, regardless of what appears in other rows.\n - If a cell is empty in the original table, leave it empty in your output for that row.\n - Lastly, If the image that you see is not an image of a table, return an empty string."""
}
]
}
]
}
# Parameters for the API version
params = {
"api-version": api_version
}
populated_data_dict = populate_request_dict(data,encoded_img=encode_image(img_file_name))
model = "gpt-4.1"
parse_model_txt_resp = lambda resp_obj: resp_obj['choices'][0]['message']['content']
def make_multimodal_curl_call(model_deployment="gpt-4.1",data=populated_data_dict):
# Make the API request
response = requests.post(url.format(model_deployment=model_deployment), headers=headers, params=params, json=data)
# Check if the request was successful
response.raise_for_status()
# Parse the JSON response
result = response.json()
return result.
OS
Windows
Python version
3.11.4
Library version
openai v1.12.1
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working