Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading JSON to Files API returns invalid file format #727

Closed
VashishtMadhavan opened this issue Nov 8, 2023 · 15 comments
Closed

Uploading JSON to Files API returns invalid file format #727

VashishtMadhavan opened this issue Nov 8, 2023 · 15 comments
Assignees
Labels
API-feedback bug Something isn't working

Comments

@VashishtMadhavan
Copy link

VashishtMadhavan commented Nov 8, 2023

Upload to the files endpoint with a JSON file throws an error

Code:

from openai import OpenAI

client = OpenAI()
file = client.files.create(
    file=open("example_1.json", "rb"),
    # Can either be fine-tuned or assistant
    purpose="assistants",
)

Stacktrace:

ile [~/anaconda3/lib/python3.10/site-packages/openai/resources/files.py:88](https://file+.vscode-resource.vscode-cdn.net/Users/vashishtmadhavan/Documents/playground/~/anaconda3/lib/python3.10/site-packages/openai/resources/files.py:88), in Files.create(self, file, purpose, extra_headers, extra_query, extra_body, timeout)
     82 if files:
     83     # It should be noted that the actual Content-Type header that will be
     84     # sent to the server will contain a `boundary` parameter, e.g.
     85     # multipart/form-data; boundary=---abc--
     86     extra_headers = {"Content-Type": "multipart/form-data", **(extra_headers or {})}
---> 88 return self._post(
     89     "[/files](https://file+.vscode-resource.vscode-cdn.net/files)",
     90     body=maybe_transform(body, file_create_params.FileCreateParams),
     91     files=files,
     92     options=make_request_options(
     93         extra_headers=extra_headers, extra_query=extra_query, extra_body=extra_body, timeout=timeout
     94     ),
     95     cast_to=FileObject,
     96 )

File [~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:1055](https://file+.vscode-resource.vscode-cdn.net/Users/vashishtmadhavan/Documents/playground/~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:1055), in SyncAPIClient.post(self, path, cast_to, body, options, files, stream, stream_cls)
   1041 def post(
   1042     self,
   1043     path: str,
   (...)
   1050     stream_cls: type[_StreamT] | None = None,
   1051 ) -> ResponseT | _StreamT:
   1052     opts = FinalRequestOptions.construct(
   1053         method="post", url=path, json_data=body, files=to_httpx_files(files), **options
   1054     )
-> 1055     return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))

File [~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:834](https://file+.vscode-resource.vscode-cdn.net/Users/vashishtmadhavan/Documents/playground/~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:834), in SyncAPIClient.request(self, cast_to, options, remaining_retries, stream, stream_cls)
    825 def request(
    826     self,
    827     cast_to: Type[ResponseT],
   (...)
    832     stream_cls: type[_StreamT] | None = None,
    833 ) -> ResponseT | _StreamT:
--> 834     return self._request(
    835         cast_to=cast_to,
    836         options=options,
    837         stream=stream,
    838         stream_cls=stream_cls,
    839         remaining_retries=remaining_retries,
    840     )

File [~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:877](https://file+.vscode-resource.vscode-cdn.net/Users/vashishtmadhavan/Documents/playground/~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:877), in SyncAPIClient._request(self, cast_to, options, remaining_retries, stream, stream_cls)
    874     # If the response is streamed then we need to explicitly read the response
    875     # to completion before attempting to access the response text.
    876     err.response.read()
--> 877     raise self._make_status_error_from_response(err.response) from None
    878 except httpx.TimeoutException as err:
    879     if retries > 0:

BadRequestError: Error code: 400 - {'error': {'message': "Invalid file format. Supported formats: ['c', 'cpp', 'csv', 'docx', 'html', 'java', 'json', 'md', 'pdf', 'php', 'pptx', 'py', 'rb', 'tex', 'txt', 'css', 'jpeg', 'jpg', 'js', 'gif', 'png', 'tar', 'ts', 'xlsx', 'xml', 'zip']", 'type': 'invalid_request_error', 'param': None, 'code': None}}

Here is the example file: example_1.json

@athyuttamre
Copy link
Collaborator

Thanks for reporting! We can repro this bug and are working on fixing it.

@RobertCraigie RobertCraigie added bug Something isn't working API-feedback labels Nov 8, 2023
@thiswillbeyourgithub
Copy link
Contributor

Hi, I think this bug is the same as #333 and found a way to fix it.

In this line if instead of file=buffer_reader I put file=open(args.file, "rb") the function returns normally instead of 400 error.

The only difference I see in BufferReader in the v1.1.1 and v0.28.1 is the omission of a check but I doubt this is the culprit.

Unfortunately there is currently an API Outage so I can't investigate that much more at the moment. At least my workaround fixes it for the whisper case.

@RobertCraigie
Copy link
Collaborator

@thiswillbeyourgithub you're looking at the CLI code, not the client code itself.

This issue also appears to be an API issue as the same error is reported in the Node SDK.

Please open a separate issue for the CLI error you're seeing!

@thiswillbeyourgithub
Copy link
Contributor

I'm sorry @RobertCraigie but I don't understand the problem. My issue is indeed appearing when I use the CLI like so python -m openai api audio.transcriptions.create -f audio.mp3 and the workaround I suggest solves it apparently.

I don't really get the meaning of "the CLI code, not the client code itself" as cli is part of the client and both are in this repo.

In any case my issue is originaly with whisper and might not be related to this one after all so I will stop talking here. I messaged the thread #333 with my workaround anyway.

Cheers

@TomasVotruba
Copy link

TomasVotruba commented Nov 9, 2023

It might be related to bug in open ai API reported today - https://community.openai.com/t/possible-bug-with-agent-creation-php-file-upload/484490/5

I have same issue with PHP SDK, uplading php and json files fails, the txt and html succeeds.

@rattrayalex
Copy link
Collaborator

This is a backend bug, not an SDK bug, so I'm going to go ahead and close this issue.

@albertaleksieiev
Copy link

The same issue, just try to send incorrect JSON file, and it will work ;)

@tnwill
Copy link

tnwill commented Nov 20, 2023

Thanks for reporting! We can repro this bug and are working on fixing it.

Hi @athyuttamre, is there an issue # to track this bug?

@NosfeKgb
Copy link

Yes, I need to keep track of this bug, I am unable to work with some files

@kennymatic
Copy link

kennymatic commented Dec 5, 2023

The same issue, just try to send incorrect JSON file, and it will work ;)

I was really hoping you were kidding about this. I've been banging my head over this for the past 5 hours. I added garbage to the beginning of my file and now it works. 🤦🏻‍♂️

I don't even understand how this can be possible on their back end.

@kennymatic
Copy link

OK I think I finally found a pattern, at least for our case. Valid JSON files upload fine so long as they are over 1025 bytes in size. If they are 1025 bytes or under, they will also work if you make them invalid by making them invalid as @albertaleksieiev suggested.

@RonABarrett
Copy link

RonABarrett commented Dec 17, 2023

@kennymatic , I believe as you do that in this case size matters. @albertaleksieiev also seems to be correct (i.e. send an invalid JSON formatted file and it will work.)

I found to be routinely successful uploading a 10kb json file.
In a small file (i.e. < 1kb) I replaced the open and close square brackets with curly brackets, then saved. File uploaded successfully.

It's one of those, "Are you kidding me?" bugs.

BTW, ChatGPT can't help solve this bug. I asked for help repeatedly even after providing documentation from:
https://github.com/openai/openai-python
https://github.com/openai/openai-python/tree/main#file-uploads

@RonABarrett
Copy link

RonABarrett commented Dec 17, 2023

BTW, a stupid but successful work-around is to create an initial entry in the JSON file with enough content that will make the file size exceed the 1kb that @kennymatic mentioned.
I put in the following:
[
{
"id": "msg_0001",
"role": "user",
"content": "\n This messgage is to create and initialize the JSON file with enough file size that OpenAI will upload it.\n A JSON (JavaScript Object Notation) file is a lightweight data interchange format that is easy for humans to read and write,\n and easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999.\n JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages,\n including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.\n The structure of a JSON file is simple yet flexible. It represents data in a text format consisting of key-value pairs,\n making it analogous to a dictionary in Python or an object in JavaScript.\n These key-value pairs are enclosed in curly braces, with the key being a string and the value being a valid JSON data type such as\n a string, number, array, or even another JSON object.\n This hierarchical structure allows for the representation of complex data in an organized and hierarchical manner,\n which is particularly useful in web applications for data exchange between a client and a server, as well as in many other\n programming contexts where data needs to be stored or transmitted in a structured format.\n "
},
{
"id": "msg_0002",
"role": "user",
"content": "\n We need to have a meeting of everyone within the company. Our strategic goal is to raise revenue by 10%.\n Your mission is to discuss amongst yourself to provide a consensus suggestion of the top five initiatives the company should take.\n When the mission has been completed provide the five initiatives and your reasonsing for each.\n "
}
]

@kennymatic
Copy link

BTW, a stupid but successful work-around is to create an initial entry in the JSON file with enough content that will make the file size exceed the 1kb that @kennymatic mentioned. I put in the following: [ { "id": "msg_0001", "role": "user", "content": "\n This messgage is to create and initialize the JSON file with enough file size that OpenAI will upload it.\n A JSON (JavaScript Object Notation) file is a lightweight data interchange format that is easy for humans to read and write,\n and easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999.\n JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages,\n including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.\n The structure of a JSON file is simple yet flexible. It represents data in a text format consisting of key-value pairs,\n making it analogous to a dictionary in Python or an object in JavaScript.\n These key-value pairs are enclosed in curly braces, with the key being a string and the value being a valid JSON data type such as\n a string, number, array, or even another JSON object.\n This hierarchical structure allows for the representation of complex data in an organized and hierarchical manner,\n which is particularly useful in web applications for data exchange between a client and a server, as well as in many other\n programming contexts where data needs to be stored or transmitted in a structured format.\n " }, { "id": "msg_0002", "role": "user", "content": "\n We need to have a meeting of everyone within the company. Our strategic goal is to raise revenue by 10%.\n Your mission is to discuss amongst yourself to provide a consensus suggestion of the top five initiatives the company should take.\n When the mission has been completed provide the five initiatives and your reasonsing for each.\n " } ]

This is basically what we did except we got the difference in length to get up to the min characters. Then we added a field to the JSON and filled it with spaces to get us up to the min limit. 😬

@RonABarrett
Copy link

RonABarrett commented Dec 19, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-feedback bug Something isn't working
Projects
None yet
Development

No branches or pull requests