Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client.files.retrieve_content only returns strings, and not bytes/binary data #699

Closed
eware-godaddy opened this issue Nov 7, 2023 · 3 comments

Comments

@eware-godaddy
Copy link

eware-godaddy commented Nov 7, 2023

When I try to download/retrieve a binary file (eg. PNG image) created by an assistant, it get automatically cast to a string, so it can't be correctly parsed/displayed.

Eg:

ret_file = client.files.retrieve_content('file-XXX')
ret_file[:10]
# '�PNG\r\n\x1a\n\x00\x00'

There doesn't seem to be a clean way in the API to retrieve a file as raw bytes from what I can see.

This is important for code interpreter scenarios where the agent returns binary files that need to be rendered, like Images.

For others having this issue, you can just request the files directly using requests like:

import requests
from io import BytesIO

file_id = 'file-XXXX'
headers = {
    'Authorization': f"Bearer {os.environ['OPENAI_API_KEY']}"
}
response = requests.get(f'https://api.openai.com/v1/files/{file_id}/content', headers=headers)
Image.open(BytesIO(response.content))
@tnm
Copy link

tnm commented Nov 7, 2023

Hit this as well, and also using requests directly as a work-around for now. Related, the general OpenAI API documentation for this should probably cover the format/expectations of what's coming back in the /content request.

@RobertCraigie
Copy link
Collaborator

RobertCraigie commented Nov 7, 2023

Note that you do not have to drop down to using raw requests here, the SDK exposes methods for accessing the raw response directly using .with_raw_response properties. e.g.

response = client.files.with_raw_response.retrieve_content('file-XXX')
response  # APIResponse
response.content  # raw bytes

The reason we cast to a string is because you can upload files that are not a binary content and if you need the binary content you can use the above method.

We'll look into if there are any ways to make this easier & to document it better.

@RobertCraigie
Copy link
Collaborator

This has been fixed in the v1.2.1 release, you can now use client.files.content() instead and client.files.retrieve_content() has been deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants