-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Confirm this is an issue with the Python library and not an underlying OpenAI API
- This is an issue with the Python library
Describe the bug
I noticed embeddings behaves differently when we set encoding_format='float' or not.
I ran the embeddings for chain of thoughts 100 times with and without encoding_format='float' which then give me different embedding results.
If I specify encoding_format='float', all the returned embeddings have the same vector ( list of floats).
But if I don't specify the parameter, embedding api is using base64 by default and returned me 3 different vectors although the differences among the 3 vectors are very minimal and their cosine distances are neglectable (< 1e-6).
I suspect this inconsistency is caused by base64 encoding of the embedding results which is using numpy to decode.
Reporting this issue so hope team can investigate and fix it.
context:
openai version: 1.3.4
python: 3.10.12
numpy: 1.26.1
model: text-embedding-ada-002
To Reproduce
run the following 100 times:
openai_client.embeddings.create(input='chain of thoughts', model='text-embedding-ada-002')
run the following 100 times:
openai_client.embeddings.create(input='chain of thoughts', model='text-embedding-ada-002', encoding_format='float')
Code snippets
No response
OS
macos
Python version
Python 3.10.12
Library version
1.3.4