Skip to content

embedding behavior inconsistency with different parameters. #1064

@gepolvtest

Description

@gepolvtest

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

I noticed embeddings behaves differently when we set encoding_format='float' or not.
I ran the embeddings for chain of thoughts 100 times with and without encoding_format='float' which then give me different embedding results.

If I specify encoding_format='float', all the returned embeddings have the same vector ( list of floats).
But if I don't specify the parameter, embedding api is using base64 by default and returned me 3 different vectors although the differences among the 3 vectors are very minimal and their cosine distances are neglectable (< 1e-6).

I suspect this inconsistency is caused by base64 encoding of the embedding results which is using numpy to decode.

Reporting this issue so hope team can investigate and fix it.

context:
openai version: 1.3.4
python: 3.10.12
numpy: 1.26.1
model: text-embedding-ada-002

To Reproduce

run the following 100 times:

openai_client.embeddings.create(input='chain of thoughts', model='text-embedding-ada-002')

run the following 100 times:

openai_client.embeddings.create(input='chain of thoughts', model='text-embedding-ada-002', encoding_format='float')

Code snippets

No response

OS

macos

Python version

Python 3.10.12

Library version

1.3.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions