Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when querying for the most similar vector #50

Closed
vemonet opened this issue Jan 17, 2024 · 3 comments
Closed

Error when querying for the most similar vector #50

vemonet opened this issue Jan 17, 2024 · 3 comments

Comments

@vemonet
Copy link

vemonet commented Jan 17, 2024

Hi @ankane , we are trying to use pgvector to perform similarity search

We have successfully loaded embeddings to a postgres table using pgvector, psycopg3, and fastembed by following the readme doc:

https://github.com/vemonet/concept-resolver/blob/main/src/pubdict_load.py#L157

The table has been properly populated, I checked it with this command:

similar = conn.execute("SELECT * FROM pubdictionaries_embeddings LIMIT 5").fetchall()

But if I try to retrieve the most similar (full code here: https://github.com/vemonet/concept-resolver/blob/main/src/pubdict_search.py):

similar = conn.execute('SELECT * FROM pubdictionaries_embeddings ORDER BY embedding <-> %s LIMIT 5', (embeddings,)).fetchall()

I am getting an error where pg pretends it does not know the type I am passing:

Traceback (most recent call last):
  File "/app/src/pubdict_search.py", line 22, in <module>
    similar = conn.execute('SELECT * FROM pubdictionaries_embeddings ORDER BY embedding <-> %s LIMIT 5', (embeddings,)).fetchall()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg/connection.py", line 896, in execute
    raise ex.with_traceback(None)
psycopg.errors.UndefinedFunction: operator does not exist: vector <-> double precision[]
LINE 1: ...ROM pubdictionaries_embeddings ORDER BY embedding <-> $1 LIM...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

This error seems to indicate that the vector extension is not properly enabled, which does not seems right since:

  1. It has been enabled inside the pg db by the load script
  2. I am even rerunning the register_vector(conn) in the search script just in case (I would expect it is not needed, but since pg was complaining about not knowing anymore what a vector is I thought that could help)
  3. I am using the exact same function to generate embeddings for load and search, which return a list of list of floats (pythonic list, not ndarray nor np.array), so if it works for load it should also work for search:
        return [
            embedding.tolist() for embedding in self.embedding_model.embed(labels)
        ]

We also tried to just pass 1 embedding when performing the search (the list of floats, instead of a list of list of floats):

similar = conn.execute('SELECT * FROM pubdictionaries_embeddings ORDER BY embedding <-> %s LIMIT 5', (embeddings,)).fetchall()

But we are getting the exact same error.

We also tried to convert our pythonic list of list of floats with np.array()

        return [
            np.array(embedding.tolist()) for embedding in self.embedding_model.embed(labels)
        ]

But we are getting another error:

Traceback (most recent call last):
  File "/app/src/pubdict_load.py", line 157, in <module>
    cursor.execute(
  File "/usr/local/lib/python3.11/site-packages/psycopg/cursor.py", line 732, in execute
    raise ex.with_traceback(None)
psycopg.ProgrammingError: cannot adapt type 'ndarray' using placeholder '%s' (format: AUTO)

There is this error on stackoverflow that is a bit similar: https://stackoverflow.com/questions/75904637/how-to-fix-postgres-error-operator-does-not-exist-using-pgvector
But the conclusion is "enable the vector extension", which we have done already

Any idea how this could be fixed? We carefully followed the provided docs, and are not postgres extension experts, so we are a bit blocked now

Thanks a lot!

@ankane
Copy link
Member

ankane commented Jan 17, 2024

Hi @vemonet, check out #4.

@ankane ankane closed this as completed Jan 17, 2024
@vemonet
Copy link
Author

vemonet commented Jan 17, 2024

Thanks a lot @ankane !

I don't understand though... Is there a reason why loading only work with pythonic lists, and fails with np.array(), while searching only works with np.array() and not pythonic lists?

It makes the whole system really unconsistent in how it handles types...

@ankane
Copy link
Member

ankane commented Jan 17, 2024

np.array should work in all cases when the type is registered correctly. Python lists happen to work on inserts since there's implicit conversion from double precision[] (which is how psycopg 3 passes float lists to Postgres) to vector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants