-
Notifications
You must be signed in to change notification settings - Fork 20.5k
feature: removed pandas dataframe dependency for similary_search when using DuckDB as vector store #30445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: removed pandas dataframe dependency for similary_search when using DuckDB as vector store #30445
Conversation
… using DuckDB as vector store
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
ccurme
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
| return ids | ||
|
|
||
| def similarity_search( | ||
| def similarity_search_pd( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ensure that the behavior of similarity_search is identical to what it was previously, and not keep the pandas stuff around IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That I did verify with a test example, with text and metadata filled in. Doesn't seem to be breaking anything. However, I'd like you to verify as well, just to be doubly sure.
ccurme
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you run the tests in tests/integration_tests/vectorstores/test_duckdb.py? I'm seeing some of them fail in this branch.
|
Great catch! I've updated the code now. I missed a small, but critical case when metadata is None, but is passed to json.loads() as such which caused the issue. Rectified now. |
PR title: "community: Removes pandas dependency for using DuckDB for similarity search"
PR message:
similarity_search_pd, while the new one is atsimilarity_searchand requires no code changes. Return format remains the same.langchain_community.vectorstores.DuckDBrequires pandas - add a helpful (error) message #29933 and update on PR feature: added warning when duckdb is used as a vectorstore without pandas #30435