New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community: DuckDB VS - expose similarity, improve performance of from_texts #20971
base: master
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
afc56bb
to
b171b58
Compare
@baskaryan @eyurtsev should I try to fix the failing test? Any suggestion would be appreciated. |
@hwchase17 can we merge it now? |
Row-by-row INSERTs are not recommended by the official DOC. They are very slow and utilize heavily the storage. I tested it with 100+ documents, duration went down from 27s to 7s and local SSD is far less utilized.
a65c58c
to
6275223
Compare
3 fixes of DuckDB vector store:
vector_key
).from_documents
Dependencies: added Pandas to speed up
from_documents
.I was thinking about CSV and JSON options, but I expect trouble loading JSON values this way and also CSV and JSON options require storing data to disk.
Anyway, the poetry file for langchain-community already contains a dependency on Pandas.