Replies: 1 comment
-
I'm also having this issue and did a similar workaround to yours, but I'm not sure how well this will work for some Embeddings. Currently I'm using BedrockEmbeddings and while it seems to work, I'm not 100% sure if it's behaving as expected since Boto3 (AWS SDK) is not thread/process-safe for some cases nor properly supports async. Langchain async methods usually just calls run_in_executor to achieve async functionality, I don't know how this works in depth, but I assume this basically makes the code multithreaded. Would be good to get some input from the langchain team about this. |
Beta Was this translation helpful? Give feedback.
-
Checked
Feature request
I would like to propose multithreading when initializing a VectorStore or adding texts/documents to it.
Currently, sync and async methods of
add_texts
,add_documents
,from_texts
, andfrom_documents
all processes texts sequentially. This does not fully utilize Embeddings API throughput and becomes bottleneck.The following was my workaround to this problem. It splits documents into N groups and runs
aadd_documents
in parallel. This improves the entire embedding processing.I thought it would be a great feature if
VectorStore
supports something like this internally and users can use this with one liner.One option is to support
concurrency
parameter inVectorStore
which defaults to 1.I also noticed
ContextThreadPoolExecutor
already exists, so we can probably leverage that inVectorStore
.Let me know if there is already a better way to achieve this!
Motivation
Adding large number of chunks into VectorStore takes very long time currently and easily becomes a bottleneck. There is a workaround to this problem, but it is cumbersome to code out the concurrent processing.
Proposal (If applicable)
No response
Beta Was this translation helpful? Give feedback.
All reactions