Skip to content

vector load_documents in batches#2251

Merged
dayesouza merged 4 commits intomainfrom
embed_v2
Feb 27, 2026
Merged

vector load_documents in batches#2251
dayesouza merged 4 commits intomainfrom
embed_v2

Conversation

@dayesouza
Copy link
Contributor

This pull request refactors the document loading process for several vector store backends to support batch operations, improving efficiency and consistency. The main change replaces the single-document insert method with a batch-oriented load_documents method across all relevant classes, and updates the embedding operation and its tests to use this new batching logic.

Batch document loading refactor:

  • Replaced the insert method with a load_documents method in azure_ai_search.py, cosmosdb.py, and lancedb.py, enabling batch uploads for Azure AI Search and LanceDB, and iterative uploads for CosmosDB [1] [2] [3].
  • Updated the abstract base class in vector_store.py to define load_documents as the primary method for document insertion, with insert now delegating to it for single documents.

Embedding operation and test updates:

  • Changed the batching logic in embed_text.py to flush the buffer when it exceeds batch_size * 4 instead of just batch_size, increasing batch efficiency [1] [2].
  • Updated the corresponding test in test_embed_text.py to reflect the new batching behavior, verifying that documents are processed in larger batches and adjusting assertions accordingly [1] [2] [3].

@dayesouza dayesouza requested a review from a team as a code owner February 27, 2026 20:29
@dayesouza dayesouza merged commit 6f26d0e into main Feb 27, 2026
18 checks passed
@dayesouza dayesouza deleted the embed_v2 branch February 27, 2026 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants