vector load_documents in batches by dayesouza · Pull Request #2251 · microsoft/graphrag

dayesouza · 2026-02-27T20:29:42Z

This pull request refactors the document loading process for several vector store backends to support batch operations, improving efficiency and consistency. The main change replaces the single-document insert method with a batch-oriented load_documents method across all relevant classes, and updates the embedding operation and its tests to use this new batching logic.

Batch document loading refactor:

Replaced the insert method with a load_documents method in azure_ai_search.py, cosmosdb.py, and lancedb.py, enabling batch uploads for Azure AI Search and LanceDB, and iterative uploads for CosmosDB [1] [2] [3].
Updated the abstract base class in vector_store.py to define load_documents as the primary method for document insertion, with insert now delegating to it for single documents.

Embedding operation and test updates:

Changed the batching logic in embed_text.py to flush the buffer when it exceeds batch_size * 4 instead of just batch_size, increasing batch efficiency [1] [2].
Updated the corresponding test in test_embed_text.py to reflect the new batching behavior, verifying that documents are processed in larger batches and adjusting assertions accordingly [1] [2] [3].

dayesouza added 2 commits February 27, 2026 20:08

vectors bulk load_documents

6519368

vector load

977dd24

dayesouza requested a review from a team as a code owner February 27, 2026 20:29

dayesouza added 2 commits February 27, 2026 20:36

add upsert work into dictionary

fafb06b

fic dictionary

6aa6d92

andresmor-ms approved these changes Feb 27, 2026

View reviewed changes

dayesouza merged commit 6f26d0e into main Feb 27, 2026
18 checks passed

dayesouza deleted the embed_v2 branch February 27, 2026 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vector load_documents in batches#2251

vector load_documents in batches#2251
dayesouza merged 4 commits intomainfrom
embed_v2

dayesouza commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dayesouza commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants