Feature/knn vector update mongodb#3971
Draft
ranfysvalle02 wants to merge 4 commits intomem0ai:mainfrom
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
MongoDB Vector Store: Migration to vectorSearch
Description
Migrates MongoDB vector store from deprecated
knnVectorto MongoDB AtlasvectorSearchindex type with$vectorSearchaggregation pipeline. Includes comprehensive integration tests that verify$vectorSearchfunctionality end-to-end and automatic migration of legacy indexes.🔍 Integration tests explicitly test
$vectorSearchaggregation pipeline using MongoDB Atlas Local containers to ensure the vector search implementation works correctly in production-like environments.Fixes #3970
Type of Change
knnVectortovectorSearch)Summary of Changes
1. Migration:
knnVector→vectorSearchIndex TypeBefore:
After:
Impact:
vectorSearchindex type)knnVectorfield type2. Vector Search Improvements
numCandidateschanged fromlimittolimit * 20for better HNSW index recalllist_search_indexes()check on every search operation$vectorSearchaggregation stage with proper score extraction3. Auto-Healing Legacy Indexes (Zero Data Loss)
NEW: Automatic detection and migration of legacy
knnVectorindexes without data loss.create_col()inspects existing indexesknnVectortype in oldmappingsstructure or outdated index configurationsvectorSearchBefore (Manual Migration Required):
After (Automatic):
4. Asynchronous Index Operations
NEW: Robust handling of MongoDB Atlas Search's asynchronous index operations.
_wait_for_index_status()method polls index status until ready or deletedqueryable=Truebefore using indexes (not just existence)index_creation_timeoutparameter (default 300s, adjustable for large datasets)wait_for_index_ready=Falseoption for non-blocking initialization in APIsKey Features:
queryable=True5. Code Quality
_idin insert operationsIntegration Tests:
$vectorSearchVerification🎯 NEW: Comprehensive Integration Test Suite
Added
tests/vector_stores/test_mongodb_integration.pythat explicitly tests$vectorSearchaggregation pipeline using real MongoDB Atlas Local containers.Why MongoDB Atlas Local?
$vectorSearch$vectorSearchfunctionalityTest Coverage
test_vector_lifecycle- Full CRUD with$vectorSearchWhat it verifies:
$vectorSearchaggregation pipeline executionvectorSearchScoremetadata extraction$matchstage filtering with$vectorSearchtest_list_functionality- List operations with filtersTests
list()method with payload filters.test_knnvector_to_vectorsearch_migration- NEW: Migration TestComprehensive test that verifies automatic migration from legacy
knnVectortovectorSearch:What it verifies:
knnVectorindex creation (oldmappingsformat)Running the Tests
Test Results:
Migration Test Output:
Test Infrastructure
testcontainerslibrary for Docker container managementAtlasContainerwrapper for MongoDB Atlas LocalHow Has This Been Tested?
✅ Integration Tests (NEW)
$vectorSearchfunctionality and migrationknnVector→vectorSearchmigration with data preservation✅ Existing Unit Tests
vectorSearchindex format✅ Manual Testing
$vectorSearchpipeline works correctlyMigration Guide
For Existing Users
No API changes - The public interface remains identical. Migration is automatic and zero-downtime:
Automatic Index Migration: Existing collections with legacy
knnVectorindexes are automatically detected and migratedWhat happens:
create_col()inspects existing indexesknnVectordetected (checks oldmappingsstructure), it drops only the index (preserving data)vectorSearchconfigurationwait_for_index_ready=True)MongoDB Version: Requires MongoDB Atlas or MongoDB Atlas Local (standard MongoDB doesn't support
$vectorSearch)Manual Reset (Optional): If you prefer to start fresh,
reset()is still available but no longer required for migrationTechnical Details
$vectorSearchPipeline StructureThe search method now uses this aggregation pipeline:
If filters are provided, a
$matchstage is inserted after$vectorSearch:Auto-Healing Implementation
The
create_col()method now includes intelligent legacy index detection and migration with robust asynchronous handling:Key Features:
mappingsstructure and missingtypefieldwait_for_index_readyandindex_creation_timeoutparametersAsynchronous Index Operations
MongoDB Atlas Search index operations are asynchronous. The implementation includes:
_wait_for_index_status()Helper Method:queryable=Truefor readiness (not just existence)Why This Matters:
Configuration:
Production Considerations:
index_creation_timeoutwait_for_index_ready=Falseand handle "index not ready" errors gracefullyPerformance Considerations
list_search_indexes())numCandidates = limit * 20for better recall (trade-off: slightly slower but more accurate)Checklist
Breaking Changes
None - The API remains fully backward compatible. Only internal implementation changed.
Migration: Existing
knnVectorindexes are automatically detected and migrated on initialization. No manual intervention required - your data is preserved during the migration process.Maintainer Checklist