feat: Add Vector Search support to MongoDBOnlineStore#6344
feat: Add Vector Search support to MongoDBOnlineStore#6344ntkathole merged 11 commits intofeast-dev:masterfrom
Conversation
|
Note: This will be rebased onto master. I mistakenly built from #6138 |
4f99bd7 to
dcf08d1
Compare
dcf08d1 to
ebe08c5
Compare
| "pytest-asyncio<=0.24.0", | ||
| "py>=1.11.0", | ||
| "testcontainers==4.9.0", | ||
| "testcontainers==4.15.0rc2", |
There was a problem hiding this comment.
need to update requirements using make lock-python-dependencies-all, doesn't using stable release help ?
There was a problem hiding this comment.
- Lockfiles regenerated via make lock-python-dependencies-all — included in latest push.
- 4.15.0rc2 is the earliest release that includes MongoDBAtlasLocalContainer — there's no stable release with it yet. Changed to >=4.15.0rc2 so it will automatically pick up 4.15.0 stable once it ships.
beaa3e9 to
d1e5348
Compare
- Extend MongoDBOnlineStoreConfig with VectorStoreConfig (vector_enabled, similarity, vector_index_wait_timeout) - Auto-create/drop Atlas vector search indexes in update() for feature views with vector_index=True fields - Implement retrieve_online_documents_v2 using \ aggregation - Add MongoDBAtlasOnlineStoreCreator using MongoDBAtlasLocalContainer from testcontainers-python fork - Add integration tests for index lifecycle, write+retrieve round-trip, top_k limiting, and teardown cleanup Signed-off-by: Casey Clements <casey.clements@mongodb.com>
…_WAIT - Explicitly create collection in _ensure_vector_indexes before calling create_search_index (Atlas requires it to exist) - Restructure tests to share store instance and setup via module-scoped fixture - Add INDEX_WAIT (default 5s) for Atlas Search eventual consistency after writes Signed-off-by: Casey Clements <casey.clements@mongodb.com>
- Test write_data now uses np.float32 to match Array(Float32) schema - retrieve_online_documents_v2 coerces embedding to native Python floats before passing to \, avoiding BSON encoding errors with numpy float types Signed-off-by: Casey Clements <casey.clements@mongodb.com>
…tion test - Add unit tests for retrieve_online_documents_v2 error paths: vector_enabled=False, embedding=None, no vector_index fields, missing vector_length - Add idempotency integration test: calling update() twice does not duplicate indexes - Add vector_index_wait_poll_interval config option - Change _wait_for_index_ready to raise TimeoutError instead of silently continuing on timeout Signed-off-by: Casey Clements <casey.clements@mongodb.com>
…ntainer Replace MongoDbContainer with MongoDBAtlasLocalContainer in MongoDB offline store unit tests. This uses the mongodb/mongodb-atlas-local image which includes Atlas Search services, enabling future vector search testing. - Bump testcontainers from 4.9.0 to 4.15.0rc2 - Switch test fixture to MongoDBAtlasLocalContainer - Simplify connection string fixture to use get_connection_url() Signed-off-by: Casey Clements <casey.clements@mongodb.com>
bool is a subclass of int in Python, so isinstance(True, int) returns True. Move the bool check before int so boolean values are correctly converted to ValueProto(bool_val=...) instead of ValueProto(int64_val=...). Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Add docstring to _ensure_vector_indexes explaining the current one-index-per-field design and noting that a single composite index with multiple field definitions would reduce cluster-wide index count and memory overhead. Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Run 'make lock-python-dependencies-all' to update all lockfiles to reflect testcontainers>=4.15.0rc2 and mongodb in ci extras. Signed-off-by: Casey Clements <casey.clements@mongodb.com>
d1e5348 to
81aea6e
Compare
|
@ntkathole - This is ready for your review/merge. TIA! |
|
@caseyclements Let's update the pixi lock file to fix failures and will merge |
I've updated in my environment - MacOS. CI is still failing. What needs to be done? |
|
@caseyclements seems there are conflicts with master |
Oh. Of course. Thanks. |
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
|
@ntkathole Are we good to merge? Failures appear to be unrelated. ● Here's the summary of all 4 failures:
|
|
@caseyclements snowflake is expected but mysql failures are due to testcontainer version upgrade (testcontainers/testcontainers-python#739) Change conftest.py to include dialect: |
The testcontainers upgrade changed the default MySQL dialect, causing 'No module named MySQLdb' errors. Explicitly set dialect='pymysql' in all MySqlContainer instantiations. Signed-off-by: Casey Clements <casey.clements@mongodb.com>
# [0.63.0](v0.62.0...v0.63.0) (2026-05-04) ### Bug Fixes * Add project filter to apply_data_source and delete_data_source (closes [#6206](#6206)) ([#6322](#6322)) ([96562c4](96562c4)) * Add project_id filter to SnowflakeRegistry UPDATE path ([#6243](#6243)) ([6658b71](6658b71)), closes [#6208](#6208) [#6208](#6208) * Add subprocess timeouts to prevent test_e2e_local hanging on Dask atexit handler ([3de6556](3de6556)) * Ambiguous truth value of array during materialization ([#6259](#6259)) ([d0c8984](d0c8984)) * Auto-detect GCS/S3 registry store when registry is passed as string ([#6260](#6260)) ([7ebcf03](7ebcf03)) * **bigquery:** Prefer query over table in get_table_query_string ([#6360](#6360)) ([77ed779](77ed779)), closes [#6200](#6200) * correct project_id scoping in get_user_metadata and delete_project ([0c469a7](0c469a7)) * disable Redis RDB persistence in test deployments ([44cd682](44cd682)) * Disable snowflake tests temporarily in CI ([#6356](#6356)) ([31d5a98](31d5a98)) * Filter empty SQL commands at execute_snowflake_statement call sites ([#6249](#6249)) ([92ffbb9](92ffbb9)) * Fix five bugs in milvus online store ([#6275](#6275)) ([212504b](212504b)) * Fix issue with apply feature view ([835cda8](835cda8)) * Fix streaming materialization for exotic sources with lazy UDF pipelines ([c07972d](c07972d)) * Handle missing features gracefully instead of panicking ([7d00b3a](7d00b3a)) * Harden informer cache with label selectors and memory optimizations ([#6242](#6242)) ([3f11356](3f11356)) * **helm:** Avoid nil pointer for metrics.enabled inside podAnnotations ([#6251](#6251)) ([c833f1a](c833f1a)) * Include git in feast server image ([fb03c46](fb03c46)) * Include StreamFeatureView in freshness metric ([#6269](#6269)) ([463f16c](463f16c)) * Pre-create S3A event log dir before SparkContext init ([#6317](#6317)) ([9feca77](9feca77)) * Remote Online Store Type Inference Error with All-NULL Columns ([#6063](#6063)) ([de67bdd](de67bdd)) * Remove selector with kustomize overlay using a JSON 6902 patch ([9107a43](9107a43)) * Resolve multiple bugs in SnowflakeRegistry and Snowflake connection handling ([#6315](#6315)) ([7e66a2e](7e66a2e)) * **spark:** BatchFeatureView with TransformationMode.PYTHON now reads all source columns ([a310eaf](a310eaf)) * **spark:** Use SELECT * when feature_name_columns is empty in pull_all_from_table_or_query ([e1b1d2d](e1b1d2d)) * Support pandas mode in feature builder and fix dask column extraction ([863315e](863315e)) * support SQL string as entity_df in RemoteOfflineStore.get_historical_features ([c559889](c559889)) * Wrap LocalOutputNode return value in ArrowTableValue for consist… ([#6286](#6286)) ([a16cd55](a16cd55)) ### Features * Add agent skills and Cursor/Claude rules for Feast development ([312eea3](312eea3)) * Add feature view versioning support to FAISS online store ([b36acb7](b36acb7)) * Add feature view versioning support to Redis and DynamoDB online stores ([#6257](#6257)) ([edf25af](edf25af)), closes [#6164](#6164) [#6163](#6163) * Add optional 'org' in feature view ([#6288](#6288)) ([#6301](#6301)) ([608b105](608b105)) * Add RaySource, to_ray_dataset first-class method, docs, and tests ([1c98157](1c98157)) * Add TLS support for Go Feature Server ([#6229](#6229)) ([28a58d0](28a58d0)) * Add Vector Search support to MongoDBOnlineStore ([#6344](#6344)) ([c102738](c102738)) * Add versioning support to Milvus online store ([#6330](#6330)) ([3268ced](3268ced)) * Addresses performance issues in the Redis online store ([2e50da0](2e50da0)) * Allow to set gpu for ray ([5580ab4](5580ab4)) * Bump redis-py version cap from <5 to <8 ([#6339](#6339)) ([9538180](9538180)) * Expose feature_server, materialization, and openlineage configuration via FeatureStore CRD ([ec6ecfd](ec6ecfd)) * Make online_write_batch_size configurable in MaterializationConfig ([#6268](#6268)) ([d41becf](d41becf)) * Make udf optional if agg defined ([#5689](#5689)) ([#6328](#6328)) ([f630056](f630056)) * MongoDB offline store ([#6138](#6138)) ([8eebad7](8eebad7)) * Optional input_schema for ODFV ([#6308](#6308)) ([#6312](#6312)) ([f08b4e8](f08b4e8)) * Provision minimal TokenReview RBAC for OIDC auth and add SSL error logging in token parser ([#6240](#6240)) ([dca57e8](dca57e8)) * **spark:** Add compute-on-read support for BatchFeatureView in get_… ([#6357](#6357)) ([630d9f8](630d9f8))
What this PR does / why we need it:
Adds MongoDB Atlas Vector Search support to
MongoDBOnlineStore, enabling similarity search over feature embeddings stored in MongoDB Atlas.Changes:
MongoDBOnlineStoreConfig: ExtendsVectorStoreConfigto inheritvector_enabledandsimilarityfields. Addsvector_index_wait_timeoutfor controlling how long to wait for newly created Atlas Search indexes to become queryable.MongoDBOnlineStore.update(): Whenvector_enabled=True, automatically creates Atlas vector search indexes for anyFeatureViewfields that havevector_index=True. Ensures the underlying collection exists before index creation (required by Atlas). Drops vector indexes for removed feature views.MongoDBOnlineStore.retrieve_online_documents_v2(): Implements similarity search using the$vectorSearchaggregation stage. Returns results as(event_ts, entity_key_proto, feature_dict)tuples with a syntheticdistancefield containing the vector search score. Coerces query vectors to native Python floats for BSON compatibility.MongoDBAtlasOnlineStoreCreator: Test infrastructure usingMongoDBAtlasLocalContainerfromtestcontainers-pythonto spin upmongodb/mongodb-atlas-local:8.0.4for integration testing.Integration tests (
test_mongodb_vector_search.py): Covers index lifecycle (create onupdate(), drop onteardown()), write + retrieve round-trip with known embeddings, andtop_klimiting.Dependencies:
pymongo >= 4.13.0Atlas Search index APIs (SearchIndexModel,create_search_index,list_search_indexes,drop_search_index). No change here.testcontainers.mongodb.MongoDBAtlasLocalContainer, which is, as of April 30th, available intestcontainers=4.15.0rc2. See testcontainers pull/873Which issue(s) this PR fixes:
Closes MongoDB ticket INTPYTHON-921
Checks
git commit -s)Testing Strategy
Misc
The integration tests use the
mongodb/mongodb-atlas-local:8.0.4Docker image via testcontainers. A configurableINDEX_WAIT(default 5s) accounts for Atlas Search index eventual consistency after writes.Verified that existing unit tests (676 passed) and MongoDB universal integration tests (75 passed) are unaffected by these changes.