Skip to content

feat: Add Vector Search support to MongoDBOnlineStore#6344

Merged
ntkathole merged 11 commits intofeast-dev:masterfrom
caseyclements:INTPYTHON-921
May 1, 2026
Merged

feat: Add Vector Search support to MongoDBOnlineStore#6344
ntkathole merged 11 commits intofeast-dev:masterfrom
caseyclements:INTPYTHON-921

Conversation

@caseyclements
Copy link
Copy Markdown
Contributor

@caseyclements caseyclements commented Apr 28, 2026

What this PR does / why we need it:

Adds MongoDB Atlas Vector Search support to MongoDBOnlineStore, enabling similarity search over feature embeddings stored in MongoDB Atlas.

Changes:

  • MongoDBOnlineStoreConfig: Extends VectorStoreConfig to inherit vector_enabled and similarity fields. Adds vector_index_wait_timeout for controlling how long to wait for newly created Atlas Search indexes to become queryable.

  • MongoDBOnlineStore.update(): When vector_enabled=True, automatically creates Atlas vector search indexes for any FeatureView fields that have vector_index=True. Ensures the underlying collection exists before index creation (required by Atlas). Drops vector indexes for removed feature views.

  • MongoDBOnlineStore.retrieve_online_documents_v2(): Implements similarity search using the $vectorSearch aggregation stage. Returns results as (event_ts, entity_key_proto, feature_dict) tuples with a synthetic distance field containing the vector search score. Coerces query vectors to native Python floats for BSON compatibility.

  • MongoDBAtlasOnlineStoreCreator: Test infrastructure using MongoDBAtlasLocalContainer from testcontainers-python to spin up mongodb/mongodb-atlas-local:8.0.4 for integration testing.

  • Integration tests (test_mongodb_vector_search.py): Covers index lifecycle (create on update(), drop on teardown()), write + retrieve round-trip with known embeddings, and top_k limiting.

Dependencies:

  • Uses existing pymongo >= 4.13.0 Atlas Search index APIs (SearchIndexModel, create_search_index, list_search_indexes, drop_search_index). No change here.
  • The integration tests require testcontainers.mongodb.MongoDBAtlasLocalContainer, which is, as of April 30th, available in testcontainers=4.15.0rc2. See testcontainers pull/873

Which issue(s) this PR fixes:

Closes MongoDB ticket INTPYTHON-921

Checks

  • I've made sure the tests are passing.
  • My commits are signed off (git commit -s)
  • My PR title follows conventional commits format

Testing Strategy

  • Unit tests
  • Integration tests
  • Manual tests
  • Testing is not required for this change

Misc

The integration tests use the mongodb/mongodb-atlas-local:8.0.4 Docker image via testcontainers. A configurable INDEX_WAIT (default 5s) accounts for Atlas Search index eventual consistency after writes.

Verified that existing unit tests (676 passed) and MongoDB universal integration tests (75 passed) are unaffected by these changes.

@caseyclements
Copy link
Copy Markdown
Contributor Author

Note: This will be rebased onto master. I mistakenly built from #6138

devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread pyproject.toml Outdated
"pytest-asyncio<=0.24.0",
"py>=1.11.0",
"testcontainers==4.9.0",
"testcontainers==4.15.0rc2",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update requirements using make lock-python-dependencies-all, doesn't using stable release help ?

https://github.com/feast-dev/feast/blob/master/sdk/python/requirements/py3.12-ci-requirements.txt#L5740

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Lockfiles regenerated via make lock-python-dependencies-all — included in latest push.
  2. 4.15.0rc2 is the earliest release that includes MongoDBAtlasLocalContainer — there's no stable release with it yet. Changed to >=4.15.0rc2 so it will automatically pick up 4.15.0 stable once it ships.

- Extend MongoDBOnlineStoreConfig with VectorStoreConfig (vector_enabled,
  similarity, vector_index_wait_timeout)
- Auto-create/drop Atlas vector search indexes in update() for feature
  views with vector_index=True fields
- Implement retrieve_online_documents_v2 using \ aggregation
- Add MongoDBAtlasOnlineStoreCreator using MongoDBAtlasLocalContainer
  from testcontainers-python fork
- Add integration tests for index lifecycle, write+retrieve round-trip,
  top_k limiting, and teardown cleanup

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
…_WAIT

- Explicitly create collection in _ensure_vector_indexes before calling
  create_search_index (Atlas requires it to exist)
- Restructure tests to share store instance and setup via module-scoped
  fixture
- Add INDEX_WAIT (default 5s) for Atlas Search eventual consistency
  after writes

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
- Test write_data now uses np.float32 to match Array(Float32) schema
- retrieve_online_documents_v2 coerces embedding to native Python floats
  before passing to \, avoiding BSON encoding errors with
  numpy float types

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
…tion test

- Add unit tests for retrieve_online_documents_v2 error paths:
  vector_enabled=False, embedding=None, no vector_index fields,
  missing vector_length
- Add idempotency integration test: calling update() twice does not
  duplicate indexes
- Add vector_index_wait_poll_interval config option
- Change _wait_for_index_ready to raise TimeoutError instead of
  silently continuing on timeout

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
…ntainer

Replace MongoDbContainer with MongoDBAtlasLocalContainer in MongoDB
offline store unit tests. This uses the mongodb/mongodb-atlas-local
image which includes Atlas Search services, enabling future vector
search testing.

- Bump testcontainers from 4.9.0 to 4.15.0rc2
- Switch test fixture to MongoDBAtlasLocalContainer
- Simplify connection string fixture to use get_connection_url()

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
bool is a subclass of int in Python, so isinstance(True, int) returns
True. Move the bool check before int so boolean values are correctly
converted to ValueProto(bool_val=...) instead of ValueProto(int64_val=...).

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Add docstring to _ensure_vector_indexes explaining the current
one-index-per-field design and noting that a single composite index
with multiple field definitions would reduce cluster-wide index count
and memory overhead.

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Run 'make lock-python-dependencies-all' to update all lockfiles
to reflect testcontainers>=4.15.0rc2 and mongodb in ci extras.

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@bisht2050
Copy link
Copy Markdown

@ntkathole - This is ready for your review/merge. TIA!

Copy link
Copy Markdown
Member

@ntkathole ntkathole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@ntkathole
Copy link
Copy Markdown
Member

@caseyclements Let's update the pixi lock file to fix failures and will merge

make lock-python-dependencies-all and pixi lock

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@caseyclements
Copy link
Copy Markdown
Contributor Author

@caseyclements Let's update the pixi lock file to fix failures and will merge

I've updated in my environment - MacOS. CI is still failing. What needs to be done?

(feast) ~/src/feast (INTPYTHON-921)
$ pixi install -e registration-tests --locked
✔ The registration-tests environment has been installed.
(feast) ~/src/feast (INTPYTHON-921)
$ git ss
## INTPYTHON-921...origin/INTPYTHON-921
(feast) ~/src/feast (INTPYTHON-921)
$ pixi lock
✔ Lock-file was already up-to-date
(feast) ~/src/feast (INTPYTHON-921)

@ntkathole
Copy link
Copy Markdown
Member

@caseyclements seems there are conflicts with master

@caseyclements
Copy link
Copy Markdown
Contributor Author

@caseyclements seems there are conflicts with master

Oh. Of course. Thanks.

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@caseyclements
Copy link
Copy Markdown
Contributor Author

@ntkathole Are we good to merge? Failures appear to be unrelated.

● Here's the summary of all 4 failures:

Check Failing tests Ours?
unit-test-python (3.11, macos-14) test_redis.py::test_online_write_batch_async_skip_dedup_single_pipeline — event loop error No — Redis test
integration-test-python (3.11) Snowflake materialization tests + Snowflake offline store tests No — all Snowflake
integration-test-registration-ci No module named 'MySQLdb' No — MySQL driver missing
integration-test-registration-local Same MySQL issue No — MySQL driver missing

@ntkathole
Copy link
Copy Markdown
Member

@caseyclements snowflake is expected but mysql failures are due to testcontainer version upgrade (testcontainers/testcontainers-python#739)

Change conftest.py to include dialect:

@pytest.fixture(scope="session")
def mysql_server():
    container = MySqlContainer("mysql:latest", dialect="pymysql")
    container.start()

The testcontainers upgrade changed the default MySQL dialect, causing
'No module named MySQLdb' errors. Explicitly set dialect='pymysql' in
all MySqlContainer instantiations.

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@ntkathole ntkathole merged commit c102738 into feast-dev:master May 1, 2026
23 of 26 checks passed
@caseyclements caseyclements deleted the INTPYTHON-921 branch May 1, 2026 14:20
franciscojavierarceo pushed a commit that referenced this pull request May 4, 2026
# [0.63.0](v0.62.0...v0.63.0) (2026-05-04)

### Bug Fixes

* Add project filter to apply_data_source and delete_data_source (closes [#6206](#6206)) ([#6322](#6322)) ([96562c4](96562c4))
* Add project_id filter to SnowflakeRegistry UPDATE path ([#6243](#6243)) ([6658b71](6658b71)), closes [#6208](#6208) [#6208](#6208)
* Add subprocess timeouts to prevent test_e2e_local hanging on Dask atexit handler ([3de6556](3de6556))
* Ambiguous truth value of array during materialization ([#6259](#6259)) ([d0c8984](d0c8984))
* Auto-detect GCS/S3 registry store when registry is passed as string ([#6260](#6260)) ([7ebcf03](7ebcf03))
* **bigquery:** Prefer query over table in get_table_query_string ([#6360](#6360)) ([77ed779](77ed779)), closes [#6200](#6200)
* correct project_id scoping in get_user_metadata and delete_project ([0c469a7](0c469a7))
* disable Redis RDB persistence in test deployments ([44cd682](44cd682))
* Disable snowflake tests temporarily in CI ([#6356](#6356)) ([31d5a98](31d5a98))
* Filter empty SQL commands at execute_snowflake_statement call sites ([#6249](#6249)) ([92ffbb9](92ffbb9))
* Fix five bugs in milvus online store ([#6275](#6275)) ([212504b](212504b))
* Fix issue with apply feature view ([835cda8](835cda8))
* Fix streaming materialization for exotic sources with lazy UDF pipelines ([c07972d](c07972d))
* Handle missing features gracefully instead of panicking ([7d00b3a](7d00b3a))
* Harden informer cache with label selectors and memory optimizations ([#6242](#6242)) ([3f11356](3f11356))
* **helm:** Avoid nil pointer for metrics.enabled inside podAnnotations ([#6251](#6251)) ([c833f1a](c833f1a))
* Include git in feast server image ([fb03c46](fb03c46))
* Include StreamFeatureView in freshness metric ([#6269](#6269)) ([463f16c](463f16c))
* Pre-create S3A event log dir before SparkContext init ([#6317](#6317)) ([9feca77](9feca77))
* Remote Online Store Type Inference Error with All-NULL Columns ([#6063](#6063)) ([de67bdd](de67bdd))
* Remove selector with kustomize overlay using a JSON 6902 patch ([9107a43](9107a43))
* Resolve multiple bugs in SnowflakeRegistry and Snowflake connection handling ([#6315](#6315)) ([7e66a2e](7e66a2e))
* **spark:** BatchFeatureView with TransformationMode.PYTHON now reads all source columns ([a310eaf](a310eaf))
* **spark:** Use SELECT * when feature_name_columns is empty in pull_all_from_table_or_query ([e1b1d2d](e1b1d2d))
* Support pandas mode in feature builder and fix dask column extraction ([863315e](863315e))
* support SQL string as entity_df in RemoteOfflineStore.get_historical_features ([c559889](c559889))
* Wrap LocalOutputNode return value in ArrowTableValue for consist… ([#6286](#6286)) ([a16cd55](a16cd55))

### Features

* Add agent skills and Cursor/Claude rules for Feast development ([312eea3](312eea3))
* Add feature view versioning support to FAISS online store ([b36acb7](b36acb7))
* Add feature view versioning support to Redis and DynamoDB online stores ([#6257](#6257)) ([edf25af](edf25af)), closes [#6164](#6164) [#6163](#6163)
* Add optional 'org' in feature view ([#6288](#6288)) ([#6301](#6301)) ([608b105](608b105))
* Add RaySource, to_ray_dataset first-class method, docs, and tests ([1c98157](1c98157))
* Add TLS support for Go Feature Server ([#6229](#6229)) ([28a58d0](28a58d0))
* Add Vector Search support to MongoDBOnlineStore ([#6344](#6344)) ([c102738](c102738))
* Add versioning support to Milvus online store ([#6330](#6330)) ([3268ced](3268ced))
* Addresses performance issues in the Redis online store ([2e50da0](2e50da0))
* Allow to set gpu for ray ([5580ab4](5580ab4))
* Bump redis-py version cap from <5 to <8 ([#6339](#6339)) ([9538180](9538180))
* Expose feature_server, materialization, and openlineage configuration via FeatureStore CRD ([ec6ecfd](ec6ecfd))
* Make online_write_batch_size configurable in MaterializationConfig ([#6268](#6268)) ([d41becf](d41becf))
* Make udf optional if agg defined ([#5689](#5689)) ([#6328](#6328)) ([f630056](f630056))
* MongoDB offline store ([#6138](#6138)) ([8eebad7](8eebad7))
* Optional input_schema for ODFV ([#6308](#6308)) ([#6312](#6312)) ([f08b4e8](f08b4e8))
* Provision minimal TokenReview RBAC for OIDC auth and add SSL error logging in token parser ([#6240](#6240)) ([dca57e8](dca57e8))
* **spark:** Add compute-on-read support for BatchFeatureView in get_… ([#6357](#6357)) ([630d9f8](630d9f8))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants