JAVA-6099 Add new fields for Auto embedding#1936

Open

strogiyotec wants to merge 3 commits intomongodb:mainfrom

strogiyotec:JAVA-6099

Contributor

strogiyotec commented Apr 9, 2026 •

edited

Loading

JAVA-6099
The doc in JIRA list new fields that must be supported by the drivers
but because the Search index doesn't use builders, java driver makes it possible to use new fields without changing API
As part of this PR I added a new unit test to disabled test class and enabled it locally for testing
The reason it's disabled is

This class was created to add support for queryText index that was only available in custom docker image ,right now it's available in community edition so we have to adjust evergreen first
New Search Index fields are only available in Atlas and are not in public preview yet, I had to use specific cluster in Atlas QA environment
For other details please check the JIRA ticket comment


          JAVA-6099 Add new fields for Auto embedding

strogiyotec requested a review from a team as a code owner

April 9, 2026 04:55

strogiyotec requested a review from rozza

April 9, 2026 04:55


          disable test case

c140c58

codeowners-service-app bot requested a review from atandon2024

April 9, 2026 04:59

codeowners-service-app bot commented Apr 9, 2026

Assigned atandon2024 for team dbx-java because rozza is out of office.


          upgrade embedding test cases

b674eb9

strogiyotec requested review from vbabanin and removed request for atandon2024 and rozza

April 9, 2026 05:20

vbabanin requested a review from Copilot

April 9, 2026 06:40

Copilot started reviewing on behalf of vbabanin

April 9, 2026 06:40

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

Adds functional coverage for new Atlas Search “autoEmbed” index fields and related vector search behaviors in the sync driver test suite (JAVA-6099), primarily by expanding the existing automated-embedding functional test class.

Changes:

Adds new functional tests for auto-embedding search index creation scenarios (quantization variants, filter fields, invalid configurations).
Adds query coverage for model override and exact vector search options.
Introduces a disabled (intended-to-skip) test case for numDimensions on autoEmbed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

...nctional/com/mongodb/client/vector/AbstractAutomatedEmbeddingVectorSearchFunctionalTest.java

Comment on lines 31 to +243

@@ @@ -44,6 +47,7 @@ @@
               import static com.mongodb.client.model.Aggregates.vectorSearch;
               import static com.mongodb.client.model.search.SearchPath.fieldPath;
               import static com.mongodb.client.model.search.VectorSearchOptions.approximateVectorSearchOptions;
+              import static com.mongodb.client.model.search.VectorSearchOptions.exactVectorSearchOptions;
               import static com.mongodb.client.model.search.VectorSearchQuery.textQuery;
               import static java.util.Arrays.asList;
               import static org.bson.codecs.configuration.CodecRegistries.fromProviders;
@@ @@ -210,6 +214,200 @@ @@
                       ));
                   }
+                  @ParameterizedTest(name = "should create auto embedding index with {0} quantization")
+                  @ValueSource(strings = {"float", "scalar", "binary", "binaryNoRescore"})
+                  void shouldCreateAutoEmbeddingIndexWithQuantization(final String quantization) {
+                      final String indexName = INDEX_NAME + "_" + quantization;
+                      mongoClient.getDatabase(getDatabaseName()).createCollection(getCollectionName());
+                      SearchIndexModel indexModel = new SearchIndexModel(
+                              indexName,
+                              new Document(
+                                      "fields",
+                                      Collections.singletonList(
+                                              new Document("type", "autoEmbed")
+                                                      .append("modality", "text")
+                                                      .append("path", FIELD_SEARCH_PATH)
+                                                      .append("model", "voyage-4-large")
+                                                      .append("quantization", quantization)
+                                      )),
+                              SearchIndexType.vectorSearch()
+                      );
+                      List<String> result = documentCollection.createSearchIndexes(Collections.singletonList(indexModel));
+                      Assertions.assertFalse(result.isEmpty());
+                  }
+                  @Test
+                  @DisplayName("should create auto embedding index with custom numDimensions")
+                  @Ignore("Currently numDimensions can't be used, it fails with server error: 'Invalid numDimensions value for autoEmbed field in index: test_auto_embed. Expected an integer.'")
+                  void shouldCreateAutoEmbeddingIndexWithCustomNumDimensions() {

Copilot AI Apr 9, 2026

This test class is using JUnit Jupiter (@test, @beforeeach, etc.), but the method is annotated with JUnit 4's @ignore. JUnit Jupiter will not honor @ignore, so when the class-level Assumption is eventually removed this test will run (and currently looks intended to be skipped). Use org.junit.jupiter.api.Disabled (or the project’s existing conditional-disable mechanism) and drop the org.junit.Ignore import.

Copilot uses AI. Check for mistakes.

...nctional/com/mongodb/client/vector/AbstractAutomatedEmbeddingVectorSearchFunctionalTest.java

+                  @Test
+                  @DisplayName("should create auto embedding index with custom numDimensions")
+                  @Ignore("Currently numDimensions can't be used, it fails with server error: 'Invalid numDimensions value for autoEmbed field in index: test_auto_embed. Expected an integer.'")

Copilot AI Apr 9, 2026

The @Ignore reason mentions an index name test_auto_embed, but this test creates the index using INDEX_NAME (currently "voyage_4"). Consider making the ignore/disabled message generic or matching the actual index name to avoid confusion when diagnosing failures.

Suggested change

      
                @Ignore("Currently numDimensions can't be used, it fails with server error: 'Invalid numDimensions value for autoEmbed field in index: test_auto_embed. Expected an integer.'")
          
                @Ignore("Currently numDimensions can't be used; it fails with a server error for the autoEmbed field: 'Invalid numDimensions value for autoEmbed field. Expected an integer.'")

Copilot uses AI. Check for mistakes.

vbabanin reviewed

View reviewed changes

...nctional/com/mongodb/client/vector/AbstractAutomatedEmbeddingVectorSearchFunctionalTest.java

Comment on lines +367 to +375

+                      createAutoEmbeddingIndex("voyage-4-large");
+                      TimeUnit.SECONDS.sleep(2L);
+                      insertDocumentsForEmbedding();
+                      TimeUnit.SECONDS.sleep(2L);
+                      List<Bson> pipeline = asList(
+                              vectorSearch(
+                                      fieldPath(FIELD_SEARCH_PATH),
+                                      textQuery("movies about love").model("voyage-4-large"),

Member

vbabanin Apr 13, 2026

This doesn’t actually exercise an override. The index is created with voyage-4-large, and the query also specifies .model("voyage-4-large") - same model. An override would mean using a different compatible model at query time.

The spec calls this out for the voyage-4 family (interoperability within the same family). To test override, we should create the index with voyage-4-large and run the query with a different compatible model (e.g., voyage-4 or voyage-4-lite).

...nctional/com/mongodb/client/vector/AbstractAutomatedEmbeddingVectorSearchFunctionalTest.java

                * limitations under the License.
                */
               package com.mongodb.client.vector;

Member

vbabanin Apr 14, 2026 •

edited

Loading

I missed this in the previous review (#1862, https://github.com/mongodb/mongo-java-driver/pull/1862/changes
), but i think AbstractAutomatedEmbeddingVectorSearchFunctionalTest should be relocated to com.mongodb.client.model.search (where AggregatesBinaryVectorSearchIntegrationTest lives).

Both tests exercise the same surface area: $vectorSearch aggregation stage behavior - one with binary vectors, the other with auto-embedded text queries. They use the same API (Aggregates.vectorSearch(), VectorSearchOptions, VectorSearchQuery, SearchPath.fieldPath()), all of which live under com.mongodb.client.model.search, so co-locating the tests with that API keeps the package structure consistent.

com.mongodb.client.vector should remain focused on BSON vector type coverage, not aggregation-stage behavior, which is what is done in AbstractBinaryVectorFunctionalTest - this test is specifically for vector BSON type.

...nctional/com/mongodb/client/vector/AbstractAutomatedEmbeddingVectorSearchFunctionalTest.java

+                  @DisplayName("should create auto embedding index and run query with model override")
+                  void shouldCreateAutoEmbeddingIndexAndRunQueryWithModelOverride() throws InterruptedException {
+                      mongoClient.getDatabase(getDatabaseName()).createCollection(getCollectionName());
+                      createAutoEmbeddingIndex("voyage-4-large");

Member

vbabanin Apr 14, 2026 •

edited

Loading

This comment is related to the suggestion in: #1936 (comment)

Currently, these tests mix two concerns: index setup and running a query. When a test covers both, a failure doesn’t tell us - without digging into logs/code - whether the setup broke (index creation) or the operation broke (query execution).

This applies to:
shouldCreateAutoEmbeddingIndexAndRunQueryWithModelOverride
shouldCreateAutoEmbeddingIndexAndRunExactVectorSearchQuery
shouldCreateAutoEmbeddingIndexAndRunVectorSearchQuery

The createAutoEmbeddingIndex("voyage-4-large") is the exact same setup duplicated across all three - it's not the SUT, it's a prerequisite:

        createAutoEmbeddingIndex("voyage-4-large");
        TimeUnit.SECONDS.sleep(2L);
        insertDocumentsForEmbedding();
        TimeUnit.SECONDS.sleep(2L);

I think we should follow the pattern in AggregatesBinaryVectorSearchIntegrationTest: create the index and shared test data once, and have each test method exercise a single query concern (approximate, exact, model override), e.g. via @BeforeAll (and later it will be a pre-created index on the QA cluster).

The same pattern we did before:

AggregatesBinaryVectorSearchIntegrationTest (38fc561) - index setup in the BeforeAll.
Migration to QA cluster.

...nctional/com/mongodb/client/vector/AbstractAutomatedEmbeddingVectorSearchFunctionalTest.java

+                  @Test
+                  @DisplayName("should create auto embedding index with filter field")
+                  void shouldCreateAutoEmbeddingIndexWithFilterField() {

Member

vbabanin Apr 14, 2026 •

edited

Loading

Index creation and vector search queries are separate concerns, and the codebase currently reflects this split:

Index management tests (index creation, deletion etc) live in AbstractAtlasSearchIndexManagementProseTest
Vector search query tests live in AggregatesBinaryVectorSearchIntegrationTest / AbstractAutomatedEmbeddingVectorSearchFunctionalTest

I suggest we move the index creation tests in this PR to AbstractAtlasSearchIndexManagementProseTest:

shouldCreateAutoEmbeddingIndexWithQuantization
shouldCreateAutoEmbeddingIndexWithCustomNumDimensions
shouldCreateAutoEmbeddingIndexWithFilterField
shouldFailWhenMixingVectorAndAutoEmbedTypes
shouldFailWhenDuplicatePathsAreUsed
shouldFailWhenAutoEmbedFieldUsedAsFilterField

They aren’t spec-defined prose tests today, but colocating this additional (non-prose) coverage alongside the prose tests is still a good fit: it extends index-management coverage in the same area, and if/when the spec adds auto-embedding prose tests, they’d naturally land there as well. The C# driver took the same approach by adding auto-embed index creation tests under their Atlas Search index management tests (mongo-csharp-driver PR #1900).

The class would stay then quite simple and would have only:

shouldExecuteVectorSearchQuery (existing, from previous PR)
shoulduExecuteVectorSearchWithModelOverride (new, added in this PR)
shouldExecuteExactVectorSearchQuery (new, added in this PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet