Skip to content

Conversation

@nirinchev
Copy link
Collaborator

@nirinchev nirinchev commented Oct 27, 2025

Proposed changes

This adds support for creating atlas search indexes. Dropping and listing is already supported, so needed no changes.

Notable omissions from the schema, that we will want to evaluate based on user feedback and internal testing:

  • analyzers - based on consultation with the Search team
  • mappings.dynamic.typeSet - preview feature
  • searchAnalyzer - based on consultation with the Search team
  • storedSource - based on testing of model performance
  • synonyms - for simplicity
  • typeSets - for simplicity
  • mappings.fields.* - the fields definition only includes the type and uses passthrough for the remaining fields. We'll have to evaluate its performance and if necessary, define the complete json schema.

@nirinchev nirinchev requested a review from a team as a code owner October 27, 2025 15:11
@nirinchev nirinchev requested a review from Copilot October 27, 2025 16:03
Base automatically changed from ni/feature-flags to main October 27, 2025 16:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for creating Atlas Search (lexical) indexes through the MCP server. The changes enable users to create Atlas Search indexes with both dynamic and explicit field mappings, complementing the existing support for listing and dropping search indexes.

Key changes:

  • Added Atlas Search index creation support to create-index tool with comprehensive schema validation
  • Enhanced testing to differentiate between search and vector search indexes
  • Added accuracy tests for various Atlas Search index creation scenarios

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/tools/mongodb/create/createIndex.ts Adds atlasSearchIndexDefinition schema and implements search index creation logic
tests/integration/tools/mongodb/create/createIndex.test.ts Adds comprehensive integration tests for Atlas Search index creation scenarios
tests/integration/tools/mongodb/delete/dropIndex.test.ts Refactors tests to handle both search and vector search indexes separately
tests/accuracy/createIndex.test.ts Adds accuracy test cases for Atlas Search index creation with various configurations
tests/accuracy/sdk/accuracyTestingClient.ts Removes deprecated --connectionString flag from CLI arguments
README.md Updates documentation to remove deprecated --connectionString flag
Comments suppressed due to low confidence (1)

tests/integration/tools/mongodb/create/createIndex.test.ts:1

  • The index names are being evaluated at test definition time rather than test execution time. If getSearchIndexName() and getVectorIndexName() depend on beforeEach setup, these calls will execute before the setup runs, potentially returning undefined or stale values. Wrap these in functions: { description: "search", indexName: () => getSearchIndexName() } and update the test to call the function.
import { describeWithMongoDB, validateAutoConnectBehavior, waitUntilSearchIsReady } from "../mongodbHelpers.js";

@coveralls
Copy link
Collaborator

coveralls commented Oct 27, 2025

Pull Request Test Coverage Report for Build 19331710057

Details

  • 165 of 168 (98.21%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 80.397%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/tools/args.ts 9 12 75.0%
Totals Coverage Status
Change from base Build 19331702101: 0.2%
Covered Lines: 6581
Relevant Lines: 8073

💛 - Coveralls

import { quantizationEnum, similarityEnum } from "../../../common/search/vectorSearchEmbeddingsManager.js";

export class CreateIndexTool extends MongoDBToolBase {
private vectorSearchIndexDefinition = z.object({
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is best viewed with the "Hide whitespace" option - it's just prettier reformatting the indents.

});

const args = [MCP_SERVER_CLI_SCRIPT, "--connectionString", mdbConnectionString, ...additionalArgs];
const args = [MCP_SERVER_CLI_SCRIPT, mdbConnectionString, ...additionalArgs];
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--connectionString is deprecated so using the positional argument.

@nirinchev nirinchev changed the title feat: add ability to create atlas search indexes feat: add ability to create atlas search indexes MCP-275 Oct 28, 2025
private atlasSearchIndexDefinition = z
.object({
type: z.literal("search"),
analyzer: z
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this should be an enum of the analyzers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussing with the search team, searchAnalyzer and analyzers is not something we need to support based on current usage patterns. We can add support in the future if we see customer demand for it.

"The analyzer to use for the index. Can be one of the built-in lucene analyzers (`lucene.standard`, `lucene.simple`, `lucene.whitespace`, `lucene.keyword`), a language-specific analyzer, such as `lucene.cjk` or `lucene.czech`, or a custom analyzer defined in the Atlas UI."
),
mappings: z
.object({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lack support of:

  • numPartitions
  • searchAnalyzer vs analyze
  • custom analyzers
  • storedSources
  • synonyms
  • typeSets

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.mongodb.com/docs/atlas/atlas-search/index-definitions/?deployment-type=atlas&interface=driver&language=nodejs#std-label-ref-index-definitions

We could say that custom analyzers are not that important, but storedSources is actually relevant most of the times.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The args shape is based on the POC the search team did for index support and I was going off of the assumption that they've selected the fields that they see the most value in exposing to LLMs. I realize there's a lot more configuration that's possible, I'm just not sure how much of that is stuff we expect agents to configure vs an actual human who wants to fine-tune the index.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A POC to see the feasibility to create search indexes and production code are likely to have different requirements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • typeSets is still in preview, so I'm leaving it out for now.
  • numPartitions was added
  • searchAnalyzer and analyzers are out of scope per search team's recommendation

I'm leaving the following out of this PR for now as they have more complex schema and have been more error prone when testing:

  • synonyms
  • storedSources

We can add them as we move toward GA and build out a more comprehensive testing suite.

Copy link
Collaborator

@kmruiz kmruiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation lacks support for multiple important fields, and it should be discussed if we want to support them or not.

z.string().describe("The field name"),
z
.object({
type: z
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Objects will require additional fields depending on the type. I know passthrough will keep them, but we should document them so the agent knows which ones to use and how. For example, autocomplete supports defining a custom analyzer, how to tokenize (which is really important) and similarity functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact shape is extremely complex to represent in a json schema. I'm worried that being overly specific will result in this being more harmful than helpful, especially if we expect the majority of the use cases to revolve around just specifying the type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the schema is complicated, it has a lot of options that are not compatible even between them. We should have proper documentation of which ones we want to expose and which ones not, something that we haven't discussed yet because supporting the most used bits of Atlas Search is already a substantial effort.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this out of scope for now and we can decide how much detail to provide as we build out a more comprehensive testing suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants