Skip to content

Comments

fix(mcp): fix GetDocumentByUrl and AnalyzeDocumentStructure returning 'Document not found'#2727

Merged
theletterf merged 2 commits intomainfrom
fix/mcp-document-url-lookup
Feb 18, 2026
Merged

fix(mcp): fix GetDocumentByUrl and AnalyzeDocumentStructure returning 'Document not found'#2727
theletterf merged 2 commits intomainfrom
fix/mcp-document-url-lookup

Conversation

@theletterf
Copy link
Contributor

Issue

Closes #2722.

GetDocumentByUrl and AnalyzeDocumentStructure returned "Document not found" for every URL, including paths returned by other MCP tools such as SemanticSearch.

Root cause

Two independent bugs in DocumentGateway:

1. Non-existent Elasticsearch field (url.keyword)

The query used .Suffix("keyword") to construct the field path url.keyword. However, the index mapping defines url as type: keyword directly — its only sub-fields are url.match and url.prefix. The field url.keyword does not exist, so the Term query produced zero hits for every input.

Fix: remove .Suffix("keyword") and query the url field directly.

2. No URL normalization

The URL was passed to the Term query unchanged. The index stores path-only values like /docs/deploy-manage/api-keys, so full URLs such as https://www.elastic.co/docs/deploy-manage/api-keys never matched, nor did bare paths without a leading slash.

Fix: added NormalizeUrl, which:

  • Parses absolute URLs and extracts the path component via Uri.AbsolutePath.
  • Ensures a leading slash on relative paths.
  • Strips trailing slashes.

Changes

  • DocumentGateway.GetByUrlAsync / GetStructureAsync: fix field reference, add normalization call.
  • DocumentGateway.NormalizeUrl: new private static helper.
  • DocumentTools: updated parameter descriptions to document accepted URL formats.

Trade-offs

  • The normalizer is deliberately minimal: it does not validate that the path starts with /docs/, because path prefixes may change across deployments and the gateway should not encode that assumption.
  • Fragment identifiers (#heading) in absolute URLs are silently dropped by Uri.AbsolutePath; this is the correct behavior since the index does not store fragment-level granularity.

LLM usage

This fix was developed with Claude 4.6 Sonnet and Cursor.

Made with Cursor

… "Document not found"

Two bugs caused all URL lookups to fail:

1. The Elasticsearch query used `url.keyword` via `.Suffix("keyword")`, but the
   index mapping defines `url` as `type: keyword` directly (with sub-fields
   `url.match` and `url.prefix`). The `url.keyword` path does not exist, so
   the term query matched nothing. Fixed by querying `url` directly.

2. No URL normalization was applied before querying, so full URLs such as
   `https://www.elastic.co/docs/deploy-manage/api-keys` never matched the
   path-only values stored in the index. Added a `NormalizeUrl` helper that
   extracts the path from absolute URLs, ensures a leading slash, and strips
   trailing slashes.

The tool parameter descriptions are updated to document the accepted formats.

Closes #2722

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@theletterf theletterf requested a review from a team as a code owner February 18, 2026 08:52
@theletterf theletterf requested a review from reakaleek February 18, 2026 08:52
@theletterf theletterf self-assigned this Feb 18, 2026
@theletterf theletterf added the fix label Feb 18, 2026
Extracts NormalizeUrl as internal static and adds InternalsVisibleTo so it
can be tested without an Elasticsearch connection. Covers all URL formats
reported in issue #2722: path-only, bare path, full https URL, preview URL,
trailing slash, query string, fragment, and leading/trailing whitespace.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@theletterf theletterf enabled auto-merge (squash) February 18, 2026 09:03
@theletterf theletterf merged commit 5fef306 into main Feb 18, 2026
30 checks passed
@theletterf theletterf deleted the fix/mcp-document-url-lookup branch February 18, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Elastic Docs MCP: GetDocumentByUrl and AnalyzeDocumentStructure return 'Document not found' for all URL formats

2 participants