fix(mcp): fix GetDocumentByUrl and AnalyzeDocumentStructure returning 'Document not found' by theletterf · Pull Request #2727 · elastic/docs-builder

theletterf · 2026-02-18T08:52:29Z

Issue

Closes #2722.

GetDocumentByUrl and AnalyzeDocumentStructure returned "Document not found" for every URL, including paths returned by other MCP tools such as SemanticSearch.

Root cause

Two independent bugs in DocumentGateway:

1. Non-existent Elasticsearch field (url.keyword)

The query used .Suffix("keyword") to construct the field path url.keyword. However, the index mapping defines url as type: keyword directly — its only sub-fields are url.match and url.prefix. The field url.keyword does not exist, so the Term query produced zero hits for every input.

Fix: remove .Suffix("keyword") and query the url field directly.

2. No URL normalization

The URL was passed to the Term query unchanged. The index stores path-only values like /docs/deploy-manage/api-keys, so full URLs such as https://www.elastic.co/docs/deploy-manage/api-keys never matched, nor did bare paths without a leading slash.

Fix: added NormalizeUrl, which:

Parses absolute URLs and extracts the path component via Uri.AbsolutePath.
Ensures a leading slash on relative paths.
Strips trailing slashes.

Changes

DocumentGateway.GetByUrlAsync / GetStructureAsync: fix field reference, add normalization call.
DocumentGateway.NormalizeUrl: new private static helper.
DocumentTools: updated parameter descriptions to document accepted URL formats.

Trade-offs

The normalizer is deliberately minimal: it does not validate that the path starts with /docs/, because path prefixes may change across deployments and the gateway should not encode that assumption.
Fragment identifiers (#heading) in absolute URLs are silently dropped by Uri.AbsolutePath; this is the correct behavior since the index does not store fragment-level granularity.

LLM usage

This fix was developed with Claude 4.6 Sonnet and Cursor.

Made with Cursor

… "Document not found" Two bugs caused all URL lookups to fail: 1. The Elasticsearch query used `url.keyword` via `.Suffix("keyword")`, but the index mapping defines `url` as `type: keyword` directly (with sub-fields `url.match` and `url.prefix`). The `url.keyword` path does not exist, so the term query matched nothing. Fixed by querying `url` directly. 2. No URL normalization was applied before querying, so full URLs such as `https://www.elastic.co/docs/deploy-manage/api-keys` never matched the path-only values stored in the index. Added a `NormalizeUrl` helper that extracts the path from absolute URLs, ensures a leading slash, and strips trailing slashes. The tool parameter descriptions are updated to document the accepted formats. Closes #2722 Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Extracts NormalizeUrl as internal static and adds InternalsVisibleTo so it can be tested without an Elasticsearch connection. Covers all URL formats reported in issue #2722: path-only, bare path, full https URL, preview URL, trailing slash, query string, fragment, and leading/trailing whitespace. Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>

theletterf requested a review from a team as a code owner February 18, 2026 08:52

theletterf requested a review from reakaleek February 18, 2026 08:52

theletterf self-assigned this Feb 18, 2026

theletterf added the fix label Feb 18, 2026

reakaleek approved these changes Feb 18, 2026

View reviewed changes

theletterf enabled auto-merge (squash) February 18, 2026 09:03

theletterf merged commit 5fef306 into main Feb 18, 2026
30 checks passed

theletterf deleted the fix/mcp-document-url-lookup branch February 18, 2026 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix(mcp): fix GetDocumentByUrl and AnalyzeDocumentStructure returning 'Document not found'#2727

fix(mcp): fix GetDocumentByUrl and AnalyzeDocumentStructure returning 'Document not found'#2727
theletterf merged 2 commits intomainfrom
fix/mcp-document-url-lookup

theletterf commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

theletterf commented Feb 18, 2026

Issue

Root cause

Changes

Trade-offs

LLM usage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants