Continuing tweaking search relevance #2300

Mpdreamz · 2025-12-02T19:27:32Z

Summary

Add rank_feature scoring for navigation structure - Landing/overview pages now score better based on their navigation depth and position. navigation_depth and
navigation_table_of_contents fields are indexed as rank_feature with negative impact (shallower = better). Reference and getting-started sections get slight boosts.
Improve synonym handling - Split synonyms into index-time and search-time. Key terms like esql, data-streams, and machine-learning now use explicit contraction rules (e.g.,
es|ql => esql, data stream => data-streams) applied at index time for consistent matching.
Tune lexical query scoring - Wrap title completion match in ConstantScoreQuery to prevent high TF/IDF from dominating. Reduce body match boost. Add phrase matching for 3+
token queries. Remove redundant title tokens from search_title to avoid inflating TF.

Test plan

Run existing search relevance tests - 14 new test cases added covering single-term product searches (elasticsearch, kibana, logstash, etc.), synonyms (ml, esql), and longer
queries
Verify landing pages like /docs/reference/elasticsearch rank first for "elasticsearch"
Verify synonym searches like "ml" and "machine learning" return same top result
Confirm deeper nested pages don't outrank overview pages for generic terms

This is not complete but ready to be merged, this is a another step closer to increase relevance in continutation of #2279

Not all relevance test pass yet:

q=datastreams

Expected: /docs/manage-data/data-store/data-streams
  - Score: 7.5496
  - Matched: True

Actual: /docs/manage-data/lifecycle/data-stream
  - Score: 7.5838
  - Matched: True

It's close but the top result is not quite what I want.

q=logstash

Expected: /docs/reference/logstash
  - Score: 7.8765
  - Matched: True

Actual: /docs/release-notes/logstash
  - Score: 7.9206
  - Matched: True

Similar its close but because logstash reference is nested deeper than its release-notes its not at in the number 1 spot.

Will continue to follow up with this.

reakaleek · 2025-12-03T08:53:57Z

src/api/Elastic.Documentation.Api.Infrastructure/Adapters/Search/ElasticsearchGateway.cs

 			{
-				Query = "plugin client integration", Operator = Operator.Or, Fields = new[] { "search_title", "headings", "url.match" }
-			}
+				Query = "plugin client integration glossary", Operator = Operator.Or, Fields = new[] { "search_title", "url.match" }


This is interesting.

Can you explain why this is needed? What does "plugin client integration glossary" mean? Why these terms?

This it to ensure a query for logstash X penalizes logstash plugin documentation over regular logstash docs unless plugin is part of the query. Similar thing for client and integration.

Glossary gets penalized because if its part of the title it means its a page with a lot of interesting keywords that we don't necessary want to always have take top spot.

This list will be configurable, refactoring config/synonyms.yml to config/search.yml as we speak :)

Continuing tweaking search relevance

0e0fe5a

Mpdreamz requested review from a team as code owners December 2, 2025 19:27

Mpdreamz added the fix label Dec 2, 2025

Mpdreamz self-assigned this Dec 2, 2025

Mpdreamz requested a review from cotti December 2, 2025 19:27

update test assertion

7b3077c

reakaleek approved these changes Dec 3, 2025

View reviewed changes

reakaleek reviewed Dec 3, 2025

View reviewed changes

Mpdreamz merged commit 7c5ec51 into main Dec 3, 2025
28 checks passed

Mpdreamz deleted the fix/search-relevance-continued branch December 3, 2025 09:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Continuing tweaking search relevance #2300

Continuing tweaking search relevance #2300

Uh oh!

Mpdreamz commented Dec 2, 2025 •

edited

Loading

Uh oh!

reakaleek Dec 3, 2025

Uh oh!

Mpdreamz Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Continuing tweaking search relevance #2300

Continuing tweaking search relevance #2300

Uh oh!

Conversation

Mpdreamz commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

q=datastreams

q=logstash

Uh oh!

reakaleek Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Mpdreamz Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mpdreamz commented Dec 2, 2025 •

edited

Loading