Skip to content

Conversation

@Mpdreamz
Copy link
Member

@Mpdreamz Mpdreamz commented Dec 2, 2025

Summary

  • Add rank_feature scoring for navigation structure - Landing/overview pages now score better based on their navigation depth and position. navigation_depth and
    navigation_table_of_contents fields are indexed as rank_feature with negative impact (shallower = better). Reference and getting-started sections get slight boosts.
  • Improve synonym handling - Split synonyms into index-time and search-time. Key terms like esql, data-streams, and machine-learning now use explicit contraction rules (e.g.,
    es|ql => esql, data stream => data-streams) applied at index time for consistent matching.
  • Tune lexical query scoring - Wrap title completion match in ConstantScoreQuery to prevent high TF/IDF from dominating. Reduce body match boost. Add phrase matching for 3+
    token queries. Remove redundant title tokens from search_title to avoid inflating TF.

Test plan

  • Run existing search relevance tests - 14 new test cases added covering single-term product searches (elasticsearch, kibana, logstash, etc.), synonyms (ml, esql), and longer
    queries
  • Verify landing pages like /docs/reference/elasticsearch rank first for "elasticsearch"
  • Verify synonym searches like "ml" and "machine learning" return same top result
  • Confirm deeper nested pages don't outrank overview pages for generic terms

This is not complete but ready to be merged, this is a another step closer to increase relevance in continutation of #2279

Not all relevance test pass yet:

q=datastreams

Expected: /docs/manage-data/data-store/data-streams
  - Score: 7.5496
  - Matched: True

Actual: /docs/manage-data/lifecycle/data-stream
  - Score: 7.5838
  - Matched: True

It's close but the top result is not quite what I want.

q=logstash

Expected: /docs/reference/logstash
  - Score: 7.8765
  - Matched: True

Actual: /docs/release-notes/logstash
  - Score: 7.9206
  - Matched: True

Similar its close but because logstash reference is nested deeper than its release-notes its not at in the number 1 spot.

Will continue to follow up with this.

@Mpdreamz Mpdreamz requested review from a team as code owners December 2, 2025 19:27
@Mpdreamz Mpdreamz added the fix label Dec 2, 2025
@Mpdreamz Mpdreamz self-assigned this Dec 2, 2025
@Mpdreamz Mpdreamz requested a review from cotti December 2, 2025 19:27
{
Query = "plugin client integration", Operator = Operator.Or, Fields = new[] { "search_title", "headings", "url.match" }
}
Query = "plugin client integration glossary", Operator = Operator.Or, Fields = new[] { "search_title", "url.match" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting.

Can you explain why this is needed? What does "plugin client integration glossary" mean? Why these terms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This it to ensure a query for logstash X penalizes logstash plugin documentation over regular logstash docs unless plugin is part of the query. Similar thing for client and integration.

Glossary gets penalized because if its part of the title it means its a page with a lot of interesting keywords that we don't necessary want to always have take top spot.

This list will be configurable, refactoring config/synonyms.yml to config/search.yml as we speak :)

@Mpdreamz Mpdreamz merged commit 7c5ec51 into main Dec 3, 2025
28 checks passed
@Mpdreamz Mpdreamz deleted the fix/search-relevance-continued branch December 3, 2025 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants