Introduce config/search.yml and ensure all relevance tests pass. #2302
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Renames
config/synonyms.ymltoconfig/search.ymland expands it to be the central place for all search-related configuration.This consolidates all search configuration into a single
search.ymlfile - synonyms, negative boosting terms, and a new query rules system. Having everything in one place makes it easier to reason about and tune search behavior.The big addition here is query rules support. Sometimes our relevance scoring is almost right but not quite - like when searching for "logstash" returns the Logstash extensions page instead of the main Logstash reference (both are equally valid from a scoring perspective, but users expect the reference page first). Rather than constantly tweaking boost values and hoping for the best, we can now explicitly pin specific results for specific queries.
We're also moving the
diminish_termsconfig (those terms that get negative-boosted like "plugin", "client", "glossary") out of the code and into the config file where it belongs.PUT /_query_rules/docs-ruleset-{env}) during indexing, then applied at search time viaRuleQueryThe rules support
exact,fuzzy, andprefixmatching on the query string, withpinnedorexcludeactions. We're starting conservatively with just two pinned rules for data-streams and logstash - the goal is to use these sparingly when relevance tuning alone can't solve the problem.