Skip to content

Conversation

@cotti
Copy link
Contributor

@cotti cotti commented Nov 17, 2025

Closes #2159

This PR adjusts the following parameters for lexical searches:

  • abstract and stripped_body now use the synonyms_analyzer;
  • A MatchPhrasePrefixQuery for synonym-compatible title searches was added to the lexical retriever, at a high priority - but lower than direct matches;
  • A custom tokenizer is now provided to handle whitespace and most token-splitting scenarios, while keeping current use cases in synonyms.yml intact;
  • Document headings have been added to the available data;
  • abstract, stripped_body and headings all have explicit AND/OR variants, with the AND versions having a higher boost

@cotti cotti self-assigned this Nov 17, 2025
@cotti cotti requested a review from a team as a code owner November 17, 2025 21:17
@cotti cotti requested a review from Mpdreamz November 17, 2025 21:17
@cotti cotti added the fix label Nov 17, 2025
@cotti cotti linked an issue Nov 17, 2025 that may be closed by this pull request

var lexicalSearchRetriever =
((Query)new PrefixQuery(Infer.Field<DocumentDto>(f => f.Title.Suffix("keyword")), searchQuery) { Boost = 10.0f, CaseInsensitive = true }
|| new MatchPhrasePrefixQuery(Infer.Field<DocumentDto>(f => f.Title), searchQuery) { Boost = 9.0f }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for a follow up but jotting thoughts down here, we probably do not want to have all these prefix queries score so high.

I think we should set up an explicit completion fields:

For general purpose: https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/search-as-you-type

And explicit .prefix multifields on title and maybe headers using (edge)NGram tokenizer (or use the one provided by the search_as_you_type field).

@cotti cotti merged commit c1c32ba into main Nov 18, 2025
25 checks passed
@cotti cotti deleted the fix/synonym_tweaks branch November 18, 2025 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix synonyms in search

4 participants