Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using synonym_graph force elastic to double score the document #28982

Open
ahmadazimi opened this issue Mar 11, 2018 · 9 comments
Open

using synonym_graph force elastic to double score the document #28982

ahmadazimi opened this issue Mar 11, 2018 · 9 comments
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@ahmadazimi
Copy link

ahmadazimi commented Mar 11, 2018

Elasticsearch version: 6.2.2, Build: 10b1edd/2018-02-16T19:01:30.685723Z

Plugins installed: []

JVM version: 1.8.0_144

OS version: Ubuntu, Linux 4.4.0-104-generic

When I use synonym_graph in search time analyzer, some words which has more than one segments for example coffee shop treated as two words and make score double!

I defined coffee shop as a synonym of cafe, then when I search for cafe all documents which has coffee shop in their titles have greater scores than same documents which have cafe in their titles (about 2 times greater).

I've used Explain Api and found these scores returned by elastic:

For a document with coffee shop in its title, sum of:
59.249336 weight(search:coffee in 9429) [PerFieldSimilarity]
63.80951 weight(search:shop in 9429) [PerFieldSimilarity]

And for another document with cafe in its title:
34.8931 weight(search:cafe in 4409) [PerFieldSimilarity]

Is this a bug in synonym_graph or I had a mistake?

PS: all other keywords for these two documents are same.

@ahmadazimi ahmadazimi changed the title using synonym_graph force elastic to double score doc! using synonym_graph force elastic to double score the document Mar 11, 2018
@cbuescher cbuescher added the :Search/Search Search-related issues that do not fall into other categories label Mar 15, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@ahmadazimi
Copy link
Author

Aby update around this issue?

@ahmadazimi
Copy link
Author

Yesterday I've found another bad unacceptable issue which it seems like above issue (synonym_graph).
Imagine you have two documents: Emmy Cafe and Emmy Coffee Wholesale Shop, so you define coffee shop as a synonym of cafe via synonym_graph.
Now when you search for cafe, second document which has coffee and shop in its title get an score about two times greater than the first document and always be the first result in the result set.
PS. norms set to false in mapping for search field.

@colings86
Copy link
Contributor

@romseygeek could you take a look at this?

@jimczi
Copy link
Contributor

jimczi commented Mar 19, 2018

I think it's reasonable to use a max disjunction for multi-terms synonyms, currently the scores of the matching synonyms are simply added but we should select the max score.
As @ahmadazimi reported in his last comment this is not enough since the scoring also depends on the number of terms in each variant. We have something in place for single term synonyms with the SynonymQuery but it would be difficult to generalize the idea with multi-terms.
Changing the query to use a max disjunction is trivial so we should start with that, this will already improves things. In the mean time we can think of a more general solution that would allow to produce a single score per synonym rule but that's not a low hanging fruit.

@ahmadazimi
Copy link
Author

So is there any easy way to handle it in the current version (6.2.2)?

@jimczi
Copy link
Contributor

jimczi commented Mar 19, 2018

No there is no workaround in the current version, we'll need a patch, first to select the best synonym score per document which as I said should be trivial to do and then work on a solution to produce similar scores for documents that match caffe and documents that match coffee shop. Though the latter is not something that we can do easily so I wouldn't expect a solution anytime soon.

@pierremalletneo9
Copy link

Hello, is the first step for the solution explained by jimczi will be implemented in elasticsearch in the current 7.X version? Currently, the scoring with multi-words synonyms it a bit hard to work with.
Thanks!

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@BenjD90
Copy link

BenjD90 commented Jun 21, 2022

Hello @jimczi,

Is there any space in your roadmap for this improvement ?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

9 participants