Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

odd scoring behaviour / inconsistent scoring #3521

Closed
lmenezes opened this Issue Aug 16, 2013 · 11 comments

Comments

Projects
None yet
3 participants
@lmenezes
Copy link
Contributor

lmenezes commented Aug 16, 2013

unfortunately i wasn't able to reproduce this in the usual way, but here is the case:

I have a query such as

{ "from": 0, "size": 10, "query": { "bool": {  "must": [ {"match_all":{}},{ "constant_score": { "filter": { "terms": { "id": [...] } } } } ] } }, "fields": "", "sort": [ { "_score": {} }, { "id": { "order": "desc" } } ] }

to the terms filter, i pass a list of ids(anywhere between 1 and 200k unique ids).

when executing this query multiple times i get different results. so, investigating a little i traced it to the score not being constant sometimes(not what i expected at all).

i ran the same query a few times with explain set to true and getting only the last document, and here is what i got:

_explanation: {
value: 1.264911
description: sum of:
details: [
{
value: 0.94868326
description: ConstantScore(*:*)^3.0, product of:
details: [
{
value: 3
description: boost
}
{
value: 0.31622776
description: queryNorm
}
]
}
{
value: 0.31622776
description: ConstantScore...

and then

_explanation: {
value: 1.4142135
description: sum of:
details: [
{
value: 0.70710677
description: ConstantScore(*:*), product of:
details: [
{
value: 1
description: boost
}
{
value: 0.70710677
description: queryNorm
}
]
}
{
value: 0.70710677
description: ConstantScore...

so, here i would expect this query to ALWAYS have the same score, and also, that every document scores exactly the same.
it could even seem ok if the score wasnt constant across requests, but not really that documents score differently.

  • i do know i don't need the "match_all" query, but that's the way i managed to reproduce it on our cluster. without that, the score for the documents would always be 1 and i could not reproduce this behaviour.

** hope thats clear enough... but let me know if you need more info, or even the complete output for the explain(its pretty big)

@ghost ghost assigned s1monw Aug 16, 2013

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Aug 16, 2013

hey @lmenezes, I am afraid but this is the expected behavior. there is a lot going on in this boolean query construct that depends on a number of factors. the only guarantee here is that it will be the same score for all docs. Yet the eventual score and the query norm depend on your similarity, with a similarity that doesn not modify the query norm I guess it'd be 1.0 across the board.
One thing you can do is to wram top level query in a constant score that should give you a score of 1.0

@s1monw s1monw closed this Aug 16, 2013

@lmenezes

This comment has been minimized.

Copy link
Contributor Author

lmenezes commented Aug 16, 2013

@s1monw that's the thing... i don't get a constant score for all documents. i do understand(also, don't care. not interested in the score for this case) that running the same query multiple times might result in different scores. but, for the same execution the documents should all score the same, right? if so, then there is something wrong here.

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Aug 16, 2013

I just wrote a tests for this and I actually get back scores for all docs taht are consistent. I might not understand your problem here. you run this query and two docs get different scores?

@lmenezes

This comment has been minimized.

Copy link
Contributor Author

lmenezes commented Aug 16, 2013

i tried pretty hard myself to write an example that worked(got 2 documents with different scores) here and wasn't able to.
anyway, here is the response from a single query with the behavior i'm trying do describe: https://gist.github.com/lmenezes/6249787

i could not reproduce that into my staging environment, only on live. the difference between staging and live at the moment, is that staging is not getting updates and has no replicas(if this info might help).

would setting explain -> true, and getting all the results help? its a pretty big response...

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Aug 16, 2013

@lmenezes any idea where the boost comes from that is shown in your response?

@lmenezes

This comment has been minimized.

Copy link
Contributor Author

lmenezes commented Aug 16, 2013

you mean this: ConstantScore(:)^3 right? if so, no idea. actually i'm executing the query from a file, so i know its always the same.

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Aug 16, 2013

@lmenezes check your email

s1monw added a commit to s1monw/elasticsearch that referenced this issue Aug 16, 2013

Removed static versions of MatchAllDocsQuery
If a static cached version of MatchAllDocsQuery is used through for
instanst the `query_string` together with a boost like `*:*^2.0` the
globally used version is modified since queries are not immutable and it's
boost variable can change at any time. Holding on to queries that are modifiable
is risky and should not be done in a global scope.
This commit also adds tests for constant scores from `constant_score` query.

Closes elastic#3521

@s1monw s1monw reopened this Aug 16, 2013

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Aug 16, 2013

this is more sneaky than I though... we are modifying a cached version of match all docs in the query parser... a fix is attached

@lmenezes

This comment has been minimized.

Copy link
Contributor Author

lmenezes commented Aug 16, 2013

looks good. we currently fixed using a default boost != 1.0 for the matchall queries and will remove when updating to next version.

@s1monw s1monw closed this in b11f81d Aug 16, 2013

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Aug 16, 2013

FYI - this can be triggered from a user query via query_string ie. (*:*)^3.0 will leave a boost behind on the shards it hits.

s1monw added a commit that referenced this issue Aug 16, 2013

Removed static versions of MatchAllDocsQuery
If a static cached version of MatchAllDocsQuery is used through for
instanst the `query_string` together with a boost like `*:*^2.0` the
globally used version is modified since queries are not immutable and it's
boost variable can change at any time. Holding on to queries that are modifiable
is risky and should not be done in a global scope.
This commit also adds tests for constant scores from `constant_score` query.

Closes #3521
@lmenezes

This comment has been minimized.

Copy link
Contributor Author

lmenezes commented Aug 17, 2013

great :)

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Removed static versions of MatchAllDocsQuery
If a static cached version of MatchAllDocsQuery is used through for
instanst the `query_string` together with a boost like `*:*^2.0` the
globally used version is modified since queries are not immutable and it's
boost variable can change at any time. Holding on to queries that are modifiable
is risky and should not be done in a global scope.
This commit also adds tests for constant scores from `constant_score` query.

Closes elastic#3521
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.