Support multiple rescores #4749

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
3 participants
@nik9000
Contributor

nik9000 commented Jan 16, 2014

Support multiple rescores

Detects if rescores arrive as an array instead of a plain object. If so
then parse each element of the array as a separate rescore to be executed
one after another. It looks like this:

   "rescore" : [ {
      "window_size" : 100,
      "query" : {
         "rescore_query" : {
            "match" : {
               "field1" : {
                  "query" : "the quick brown",
                  "type" : "phrase",
                  "slop" : 2
               }
            }
         },
         "query_weight" : 0.7,
         "rescore_query_weight" : 1.2
      }
   }, {
      "window_size" : 10,
      "query" : {
         "score_mode": "multiply",
         "rescore_query" : {
            "function_score" : {
               "script_score": {
                  "script": "log10(doc['numeric'].value + 2)"
               }
            }
         }
      }
   } ]

Rescores as a single object are still supported.

Also add documentation on score_mode when adding documentation about multiple
rescores.

Closes #4748
Closes #4742

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 16, 2014

Contributor

This isn't quite ready but it is worth reviewing I think. TODO:

  1. Test the explanation of multiple rescores.
  2. Documentation.
  3. Rest testing?
Contributor

nik9000 commented Jan 16, 2014

This isn't quite ready but it is worth reviewing I think. TODO:

  1. Test the explanation of multiple rescores.
  2. Documentation.
  3. Rest testing?
@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 16, 2014

Contributor

Added test for explanation. It caught that I was building the explanations backwards so the first rescore looked like it processed output from the second when in fact the opposite is true.

Contributor

nik9000 commented Jan 16, 2014

Added test for explanation. It caught that I was building the explanations backwards so the first rescore looked like it processed output from the second when in fact the opposite is true.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 16, 2014

Contributor

Wait! I had it backwards! The code was right and the test was wrong. Fixed.

Contributor

nik9000 commented Jan 16, 2014

Wait! I had it backwards! The code was right and the test was wrong. Fixed.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 16, 2014

Contributor

Added documentation and while I was in there documentation for score_mode.

Contributor

nik9000 commented Jan 16, 2014

Added documentation and while I was in there documentation for score_mode.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 16, 2014

Contributor

I don't think I have anything to do for the rest testing/rest spec because it just describes the body of the search request as "The search definition using the Query DSL" which I think the asciidoc covers.

Contributor

nik9000 commented Jan 16, 2014

I don't think I have anything to do for the rest testing/rest spec because it just describes the body of the search request as "The search definition using the Query DSL" which I think the asciidoc covers.

@jpountz

View changes

docs/reference/search/request/rescore.asciidoc
-linearly to produce the final `_score` for each document. The relative
-importance of the original query and of the rescore query can be
-controlled with the `query_weight` and `rescore_query_weight`
+By defaulthe scores from the original query and the rescore query are

This comment has been minimized.

@jpountz

jpountz Jan 16, 2014

Contributor

space missing between 'default' and 'the'

@jpountz

jpountz Jan 16, 2014

Contributor

space missing between 'default' and 'the'

@jpountz

View changes

docs/reference/search/request/rescore.asciidoc
-linearly to produce the final `_score` for each document. The relative
-importance of the original query and of the rescore query can be
-controlled with the `query_weight` and `rescore_query_weight`
+By defaul the scores from the original query and the rescore query are

This comment has been minimized.

@jpountz

jpountz Jan 16, 2014

Contributor

"default" :)

@jpountz

jpountz Jan 16, 2014

Contributor

"default" :)

This comment has been minimized.

@nik9000

nik9000 Jan 16, 2014

Contributor

I can't type. Fixed....

@nik9000

nik9000 Jan 16, 2014

Contributor

I can't type. Fixed....

@jpountz

View changes

src/main/java/org/elasticsearch/action/search/SearchRequestBuilder.java
+ public SearchRequestBuilder clearRescorers() {
+ sourceBuilder().clearRescore();
+ return this;
+ }

This comment has been minimized.

@jpountz

jpountz Jan 16, 2014

Contributor

Should it be setRescorer(Rescorer...) instead of set/add/clear? (just wondering)

@jpountz

jpountz Jan 16, 2014

Contributor

Should it be setRescorer(Rescorer...) instead of set/add/clear? (just wondering)

This comment has been minimized.

@nik9000

nik9000 Jan 16, 2014

Contributor

Hmmmmmm.... I'm not sure setRescorer(Rescorer...) works because you also need to be able to set the window. I tried to come up with something that is compatible with what we have now but I'm not in love with it.

@nik9000

nik9000 Jan 16, 2014

Contributor

Hmmmmmm.... I'm not sure setRescorer(Rescorer...) works because you also need to be able to set the window. I tried to come up with something that is compatible with what we have now but I'm not in love with it.

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Jan 16, 2014

Contributor

This looks good in general. However it seems to me that the second rescorer would be applied to all top docs but not only the first 10? (QueryRescorer.rescore rescores all top docs which was probably ok when there could be a single rescorer but now that there should be several ones, I think the TopDocsFilter should take the window size into account?)

Contributor

jpountz commented Jan 16, 2014

This looks good in general. However it seems to me that the second rescorer would be applied to all top docs but not only the first 10? (QueryRescorer.rescore rescores all top docs which was probably ok when there could be a single rescorer but now that there should be several ones, I think the TopDocsFilter should take the window size into account?)

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Jan 16, 2014

Contributor

cool stuff I like the feature!

Contributor

s1monw commented Jan 16, 2014

cool stuff I like the feature!

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 16, 2014

Contributor

@jpountz I'll have a look at that in a bit. I thought I had a test that checked that if the second window is smaller then the first and the first doesn't pull the match into the window then the second one doesn't see it.

Contributor

nik9000 commented Jan 16, 2014

@jpountz I'll have a look at that in a bit. I thought I had a test that checked that if the second window is smaller then the first and the first doesn't pull the match into the window then the second one doesn't see it.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 16, 2014

Contributor

@jpountz you were right of course. My test was actually backwards. It was making sure that the second rescore took effect when I wanted the opposite.

I've pushed a fix. I also did some reworking on QueryRescorer#rescore because it was a little twisted. The only real change is that TopDocsFilter now takes a maximum number of docs to filter and I set it to the rescore window. I also set the maximum number of docs returned by the searcher to the rescore window rather than the size of topdocs.

Contributor

nik9000 commented Jan 16, 2014

@jpountz you were right of course. My test was actually backwards. It was making sure that the second rescore took effect when I wanted the opposite.

I've pushed a fix. I also did some reworking on QueryRescorer#rescore because it was a little twisted. The only real change is that TopDocsFilter now takes a maximum number of docs to filter and I set it to the rescore window. I also set the maximum number of docs returned by the searcher to the rescore window rather than the size of topdocs.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 23, 2014

Contributor

Rebased. Is there anything else I should change in this?

Contributor

nik9000 commented Jan 23, 2014

Rebased. Is there anything else I should change in this?

@ghost ghost assigned jpountz Jan 23, 2014

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Jan 23, 2014

Contributor

This looks very good . My only concern right now is about the client API. Now that it is possible to have several rescorers per request, it feels wrong to me to have the setRescoreWindow method on SearchRequestBuilder. @s1monw what do you think?

Something that would be nice also would be to validate that rescore window sizes are in strictly descending order. Otherwise applying a rescorer that is followed by a rescorer with a greater window size would be useless I think?

Contributor

jpountz commented Jan 23, 2014

This looks very good . My only concern right now is about the client API. Now that it is possible to have several rescorers per request, it feels wrong to me to have the setRescoreWindow method on SearchRequestBuilder. @s1monw what do you think?

Something that would be nice also would be to validate that rescore window sizes are in strictly descending order. Otherwise applying a rescorer that is followed by a rescorer with a greater window size would be useless I think?

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Jan 23, 2014

Contributor

IMO the rescore window should be max(default_window_size, max_rescorer_window where we can setRescoreWindow as a default so we don't need to specify it everywhere? and I agree we should sort the rescorer by windows size!

Contributor

s1monw commented Jan 23, 2014

IMO the rescore window should be max(default_window_size, max_rescorer_window where we can setRescoreWindow as a default so we don't need to specify it everywhere? and I agree we should sort the rescorer by windows size!

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 23, 2014

Contributor

Something that would be nice also would be to validate that rescore window sizes are in strictly descending order. Otherwise applying a rescorer that is followed by a rescorer with a greater window size would be useless I think?

I was thinking it might be nice to have a multiply rescore with a big window after a total rescore with a smaller window. The multiply must come after so you multiply the totalled score. The total has a smaller window because it is more expensive then the multiply.

setRescoreWindow as a default

I'll make this change and see what it looks like.

Contributor

nik9000 commented Jan 23, 2014

Something that would be nice also would be to validate that rescore window sizes are in strictly descending order. Otherwise applying a rescorer that is followed by a rescorer with a greater window size would be useless I think?

I was thinking it might be nice to have a multiply rescore with a big window after a total rescore with a smaller window. The multiply must come after so you multiply the totalled score. The total has a smaller window because it is more expensive then the multiply.

setRescoreWindow as a default

I'll make this change and see what it looks like.

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Jan 23, 2014

Contributor

I was thinking it might be nice to have a multiply rescore with a big window after a total rescore with a smaller window. The multiply must come after so you multiply the totalled score. The total has a smaller window because it is more expensive then the multiply.

Agreed, let's not check the window sizes in order to allow for this kind of usage.

Contributor

jpountz commented Jan 23, 2014

I was thinking it might be nice to have a multiply rescore with a big window after a total rescore with a smaller window. The multiply must come after so you multiply the totalled score. The total has a smaller window because it is more expensive then the multiply.

Agreed, let's not check the window sizes in order to allow for this kind of usage.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 23, 2014

Contributor

Added another commit to make setRescoreWindow set a default and to use set/add instead of set/next. If you like it I'll squash the changes together. I didn't want to have to do git backflips to get back to the old api in case this one doesn't make sense.

Contributor

nik9000 commented Jan 23, 2014

Added another commit to make setRescoreWindow set a default and to use set/add instead of set/next. If you like it I'll squash the changes together. I didn't want to have to do git backflips to get back to the old api in case this one doesn't make sense.

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Jan 23, 2014

Contributor

Looks good to me. I'm going to merge this PR if there are no objections.

Contributor

jpountz commented Jan 23, 2014

Looks good to me. I'm going to merge this PR if there are no objections.

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 23, 2014

Contributor

Cool. Want me to squash the commits or will you handle it?

Contributor

nik9000 commented Jan 23, 2014

Cool. Want me to squash the commits or will you handle it?

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Jan 23, 2014

Contributor

LGTM - would be awesome if we could have an issue for this as well to mark the versions etc. otherwise +1 to the feature thanks nik!

Contributor

s1monw commented Jan 23, 2014

LGTM - would be awesome if we could have an issue for this as well to mark the versions etc. otherwise +1 to the feature thanks nik!

@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 23, 2014

Contributor

awesome if we could have an issue for this as well to mark the versions

Is #4748 what you need?

Contributor

nik9000 commented Jan 23, 2014

awesome if we could have an issue for this as well to mark the versions

Is #4748 what you need?

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Jan 23, 2014

Contributor

yeah @jpountz made see it too :) sorry for the noise! ;)

Contributor

s1monw commented Jan 23, 2014

yeah @jpountz made see it too :) sorry for the noise! ;)

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Jan 23, 2014

Contributor

Want me to squash the commits or will you handle it?

Actually what would be nice would be to split the change into one commit for documentation of score mode that I'll merge into 1.0,1.x and master and the rest that I'll merge into 1.x and master.

Contributor

jpountz commented Jan 23, 2014

Want me to squash the commits or will you handle it?

Actually what would be nice would be to split the change into one commit for documentation of score mode that I'll merge into 1.0,1.x and master and the rest that I'll merge into 1.x and master.

Support multiple rescores
Detects if rescores arrive as an array instead of a plain object.  If so
then parse each element of the array as a separate rescore to be executed
one after another.  It looks like this:
   "rescore" : [ {
      "window_size" : 100,
      "query" : {
         "rescore_query" : {
            "match" : {
               "field1" : {
                  "query" : "the quick brown",
                  "type" : "phrase",
                  "slop" : 2
               }
            }
         },
         "query_weight" : 0.7,
         "rescore_query_weight" : 1.2
      }
   }, {
      "window_size" : 10,
      "query" : {
         "score_mode": "multiply",
         "rescore_query" : {
            "function_score" : {
               "script_score": {
                  "script": "log10(doc['numeric'].value + 2)"
               }
            }
         }
      }
   } ]

Rescores as a single object are still supported.

Closes #4748
@nik9000

This comment has been minimized.

Show comment
Hide comment
@nik9000

nik9000 Jan 23, 2014

Contributor

Done.

Contributor

nik9000 commented Jan 23, 2014

Done.

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Jan 23, 2014

Contributor

Merged, thanks again Nik!

Contributor

jpountz commented Jan 23, 2014

Merged, thanks again Nik!

@jpountz jpountz closed this Jan 23, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment