Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rescore collapsed documents #28521

Merged
merged 14 commits into from Mar 4, 2018

Conversation

Projects
None yet
4 participants
@fred84
Copy link
Contributor

commented Feb 5, 2018

Add support for rescoring collapsed docs (#27243). Documents at first get collapsed and then rescored.

@jimczi please take a look

fred84 added some commits Jan 30, 2018

@elasticmachine

This comment has been minimized.

Copy link
Collaborator

commented Feb 5, 2018

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine

This comment has been minimized.

Copy link
Collaborator

commented Feb 5, 2018

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@jimczi

This comment has been minimized.

Copy link
Member

commented Feb 14, 2018

@elasticmachine ok to test

@jimczi

jimczi approved these changes Feb 14, 2018

Copy link
Member

left a comment

It looks good to me.
I'll merge if the build passes with the changes, thanks @fred84 !

@fred84

This comment has been minimized.

Copy link
Contributor Author

commented Feb 14, 2018

@jimczi
Copy link
Member

left a comment

Thanks @fred84, you can remove the failing tests, it is no longer needed. Though I left some comments regarding the IT test. I think it needs to be changed to ensure that scoring and ordering are consistent.


SearchResponse searchResponse = client().prepareSearch("test")
.setTypes("type1")
.setQuery(new MatchQueryBuilder("name", "one"))

This comment has been minimized.

Copy link
@jimczi

jimczi Feb 14, 2018

Member

The score of this query depends on the number of shards, the default similarity, ... To make sure that we have consistent scoring you can use a function_score query like the following:

 QueryBuilder query = functionScoreQuery(
            termQuery("name", "one"),
            ScoreFunctionBuilders.fieldValueFactorFunction("my_static_doc_score") 
        ).boostMode(CombineFunction.REPLACE);

... and add the my_static_doc_score at indexing time.

This comment has been minimized.

Copy link
@fred84

fred84 Feb 27, 2018

Author Contributor

fixed

SearchResponse searchResponse = client().prepareSearch("test")
.setTypes("type1")
.setQuery(new MatchQueryBuilder("name", "one"))
.addRescorer(new QueryRescorerBuilder(new MatchQueryBuilder("name", "two")))

This comment has been minimized.

Copy link
@jimczi

jimczi Feb 14, 2018

Member

You can use the same for the rescore with another field for instance

This comment has been minimized.

Copy link
@fred84

fred84 Feb 27, 2018

Author Contributor

fixed

@fred84

This comment has been minimized.

Copy link
Contributor Author

commented Feb 15, 2018

@jimczi Thanks for reviewing. I'll update PR next week.

@fred84

This comment has been minimized.

Copy link
Contributor Author

commented Feb 27, 2018

@jimczi PR updated, now integration test use static scoring.

@jimczi

This comment has been minimized.

Copy link
Member

commented Feb 28, 2018

@elasticmachine ok to test

@jimczi

jimczi approved these changes Mar 4, 2018

Copy link
Member

left a comment

Thanks @fred84

@jimczi jimczi merged commit f057fc2 into elastic:master Mar 4, 2018

2 checks passed

CLA Commit author has signed the CLA
Details
elasticsearch-ci Build finished.
Details

jimczi added a commit that referenced this pull request Mar 4, 2018

Rescore collapsed documents (#28521)
This change adds the ability to rescore collapsed documents.

@fred84 fred84 deleted the fred84:27243_collapse_with_rescore branch Mar 5, 2018

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Mar 7, 2018

Merge branch 'master' into unknown-or-invalid-settings-updates
* master:
  [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse
  Decouple XContentType from StreamInput/Output (elastic#28927)
  Remove BytesRef usage from XContentParser and its subclasses (elastic#28792)
  [DOCS] Correct typo in configuration (elastic#28903)
  Fix incorrect datemath example (elastic#28904)
  Add a usage example of the JLH score (elastic#28905)
  Wrap stream passed to createParser in try-with-resources (elastic#28897)
  Rescore collapsed documents (elastic#28521)
  Fix (simple)_query_string to ignore removed terms (elastic#28871)
  [Docs] Fix typo in composite aggregation (elastic#28891)
  Try if tombstone is eligable for pruning before locking on it's key (elastic#28767)

jimczi added a commit that referenced this pull request Mar 8, 2018

Revert "Rescore collapsed documents (#28521)"
This reverts commit f057fc2.
The rescorer does not resort the collapsed values inside the top docs
during rescoring. For this reason the Lucene rescorer is not compatible
with collapsing.
Relates #27243

jimczi added a commit that referenced this pull request Mar 8, 2018

Revert "Rescore collapsed documents (#28521)"
This reverts commit f057fc2.
The rescorer does not resort the collapsed values inside the top docs
during rescoring. For this reason the Lucene rescorer is not compatible
with collapsing.
Relates #27243
@jimczi

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

I had to revert this change since it doesn't work as expected. I forgot that the collapsed values would also need to be resorted by the rescorer. We use these values in the coordinating node to collapse the results of each shard but the rescorer in Lucene cannot access them:


I am really sorry I missed that but since it would require a rewriting of the rescorer in Lucene and that the collapsing code is only in es I don't think it is worth the effort.

@jimczi jimczi added won't fix and removed v6.3.0 v7.0.0 labels Mar 8, 2018

@rpedela

This comment has been minimized.

Copy link

commented Mar 8, 2018

Doesn't Solr support collapse + rescore (rerank)? The claim that Lucene's rescorer needs a rewrite seems dubious.

@jimczi jimczi removed the won't fix label Mar 8, 2018

@jimczi

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

I agree that we should be able to rescore collapsed documents but this is more high hanging fruit than I thought which is why I reverted and closed the issue for now (sorry @fred84 ).
The current design of the collapsing in es is not compatible with the rescorer and will require some internal refactoring. I've started to work on this refactoring and when it's ready we'll reevaluate this pr if @fred84 still wants to work on it ;).

@fred84

This comment has been minimized.

Copy link
Contributor Author

commented Mar 8, 2018

@jimczi let me now when I can start this issue again :)

martijnvg added a commit that referenced this pull request Mar 8, 2018

Merge remote-tracking branch 'es/master' into ccr
* es/master: (48 commits)
  Update bucket-sort-aggregation.asciidoc (#28937)
  [Docs] REST high-level client: Fix code for most basic search request (#28916)
  Improved percolator's random candidate query duel test and fixed bugs that were exposed by this:
  Revert "Rescore collapsed documents (#28521)"
  Build: Fix test logger NPE when no tests are run (#28929)
  [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse
  Decouple XContentType from StreamInput/Output (#28927)
  Remove BytesRef usage from XContentParser and its subclasses (#28792)
  [DOCS] Correct typo in configuration (#28903)
  Fix incorrect datemath example (#28904)
  Add a usage example of the JLH score (#28905)
  Wrap stream passed to createParser in try-with-resources (#28897)
  Rescore collapsed documents (#28521)
  Fix (simple)_query_string to ignore removed terms (#28871)
  [Docs] Fix typo in composite aggregation (#28891)
  Try if tombstone is eligable for pruning before locking on it's key (#28767)
  Limit analyzed text for highlighting (improvements) (#28808)
  Missing `timeout` parameter from the REST API spec JSON files (#28328)
  Clarifies how query_string splits textual part (#28798)
  Update outdated java version reference (#28870)
  ...

martijnvg added a commit that referenced this pull request Mar 8, 2018

Merge remote-tracking branch 'es/6.x' into ccr-6.x
* es/6.x: (48 commits)
  Update bucket-sort-aggregation.asciidoc (#28937)
  [Docs] REST high-level client: Fix code for most basic search request (#28916)
  Improved percolator's random candidate query duel test and fixed bugs that were exposed by this:
  Revert "Rescore collapsed documents (#28521)"
  Build: Fix test logger NPE when no tests are run (#28929)
  [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse
  Decouple XContentType from StreamInput/Output (#28927)
  Remove BytesRef usage from XContentParser and its subclasses (#28792)
  Add doc note for -server flag on Windows service
  [DOCS] Correct typo in configuration (#28903)
  Fix incorrect datemath example (#28904)
  Add a usage example of the JLH score (#28905)
  Limit analyzed text for highlighting (improvements) (#28907)
  Wrap stream passed to createParser in try-with-resources (#28897)
  [Docs] Fix typo in composite aggregation (#28891)
  Rescore collapsed documents (#28521)
  Fix (simple)_query_string to ignore removed terms (#28871)
  Missing `timeout` parameter from the REST API spec JSON files (#28328)
  Clarifies how query_string splits textual part (#28798)
  Update outdated java version reference (#28870)
  ...

sebasjm pushed a commit to sebasjm/elasticsearch that referenced this pull request Mar 10, 2018

Rescore collapsed documents (elastic#28521)
This change adds the ability to rescore collapsed documents.

sebasjm pushed a commit to sebasjm/elasticsearch that referenced this pull request Mar 10, 2018

Revert "Rescore collapsed documents (elastic#28521)"
This reverts commit f057fc2.
The rescorer does not resort the collapsed values inside the top docs
during rescoring. For this reason the Lucene rescorer is not compatible
with collapsing.
Relates elastic#27243

@jimczi jimczi added the stalled label Mar 14, 2018

@jimczi

This comment has been minimized.

Copy link
Member

commented Mar 30, 2018

Sorry it took me some time to come back at this. I checked why Solr was able to rescore the collapsed documents seamlessly and found out that they force the routing of each group in a single shard. This means that all the documents belonging to a single group are on the same shard so the rescoring is always done on the final head of the group. In es we don't enforce the routing so each group can be spread over multiple shards. This complicates the rescoring since it is always applied at the shard level and in this case on the temporary head of the groups (we don't know the final head in the shard since another shard can contain a better document for that group). For this reason I am reluctant to add this functionality because it might be surprising to see a head in a group that is not the best document of that group in the final response. This can happen if the rescoring gives a score to a document in a shard that is better that the score of the best document in the group which is in another shard. I don't see how we could avoid this unless we force the routing of the groups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.