Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve explanation in rescore #30629

Merged

Conversation

mayya-sharipova
Copy link
Contributor

@mayya-sharipova mayya-sharipova commented May 15, 2018

Currently in a rescore request if window_size is smaller than
the top N documents returned (N=size), explanation of scores could be incorrect
for documents that were a part of topN and not part of rescoring.
This PR corrects this by saving in RescoreContext docIDs of documents
for which rescoring was applied, and adding rescoring explanation
only for these docIDs.

Closes #28725

Currently in a rescore request if window_size is smaller than
the top N documents returned (N=size), explanation of scores could be incorrect
for documents that were a part of topN and not part of rescoring.
This PR corrects this, but saving in RescoreContext docIDs of documents
for which rescoring was applied, and adding rescoring explanation
only for these docIDs.

Closes elastic#28725
@mayya-sharipova
Copy link
Contributor Author

@elasticmachine run gradle build tests

@colings86 colings86 added >enhancement :Search/Search Search-related issues that do not fall into other categories labels May 16, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mayya-sharipova
I left a small comment regarding the creation of the rescore explanation.

@@ -92,7 +97,7 @@ public Explanation explain(int topLevelDocId, IndexSearcher searcher, RescoreCon

// NOTE: we don't use Lucene's Rescorer.explain because we want to insert our own description with which ScoreMode was used. Maybe
// we should add QueryRescorer.explainCombine to Lucene?
if (rescoreExplain != null && rescoreExplain.isMatch()) {
if (rescoreContext.isRescored(topLevelDocId) && rescoreExplain != null && rescoreExplain.isMatch()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always build the rescore explanation even when rescoreContext.isRescored(topLevelDocId) is false. We could avoid this by building the primary explanation first and return it directly if rescoreContext.isRescored(topLevelDocId) is false ?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jimczi jimczi added the v6.4.0 label May 16, 2018
@mayya-sharipova
Copy link
Contributor Author

@elasticmachine run gradle build tests

@@ -27,6 +29,7 @@
public class RescoreContext {
private final int windowSize;
private final Rescorer rescorer;
private Set<Integer> recroredDocs; //doc Ids for which rescoring was applied
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/recrored/rescored/

@mayya-sharipova mayya-sharipova merged commit 3dfa93e into elastic:master May 17, 2018
mayya-sharipova added a commit that referenced this pull request May 17, 2018
Currently in a rescore request if window_size is smaller than
the top N documents returned (N=size), explanation of scores could be incorrect
for documents that were a part of topN and not part of rescoring.
This PR corrects this by saving in RescoreContext docIDs of documents
for which rescoring was applied, and adding rescoring explanation
only for these docIDs.

Closes #28725
mayya-sharipova added a commit that referenced this pull request May 17, 2018
Currently in a rescore request if window_size is smaller than
the top N documents returned (N=size), explanation of scores could be incorrect
for documents that were a part of topN and not part of rescoring.
This PR corrects this by saving in RescoreContext docIDs of documents
for which rescoring was applied, and adding rescoring explanation
only for these docIDs.

Closes #28725
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request May 17, 2018
…ngs-to-true

* elastic/master: (25 commits)
  [DOCS] Replace X-Pack terms with attributes
  [ML] Clean left behind model state docs (elastic#30659)
  Correct typos
  filters agg docs duplicated 'bucket' word removal (elastic#30677)
  top_hits doc example description update (elastic#30676)
  [Docs] Replace InetSocketTransportAddress with TransportAdress (elastic#30673)
  [TEST] Account for increase in ML C++ memory usage (elastic#30675)
  User proper write-once semantics for GCS repository (elastic#30438)
  Remove bogus file accidentally added
  Add detailed assert message to IndexAuditUpgradeIT (elastic#30669)
  Adjust fast forward for token expiration test  (elastic#30668)
  Improve explanation in rescore (elastic#30629)
  Deprecate `nGram` and `edgeNGram` names for ngram filters (elastic#30209)
  Watcher: Fix watch history template for dynamic slack attachments (elastic#30172)
  Fix _cluster/state to always return cluster_uuid (elastic#30656)
  [Tests] Add debug information to CorruptedFileIT
  Preserve REST client auth despite 401 response (elastic#30558)
  [test] packaging: add windows boxes (elastic#30402)
  Make xpack modules instead of a meta plugin (elastic#30589)
  Mute ShrinkIndexIT
  ...
@mayya-sharipova mayya-sharipova deleted the improve-explanation-in-rescore branch May 17, 2018 19:45
dnhatn added a commit that referenced this pull request May 19, 2018
* 6.x:
  Mute testCorruptFileThenSnapshotAndRestore
  Plugins: Remove meta plugins (#30670)
  Upgrade to Lucene-7.4.0-snapshot-59f2b7aec2 (#30726)
  Docs: Add uptasticsearch to list of clients (#30738)
  [TEST] Reduce forecast overflow to disk test memory limit (#30727)
  [DOCS] Removes redundant index.asciidoc files (#30707)
  [DOCS] Moves X-Pack configurationg pages in table of contents (#30702)
  [ML][TEST] Fix bucket count assertion in ModelPlotsIT (#30717)
  [ML][TEST] Make AutodetectMemoryLimitIT less fragile (#30716)
  [Build] Add test admin when starting gradle run with trial license and
  [ML] provide tmp storage for forecasting and possibly any ml native jobs #30399
  Tests: Fail if test watches could not be triggered (#30392)
  Watcher: Prevent duplicate watch triggering during upgrade (#30643)
  [ML] add version information in case of crash of native ML process (#30674)
  Add detailed assert message to IndexAuditUpgradeIT (#30669)
  Preserve REST client auth despite 401 response (#30558)
  Make TransportClusterStateAction abide to our style (#30697)
  [DOCS] Fixes edit URLs for stack overview (#30583)
  [DOCS] Add missing callout in IndicesClientDocumentationIT
  Backport get settings API changes to 6.x (#30494)
  Silence sleep based watcher test
  [DOCS] Replace X-Pack terms with attributes
  Improve explanation in rescore (#30629)
  [test] packaging: add windows boxes (#30402)
  [ML] Clean left behind model state docs (#30659)
  filters agg docs duplicated 'bucket' word removal (#30677)
  top_hits doc example description update (#30676)
  MovingFunction Pipeline agg backport to 6.x (#30658)
  [Docs] Replace InetSocketTransportAddress with TransportAdress (#30673)
  [TEST] Account for increase in ML C++ memory usage (#30675)
  User proper write-once semantics for GCS repository (#30438)
  Deprecate `nGram` and `edgeNGram` names for ngram filters (#30209)
  Watcher: Fix watch history template for dynamic slack attachments (#30172)
  Fix _cluster/state to always return cluster_uuid (#30656)
dnhatn added a commit that referenced this pull request May 19, 2018
* master:
  Scripting: Remove getDate methods from ScriptDocValues (#30690)
  Upgrade to Lucene-7.4.0-snapshot-59f2b7aec2 (#30726)
  [Docs] Fix single page :docs:check invocation (#30725)
  Docs: Add uptasticsearch to list of clients (#30738)
  [DOCS] Removes out-dated x-pack/docs/en/index.asciidoc
  [DOCS] Removes redundant index.asciidoc files (#30707)
  [TEST] Reduce forecast overflow to disk test memory limit (#30727)
  Plugins: Remove meta plugins (#30670)
  [DOCS] Moves X-Pack configurationg pages in table of contents (#30702)
  TEST: Add engine log to testCorruptFileThenSnapshotAndRestore
  [ML][TEST] Fix bucket count assertion in ModelPlotsIT (#30717)
  [ML][TEST] Make AutodetectMemoryLimitIT less fragile (#30716)
  Default copy settings to true and deprecate on the REST layer (#30598)
  [Build] Add test admin when starting gradle run with trial license and
  This implementation lazily (on 1st forecast request) checks for available diskspace and creates a subfolder for storing data outside of Lucene indexes, but as part of the ES data paths.
  Tests: Fail if test watches could not be triggered (#30392)
  [ML] add version information in case of crash of native ML process (#30674)
  Make TransportClusterStateAction abide to our style (#30697)
  Change required version for Get Settings transport API changes to 6.4.0 (#30706)
  [DOCS] Fixes edit URLs for stack overview (#30583)
  Silence sleep based watcher test
  [TEST] Adjust version skips for movavg/movfn tests
  [DOCS] Replace X-Pack terms with attributes
  [ML] Clean left behind model state docs (#30659)
  Correct typos
  filters agg docs duplicated 'bucket' word removal (#30677)
  top_hits doc example description update (#30676)
  [Docs] Replace InetSocketTransportAddress with TransportAdress (#30673)
  [TEST] Account for increase in ML C++ memory usage (#30675)
  User proper write-once semantics for GCS repository (#30438)
  Remove bogus file accidentally added
  Add detailed assert message to IndexAuditUpgradeIT (#30669)
  Adjust fast forward for token expiration test  (#30668)
  Improve explanation in rescore (#30629)
  Deprecate `nGram` and `edgeNGram` names for ngram filters (#30209)
  Watcher: Fix watch history template for dynamic slack attachments (#30172)
  Fix _cluster/state to always return cluster_uuid (#30656)
  [Tests] Add debug information to CorruptedFileIT

# Conflicts:
#	test/framework/src/main/java/org/elasticsearch/indices/analysis/AnalysisFactoryTestCase.java
ywelsch pushed a commit to ywelsch/elasticsearch that referenced this pull request May 23, 2018
Currently in a rescore request if window_size is smaller than
the top N documents returned (N=size), explanation of scores could be incorrect
for documents that were a part of topN and not part of rescoring.
This PR corrects this, but saving in RescoreContext docIDs of documents
for which rescoring was applied, and adding rescoring explanation
only for these docIDs.

Closes elastic#28725
@jpountz jpountz added the v6.3.0 label Jun 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v6.3.0 v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants