Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant terms can throw error on index with deleted docs. #7960

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
3 participants
@markharwood
Copy link
Contributor

markharwood commented Oct 2, 2014

If an index contains a highly popular term and many deleted documents it can cause an error as reported in issue #7951

The background count for docs should include deleted docs otherwise a term’s docFreq (which includes deleted docs) can exceed the number of docs reported in the index and cause an exception.

The randomisation that deletes documents is also removed from tests as this doc-accounting change would mean the specific scores being expected in tests would now be subject to random variability and so fail.

Closes #7951

Aggs fix - background count for docs should include deleted docs othe…
…rwise a term’s docFreq (which includes deleted docs) can exceed the number of docs reported in the index and cause an exception.

The randomisation that deletes documents is also removed from tests as this doc-accounting change would mean the specific scores being expected in tests would now be subject to random variability and so fail.

Closes #7951
@@ -71,7 +71,7 @@ public FilterableTermsEnum(IndexReader reader, String field, int docsEnumFlag, @
}
this.docsEnumFlag = docsEnumFlag;
if (filter == null) {
numDocs = reader.numDocs();
numDocs = reader.maxDoc();

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 2, 2014

Contributor

maybe just leave a comment that we get maxDoc on purpose instead of numDocs, so that someone doesn't rollback that change in the future if numDocs feels more logical to him?

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Oct 2, 2014

The change looks good, I think it would be good to have a test too?

@jpountz jpountz removed the review label Oct 2, 2014

@markharwood

This comment has been minimized.

Copy link
Contributor Author

markharwood commented Oct 3, 2014

Committed in f878f40
Removed the labels on this PR and added them to the issue.

@markharwood markharwood closed this Oct 3, 2014

@clintongormley clintongormley changed the title Aggs fix - significant terms can throw error on index with deleted docs. Aggregations: Significant terms can throw error on index with deleted docs. Nov 3, 2014

@clintongormley clintongormley changed the title Aggregations: Significant terms can throw error on index with deleted docs. Significant terms can throw error on index with deleted docs. Jun 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.