Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of multiple buckets being emitted for the same parent doc id in nested aggregation #9346

Conversation

martijnvg
Copy link
Member

This bug was introduced by #8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the score of the current root document, so the amount of child doc ids buffered is small.

Closes #9317

if (currentRootDoc == rootDocId) {
final IntArrayList childDocIdBuffer = childDocIdBuffers.get(parentDocId);
if (childDocIdBuffer != null) {
return new ChildDocIterator() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could childDocIdBuffers store a reusable iterator? Would avoid the object creation overhead every time a parent is re-collected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering that it might be even easier to return an IntList instead of an iterator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1I like that idea. I'll update the PR with that change.

@markharwood
Copy link
Contributor

Looks good and tests passing here

@@ -88,6 +96,7 @@ public void setNextReader(AtomicReaderContext reader) {
} else {
childDocs = childDocIdSet.iterator();
}
rootDocs = SearchContext.current().fixedBitSetFilterCache().getFixedBitSetFilter(NonNestedDocsFilter.INSTANCE).getDocIdSet(reader, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pull the search context from the aggregation context instead? It does not make any difference today but I think it would help if we ever have aggregation unit tests.

@martijnvg
Copy link
Member Author

@markharwood @jpountz I updated the PR and use IntArrayList directly now.

// 1) nested agg wrapped is by terms agg and multiple buckets per document are emitted
// 2) Multiple nested fields are defined. A nested agg joins back to another nested agg via the reverse_nested agg.
// For each child in the first nested agg the second nested agg gets invoked with the same buckets / docids
private IntArrayList getChildIterator(final int parentDocId) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/getChildIterator/getChildren/ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good, point, will fix this and push it.

@jpountz
Copy link
Contributor

jpountz commented Jan 26, 2015

+1 Thanks @martijnvg

One concern could be that we keep on creating new object in collect() which is called in a tight loop, but correctness over speed, we can take time to think about it one this change is in.

@martijnvg
Copy link
Member Author

@jpountz Thanks, I was thinking about adding a different impl for when we know that that the child docs of root docs are only evaluated once. (similar to how we push to a different impl for singe values values / multi values in the global ordinals terms aggregator)

… the same parent doc id.

This bug was introduced by elastic#8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the scope of the current root document, so the amount of child doc ids buffered is small.

Closes elastic#9317
Closes elastic#9346
@martijnvg martijnvg force-pushed the bugs/nested_agg/support_same_parent_being_handles_multiple_times branch from 0b011c9 to fc64e4f Compare January 26, 2015 16:04
martijnvg added a commit that referenced this pull request Jan 26, 2015
… the same parent doc id.

This bug was introduced by #8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the scope of the current root document, so the amount of child doc ids buffered is small.

Closes #9317
Closes #9346
@martijnvg martijnvg closed this in 7ca2ef9 Jan 26, 2015
martijnvg added a commit that referenced this pull request Jan 26, 2015
… the same parent doc id.

This bug was introduced by #8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the scope of the current root document, so the amount of child doc ids buffered is small.

Closes #9317
Closes #9346
@martijnvg martijnvg removed the review label Jan 26, 2015
@clintongormley clintongormley changed the title Nested aggregator: Fix handling of multiple buckets being emitted for the same parent doc id Aggs: Fix handling of multiple buckets being emitted for the same parent doc id in nested aggregation Feb 11, 2015
@martijnvg martijnvg deleted the bugs/nested_agg/support_same_parent_being_handles_multiple_times branch May 18, 2015 23:26
@clintongormley clintongormley changed the title Aggs: Fix handling of multiple buckets being emitted for the same parent doc id in nested aggregation Fix handling of multiple buckets being emitted for the same parent doc id in nested aggregation Jun 7, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
… the same parent doc id.

This bug was introduced by elastic#8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the scope of the current root document, so the amount of child doc ids buffered is small.

Closes elastic#9317
Closes elastic#9346
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Nested Docs labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v1.4.3 v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants