New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of multiple buckets being emitted for the same parent doc id in nested aggregation #9346
Conversation
if (currentRootDoc == rootDocId) { | ||
final IntArrayList childDocIdBuffer = childDocIdBuffers.get(parentDocId); | ||
if (childDocIdBuffer != null) { | ||
return new ChildDocIterator() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could childDocIdBuffers store a reusable iterator? Would avoid the object creation overhead every time a parent is re-collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering that it might be even easier to return an IntList instead of an iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1I like that idea. I'll update the PR with that change.
Looks good and tests passing here |
@@ -88,6 +96,7 @@ public void setNextReader(AtomicReaderContext reader) { | |||
} else { | |||
childDocs = childDocIdSet.iterator(); | |||
} | |||
rootDocs = SearchContext.current().fixedBitSetFilterCache().getFixedBitSetFilter(NonNestedDocsFilter.INSTANCE).getDocIdSet(reader, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe pull the search context from the aggregation context instead? It does not make any difference today but I think it would help if we ever have aggregation unit tests.
@markharwood @jpountz I updated the PR and use IntArrayList directly now. |
// 1) nested agg wrapped is by terms agg and multiple buckets per document are emitted | ||
// 2) Multiple nested fields are defined. A nested agg joins back to another nested agg via the reverse_nested agg. | ||
// For each child in the first nested agg the second nested agg gets invoked with the same buckets / docids | ||
private IntArrayList getChildIterator(final int parentDocId) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/getChildIterator/getChildren/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good, point, will fix this and push it.
+1 Thanks @martijnvg One concern could be that we keep on creating new object in collect() which is called in a tight loop, but correctness over speed, we can take time to think about it one this change is in. |
@jpountz Thanks, I was thinking about adding a different impl for when we know that that the child docs of root docs are only evaluated once. (similar to how we push to a different impl for singe values values / multi values in the global ordinals terms aggregator) |
… the same parent doc id. This bug was introduced by elastic#8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the scope of the current root document, so the amount of child doc ids buffered is small. Closes elastic#9317 Closes elastic#9346
0b011c9
to
fc64e4f
Compare
… the same parent doc id. This bug was introduced by #8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the scope of the current root document, so the amount of child doc ids buffered is small. Closes #9317 Closes #9346
… the same parent doc id. This bug was introduced by #8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the scope of the current root document, so the amount of child doc ids buffered is small. Closes #9317 Closes #9346
… the same parent doc id. This bug was introduced by elastic#8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the scope of the current root document, so the amount of child doc ids buffered is small. Closes elastic#9317 Closes elastic#9346
This bug was introduced by #8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the score of the current root document, so the amount of child doc ids buffered is small.
Closes #9317