You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Max aggregator has an optimization to use the BKD tree in an attempt to find the max, bypassing an expensive collection of all documents. It does this by checking the largest leaf in the tree to see if we can find the max. Today this process decodes the packed value for every live doc in the leaf, which is not necessary. We could instead just cache the packed value and decode after intersecting.
The Min aggregator works a little differently. Since values are sorted ascending in the BKD tree, we can start at the beginning and iterate until we find a live doc (e.g. non-deleted document) then exit and use that value.
This is potentially problematic if there are many or mostly deleted documents, since we could spend a long time traversing the BKD tree. It might be faster to actually collect the documents normally since those skip deleted documents. We should probably include some kind of heuristic and revert to the non-BKD approach if we can't find the value (max 1024 lookups?)
This would be a good first issue for someone wanting to get into the agg framework, or learn how the BKD tree works, or both :)
The text was updated successfully, but these errors were encountered:
The Max aggregator has an optimization to use the BKD tree in an attempt to find the max, bypassing an expensive collection of all documents. It does this by checking the largest leaf in the tree to see if we can find the max. Today this process decodes the packed value for every live doc in the leaf, which is not necessary. We could instead just cache the packed value and decode after intersecting.
The Min aggregator works a little differently. Since values are sorted ascending in the BKD tree, we can start at the beginning and iterate until we find a live doc (e.g. non-deleted document) then exit and use that value.
This is potentially problematic if there are many or mostly deleted documents, since we could spend a long time traversing the BKD tree. It might be faster to actually collect the documents normally since those skip deleted documents. We should probably include some kind of heuristic and revert to the non-BKD approach if we can't find the value (max 1024 lookups?)
This would be a good first issue for someone wanting to get into the agg framework, or learn how the BKD tree works, or both :)
The text was updated successfully, but these errors were encountered: