You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Facebook workloads, we observed MergingPageOutput.getOutput() was allocating large amount of memory, in which BlockBuilders keep increasing the arrays after appending some rows. It also creates lots of Slices just to read the data from the source. This can be avoided because the positions in the pages being merged is already known and we don't need to grow memory or create insane amount of Slices on the fly. Instead of appending the values one by one, we can make the BlockBuilders be able to append an existing block internally. This should also benefit the performance.
The text was updated successfully, but these errors were encountered:
Just discussed with @oerling about this. Orri's point is the MergingPageOutput cost from the small pages might well over the benefit of merging them to larger pages. Moreover, if the MergingPageOutput is followed by a repartition(which might be most of the cases), the effect of small pages will be gone anyways. Therefore, a more straightforward and time saving way is to remove MergingPageOutput whenever it's followed by repartition. I will do performance testing afterwards.
In Facebook workloads, we observed MergingPageOutput.getOutput() was allocating large amount of memory, in which BlockBuilders keep increasing the arrays after appending some rows. It also creates lots of Slices just to read the data from the source. This can be avoided because the positions in the pages being merged is already known and we don't need to grow memory or create insane amount of Slices on the fly. Instead of appending the values one by one, we can make the BlockBuilders be able to append an existing block internally. This should also benefit the performance.
The text was updated successfully, but these errors were encountered: