New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Decimal Aggregation Performance #16610
Improve Decimal Aggregation Performance #16610
Conversation
24c3670
to
7daa5c1
Compare
"Improve Decimal sum and average aggregation performance" Mainly my own question
...java/com/facebook/presto/operator/aggregation/state/LongDecimalWithOverflowStateFactory.java
Show resolved
Hide resolved
@@ -69,7 +70,7 @@ | |||
|
|||
private static final MethodHandle COMBINE_FUNCTION = methodHandle(DecimalAverageAggregation.class, "combine", LongDecimalWithOverflowAndLongState.class, LongDecimalWithOverflowAndLongState.class); | |||
|
|||
private static final BigInteger TWO = new BigInteger("2"); | |||
private static final BigInteger OVERFLOW_MULTIPLIER = new BigInteger("2").shiftLeft(UNSCALED_DECIMAL_128_SLICE_LENGTH * 8 - 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a brief comment to clarify what UNSCALED_DECIMAL_128_SLICE_LENGTH * 8 - 2
is and why this is overflow multiplier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to explain it, but I just extracted a constant that was being recomputed on each invocation to the average
method before this. I believe the explanation is something like: 2 << 126
basically 1 << 127
(the highest bit is the sign bit). Why exactly 2 << 126
? I'm not sure- maybe something to do with how BigInteger("1")
works internally?
...o-main/src/main/java/com/facebook/presto/operator/aggregation/DecimalAverageAggregation.java
Show resolved
Hide resolved
"Improve Decimal Aggregation State Serializer Performance"
...a/com/facebook/presto/operator/aggregation/state/LongDecimalWithOverflowStateSerializer.java
Show resolved
Hide resolved
...java/com/facebook/presto/operator/aggregation/state/LongDecimalWithOverflowStateFactory.java
Outdated
Show resolved
Hide resolved
@shixuan-fan, @yuanzhanhku, any other suggestions? Or it's good to go? |
Improves DecimalAverageAggregation, DecimalSumAggregation, and assocated LongDecimalWithOverflowState classes to avoid repeatedly calling BigArray get / set method pairs wherever possible. Specifically modifies GroupedLongDecimalWithOverflowState to lazily allocate the overflows LongBigArray which reduces memory usage and extra memory indirections when no overflows occur.
Avoids using a method handle bound to the specific input type for DecimalSumAggregation/DecimalAverageAggregation input functions, using a static constant inside of the methods instead since the input functions are scale-invariant (all values are unscaled) and method handles carrying arguments bound to specific instances can't be inlined nearly as well by the JIT.
Avoids creating heap allocated Slices and indirecting through SliceInput/SliceOutput to serialize and deserialize LongDecimalWithOverflowState and LongDecimalWithOverflowAndLongState values since the Slice width is not dynamic but rather fixed to some number of long fields followed by a 128 bit decimal.
7daa5c1
to
cd5d58a
Compare
Extracted changes from trinodb/trino#8878
This set of changes includes improvements to
DecimalSumAggregation
andDecimalAverageAggregation
, as well as to their state serializers. Since no benchmarks existed for these aggregation operations before in PrestoDB, I'll have to link to the Trino benchmarking results instead.The overall changes include:
DecimalType
instances instead of binding specificDecimalType
instances to method handles that only manipulate unscaled decimal values