Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Decimal Aggregation Performance #16610

Merged
merged 3 commits into from Aug 30, 2021

Conversation

pettyjamesm
Copy link
Contributor

@pettyjamesm pettyjamesm commented Aug 13, 2021

Extracted changes from trinodb/trino#8878

This set of changes includes improvements to DecimalSumAggregation and DecimalAverageAggregation, as well as to their state serializers. Since no benchmarks existed for these aggregation operations before in PrestoDB, I'll have to link to the Trino benchmarking results instead.

The overall changes include:

  • Avoid repeated access through pairs of BigArray get() / set() calls
  • Using constant DecimalType instances instead of binding specific DecimalType instances to method handles that only manipulate unscaled decimal values
  • Leveraging the fixed size nature of intermediate aggregation states to avoid intermediate serialization / deserialization overhead
  • Lazily allocating the big array for overflow tracking in grouped accumulations, since overflows are rare and the per-group overhead can be significant.
== RELEASE NOTES ==

General Changes
* Improve performance of sum and average aggregations on decimals

@pettyjamesm pettyjamesm force-pushed the improve-decimal-aggregations branch 3 times, most recently from 24c3670 to 7daa5c1 Compare Aug 16, 2021
@pettyjamesm pettyjamesm changed the title [WIP] Improve Decimal Aggregation Performance Improve Decimal Aggregation Performance Aug 16, 2021
@pettyjamesm pettyjamesm requested a review from highker Aug 18, 2021
Copy link
Contributor

@shixuan-fan shixuan-fan left a comment

"Improve Decimal sum and average aggregation performance" Mainly my own question

@@ -69,7 +70,7 @@

private static final MethodHandle COMBINE_FUNCTION = methodHandle(DecimalAverageAggregation.class, "combine", LongDecimalWithOverflowAndLongState.class, LongDecimalWithOverflowAndLongState.class);

private static final BigInteger TWO = new BigInteger("2");
private static final BigInteger OVERFLOW_MULTIPLIER = new BigInteger("2").shiftLeft(UNSCALED_DECIMAL_128_SLICE_LENGTH * 8 - 2);
Copy link
Contributor

@shixuan-fan shixuan-fan Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a brief comment to clarify what UNSCALED_DECIMAL_128_SLICE_LENGTH * 8 - 2 is and why this is overflow multiplier?

Copy link
Contributor Author

@pettyjamesm pettyjamesm Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to explain it, but I just extracted a constant that was being recomputed on each invocation to the average method before this. I believe the explanation is something like: 2 << 126 basically 1 << 127 (the highest bit is the sign bit). Why exactly 2 << 126 ? I'm not sure- maybe something to do with how BigInteger("1") works internally?

Copy link
Contributor

@shixuan-fan shixuan-fan left a comment

"Use unbound method references for input functions" LGTM

Copy link
Contributor

@shixuan-fan shixuan-fan left a comment

"Improve Decimal Aggregation State Serializer Performance"

Copy link
Contributor

@yuanzhanhku yuanzhanhku left a comment

Looks good to me

@highker
Copy link
Contributor

@highker highker commented Aug 30, 2021

@shixuan-fan, @yuanzhanhku, any other suggestions? Or it's good to go?

pettyjamesm added 3 commits Aug 30, 2021
Improves DecimalAverageAggregation, DecimalSumAggregation, and
assocated LongDecimalWithOverflowState classes to avoid repeatedly
calling BigArray get / set method pairs wherever possible.

Specifically modifies GroupedLongDecimalWithOverflowState to lazily
allocate the overflows LongBigArray which reduces memory usage and
extra memory indirections when no overflows occur.
Avoids using a method handle bound to the specific input type for
DecimalSumAggregation/DecimalAverageAggregation input functions,
using a static constant inside of the methods instead since the
input functions are scale-invariant (all values are unscaled) and
method handles carrying arguments bound to specific instances can't
be inlined nearly as well by the JIT.
Avoids creating heap allocated Slices and indirecting through
SliceInput/SliceOutput to serialize and deserialize
LongDecimalWithOverflowState and LongDecimalWithOverflowAndLongState
values since the Slice width is not dynamic but rather fixed to
some number of long fields followed by a 128 bit decimal.
@pettyjamesm pettyjamesm force-pushed the improve-decimal-aggregations branch from 7daa5c1 to cd5d58a Compare Aug 30, 2021
@pettyjamesm pettyjamesm merged commit 6425b6c into prestodb:master Aug 30, 2021
40 checks passed
@pettyjamesm pettyjamesm deleted the improve-decimal-aggregations branch Aug 30, 2021
@aweisberg aweisberg mentioned this pull request Aug 31, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants