Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two's complement representation for decimals #10051

Merged
merged 3 commits into from Dec 23, 2021

Conversation

martint
Copy link
Member

@martint martint commented Nov 23, 2021

No description provided.

@cla-bot cla-bot bot added the cla-signed label Nov 23, 2021
@martint
Copy link
Member Author

martint commented Nov 23, 2021

Some preliminary benchmark results for decimal addition:

before

Benchmark                                             (expression)   Mode  Cnt  Score   Error   Units
BenchmarkDecimalOperators.additionBenchmark                l1 + l2  thrpt   15  1.055 ± 0.002  ops/ms
BenchmarkDecimalOperators.additionBenchmark      l1 + l2 + l3 + l4  thrpt   15  0.429 ± 0.006  ops/ms
BenchmarkDecimalOperators.additionBenchmark      s2 + l3 + l1 + s4  thrpt   15  0.433 ± 0.007  ops/ms
BenchmarkDecimalOperators.additionBenchmark              lz1 + lz2  thrpt   15  1.121 ± 0.008  ops/ms
BenchmarkDecimalOperators.additionBenchmark  lz1 + lz2 + lz3 + lz4  thrpt   15  0.469 ± 0.007  ops/ms
BenchmarkDecimalOperators.additionBenchmark    s2 + lz3 + lz1 + s4  thrpt   15  0.311 ± 0.022  ops/ms


after

Benchmark                                             (expression)   Mode  Cnt  Score   Error   Units
BenchmarkDecimalOperators.additionBenchmark                l1 + l2  thrpt   15  2.270 ± 0.011  ops/ms
BenchmarkDecimalOperators.additionBenchmark      l1 + l2 + l3 + l4  thrpt   15  1.103 ± 0.003  ops/ms
BenchmarkDecimalOperators.additionBenchmark      s2 + l3 + l1 + s4  thrpt   15  1.015 ± 0.014  ops/ms
BenchmarkDecimalOperators.additionBenchmark              lz1 + lz2  thrpt   15  3.599 ± 0.011  ops/ms
BenchmarkDecimalOperators.additionBenchmark  lz1 + lz2 + lz3 + lz4  thrpt   15  2.571 ± 0.216  ops/ms
BenchmarkDecimalOperators.additionBenchmark    s2 + lz3 + lz1 + s4  thrpt   15  1.068 ± 0.017  ops/ms

@findepi
Copy link
Member

findepi commented Nov 24, 2021

If this changes long decimal representation, how do we address connectors compatibility?

@martint
Copy link
Member Author

martint commented Nov 25, 2021

If this changes long decimal representation, how do we address connectors compatibility?

I don't think there's a good way around that. We'll probably have to make this a backward compatibility breaking change. It's unfortunate that we chose the current representation when we first introduced decimals, and that we didn't have the facilities to support arbitrary object types other than Slice back then (which seems to introduce non-trivial performance issues).

On the positive side, since we're changing the stack type, any connector that doesn't support the new representation will just fail loudly instead of producing incorrect results.

There may be ways to do it in a backward compatible way, though -- I haven't spent too much time thinking about it yet, though. If you have any thoughts on how we might do it, please comment.

@kokosing
Copy link
Member

How about exposing method like io.trino.spi.connector.Connector#isUsingLegacyDecimalRepresentation, if so during planning we could add a projection that would call a scalar function that would translate decimal between representations. That way we could migrate one connector after the other.

@martint
Copy link
Member Author

martint commented Nov 25, 2021

Not sure how that would work. The type can only support one “stack” and block type

@findepi
Copy link
Member

findepi commented Nov 26, 2021

@kokosing actually i am in favor of backwards incompatible approach.
desire to maintain backwards compatibility is what was preventing us from fixing timestamp semantics for so long.
We were able to fix (most of) #37 not because we found smarter ways to achieve the goals, but rather: we redefined the goals.
OTOH, i would be surprised if "others" were OK with backwards incompatible approach as well. I remember the intense drama related to #9765 which was not a real problem and in a new API implemented by no-one, especially compared to big-bang approach proposed here, for type representation used by nearly every connector.

@kokosing
Copy link
Member

Not sure how that would work. The type can only support one “stack” and block type

I am sorry I haven't read the code yet. I was under impression that "stack" type didn't change but only encoding.

@martint martint force-pushed the decimal-twos-complement branch 15 times, most recently from 4af2e9d to 62e2b15 Compare November 30, 2021 22:40
@martint
Copy link
Member Author

martint commented Nov 30, 2021

Update benchmarks:

*** Before

Benchmark                                                                                                            (expression)  (precision)   Mode  Cnt  Score    Error   Units
BenchmarkDecimalOperators.additionBenchmark                                                                               s1 + s2          N/A  thrpt   15  6.304 ±  0.038  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                     s1 + s2 + s3 + s4          N/A  thrpt   15  1.136 ±  0.004  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                               l1 + l2          N/A  thrpt   15  1.558 ±  0.021  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                     l1 + l2 + l3 + l4          N/A  thrpt   15  0.399 ±  0.014  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                     s2 + l3 + l1 + s4          N/A  thrpt   15  0.331 ±  0.003  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                             lz1 + lz2          N/A  thrpt   15  1.602 ±  0.007  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                 lz1 + lz2 + lz3 + lz4          N/A  thrpt   15  0.466 ±  0.001  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                   s2 + lz3 + lz1 + s4          N/A  thrpt   15  0.317 ±  0.021  ops/ms
BenchmarkDecimalOperators.castDecimalToDoubleBenchmark                                                                        N/A           15  thrpt   15  9.778 ±  0.040  ops/ms
BenchmarkDecimalOperators.castDecimalToDoubleBenchmark                                                                        N/A           35  thrpt   15  0.035 ±  0.001  ops/ms
BenchmarkDecimalOperators.castDecimalToVarcharBenchmark                                                                       N/A           15  thrpt   15  0.256 ±  0.003  ops/ms
BenchmarkDecimalOperators.castDecimalToVarcharBenchmark                                                                       N/A           35  thrpt   15  0.057 ±  0.001  ops/ms
BenchmarkDecimalOperators.castDoubleToDecimalBenchmark                                                                        N/A           10  thrpt   15  0.103 ±  0.008  ops/ms
BenchmarkDecimalOperators.castDoubleToDecimalBenchmark                                                                        N/A           35  thrpt   15  0.047 ±  0.005  ops/ms
BenchmarkDecimalOperators.castDoubleToDecimalBenchmark                                                                        N/A       BIGINT  thrpt   15  7.244 ±  0.014  ops/ms
BenchmarkDecimalOperators.decimalToShortDecimalCastBenchmark                                       cast(l_38_30 as decimal(8, 0))          N/A  thrpt   15  0.798 ±  0.004  ops/ms
BenchmarkDecimalOperators.decimalToShortDecimalCastBenchmark                                       cast(l_26_18 as decimal(8, 0))          N/A  thrpt   15  1.890 ±  0.010  ops/ms
BenchmarkDecimalOperators.decimalToShortDecimalCastBenchmark                                       cast(l_20_12 as decimal(8, 0))          N/A  thrpt   15  1.133 ±  0.016  ops/ms
BenchmarkDecimalOperators.decimalToShortDecimalCastBenchmark                                        cast(l_20_8 as decimal(8, 0))          N/A  thrpt   15  1.146 ±  0.059  ops/ms
BenchmarkDecimalOperators.decimalToShortDecimalCastBenchmark                                        cast(s_17_9 as decimal(8, 0))          N/A  thrpt   15  8.692 ±  0.056  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               s1 / s2          N/A  thrpt   15  6.152 ±  0.048  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                     s1 / s2 / s2 / s2          N/A  thrpt   15  0.888 ±  0.007  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               s1 / s3          N/A  thrpt   15  0.275 ±  0.002  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               s2 / l1          N/A  thrpt   15  0.302 ±  0.003  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               l1 / s2          N/A  thrpt   15  0.673 ±  0.006  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               s3 / l1          N/A  thrpt   15  0.346 ±  0.005  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               l2 / l3          N/A  thrpt   15  0.180 ±  0.004  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                     l2 / l4 / l4 / l4          N/A  thrpt   15  0.108 ±  0.001  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                     l2 / s4 / s4 / s4          N/A  thrpt   15  0.117 ±  0.002  ops/ms
BenchmarkDecimalOperators.inequalityBenchmark                                                                             s1 < s2          N/A  thrpt   15  6.730 ±  0.076  ops/ms
BenchmarkDecimalOperators.inequalityBenchmark                 s1 < s2 AND s1 < s3 AND s1 < s4 AND s2 < s3 AND s2 < s4 AND s3 < s4          N/A  thrpt   15  4.354 ±  0.034  ops/ms
BenchmarkDecimalOperators.inequalityBenchmark                                                                             l1 < l2          N/A  thrpt   15  6.056 ±  0.059  ops/ms
BenchmarkDecimalOperators.inequalityBenchmark                 l1 < l2 AND l1 < l3 AND l1 < l4 AND l2 < l3 AND l2 < l4 AND l3 < l4          N/A  thrpt   15  2.753 ±  0.052  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                                 s1 % s2          N/A  thrpt   15  1.071 ±  0.003  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                       s1 % s2 % s2 % s2          N/A  thrpt   15  0.133 ±  0.003  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                                 s2 % l2          N/A  thrpt   15  0.764 ±  0.005  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                                 l3 % s3          N/A  thrpt   15  0.719 ±  0.005  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                                 s4 % l3          N/A  thrpt   15  0.852 ±  0.009  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                                 l2 % l3          N/A  thrpt   15  0.403 ±  0.003  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                       l2 % l3 % l4 % l1          N/A  thrpt   15  0.173 ±  0.001  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                               s1 * s2          N/A  thrpt   15  7.809 ±  0.024  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                     s1 * s2 * s5 * s6          N/A  thrpt   15  2.557 ±  0.021  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                               s3 * s4          N/A  thrpt   15  2.048 ±  0.036  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                               l2 * s2          N/A  thrpt   15  1.716 ±  0.010  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                     l2 * s2 * s5 * s6          N/A  thrpt   15  1.042 ±  0.003  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                               s1 * l2          N/A  thrpt   15  1.609 ±  0.017  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                               l1 * l2          N/A  thrpt   15  1.634 ±  0.016  ops/ms

*** Int128

Benchmark                                                                                                            (expression)  (precision)   Mode  Cnt  Score    Error   Units
BenchmarkDecimalOperators.additionBenchmark                                                                               s1 + s2          N/A  thrpt   15  6.160 ±  0.108  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                     s1 + s2 + s3 + s4          N/A  thrpt   15  5.067 ±  0.038  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                               l1 + l2          N/A  thrpt   15  2.154 ±  0.026  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                     l1 + l2 + l3 + l4          N/A  thrpt   15  0.510 ±  0.003  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                     s2 + l3 + l1 + s4          N/A  thrpt   15  1.012 ±  0.002  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                             lz1 + lz2          N/A  thrpt   15  4.932 ±  0.013  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                 lz1 + lz2 + lz3 + lz4          N/A  thrpt   15  2.884 ±  0.007  ops/ms
BenchmarkDecimalOperators.additionBenchmark                                                                   s2 + lz3 + lz1 + s4          N/A  thrpt   15  1.095 ±  0.006  ops/ms
BenchmarkDecimalOperators.castDecimalToDoubleBenchmark                                                                        N/A           15  thrpt   15  9.883 ±  0.040  ops/ms
BenchmarkDecimalOperators.castDecimalToDoubleBenchmark                                                                        N/A           35  thrpt   15  0.060 ±  0.001  ops/ms
BenchmarkDecimalOperators.castDecimalToVarcharBenchmark                                                                       N/A           15  thrpt   15  0.266 ±  0.003  ops/ms
BenchmarkDecimalOperators.castDecimalToVarcharBenchmark                                                                       N/A           35  thrpt   15  0.102 ±  0.001  ops/ms
BenchmarkDecimalOperators.castDoubleToDecimalBenchmark                                                                        N/A           10  thrpt   15  0.115 ±  0.010  ops/ms
BenchmarkDecimalOperators.castDoubleToDecimalBenchmark                                                                        N/A           35  thrpt   15  0.056 ±  0.003  ops/ms
BenchmarkDecimalOperators.castDoubleToDecimalBenchmark                                                                        N/A       BIGINT  thrpt   15  7.106 ±  0.033  ops/ms
BenchmarkDecimalOperators.decimalToShortDecimalCastBenchmark                                       cast(l_38_30 as decimal(8, 0))          N/A  thrpt   15  2.510 ±  0.006  ops/ms
BenchmarkDecimalOperators.decimalToShortDecimalCastBenchmark                                       cast(l_26_18 as decimal(8, 0))          N/A  thrpt   15  3.616 ±  0.005  ops/ms
BenchmarkDecimalOperators.decimalToShortDecimalCastBenchmark                                       cast(l_20_12 as decimal(8, 0))          N/A  thrpt   15  5.530 ±  0.026  ops/ms
BenchmarkDecimalOperators.decimalToShortDecimalCastBenchmark                                        cast(l_20_8 as decimal(8, 0))          N/A  thrpt   15  5.389 ±  0.023  ops/ms
BenchmarkDecimalOperators.decimalToShortDecimalCastBenchmark                                        cast(s_17_9 as decimal(8, 0))          N/A  thrpt   15  8.677 ±  0.077  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               s1 / s2          N/A  thrpt   15  6.093 ±  0.020  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                     s1 / s2 / s2 / s2          N/A  thrpt   15  2.473 ±  0.005  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               s1 / s3          N/A  thrpt   15  0.646 ±  0.007  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               s2 / l1          N/A  thrpt   15  1.104 ±  0.005  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               l1 / s2          N/A  thrpt   15  0.832 ±  0.003  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               s3 / l1          N/A  thrpt   15  0.649 ±  0.005  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                               l2 / l3          N/A  thrpt   15  0.443 ±  0.002  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                     l2 / l4 / l4 / l4          N/A  thrpt   15  0.264 ±  0.001  ops/ms
BenchmarkDecimalOperators.divisionBenchmark                                                                     l2 / s4 / s4 / s4          N/A  thrpt   15  0.322 ±  0.001  ops/ms
BenchmarkDecimalOperators.inequalityBenchmark                                                                             s1 < s2          N/A  thrpt   15  6.674 ±  0.028  ops/ms
BenchmarkDecimalOperators.inequalityBenchmark                 s1 < s2 AND s1 < s3 AND s1 < s4 AND s2 < s3 AND s2 < s4 AND s3 < s4          N/A  thrpt   15  4.346 ±  0.011  ops/ms
BenchmarkDecimalOperators.inequalityBenchmark                                                                             l1 < l2          N/A  thrpt   15  6.055 ±  0.055  ops/ms
BenchmarkDecimalOperators.inequalityBenchmark                 l1 < l2 AND l1 < l3 AND l1 < l4 AND l2 < l3 AND l2 < l4 AND l3 < l4          N/A  thrpt   15  2.499 ±  0.011  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                                 s1 % s2          N/A  thrpt   15  1.383 ±  0.006  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                       s1 % s2 % s2 % s2          N/A  thrpt   15  0.682 ±  0.002  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                                 s2 % l2          N/A  thrpt   15  1.278 ±  0.007  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                                 l3 % s3          N/A  thrpt   15  0.407 ±  0.018  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                                 s4 % l3          N/A  thrpt   15  0.473 ±  0.005  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                                 l2 % l3          N/A  thrpt   15  0.735 ±  0.003  ops/ms
BenchmarkDecimalOperators.moduloBenchmark                                                                       l2 % l3 % l4 % l1          N/A  thrpt   15  0.260 ±  0.001  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                               s1 * s2          N/A  thrpt   15  7.696 ±  0.051  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                     s1 * s2 * s5 * s6          N/A  thrpt   15  5.113 ±  0.022  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                               s3 * s4          N/A  thrpt   15  1.856 ±  0.018  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                               l2 * s2          N/A  thrpt   15  1.939 ±  0.043  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                     l2 * s2 * s5 * s6          N/A  thrpt   15  1.041 ±  0.008  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                               s1 * l2          N/A  thrpt   15  2.073 ±  0.023  ops/ms
BenchmarkDecimalOperators.multiplyBenchmark                                                                               l1 * l2          N/A  thrpt   15  1.995 ±  0.013  ops/ms

@martint martint marked this pull request as ready for review November 30, 2021 23:59
@martint martint changed the title WIP - two's complement representation for decimals Two's complement representation for decimals Dec 1, 2021
@martint martint requested a review from dain December 1, 2021 04:03
@martint martint force-pushed the decimal-twos-complement branch 2 times, most recently from c83cc0b to a902aeb Compare December 1, 2021 22:41
Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed Int128 and Int128Math so far.

core/trino-spi/src/main/java/io/trino/spi/type/Int128.java Outdated Show resolved Hide resolved
core/trino-spi/src/main/java/io/trino/spi/type/Int128.java Outdated Show resolved Hide resolved
public static int compare(long leftHigh, long leftLow, long rightHigh, long rightLow)
{
int comparison = Long.compare(leftHigh, rightHigh);
if (comparison == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it branchless?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like this?

int high = Long.compare(leftHigh, rightHigh);
int low = Long.compareUnsigned(leftLow, rightLow);
return high == 0 ? low : high;

It's not clear that that's actually generates branchless code (need to check the assembly), and it has extra cos of computing low unconditionally if the VM is not smart enough to move the computation inside the high == 0 check if it's not able to generate branchless code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather something like:

int highComparison = Long.compare(leftHigh, rightHigh);
int lowComparison = Long.compareUnsigned(leftLow, rightLow);
comparison = highComparison + ~(highComparison & 1) * lowComparison;

But I guess you can put it on the todo list and benchmark it later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

highComparison & 1

That may not work. Long.compare is not guaranteed to return -1, 0 or 1. According to its javadocs, it will return "a value less than 0 if x < y", "a value greater than 0 if x > y".

It would need to be adjusted to something like:

int high = Long.compare(aHigh, bHigh);
int low = Long.compareUnsigned(aLow, bLow);

int mask = ~(high | -high) >> 31;
int comparison = high + (mask & low);

@martint martint force-pushed the decimal-twos-complement branch 3 times, most recently from 81a6a9a to 692eab1 Compare December 3, 2021 00:15
Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through all of it. Looks good.

@martint martint force-pushed the decimal-twos-complement branch 5 times, most recently from d281075 to a07e645 Compare December 14, 2021 00:31
Both branches call the same method, so one of them is not needed.
Copy link
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor comments

Also, use a dedicated class to represent long decimals instead of Slice.
Avoid using intermediate BigIntegers when decoding long
decimals from Parquet and RCBinary
@martint martint merged commit 48581bd into trinodb:master Dec 23, 2021
@github-actions github-actions bot added this to the 368 milestone Dec 23, 2021
@martint martint mentioned this pull request Dec 23, 2021
@beliefer
Copy link

@martint Excuse me. I have a question. According to you provided micro benchmark. In some cases, the addition, multiply and division operators have regression. Do the trino fixed these issue? or how to fix them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

6 participants