Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(sql): support uuid and long256 in parallel GROUP BY #4140

Merged
merged 16 commits into from Jan 22, 2024

Conversation

nwoolmer
Copy link
Contributor

@nwoolmer nwoolmer commented Jan 17, 2024

Corresponds to this issue: #4120

Ported GroupByLongHashSet to Long128 (GroupByLong128HashSet) and Long256 (GroupByLong256HashSet).
Added tests and benchmarks for the above new hash sets.
Added QuaternaryFunction (4 arg equivalent of Unary/Binary/Ternary classes).
Added to_long256(LLLL) function (equivalent to to_uuid(LL) or to_long128(LL)) as LongsToLong256Function.
Updated count_distinct() for both UUID and Long256 to use the new hash sets.
Added tests for CountDistinctLong256GroupBy.
Updated Long, Long128 and Long256 benchmarks to align with these updates: #4134

Microbenchmarks:

Long

Benchmark                                            (size)  Mode  Cnt    Score    Error  Units
GroupByLongHashSetBenchmark.testGroupByLongHashSet     5000  avgt    3   10.319 ±  0.965  ns/op
GroupByLongHashSetBenchmark.testGroupByLongHashSet    50000  avgt    3   14.073 ±  1.312  ns/op
GroupByLongHashSetBenchmark.testGroupByLongHashSet   500000  avgt    3   33.237 ±  1.161  ns/op
GroupByLongHashSetBenchmark.testGroupByLongHashSet  5000000  avgt    3   60.773 ±  7.917  ns/op
GroupByLongHashSetBenchmark.testOrderedMap             5000  avgt    3   27.105 ±  8.215  ns/op
GroupByLongHashSetBenchmark.testOrderedMap            50000  avgt    3   27.707 ± 11.362  ns/op
GroupByLongHashSetBenchmark.testOrderedMap           500000  avgt    3  134.037 ± 93.003  ns/op
GroupByLongHashSetBenchmark.testOrderedMap          5000000  avgt    3  183.419 ± 32.628  ns/op
Old benches Long128 Benchmark (size) Mode Cnt Score Error Units GroupByLong128HashSetBenchmark.testGroupByLong128HashSet 5000 avgt 3 133.559 ± 24.516 ns/op GroupByLong128HashSetBenchmark.testGroupByLong128HashSet 50000 avgt 3 184.387 ± 11.863 ns/op GroupByLong128HashSetBenchmark.testGroupByLong128HashSet 500000 avgt 3 180.291 ± 38.545 ns/op GroupByLong128HashSetBenchmark.testGroupByLong128HashSet 5000000 avgt 3 180.389 ± 21.280 ns/op GroupByLong128HashSetBenchmark.testOrderedMap 5000 avgt 3 244.329 ± 29.777 ns/op GroupByLong128HashSetBenchmark.testOrderedMap 50000 avgt 3 194.403 ± 6.899 ns/op GroupByLong128HashSetBenchmark.testOrderedMap 500000 avgt 3 191.533 ± 6.748 ns/op GroupByLong128HashSetBenchmark.testOrderedMap 5000000 avgt 3 191.723 ± 2.794 ns/op Long256 Benchmark (size) Mode Cnt Score Error Units GroupByLong256HashSetBenchmark.testGroupByLong256HashSet 5000 avgt 3 310.210 ± 135.663 ns/op GroupByLong256HashSetBenchmark.testGroupByLong256HashSet 50000 avgt 3 256.012 ± 32.347 ns/op GroupByLong256HashSetBenchmark.testGroupByLong256HashSet 500000 avgt 3 258.856 ± 56.371 ns/op GroupByLong256HashSetBenchmark.testGroupByLong256HashSet 5000000 avgt 3 275.527 ± 326.510 ns/op GroupByLong256HashSetBenchmark.testOrderedMap 5000 avgt 3 205.034 ± 38.691 ns/op GroupByLong256HashSetBenchmark.testOrderedMap 50000 avgt 3 205.936 ± 94.791 ns/op GroupByLong256HashSetBenchmark.testOrderedMap 500000 avgt 3 199.510 ± 38.080 ns/op GroupByLong256HashSetBenchmark.testOrderedMap 5000000 avgt 3 215.477 ± 159.120 ns/op

New benches
Long128

Benchmark                                                  (size)  Mode  Cnt    Score    Error  Units
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet     2500  avgt    3   76.840 ±  3.901  ns/op
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet    25000  avgt    3  175.917 ± 55.816  ns/op
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet   250000  avgt    3  168.134 ± 78.253  ns/op
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet  2500000  avgt    3  164.173 ± 27.133  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap                2500  avgt    3  229.978 ± 22.202  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap               25000  avgt    3  204.169 ±  4.719  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap              250000  avgt    3  196.680 ± 32.682  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap             2500000  avgt    3  196.723 ± 27.970  ns/op

Long256

Benchmark                                                  (size)  Mode  Cnt    Score    Error  Units
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet     1250  avgt    3  231.235 ± 61.326  ns/op
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet    12500  avgt    3  232.874 ± 47.436  ns/op
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet   125000  avgt    3  227.261 ± 11.838  ns/op
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet  1250000  avgt    3  235.956 ± 28.635  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap                1250  avgt    3  206.105 ± 34.569  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap               12500  avgt    3  203.992 ± 12.676  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap              125000  avgt    3  204.173 ±  3.784  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap             1250000  avgt    3  203.260 ±  2.665  ns/op

Currently microbenching at (YMMV):

Benchmark                                                 Mode  Cnt  Score   Error  Units
GroupByLong128HashSetBenchmark.baseline                   avgt    3  0.009 ± 0.002  us/op
GroupByLong128HashSetBenchmark.testFastMap                avgt    3  0.197 ± 0.056  us/op
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet  avgt    3  0.146 ± 0.009  us/op
Includes port of tests from similar CountDistinctLongGroupByFunctionFactory.
Currently benches at:

Benchmark                                                 Mode  Cnt  Score   Error  Units
GroupByLong256HashSetBenchmark.baseline                   avgt    3  0.015 ± 0.001  us/op
GroupByLong256HashSetBenchmark.testFastMap                avgt    3  0.214 ± 0.023  us/op
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet  avgt    3  0.251 ± 0.039  us/op
Currently debugging issues with the tests. Added to_long256 to help to align the tests across the long/long128/long256 count distinct groupby functions. Inner expression aren't parsing as expected.
…nchmark.

Benchmark                                                  (size)  Mode  Cnt    Score    Error  Units
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet     5000  avgt    3  133.559 ± 24.516  ns/op
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet    50000  avgt    3  184.387 ± 11.863  ns/op
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet   500000  avgt    3  180.291 ± 38.545  ns/op
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet  5000000  avgt    3  180.389 ± 21.280  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap                5000  avgt    3  244.329 ± 29.777  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap               50000  avgt    3  194.403 ±  6.899  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap              500000  avgt    3  191.533 ±  6.748  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap             5000000  avgt    3  191.723 ±  2.794  ns/op

Benchmark                                                  (size)  Mode  Cnt    Score     Error  Units
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet     5000  avgt    3  310.210 ± 135.663  ns/op
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet    50000  avgt    3  256.012 ±  32.347  ns/op
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet   500000  avgt    3  258.856 ±  56.371  ns/op
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet  5000000  avgt    3  275.527 ± 326.510  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap                5000  avgt    3  205.034 ±  38.691  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap               50000  avgt    3  205.936 ±  94.791  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap              500000  avgt    3  199.510 ±  38.080  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap             5000000  avgt    3  215.477 ± 159.120  ns/op
Benchmark                                            (size)  Mode  Cnt    Score    Error  Units
GroupByLongHashSetBenchmark.testGroupByLongHashSet     5000  avgt    3   10.319 ±  0.965  ns/op
GroupByLongHashSetBenchmark.testGroupByLongHashSet    50000  avgt    3   14.073 ±  1.312  ns/op
GroupByLongHashSetBenchmark.testGroupByLongHashSet   500000  avgt    3   33.237 ±  1.161  ns/op
GroupByLongHashSetBenchmark.testGroupByLongHashSet  5000000  avgt    3   60.773 ±  7.917  ns/op
GroupByLongHashSetBenchmark.testOrderedMap             5000  avgt    3   27.105 ±  8.215  ns/op
GroupByLongHashSetBenchmark.testOrderedMap            50000  avgt    3   27.707 ± 11.362  ns/op
GroupByLongHashSetBenchmark.testOrderedMap           500000  avgt    3  134.037 ± 93.003  ns/op
GroupByLongHashSetBenchmark.testOrderedMap          5000000  avgt    3  183.419 ± 32.628  ns/op
@puzpuzpuz puzpuzpuz self-requested a review January 17, 2024 16:09
@puzpuzpuz puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Jan 18, 2024
CountDistinctLong256GroupByFunction/CountDistinctUUIDGroupByFunction:
Adjusted zero -> null mapping to only map nulls when key is entirely zero.
Swapped isParallelismSupported() to use the superclass version for Long/Long128/Long256

GroupByLong128HashSet/GroupByLong256HashSet
Adjusted ascii table comments
Swapped keyAt functions to keyAddrAt and inlined the offsets, so keyAddr is only calculated once.
Calculate address once in setKeyAt
Adjusted benchmark values to half or quarter sizes to account for larger size of inserts (aiming to get cache performance impacting the benchmark)

Hash
Correct algorithm for hashLong256.

Long256Impl
Added isNull(LLLL) overload.

LongToLong256FunctionFactory
Adjust whitespace and remove unnecessary 'this'.
Use new Long256Impl.isNull(LLLL)

QuaternaryFunction
Adjust whitespace and naming.

RecordSinkSPI and Unordered16Map
Adjust names.
Long128

Benchmark                                                  (size)  Mode  Cnt    Score    Error  Units
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet     2500  avgt    3   76.840 ±  3.901  ns/op
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet    25000  avgt    3  175.917 ± 55.816  ns/op
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet   250000  avgt    3  168.134 ± 78.253  ns/op
GroupByLong128HashSetBenchmark.testGroupByLong128HashSet  2500000  avgt    3  164.173 ± 27.133  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap                2500  avgt    3  229.978 ± 22.202  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap               25000  avgt    3  204.169 ±  4.719  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap              250000  avgt    3  196.680 ± 32.682  ns/op
GroupByLong128HashSetBenchmark.testOrderedMap             2500000  avgt    3  196.723 ± 27.970  ns/op

Long256

Benchmark                                                  (size)  Mode  Cnt    Score    Error  Units
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet     1250  avgt    3  231.235 ± 61.326  ns/op
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet    12500  avgt    3  232.874 ± 47.436  ns/op
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet   125000  avgt    3  227.261 ± 11.838  ns/op
GroupByLong256HashSetBenchmark.testGroupByLong256HashSet  1250000  avgt    3  235.956 ± 28.635  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap                1250  avgt    3  206.105 ± 34.569  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap               12500  avgt    3  203.992 ± 12.676  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap              125000  avgt    3  204.173 ±  3.784  ns/op
GroupByLong256HashSetBenchmark.testOrderedMap             1250000  avgt    3  203.260 ±  2.665  ns/op
UUID
-------------------------------------------------------------------------------
Test set: io.questdb.test.griffin.engine.functions.groupby.CountDistinctUuidGroupByFunctionFactoryTest
-------------------------------------------------------------------------------
Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.419 sec <<< FAILURE! - in io.questdb.test.griffin.engine.functions.groupby.CountDistinctUuidGroupByFunctionFactoryTest
testMappingZeroToNulls(io.questdb.test.griffin.engine.functions.groupby.CountDistinctUuidGroupByFunctionFactoryTest)  Time elapsed: 0.046 sec  <<< FAILURE!
java.lang.AssertionError: expected:<a	s
a	4
> but was:<a	s
a	3
>

Long256
-------------------------------------------------------------------------------
Test set: io.questdb.test.griffin.engine.functions.groupby.CountDistinctLong256GroupByFunctionFactoryTest
-------------------------------------------------------------------------------
Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.288 sec <<< FAILURE! - in io.questdb.test.griffin.engine.functions.groupby.CountDistinctLong256GroupByFunctionFactoryTest
testMappingZeroToNulls(io.questdb.test.griffin.engine.functions.groupby.CountDistinctLong256GroupByFunctionFactoryTest)  Time elapsed: 0.054 sec  <<< FAILURE!
java.lang.AssertionError: expected:<a	s
a	4
> but was:<a	s
a	3
>
UUID
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0

Long256
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0
Added oracles to the hash set test.

Added extra 0/null mapping cases to the griffin functions.
Copy link
Contributor

@puzpuzpuz puzpuzpuz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for your contribution!

@bluestreak01 bluestreak01 merged commit cd11bdd into questdb:master Jan 22, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Performance improvements SQL Issues or changes relating to SQL execution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants