Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
WIP Change ARRAY_INTERSECT to use TypedSet #11984
This commit partly fixes the wrong results bug on primitie type arrays that
It also improves the performance comparing to the original sort-merge
I'm concerned about the GC pressure this function will create. The use of
TypedSet effectively makes the
pageBuilder unused in this class. Maybe
TypeSet could be modified to take a constructor that says "build be a set from this existing block". This would eliminate all of the copying in this class. Alternatively, we could introduce a new class that does this. @haozhun thoughts?
We already have this class as
I did some JMH benchmarks on the two possible changes to TypedSet:
This table lists all usages of TypdedSet. We can see 5 of them requires the ability to add more than one Blocks so this feature is needed. Also there're 6 callers that can directly use the current TypedSet internal elementBlock as result. However if we use approach 1) and just pass in the input block, the benefit of saving memory usage by just returning the elementBlock directly goes away.
Without knowing the existence of JsonUtil#HashTable, I implemented a class NoCopyTypedSet and did a few benchmarks on array_distinct. The logic is similar to JsonUtil#HashTable, except NoCopyTypedSet handles nulls while JsonUtil#HashTable doesnt. If we use JsonUtil#HashTable directly to implement functions like array_distinct it would give wrong results on 0 and null. As a result NoCopyTypedSet is slightly slower than HashTable for about 10%. The tests were done on arrays with 10000 varchar values.
The first test shows directly returning elementBlock doesn't have much gain in performance:
TypedSet, caller build the result Block in a separrate block builder:
TypedSet, directly return elementBlock
The next test using NoCopyTypedSet compares 1) return a block in NoCopyTypedSet .getBlock() vs 2)the caller uses a separate block builder:
NoCopyTypedSet return the result Block in NoCopyTypedSet.getBlock() by calling Block.getPositions():
NoCopyTYpedSet, caller use separate blockBuilder
Again there's no obvious difference. NoCopyTypedSet is faster than TypedSet because I changed the internal hashtable to int instead of IntArrayList.
The JsonUtil#HashTable results;
JsonUtil#HashTable, caller use separate blockBuilder
Based on the above, I think the best way is to use approach 1), ie. Make JsonUtil#HashTable a top level class but make it a) be able to handle nulls correctly b) be able to add > 1 blocks. This way we can remove the 4MB limit which users complaint about. MultimapAggregationFunction can still use original TypedSet if we observe the memory pressure. I can make this work for array_intersect first and watch for any performance penalty for adding > 1 blocks. @dain @wenleix @haozhun what do you think?
The benchmark result at top looks great. Note the bigger input arrays are, the better improvements we will see. Thus can we look at the production workload, and generate benchmark with input array of that size ? Given the stack depth of the frame-chart , I suspect the speed-up is likely closer to the upper side in your benchmark.
Given the complicated trade-offs between different proposals on
The blocking issue (#11984 (review)) is to avoid new
Also note, for heavy use cases, we can always use specialized implementation (For example, we can use