New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Change ARRAY_INTERSECT to use TypedSet #11984

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@yingsu00
Contributor

yingsu00 commented Nov 27, 2018

Resolves:
#11493
#11978

This commit partly fixes the wrong results bug on primitie type arrays that
contains both NULL and 0-value. It depends on PR #11979 to solve the
issue on composite types involving NULL, and a rare case where both the
0-value and null have the same hash position.

It also improves the performance comparing to the original sort-merge
solution. The gain is from 2.3x to 11x for different data types.

before

Benchmark                                        (name)   (type)  Mode  Cnt    Score   Error  Units
BenchmarkArrayIntersect.arrayIntersect  array_intersect   BIGINT  avgt   20  193.023 ± 2.840  ns/op
BenchmarkArrayIntersect.arrayIntersect  array_intersect  VARCHAR  avgt   20  336.516 ± 5.301  ns/op
BenchmarkArrayIntersect.arrayIntersect  array_intersect   DOUBLE  avgt   20  219.812 ± 6.615  ns/op
BenchmarkArrayIntersect.arrayIntersect  array_intersect  BOOLEAN  avgt   20   23.913 ± 0.553  ns/op

After

Benchmark                                (type)  Mode  Cnt    Score   Error  Units
BenchmarkArrayIntersect.arrayIntersect   BIGINT  avgt   20   30.175 ± 5.077  ns/op
BenchmarkArrayIntersect.arrayIntersect  VARCHAR  avgt   20  146.365 ± 5.405  ns/op
BenchmarkArrayIntersect.arrayIntersect   DOUBLE  avgt   20   63.752 ± 5.329  ns/op
BenchmarkArrayIntersect.arrayIntersect  BOOLEAN  avgt   20    2.133 ± 0.243  ns/op

@yingsu00 yingsu00 requested review from dain and wenleix and removed request for dain and wenleix Nov 27, 2018

@yingsu00 yingsu00 changed the title from Change ARRAY_INTERSECT to use TypedSet to WIP Change ARRAY_INTERSECT to use TypedSet Nov 27, 2018

Change ARRAY_INTERSECT to use TypedSet
Resolves:
#11493
#11978

This commit partly fixes the wrong results bug on primitie type arrays that
contains both NULL and 0-value. It depends on PR #11979 to solve the
issue on composite types involving NULL, and a rare case where both the
0-value and null have the same hash position.

It also improves the performance comparing to the original sort-merge
solution. The gain is from 2.3x to 11x for different data types.

before
Benchmark                                        (name)   (type)  Mode  Cnt    Score   Error  Units
BenchmarkArrayIntersect.arrayIntersect  array_intersect   BIGINT  avgt   20  193.023 ± 2.840  ns/op
BenchmarkArrayIntersect.arrayIntersect  array_intersect  VARCHAR  avgt   20  336.516 ± 5.301  ns/op
BenchmarkArrayIntersect.arrayIntersect  array_intersect   DOUBLE  avgt   20  219.812 ± 6.615  ns/op
BenchmarkArrayIntersect.arrayIntersect  array_intersect  BOOLEAN  avgt   20   23.913 ± 0.553  ns/op

After
Benchmark                                (type)  Mode  Cnt    Score   Error  Units
BenchmarkArrayIntersect.arrayIntersect   BIGINT  avgt   20   30.175 ± 5.077  ns/op
BenchmarkArrayIntersect.arrayIntersect  VARCHAR  avgt   20  146.365 ± 5.405  ns/op
BenchmarkArrayIntersect.arrayIntersect   DOUBLE  avgt   20   63.752 ± 5.329  ns/op
BenchmarkArrayIntersect.arrayIntersect  BOOLEAN  avgt   20    2.133 ± 0.243  ns/op

@yingsu00 yingsu00 force-pushed the yingsu00:array_intersect branch from 5ab41f5 to 55f6511 Nov 27, 2018

}
// private static IntComparator intBlockCompare(Type type, Block block)

This comment has been minimized.

@findepi

findepi Nov 27, 2018

Member

nit: you probably want to remove this

@dain

I'm concerned about the GC pressure this function will create. The use of TypedSet effectively makes the pageBuilder unused in this class. Maybe TypeSet could be modified to take a constructor that says "build be a set from this existing block". This would eliminate all of the copying in this class. Alternatively, we could introduce a new class that does this. @haozhun thoughts?

}
}
return blockBuilder.getRegion(blockBuilder.getPositionCount() - intersectCount, intersectCount);

This comment has been minimized.

@dain

dain Nov 27, 2018

Contributor

I believe that intersectTypedSet already contains the output data, so maybe modify TypedSet to return elementBlock.build.

@wenleix

This comment has been minimized.

Contributor

wenleix commented Dec 2, 2018

I'm concerned about the GC pressure this function will create. The use of TypedSet effectively makes the pageBuilder unused in this class. Maybe TypeSet could be modified to take a constructor that says "build be a set from this existing block". This would eliminate all of the copying in this class. Alternatively, we could introduce a new class that does this. @haozhun thoughts?

We already have this class as JsonUtil#HashTable for JSON to map cast. We could make it a top level class and use it for other functions to avoid extra data copy.

@yingsu00

This comment has been minimized.

Contributor

yingsu00 commented Dec 6, 2018

I did some JMH benchmarks on the two possible changes to TypedSet:

  1. Instead of building an elementBlock inside of TypdeSet, we can directly pass in the block and only build a hashtable of block positions. This is what JsonUtil#HashTable does. This way we avoid one data copy and Block building, and also removed the 4MB limit on the elementBlock Size. However as @haozhun pointed out this may have memory GC issue for aggregatoin functions like MultimapAggregationFunction , because this way we have to hold the input block longer until we're done with the TypedSet, while the other array/map functions don't matter that much because the life cycle of the input blocks and TypedSet are not muh different. But since there're only one aggregation function, this can probably work.
    Another change we need to make this work is to be able to add more than one block to the TypedSet. There are a few use cases(listed in the table below) requires this, e.g. ArrayExceptFunction. This is doable, we just need to keep track of the added blocks inside of TypedSet, and maybe use some additional internal hashtable to find which block the value is in. There is a space penalty for this of course, but because we don't need to build the internal elementBlock the space requirement should not be more than existing implementation.

  2. There're a number of use cases the internal elementBlock can be DIRECTLY returned and used as results. Such cases are also listed in the table below. Theoretically this would save some cost of building the a separate result blockBuilder in caller(e.g. distinctElementBlockBuilder in ArrayDistinctFunction), however in the JMH benchmarks I'm not seeing any obvious performance gains, except that this should save memory usage any ways. If we do 1) we can also make TypdeSet to return a result block based on an array of positions, but again this doesn't show any performance gain/loss comparing to using a separate result block builder in the caller.

This table lists all usages of TypdedSet. We can see 5 of them requires the ability to add more than one Blocks so this feature is needed. Also there're 6 callers that can directly use the current TypedSet internal elementBlock as result. However if we use approach 1) and just pass in the input block, the benefit of saving memory usage by just returning the elementBlock directly goes away.

  Aggregation Function Input Blocks Handle > 1 blocks add Can directly use the elementBlock as output
MultimapAggregationFunction Y 1 N Y
ArrayDistinctFunction N 1 N Y
ArrayExceptFunction N 2 Y N
ArrayIntersectFunction N 2 N Y
ArrayUnionFunction N 2 Y Y
MapConcatFunction N n Y Y
MapFromEntriesFunction N 1 Y Y (for keyBlock)
MapToMapCast N 1 N Y(need to apply valueProcessFunction.invokeExact)
mapDotProduct N 2 N N
MultimapFromEntriesFunction N 1 N N

Without knowing the existence of JsonUtil#HashTable, I implemented a class NoCopyTypedSet and did a few benchmarks on array_distinct. The logic is similar to JsonUtil#HashTable, except NoCopyTypedSet handles nulls while JsonUtil#HashTable doesnt. If we use JsonUtil#HashTable directly to implement functions like array_distinct it would give wrong results on 0 and null. As a result NoCopyTypedSet is slightly slower than HashTable for about 10%. The tests were done on arrays with 10000 varchar values.

The first test shows directly returning elementBlock doesn't have much gain in performance:

TypedSet, caller build the result Block in a separrate block builder:
BenchmarkArrayDistinct.arrayDistinct array_distinct avgt 20 35.440 ± 2.055 ns/op

TypedSet, directly return elementBlock
BenchmarkArrayDistinct.arrayDistinct array_distinct avgt 20 34.890 ± 0.916 ns/op

The next test using NoCopyTypedSet compares 1) return a block in NoCopyTypedSet .getBlock() vs 2)the caller uses a separate block builder:

NoCopyTypedSet return the result Block in NoCopyTypedSet.getBlock() by calling Block.getPositions():
BenchmarkArrayDistinct.arrayDistinct array_distinct avgt 20 29.871 ± 2.498 ns/op

NoCopyTYpedSet, caller use separate blockBuilder
BenchmarkArrayDistinct.arrayDistinct array_distinct avgt 20 28.951 ± 1.027 ns/op

Again there's no obvious difference. NoCopyTypedSet is faster than TypedSet because I changed the internal hashtable to int[] instead of IntArrayList.

The JsonUtil#HashTable results;
JsonUtil#HashTable, return the result Block in NoCopyTypedSet.getBlock()
BenchmarkArrayDistinct.arrayDistinct array_distinct avgt 20 26.472 ± 0.411 ns/op

JsonUtil#HashTable, caller use separate blockBuilder
BenchmarkArrayDistinct.arrayDistinct array_distinct avgt 20 27.062 ± 0.794 ns/op

Based on the above, I think the best way is to use approach 1), ie. Make JsonUtil#HashTable a top level class but make it a) be able to handle nulls correctly b) be able to add > 1 blocks. This way we can remove the 4MB limit which users complaint about. MultimapAggregationFunction can still use original TypedSet if we observe the memory pressure. I can make this work for array_intersect first and watch for any performance penalty for adding > 1 blocks. @dain @wenleix @haozhun what do you think?

@wenleix

This comment has been minimized.

Contributor

wenleix commented Dec 11, 2018

The benchmark result at top looks great. Note the bigger input arrays are, the better improvements we will see. Thus can we look at the production workload, and generate benchmark with input array of that size ? Given the stack depth of the frame-chart , I suspect the speed-up is likely closer to the upper side in your benchmark.

@wenleix

This comment has been minimized.

Contributor

wenleix commented Dec 11, 2018

Given the complicated trade-offs between different proposals on TypedSet, I would suggest taking one step back and focusing on impact -- switch from sort based solution to a hash based solution.

The blocking issue (#11984 (review)) is to avoid new BlockBuilder created by TypedSet per each invocation. We can just change TypedSet to allow a provided container BlockBuilder to achieve this goal. All the follow-up optimizations and proposals are great, but let's focus on impact and get rid of sort-based array_intersect first ;) . Does that make sense? @haozhun , @yingsu00

Also note, for heavy use cases, we can always use specialized implementation (For example, we can use LongSet instead of TypedSet for BIGINT)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment