Add data columns to join hash table payload #387

ienkovich · 2019-09-09T11:55:36Z

Analyzing profiles for TPC-H queries I found there are many cache misses when we access columns with scan index > 0 and this is the main performance issue for join queries. I tried to fight with those LLC misses by storing required columns in join hash tables as a payload. Basically this gives us row representation for joined tables which can significantly reduce number of LLC misses. In case join results small number or no matches at all this patch can harm performance because stored payload is not used and therefore increased hash table size and payload build time are not payed off.

The patch adds payload support only for JoinHashTable which is used by most of TPC-H queries. There are performance results (SF=30):

TPC-H	Ref (ms)	Patched(ms)	Gain
Q1	914	920	0.99347826
Q5	4029	2992	1.34659091
Q8	453	455	0.9956044
Q7	4552	3029	1.50280621
Q8	8124	5929	1.3702142
Q9	21860	15748	1.38811278
Q10	7342	5849	1.25525731
Q11	818	830	0.98554217
Q12	428	455	0.94065934
Q18	24406	25404	0.96071485
Q19	18278	9407	1.94302115

As expected we can get significant gain when stored payload is frequently used and some performance loss due to increased hash table size when payload is not used much.

Current implementation has some work to do:

row ids can be actually completely removed from the table is all required columns are stored as a payload. For one-to-one tables this requires some new way to detect empty hash entries. In case payload has not-null columns a NULL value can be used for that. This should decrease table size and hopefully reduce performance losses in bad scenarios
hash table cache can be improved to better re-use tables in case of partial table key match (e.g. we can re-use only a subset of columns from cached table)

Still I believe this patch is enough to start discussion of the approach and to collect some performance data. I put the feature under a flag to simplify performance measurements. For now it is disabled by default but I hope we can enable it by default eventually.

asuhan · 2019-09-11T02:53:43Z

Hi @ienkovich , thank you for the patch. This is something we thought about quite a bit in the past, there is a case to be made for it and the benchmarks prove it. As a default policy, it has at least two drawbacks:

Decrease the cache hit rate. In the current world, we can reuse the same hash table with different payloads. If we fuse the payload into the hash table, that's not true anymore.
In certain cases (join on dictionary strings with different encodings) we're building the hash table on the host and use it on GPU. With this approach, we force the payload to be fetched on the host, which happens at a lower bandwidth than on GPU.

I don't think we'd be fine with taking any regression, so the right approach would be to collect runtime statistics about the workload and whether doing this would be advantageous. In other words, we'd have this as an additional policy which kicks in selectively, based on workload.

On the other hand, I see no obvious blocker to merging this behind the flag you've added, as long as all tests pass with the flag set. The smart policy which activates it can be follow-up work, there's no requirement as far as I'm concerned to bundle it with this work. We'll still need to review this carefully since it's a rather big change, but as long as changes are well isolated we should be able to take it. This has great potential, thanks!

ienkovich · 2019-09-11T14:11:56Z

I understand the initial version is not good enough to become a default strategy. For now it can be a tool to find cases when payload really helps. Hopefully we will come with some heuristics to enable it in most profitable cases.

I agree cache reusage becomes an issue. This first implementation simply checks payload is the same. This definitely should be relaxed to allow better reusage (probably even if required payload is partly cached). But anyway payload is basically an additional index to speed-up query and it has to come with additional maintenance cost.

I'm not sure fused payload can be profitable on GPU as much as on CPU and enabled it for CPU only.

alexbaden · 2019-09-16T17:20:49Z

Hi @ienkovich

I am having some problems building this with ENABLE_CUDA = on:

[ 47%] Linking CXX static library libthrift_handler.a
/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntime.cpp(124): error: identifier "fixed_width_int_decode_noinline" is undefined

/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntimeGpu.cu(40): error: no suitable conversion function from "const JoinColumn" to "int32_t" exists

/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntimeGpu.cu(41): error: no suitable user-defined conversion from "const JoinColumnTypeInfo" to "JoinColumn" exists

/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntimeGpu.cu(42): error: no suitable constructor exists to convert from "long" to "JoinColumnTypeInfo"

/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntimeGpu.cu(44): warning: integer conversion resulted in a change of sign

/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntimeGpu.cu(45): error: argument of type "int" is incompatible with parameter of type "const void *"

/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntimeGpu.cu(46): error: argument of type "int64_t" is incompatible with parameter of type "const void *"

/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntimeGpu.cu(46): error: too few arguments in function call

/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntimeGpu.cu(140): error: too few arguments in function call

/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntimeGpu.cu(230): error: identifier "fill_row_ids_gpu" is undefined

/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntimeGpu.cu(274): error: identifier "fill_row_ids_bucketized_gpu" is undefined

[ 47%] Built target thrift_handler
/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntime.cpp(838): error: identifier "payload" is undefined

/home/alexb/Projects/omniscidb/QueryEngine/HashJoinRuntime.cpp(838): error: identifier "payload_count" is undefined

12 errors detected in the compilation of "/tmp/tmpxft_000441b4_00000000-6_HashJoinRuntimeGpu.cpp1.ii".
make[2]: *** [QueryEngine/CMakeFiles/QueryEngine.dir/build.make:96: QueryEngine/HashJoinRuntimeGpu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
[ 48%] Linking CXX static library libChunk.a
[ 48%] Built target Chunk
make[1]: *** [CMakeFiles/Makefile2:1379: QueryEngine/CMakeFiles/QueryEngine.dir/all] Error 2
make: *** [Makefile:163: all] Error 2

Were you able to build w/ CUDA enabled?

ienkovich · 2019-09-16T17:56:15Z

The patch was tested on not CUDA enabled machine. I'll fix the CUDA build.

alexbaden · 2019-09-16T17:57:43Z

If you have the cuda drivers but no GPUs you should still be able to build -- we need to make some modifications in our unit test framework to pass on a CUDA build with no GPUs available. If you don't want to pollute your env, nvidia-docker is a good option -- we have pre-built containers here: https://hub.docker.com/r/omnisci/core-build-centos7-cuda10.0

ienkovich · 2019-09-17T18:36:42Z

CUDA build should be fixed now but I found another issue, in some case a wrong payload is loaded into a hash table. I couldn't find the reason yet. Will look into it.

ienkovich · 2019-09-18T07:15:36Z

Fixed issue with wrong payload loaded.

alexbaden · 2019-09-20T17:30:24Z

This looks good, I've merged all the commits from our internal repo over and this repo is now fully up to date. Do you mind rebasing and fixing the conflicts? After that we can get it merged.

ienkovich · 2019-09-23T20:25:14Z

Thanks for review! I fixed conflicts.

asuhan · 2019-09-25T02:01:21Z

@alexbaden Let's check that the IR is unchanged (if the flag is not set) before merging.

ienkovich · 2019-09-27T10:34:41Z

I compared generated IRs (using IR log channel) for TPC-H queries and found no difference.

Add row payload type to join hash table

9b695a6

Fix CUDA build

818e360

Fix fn names and payload fetch

a2c5a57

Merge branch 'master' into ienkovich/row-join-hash-cpu

26d9cc4

alexbaden mentioned this pull request Dec 1, 2019

[rebase] Add data columns to join hash table payload #457

Open

intel-ai closed this by deleting the head repository Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data columns to join hash table payload #387

Add data columns to join hash table payload #387

ienkovich commented Sep 9, 2019

asuhan commented Sep 11, 2019

ienkovich commented Sep 11, 2019

alexbaden commented Sep 16, 2019

ienkovich commented Sep 16, 2019

alexbaden commented Sep 16, 2019

ienkovich commented Sep 17, 2019

ienkovich commented Sep 18, 2019

alexbaden commented Sep 20, 2019

ienkovich commented Sep 23, 2019

asuhan commented Sep 25, 2019

ienkovich commented Sep 27, 2019

Add data columns to join hash table payload #387

Add data columns to join hash table payload #387

Conversation

ienkovich commented Sep 9, 2019

asuhan commented Sep 11, 2019

ienkovich commented Sep 11, 2019

alexbaden commented Sep 16, 2019

ienkovich commented Sep 16, 2019

alexbaden commented Sep 16, 2019

ienkovich commented Sep 17, 2019

ienkovich commented Sep 18, 2019

alexbaden commented Sep 20, 2019

ienkovich commented Sep 23, 2019

asuhan commented Sep 25, 2019

ienkovich commented Sep 27, 2019