Hash probe side spilling support #8894

xiaoxmeng · 2024-02-28T18:52:14Z

Add spilling support at hash probe side to handle the memory arbitration request
after the build operators have built the hash table and is being processed by the
probe side. We leverage the existing spilling facility built in hash join bridge to support
this and the following extensions made to probe side (build side support and join bridge
extension are already landed):
(1) make hash probe operators to wait for the other peers when finish processing the
current probe inputs (either from source or previously spilled input) no matter the join has
more spilled data to process or not. This is to handle the edge case that the spilling is
triggered at some slow probe operators and we need all the probe operators to be present
to handle the split hash table and the rest of steps. This is due to the limitation of the current
allPeersFinished implementation which expects all the drivers to be present in the pipeline to
function;
(1) add reclaim() method to interface with memory arbitration which checks if a probe operator
is spillable: if the table has been set and has data; if we have set input spiller to spill the input
as we don't support recursive input spill (which will never be the case as if build has triggered
spill, it will spill all the partitions for now so the probe side will always have an empty table if it
needs spill the input);
(2) add output spiller to spill the output produced by the current pending input. We parallelize the
output spill with one thread per each probe operator;
(3) if any one of the probe operators has no input to process (it hasn't received the no more input
signal), then we have to spill the built hash table, and we parallelize this by one thread per each
sub-hash table;
(4) free the memory held by the spilled hash table;
(5) setup input spiller for the rest of probe inputs;

Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM
injection and query abort injections.

netlify · 2024-02-28T18:52:35Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`a614026`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/66066f46d5cec70008b8ab00

Summary: Add spilling support at hash probe side to handle the memory arbitration request after the build operators have built the hash table and is being processed by the probe side. We leverage the existing spilling facility built in hash join bridge to support this and the following extensions made to probe side (build side support and join bridge extension are already landed): (1) make hash probe operators to wait for the other peers when finish processing the current probe inputs (either from source or previously spilled input) no matter the join has more spilled data to process or not. This is to handle the edge case that the spilling is triggered at some slow probe operators and we need all the probe operators to be present to handle the split hash table and the rest of steps. This is due to the limitation of the current allPeersFinished implementation which expects all the drivers to be present in the pipeline to function; (1) add reclaim() method to interface with memory arbitration which checks if a probe operator is spillable: if the table has been set and has data; if we have set input spiller to spill the input as we don't support recursive input spill (which will never be the case as if build has triggered spill, it will spill all the partitions for now so the probe side will always have an empty table if it needs spill the input); (2) add output spiller to spill the output produced by the current pending input. We parallelize the output spill with one thread per each probe operator; (3) if any one of the probe operators has no input to process (it hasn't received the no more input signal), then we have to spill the built hash table, and we parallelize this by one thread per each sub-hash table; (4) free the memory held by the spilled hash table; (5) setup input spiller for the rest of probe inputs; Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM injection and query abort injections. Reviewed By: bikramSingh91, oerling Differential Revision: D55054964 Pulled By: xiaoxmeng

facebook-github-bot · 2024-03-28T06:06:03Z