-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash probe side spilling support #8894
Conversation
✅ Deploy Preview for meta-velox canceled.
|
96612d5
to
6813bc5
Compare
50d19e6
to
b6f6cd7
Compare
0bd12f8
to
940b2cf
Compare
Summary: Add spilling support at hash probe side to handle the memory arbitration request after the build operators have built the hash table and is being processed by the probe side. We leverage the existing spilling facility built in hash join bridge to support this and the following extensions made to probe side (build side support and join bridge extension are already landed): (1) make hash probe operators to wait for the other peers when finish processing the current probe inputs (either from source or previously spilled input) no matter the join has more spilled data to process or not. This is to handle the edge case that the spilling is triggered at some slow probe operators and we need all the probe operators to be present to handle the split hash table and the rest of steps. This is due to the limitation of the current allPeersFinished implementation which expects all the drivers to be present in the pipeline to function; (1) add reclaim() method to interface with memory arbitration which checks if a probe operator is spillable: if the table has been set and has data; if we have set input spiller to spill the input as we don't support recursive input spill (which will never be the case as if build has triggered spill, it will spill all the partitions for now so the probe side will always have an empty table if it needs spill the input); (2) add output spiller to spill the output produced by the current pending input. We parallelize the output spill with one thread per each probe operator; (3) if any one of the probe operators has no input to process (it hasn't received the no more input signal), then we have to spill the built hash table, and we parallelize this by one thread per each sub-hash table; (4) free the memory held by the spilled hash table; (5) setup input spiller for the rest of probe inputs; Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM injection and query abort injections. Reviewed By: bikramSingh91, oerling Differential Revision: D55054964 Pulled By: xiaoxmeng
This pull request was exported from Phabricator. Differential Revision: D55054964 |
Summary: Add spilling support at hash probe side to handle the memory arbitration request after the build operators have built the hash table and is being processed by the probe side. We leverage the existing spilling facility built in hash join bridge to support this and the following extensions made to probe side (build side support and join bridge extension are already landed): (1) make hash probe operators to wait for the other peers when finish processing the current probe inputs (either from source or previously spilled input) no matter the join has more spilled data to process or not. This is to handle the edge case that the spilling is triggered at some slow probe operators and we need all the probe operators to be present to handle the split hash table and the rest of steps. This is due to the limitation of the current allPeersFinished implementation which expects all the drivers to be present in the pipeline to function; (1) add reclaim() method to interface with memory arbitration which checks if a probe operator is spillable: if the table has been set and has data; if we have set input spiller to spill the input as we don't support recursive input spill (which will never be the case as if build has triggered spill, it will spill all the partitions for now so the probe side will always have an empty table if it needs spill the input); (2) add output spiller to spill the output produced by the current pending input. We parallelize the output spill with one thread per each probe operator; (3) if any one of the probe operators has no input to process (it hasn't received the no more input signal), then we have to spill the built hash table, and we parallelize this by one thread per each sub-hash table; (4) free the memory held by the spilled hash table; (5) setup input spiller for the rest of probe inputs; Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM injection and query abort injections. Reviewed By: bikramSingh91, oerling Differential Revision: D55054964 Pulled By: xiaoxmeng
This pull request was exported from Phabricator. Differential Revision: D55054964 |
Summary: Add spilling support at hash probe side to handle the memory arbitration request after the build operators have built the hash table and is being processed by the probe side. We leverage the existing spilling facility built in hash join bridge to support this and the following extensions made to probe side (build side support and join bridge extension are already landed): (1) make hash probe operators to wait for the other peers when finish processing the current probe inputs (either from source or previously spilled input) no matter the join has more spilled data to process or not. This is to handle the edge case that the spilling is triggered at some slow probe operators and we need all the probe operators to be present to handle the split hash table and the rest of steps. This is due to the limitation of the current allPeersFinished implementation which expects all the drivers to be present in the pipeline to function; (1) add reclaim() method to interface with memory arbitration which checks if a probe operator is spillable: if the table has been set and has data; if we have set input spiller to spill the input as we don't support recursive input spill (which will never be the case as if build has triggered spill, it will spill all the partitions for now so the probe side will always have an empty table if it needs spill the input); (2) add output spiller to spill the output produced by the current pending input. We parallelize the output spill with one thread per each probe operator; (3) if any one of the probe operators has no input to process (it hasn't received the no more input signal), then we have to spill the built hash table, and we parallelize this by one thread per each sub-hash table; (4) free the memory held by the spilled hash table; (5) setup input spiller for the rest of probe inputs; Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM injection and query abort injections. Reviewed By: bikramSingh91, oerling Differential Revision: D55054964 Pulled By: xiaoxmeng
This pull request was exported from Phabricator. Differential Revision: D55054964 |
Summary: Add spilling support at hash probe side to handle the memory arbitration request after the build operators have built the hash table and is being processed by the probe side. We leverage the existing spilling facility built in hash join bridge to support this and the following extensions made to probe side (build side support and join bridge extension are already landed): (1) make hash probe operators to wait for the other peers when finish processing the current probe inputs (either from source or previously spilled input) no matter the join has more spilled data to process or not. This is to handle the edge case that the spilling is triggered at some slow probe operators and we need all the probe operators to be present to handle the split hash table and the rest of steps. This is due to the limitation of the current allPeersFinished implementation which expects all the drivers to be present in the pipeline to function; (1) add reclaim() method to interface with memory arbitration which checks if a probe operator is spillable: if the table has been set and has data; if we have set input spiller to spill the input as we don't support recursive input spill (which will never be the case as if build has triggered spill, it will spill all the partitions for now so the probe side will always have an empty table if it needs spill the input); (2) add output spiller to spill the output produced by the current pending input. We parallelize the output spill with one thread per each probe operator; (3) if any one of the probe operators has no input to process (it hasn't received the no more input signal), then we have to spill the built hash table, and we parallelize this by one thread per each sub-hash table; (4) free the memory held by the spilled hash table; (5) setup input spiller for the rest of probe inputs; Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM injection and query abort injections. Reviewed By: bikramSingh91, oerling Differential Revision: D55054964 Pulled By: xiaoxmeng
This pull request was exported from Phabricator. Differential Revision: D55054964 |
Summary: Add spilling support at hash probe side to handle the memory arbitration request after the build operators have built the hash table and is being processed by the probe side. We leverage the existing spilling facility built in hash join bridge to support this and the following extensions made to probe side (build side support and join bridge extension are already landed): (1) make hash probe operators to wait for the other peers when finish processing the current probe inputs (either from source or previously spilled input) no matter the join has more spilled data to process or not. This is to handle the edge case that the spilling is triggered at some slow probe operators and we need all the probe operators to be present to handle the split hash table and the rest of steps. This is due to the limitation of the current allPeersFinished implementation which expects all the drivers to be present in the pipeline to function; (1) add reclaim() method to interface with memory arbitration which checks if a probe operator is spillable: if the table has been set and has data; if we have set input spiller to spill the input as we don't support recursive input spill (which will never be the case as if build has triggered spill, it will spill all the partitions for now so the probe side will always have an empty table if it needs spill the input); (2) add output spiller to spill the output produced by the current pending input. We parallelize the output spill with one thread per each probe operator; (3) if any one of the probe operators has no input to process (it hasn't received the no more input signal), then we have to spill the built hash table, and we parallelize this by one thread per each sub-hash table; (4) free the memory held by the spilled hash table; (5) setup input spiller for the rest of probe inputs; Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM injection and query abort injections. Reviewed By: bikramSingh91, oerling Differential Revision: D55054964 Pulled By: xiaoxmeng
This pull request was exported from Phabricator. Differential Revision: D55054964 |
Summary: Add spilling support at hash probe side to handle the memory arbitration request after the build operators have built the hash table and is being processed by the probe side. We leverage the existing spilling facility built in hash join bridge to support this and the following extensions made to probe side (build side support and join bridge extension are already landed): (1) make hash probe operators to wait for the other peers when finish processing the current probe inputs (either from source or previously spilled input) no matter the join has more spilled data to process or not. This is to handle the edge case that the spilling is triggered at some slow probe operators and we need all the probe operators to be present to handle the split hash table and the rest of steps. This is due to the limitation of the current allPeersFinished implementation which expects all the drivers to be present in the pipeline to function; (1) add reclaim() method to interface with memory arbitration which checks if a probe operator is spillable: if the table has been set and has data; if we have set input spiller to spill the input as we don't support recursive input spill (which will never be the case as if build has triggered spill, it will spill all the partitions for now so the probe side will always have an empty table if it needs spill the input); (2) add output spiller to spill the output produced by the current pending input. We parallelize the output spill with one thread per each probe operator; (3) if any one of the probe operators has no input to process (it hasn't received the no more input signal), then we have to spill the built hash table, and we parallelize this by one thread per each sub-hash table; (4) free the memory held by the spilled hash table; (5) setup input spiller for the rest of probe inputs; Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM injection and query abort injections. Reviewed By: bikramSingh91, oerling Differential Revision: D55054964 Pulled By: xiaoxmeng
This pull request was exported from Phabricator. Differential Revision: D55054964 |
Summary: Add spilling support at hash probe side to handle the memory arbitration request after the build operators have built the hash table and is being processed by the probe side. We leverage the existing spilling facility built in hash join bridge to support this and the following extensions made to probe side (build side support and join bridge extension are already landed): (1) make hash probe operators to wait for the other peers when finish processing the current probe inputs (either from source or previously spilled input) no matter the join has more spilled data to process or not. This is to handle the edge case that the spilling is triggered at some slow probe operators and we need all the probe operators to be present to handle the split hash table and the rest of steps. This is due to the limitation of the current allPeersFinished implementation which expects all the drivers to be present in the pipeline to function; (1) add reclaim() method to interface with memory arbitration which checks if a probe operator is spillable: if the table has been set and has data; if we have set input spiller to spill the input as we don't support recursive input spill (which will never be the case as if build has triggered spill, it will spill all the partitions for now so the probe side will always have an empty table if it needs spill the input); (2) add output spiller to spill the output produced by the current pending input. We parallelize the output spill with one thread per each probe operator; (3) if any one of the probe operators has no input to process (it hasn't received the no more input signal), then we have to spill the built hash table, and we parallelize this by one thread per each sub-hash table; (4) free the memory held by the spilled hash table; (5) setup input spiller for the rest of probe inputs; Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM injection and query abort injections. Reviewed By: bikramSingh91, oerling Differential Revision: D55054964 Pulled By: xiaoxmeng
This pull request was exported from Phabricator. Differential Revision: D55054964 |
@xiaoxmeng merged this pull request in 2ea66c6. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: Add spilling support at hash probe side to handle the memory arbitration request after the build operators have built the hash table and is being processed by the probe side. We leverage the existing spilling facility built in hash join bridge to support this and the following extensions made to probe side (build side support and join bridge extension are already landed): (1) make hash probe operators to wait for the other peers when finish processing the current probe inputs (either from source or previously spilled input) no matter the join has more spilled data to process or not. This is to handle the edge case that the spilling is triggered at some slow probe operators and we need all the probe operators to be present to handle the split hash table and the rest of steps. This is due to the limitation of the current allPeersFinished implementation which expects all the drivers to be present in the pipeline to function; (1) add reclaim() method to interface with memory arbitration which checks if a probe operator is spillable: if the table has been set and has data; if we have set input spiller to spill the input as we don't support recursive input spill (which will never be the case as if build has triggered spill, it will spill all the partitions for now so the probe side will always have an empty table if it needs spill the input); (2) add output spiller to spill the output produced by the current pending input. We parallelize the output spill with one thread per each probe operator; (3) if any one of the probe operators has no input to process (it hasn't received the no more input signal), then we have to spill the built hash table, and we parallelize this by one thread per each sub-hash table; (4) free the memory held by the spilled hash table; (5) setup input spiller for the rest of probe inputs; Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM injection and query abort injections. Pull Request resolved: facebookincubator#8894 Reviewed By: bikramSingh91, oerling Differential Revision: D55054964 Pulled By: xiaoxmeng fbshipit-source-id: 8ad361c2e0e5bf3e88b5b719bcc323e8e7d4f276
Summary: Add spilling support at hash probe side to handle the memory arbitration request after the build operators have built the hash table and is being processed by the probe side. We leverage the existing spilling facility built in hash join bridge to support this and the following extensions made to probe side (build side support and join bridge extension are already landed): (1) make hash probe operators to wait for the other peers when finish processing the current probe inputs (either from source or previously spilled input) no matter the join has more spilled data to process or not. This is to handle the edge case that the spilling is triggered at some slow probe operators and we need all the probe operators to be present to handle the split hash table and the rest of steps. This is due to the limitation of the current allPeersFinished implementation which expects all the drivers to be present in the pipeline to function; (1) add reclaim() method to interface with memory arbitration which checks if a probe operator is spillable: if the table has been set and has data; if we have set input spiller to spill the input as we don't support recursive input spill (which will never be the case as if build has triggered spill, it will spill all the partitions for now so the probe side will always have an empty table if it needs spill the input); (2) add output spiller to spill the output produced by the current pending input. We parallelize the output spill with one thread per each probe operator; (3) if any one of the probe operators has no input to process (it hasn't received the no more input signal), then we have to spill the built hash table, and we parallelize this by one thread per each sub-hash table; (4) free the memory held by the spilled hash table; (5) setup input spiller for the rest of probe inputs; Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM injection and query abort injections. Pull Request resolved: facebookincubator#8894 Reviewed By: bikramSingh91, oerling Differential Revision: D55054964 Pulled By: xiaoxmeng fbshipit-source-id: 8ad361c2e0e5bf3e88b5b719bcc323e8e7d4f276
Add spilling support at hash probe side to handle the memory arbitration request
after the build operators have built the hash table and is being processed by the
probe side. We leverage the existing spilling facility built in hash join bridge to support
this and the following extensions made to probe side (build side support and join bridge
extension are already landed):
(1) make hash probe operators to wait for the other peers when finish processing the
current probe inputs (either from source or previously spilled input) no matter the join has
more spilled data to process or not. This is to handle the edge case that the spilling is
triggered at some slow probe operators and we need all the probe operators to be present
to handle the split hash table and the rest of steps. This is due to the limitation of the current
allPeersFinished implementation which expects all the drivers to be present in the pipeline to
function;
(1) add reclaim() method to interface with memory arbitration which checks if a probe operator
is spillable: if the table has been set and has data; if we have set input spiller to spill the input
as we don't support recursive input spill (which will never be the case as if build has triggered
spill, it will spill all the partitions for now so the probe side will always have an empty table if it
needs spill the input);
(2) add output spiller to spill the output produced by the current pending input. We parallelize the
output spill with one thread per each probe operator;
(3) if any one of the probe operators has no input to process (it hasn't received the no more input
signal), then we have to spill the built hash table, and we parallelize this by one thread per each
sub-hash table;
(4) free the memory held by the spilled hash table;
(5) setup input spiller for the rest of probe inputs;
Unit tests added to cover different spilling scenarios, and will run join fuzzer with spilling, OOM
injection and query abort injections.