Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Predicates with Uncorrelated Subqueries for Dynamic Pruning #2588

Merged
merged 40 commits into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
34bc947
[WIP] subquery pruning
dey4ss Apr 3, 2023
e63f675
ensure deep_copy works for LQPNodes and Operators
dey4ss Apr 3, 2023
eed2c18
set lqp_node for deep copied operators
dey4ss Apr 3, 2023
ec47d98
only count additionally pruned chunks in description
dey4ss Apr 3, 2023
0346490
move visit_tasks to utility
dey4ss Apr 4, 2023
b58657b
refactor
dey4ss Apr 4, 2023
b387197
add some tests
dey4ss Apr 4, 2023
df2b05b
ensure acyclic task graphs
dey4ss Apr 4, 2023
4d828c6
minor
dey4ss Apr 4, 2023
414b92b
add some tests
dey4ss Apr 5, 2023
95163af
more tests
dey4ss Apr 5, 2023
28ca64b
test subquery not executed for subquery pruning
dey4ss Apr 5, 2023
dc6776e
polish
dey4ss Apr 5, 2023
6d9f719
add prunable subquery predicates to hash and equality of StoredTableNode
dey4ss May 11, 2023
2017f3e
Merge branch 'master' into dey4ss/subquery_pruning
dey4ss Jun 15, 2023
ec9eb5e
add include
dey4ss Jun 15, 2023
455465e
merge
dey4ss Jun 28, 2023
2c15006
add include
dey4ss Jun 28, 2023
247966d
simplify StoredTableNode equality check
dey4ss Jun 28, 2023
18950bc
Merge branch 'dey4ss/subquery_pruning' of https://github.com/hyrise/h…
dey4ss Jun 28, 2023
83d6b84
unused var
dey4ss Jun 30, 2023
f3d482d
merge, merge, merge
dey4ss Jul 17, 2023
e2a6533
-.-
dey4ss Jul 17, 2023
7aa6f08
merge
dey4ss Jul 27, 2023
79b443a
some feedback
dey4ss Jul 27, 2023
697f688
helper for prunable subquery mapping
dey4ss Jul 27, 2023
567e81d
should not code late
dey4ss Jul 28, 2023
67c0b1e
wtf gcc o.O
dey4ss Jul 28, 2023
dd39b51
I think for my next paper, I just need to copy Hyrise comments.
dey4ss Aug 1, 2023
f759622
minor
dey4ss Aug 9, 2023
bf26aa8
merge
dey4ss Aug 11, 2023
5416ede
review
dey4ss Aug 14, 2023
3bc22b7
sacre bleu
dey4ss Aug 14, 2023
00f8538
merge
dey4ss Aug 15, 2023
196f3c5
remove code paths for potential task cycles
dey4ss Aug 15, 2023
d9c59c2
where is my mind?
dey4ss Aug 16, 2023
e514e72
Merge branch 'master' into dey4ss/subquery_pruning
dey4ss Aug 17, 2023
914b027
Merge branch 'master' into dey4ss/subquery_pruning
dey4ss Aug 17, 2023
634e979
memory leak? who said memory leak...?
dey4ss Aug 23, 2023
c40c8b0
trigger
dey4ss Aug 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions scripts/test/hyriseBenchmarkStarSchema_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
def main():
build_dir = initialize()

# Run SSB and validate its output using pexpect and check if all queries were successfully verified with sqlite.
arguments = dict()
# Run SSB, validate its output using pexpect, and check if all queries were successfully verified with sqlite.
arguments = {}
arguments["--queries"] = "'1.1,1.2,2.2,3.3'"
arguments["--scale"] = "0.01"
arguments["--time"] = "10"
Expand Down
1 change: 1 addition & 0 deletions src/lib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,7 @@ set(
utils/lossless_predicate_cast.cpp
utils/lossless_predicate_cast.hpp
utils/make_bimap.hpp
utils/map_prunable_subquery_predicates.hpp
utils/meta_table_manager.cpp
utils/meta_table_manager.hpp
utils/meta_tables/abstract_meta_table.cpp
Expand Down
5 changes: 3 additions & 2 deletions src/lib/expression/lqp_subquery_expression.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,9 @@ bool LQPSubqueryExpression::_shallow_equals(const AbstractExpression& expression
}

size_t LQPSubqueryExpression::_shallow_hash() const {
// Return 0, thus forcing a hash collision for LQPSubqueryExpressions and triggering a full equality check. Since we
// often hash full query plans (that do not contain many LQPSubqueryExpressions), this should be fine.
// Return AbstractExpression::_shallow_hash() (a.k. a.0), thus forcing a hash collision for LQPSubqueryExpressions and
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
// triggering a full equality check. Though we often hash entire query plans, we expect most plans to contain only few
// LQPSubqueryExpressions. Thus, these hash collisions should be fine.
return AbstractExpression::_shallow_hash();
}

Expand Down
33 changes: 5 additions & 28 deletions src/lib/logical_query_plan/abstract_lqp_node.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "predicate_node.hpp"
#include "update_node.hpp"
#include "utils/assert.hpp"
#include "utils/map_prunable_subquery_predicates.hpp"
#include "utils/print_utils.hpp"

namespace {
Expand Down Expand Up @@ -242,34 +243,10 @@ size_t AbstractLQPNode::output_count() const {
std::shared_ptr<AbstractLQPNode> AbstractLQPNode::deep_copy(LQPNodeMapping node_mapping) const {
const auto copy = _deep_copy_impl(node_mapping);

// Predicates that contain uncorrelated subqueries cannot be used for chunk pruning in the optimization phase since we
// do not know the predicate value yet. However, the ChunkPruningRule attaches the corresponding PredicateNodes to the
// StoreTableNode of the table the predicates are performed on. We attach the translated Predicates (i.e., TableScans)
// to the GetTable operators so they can use them for pruning during execution, when the subqueries might have already
// been executed and the predicate value is known. During a deep_copy, we must set the copied PredicateNodes as
// prunable subquery predicates of the StoredTableNode after copying: Due to the recursion into the inputs of each
// LQP node, the PredicateNodes are copied after the StoredTableNodes.
for (const auto& [node, node_copy] : node_mapping) {
if (node->type != LQPNodeType::StoredTable) {
continue;
}

const auto& stored_table_node = static_cast<const StoredTableNode&>(*node);
const auto& prunable_subquery_predicates = stored_table_node.prunable_subquery_predicates();
if (prunable_subquery_predicates.empty()) {
continue;
}

// Find the copies of the original PredicateNodes and set them as prunable subquery predicates of the
// StoredTableNode copy.
auto prunable_subquery_predicates_copy = std::vector<std::weak_ptr<AbstractLQPNode>>{};
prunable_subquery_predicates_copy.reserve(prunable_subquery_predicates.size());
for (const auto& predicate_node : prunable_subquery_predicates) {
DebugAssert(node_mapping.contains(predicate_node), "Could not find referenced node. LQP is invalid.");
prunable_subquery_predicates_copy.emplace_back(node_mapping.at(predicate_node));
}
static_cast<StoredTableNode&>(*node_copy).set_prunable_subquery_predicates(prunable_subquery_predicates_copy);
}
// StoredTableNodes can store references to PredicateNodes as prunable subquery predicates (see get_table.hpp for
// details). We must assign the copies of these PredicateNodes after copying the entire LQP (see
// map_prunable_subquery_predicates.hpp).
map_prunable_subquery_predicates(node_mapping);

return copy;
}
Expand Down
32 changes: 5 additions & 27 deletions src/lib/logical_query_plan/lqp_translator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,39 +67,17 @@
#include "union_node.hpp"
#include "update_node.hpp"
#include "utils/column_pruning_utils.hpp"
#include "utils/map_prunable_subquery_predicates.hpp"

namespace hyrise {

std::shared_ptr<AbstractOperator> LQPTranslator::translate_node(const std::shared_ptr<AbstractLQPNode>& node) const {
const auto pqp = _translate_node_recursively(node);

// Predicates that contain uncorrelated subqueries cannot be used for chunk pruning in the optimization phase since we
// do not know the predicate value yet. However, the ChunkPruningRule attaches the corresponding PredicateNodes to the
// StoreTableNode of the table the predicates are performed on. We attach the translated Predicates (i.e., TableScans)
// to the GetTable operators so they can use them for pruning during execution, when the subqueries might have already
// been executed and the predicate value is known. We must set the PredicateNodes after translation: Due to the
// recursion into the inputs of each LQP node, the PredicateNodes are translated after the StoredTableNodes.
for (const auto& [_, op] : _operator_by_lqp_node) {
if (op->type() != OperatorType::GetTable) {
continue;
}

DebugAssert(op->lqp_node->type == LQPNodeType::StoredTable, "Traslated GetTable operator from wrong LQP node.");
const auto& stored_table_node = static_cast<const StoredTableNode&>(*op->lqp_node);
const auto& prunable_lqp_predicates = stored_table_node.prunable_subquery_predicates();
if (prunable_lqp_predicates.empty()) {
continue;
}

auto prunable_pqp_predicates = std::vector<std::weak_ptr<const AbstractOperator>>{};
prunable_pqp_predicates.reserve(prunable_lqp_predicates.size());
for (const auto& predicate : prunable_lqp_predicates) {
DebugAssert(_operator_by_lqp_node.contains(predicate), "Could not find referenced node. LQP/PQP is invalid.");
prunable_pqp_predicates.emplace_back(_operator_by_lqp_node.at(predicate));
}

static_cast<GetTable&>(*op).set_prunable_subquery_scans(prunable_pqp_predicates);
}
// StoredTableNodes can store references to PredicateNodes as prunable subquery predicates (see get_table.hpp for
// details). We must assign the TableScans translated from these PredicateNodes after translating the entire LQP (see
// map_prunable_subquery_predicates.hpp).
map_prunable_subquery_predicates(_operator_by_lqp_node);

return pqp;
}
Expand Down
10 changes: 5 additions & 5 deletions src/lib/logical_query_plan/stored_table_node.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -181,18 +181,18 @@ size_t StoredTableNode::_on_shallow_hash() const {
for (const auto& pruned_column_id : _pruned_column_ids) {
boost::hash_combine(hash, static_cast<size_t>(pruned_column_id));
}
// We intentionally firce a hash collision for StoredTableNodes with the same number of (but different) prunable
// subquery predicates. Since we assume that (i) these predicates are not often set and (ii) we do hash LQPs often,
// this reduces the hash overhead, makes the code simpler, and triggers an in-depth equality check for the rare cases
// with prunable subquery predicates.
// We intentionally force a hash collision for StoredTableNodes with the same number of prunable subquery predicates
// even though these predicates are different. Since we assume that (i) these predicates are not often set and (ii) we
// hash LQPs often, this reduces the hash overhead, makes the code simpler, and triggers an in-depth equality check
// for the rare cases with (the same number of) prunable subquery predicates.
boost::hash_combine(hash, _prunable_subquery_predicates.size());
return hash;
}

std::shared_ptr<AbstractLQPNode> StoredTableNode::_on_shallow_copy(LQPNodeMapping& /*node_mapping*/) const {
// We cannot copy _prunable_subquery_predicated here since deep_copy() recurses into the input nodes and the
// StoredTableNodes are the first ones to be copied. Instead, AbstractLQPNode::deep_copy() sets the copied
// PredicateNodes after the whole LQP has been copied.
// PredicateNodes after the entire LQP has been copied.
const auto copy = make(table_name);
copy->set_pruned_chunk_ids(_pruned_chunk_ids);
copy->set_pruned_column_ids(_pruned_column_ids);
Expand Down
33 changes: 5 additions & 28 deletions src/lib/operators/abstract_operator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include "utils/assert.hpp"
#include "utils/format_bytes.hpp"
#include "utils/format_duration.hpp"
#include "utils/map_prunable_subquery_predicates.hpp"
#include "utils/print_utils.hpp"
#include "utils/timer.hpp"

Expand Down Expand Up @@ -201,34 +202,10 @@ std::shared_ptr<AbstractOperator> AbstractOperator::deep_copy() const {
auto copied_ops = std::unordered_map<const AbstractOperator*, std::shared_ptr<AbstractOperator>>{};
const auto copy = deep_copy(copied_ops);

// Predicates that contain uncorrelated subqueries cannot be used for chunk pruning in the optimization phase since we
// do not know the predicate value yet. However, the ChunkPruningRule attaches the corresponding PredicateNodes to the
// StoreTableNode of the table the predicates are performed on. We attach the translated Predicates (i.e., TableScans)
// to the GetTable operators so they can use them for pruning during execution, when the subqueries might have already
// been executed and the predicate value is known. During a deep_copy, we must set the copied TableScan operators as
// prunable subquery scans of the GetTable operator after copying: Due to the recursion into the inputs of each
// operator, the TableSans are copied after the GetTable operators.
for (const auto& [op, op_copy] : copied_ops) {
if (op->type() != OperatorType::GetTable) {
continue;
}

const auto& get_table = static_cast<const GetTable&>(*op);
const auto& prunable_subquery_scans = get_table.prunable_subquery_scans();
if (prunable_subquery_scans.empty()) {
continue;
}

// Find the copies of the original TableScans and set them as prunable subquery scans of the GetTable copy.
auto prunable_subquery_scans_copy = std::vector<std::weak_ptr<const AbstractOperator>>{};
prunable_subquery_scans_copy.reserve(prunable_subquery_scans.size());
for (const auto& table_scan : prunable_subquery_scans) {
DebugAssert(copied_ops.contains(table_scan.get()), "Could not find referenced operator. PQP is invalid.");
prunable_subquery_scans_copy.emplace_back(copied_ops.at(table_scan.get()));
}

static_cast<GetTable&>(*op_copy).set_prunable_subquery_scans(prunable_subquery_scans_copy);
}
// GetTable operators can store references to TableScans as prunable subquery predicates (see get_table.hpp for
// details). We must assign the copies of these TableScans after copying the entire PQP (see
// map_prunable_subquery_predicates.hpp).
map_prunable_subquery_predicates(copied_ops);

return copy;
}
Expand Down
41 changes: 28 additions & 13 deletions src/lib/operators/get_table.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -81,15 +81,16 @@ const std::vector<ColumnID>& GetTable::pruned_column_ids() const {
return _pruned_column_ids;
}

void GetTable::set_prunable_subquery_scans(std::vector<std::weak_ptr<const AbstractOperator>> subquery_scans) const {
void GetTable::set_prunable_subquery_predicates(
const std::vector<std::weak_ptr<const AbstractOperator>>& subquery_scans) const {
DebugAssert(std::all_of(subquery_scans.cbegin(), subquery_scans.cend(),
[](const auto& op) { return op.lock() && op.lock()->type() == OperatorType::TableScan; }),
"No TableScan set as prunable predicate.");

_prunable_subquery_scans = subquery_scans;
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
}

std::vector<std::shared_ptr<const AbstractOperator>> GetTable::prunable_subquery_scans() const {
std::vector<std::shared_ptr<const AbstractOperator>> GetTable::prunable_subquery_predicates() const {
auto subquery_scans = std::vector<std::shared_ptr<const AbstractOperator>>{};
subquery_scans.reserve(_prunable_subquery_scans.size());
for (const auto& subquery_scan_ref : _prunable_subquery_scans) {
Expand Down Expand Up @@ -138,10 +139,10 @@ std::shared_ptr<const Table> GetTable::_on_execute() {
// flag, too, it needs to be forwarded here; otherwise it would be completely invisible in the PQP.
DebugAssert(stored_table->value_clustered_by().empty(), "GetTable does not forward value_clustered_by");
auto overall_pruned_chunk_ids = _prune_chunks_dynamically();
auto excluded_chunk_ids = std::vector<ChunkID>{};
overall_pruned_chunk_ids.insert(_pruned_chunk_ids.cbegin(), _pruned_chunk_ids.cend());
Bouncner marked this conversation as resolved.
Show resolved Hide resolved
auto pruned_chunk_ids_iter = overall_pruned_chunk_ids.begin();
for (ChunkID stored_chunk_id{0}; stored_chunk_id < chunk_count; ++stored_chunk_id) {
auto excluded_chunk_ids = std::vector<ChunkID>{};
for (auto stored_chunk_id = ChunkID{0}; stored_chunk_id < chunk_count; ++stored_chunk_id) {
// Check whether the Chunk is pruned
if (pruned_chunk_ids_iter != overall_pruned_chunk_ids.end() && *pruned_chunk_ids_iter == stored_chunk_id) {
++pruned_chunk_ids_iter;
Expand Down Expand Up @@ -315,18 +316,20 @@ std::set<ChunkID> GetTable::_prune_chunks_dynamically() {
return {};
}

// Create a dummy PredicateNode for each predicate containing a subquery that has already been executed. We do not use
// the original predicate to ignore all other nodes between the StoredTableNode and the PredicateNodes. Since the
// ChunkPruningRule already took care to add only predicates that are safe to prune with, we can act as if there were
// no other LQP nodes.
auto prunable_predicate_nodes = std::vector<std::shared_ptr<PredicateNode>>{};
prunable_predicate_nodes.reserve(_prunable_subquery_scans.size());

// Create a dummy StoredTableNode from the table to retrieve. `compute_chunk_exclude_list` modifies the node's
// statistics and we want to avoid that. We cannot use `deep_copy()` here since it would complain that the referenced
// prunable PredicateNodes are not part of the LQP.
const auto& stored_table_node = static_cast<const StoredTableNode&>(*lqp_node);
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
const auto dummy_stored_table_node = StoredTableNode::make(_name);

// Create a dummy PredicateNode for each predicate containing a subquery that has already been executed. We do not use
// the original predicate to ignore any other nodes in between. Since the ChunkPruningRule already took care to add
// only predicates that are safe to prune with, we can act as if there were no other LQP nodes.
auto prunable_predicate_nodes = std::vector<std::shared_ptr<PredicateNode>>{};
prunable_predicate_nodes.reserve(_prunable_subquery_scans.size());
for (const auto& op : prunable_subquery_scans()) {
for (const auto& op : prunable_subquery_predicates()) {
const auto& table_scan = static_cast<const TableScan&>(*op);
const auto& operator_predicate_arguments = table_scan.predicate()->arguments;
const auto& predicate_node = static_cast<const PredicateNode&>(*table_scan.lqp_node);
Expand All @@ -340,7 +343,7 @@ std::set<ChunkID> GetTable::_prune_chunks_dynamically() {
// Replace any column with the respective column from our dummy StoredTableNode.
if (const auto lqp_column = std::dynamic_pointer_cast<LQPColumnExpression>(argument)) {
Assert(*lqp_column->original_node.lock() == stored_table_node,
"Predicate is performed on wrong StoredTableNode");
"Predicate is performed on wrong StoredTableNode.");
argument = lqp_column_(dummy_stored_table_node, lqp_column->original_column_id);
continue;
}
Expand All @@ -350,15 +353,27 @@ std::set<ChunkID> GetTable::_prune_chunks_dynamically() {
continue;
}
Assert(operator_predicate_arguments[expression_idx]->type == ExpressionType::PQPSubquery,
"Cannot resolve PQPSubqueryExpression");
"Cannot resolve PQPSubqueryExpression.");
const auto& subquery = static_cast<PQPSubqueryExpression&>(*operator_predicate_arguments[expression_idx]);
if (subquery.is_correlated()) {
continue;
}

// Ignore the subquery if it has not been executed yet. A reason might be that scheduling the subquery before the
// GetTable operator would create a cycle. For instance, this can happen for a query like this:
Bouncner marked this conversation as resolved.
Show resolved Hide resolved
// SELECT ... FROM a_table WHERE x > (SELECT AVG(x) FROM a_table);
// SELECT * FROM a_table WHERE x > (SELECT AVG(x) FROM a_table);
// The PQP of the query looks like the following:
//
// [TableScan] x > SUBQUERY
// | *
// | * uncorrelated subquery
// | *
// | [AggregateHash] AVG(x)
// | /
// [GetTable] a_table
//
// We cannot schedule the AggregateHash operator before the GetTable operator to obtain the subquery result for
// pruning: the OperatorTasks wrapping both operators would be in a circular wait for each other.
if (subquery.pqp->state() != OperatorState::ExecutedAndAvailable) {
continue;
}
Expand Down
12 changes: 7 additions & 5 deletions src/lib/operators/get_table.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,13 @@ class GetTable : public AbstractReadOnlyOperator {
const std::vector<ChunkID>& pruned_chunk_ids() const;
const std::vector<ColumnID>& pruned_column_ids() const;

// We cannot use predicates with uncorrelated subqueries to get pruned ChunkIDs during optimization. However, we can
// reference these predicates and keep track of them in the plan. Once we execute the plan, the subqueries might have
// already been executed, so we can use them for pruning during execution.
void set_prunable_subquery_scans(std::vector<std::weak_ptr<const AbstractOperator>> subquery_scans) const;
std::vector<std::shared_ptr<const AbstractOperator>> prunable_subquery_scans() const;
// Predicates that contain uncorrelated subqueries cannot be used for chunk pruning in the optimization phase since we
// do not know the predicate value yet. However, the ChunkPruningRule attaches the corresponding PredicateNodes to the
// StoredTableNode of the table the predicates are performed on. We attach the translated predicates (i.e.,
// TableScans) to the GetTable operators so they can use them for pruning during execution ("dynamic pruning"), when
// the subqueries might have already been executed and the predicate value is known.
void set_prunable_subquery_predicates(const std::vector<std::weak_ptr<const AbstractOperator>>& subquery_scans) const;
std::vector<std::shared_ptr<const AbstractOperator>> prunable_subquery_predicates() const;

protected:
std::shared_ptr<AbstractOperator> _on_deep_copy(
Expand Down
Loading