Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Predicates with Uncorrelated Subqueries for Dynamic Pruning #2588

Merged
merged 40 commits into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
34bc947
[WIP] subquery pruning
dey4ss Apr 3, 2023
e63f675
ensure deep_copy works for LQPNodes and Operators
dey4ss Apr 3, 2023
eed2c18
set lqp_node for deep copied operators
dey4ss Apr 3, 2023
ec47d98
only count additionally pruned chunks in description
dey4ss Apr 3, 2023
0346490
move visit_tasks to utility
dey4ss Apr 4, 2023
b58657b
refactor
dey4ss Apr 4, 2023
b387197
add some tests
dey4ss Apr 4, 2023
df2b05b
ensure acyclic task graphs
dey4ss Apr 4, 2023
4d828c6
minor
dey4ss Apr 4, 2023
414b92b
add some tests
dey4ss Apr 5, 2023
95163af
more tests
dey4ss Apr 5, 2023
28ca64b
test subquery not executed for subquery pruning
dey4ss Apr 5, 2023
dc6776e
polish
dey4ss Apr 5, 2023
6d9f719
add prunable subquery predicates to hash and equality of StoredTableNode
dey4ss May 11, 2023
2017f3e
Merge branch 'master' into dey4ss/subquery_pruning
dey4ss Jun 15, 2023
ec9eb5e
add include
dey4ss Jun 15, 2023
455465e
merge
dey4ss Jun 28, 2023
2c15006
add include
dey4ss Jun 28, 2023
247966d
simplify StoredTableNode equality check
dey4ss Jun 28, 2023
18950bc
Merge branch 'dey4ss/subquery_pruning' of https://github.com/hyrise/h…
dey4ss Jun 28, 2023
83d6b84
unused var
dey4ss Jun 30, 2023
f3d482d
merge, merge, merge
dey4ss Jul 17, 2023
e2a6533
-.-
dey4ss Jul 17, 2023
7aa6f08
merge
dey4ss Jul 27, 2023
79b443a
some feedback
dey4ss Jul 27, 2023
697f688
helper for prunable subquery mapping
dey4ss Jul 27, 2023
567e81d
should not code late
dey4ss Jul 28, 2023
67c0b1e
wtf gcc o.O
dey4ss Jul 28, 2023
dd39b51
I think for my next paper, I just need to copy Hyrise comments.
dey4ss Aug 1, 2023
f759622
minor
dey4ss Aug 9, 2023
bf26aa8
merge
dey4ss Aug 11, 2023
5416ede
review
dey4ss Aug 14, 2023
3bc22b7
sacre bleu
dey4ss Aug 14, 2023
00f8538
merge
dey4ss Aug 15, 2023
196f3c5
remove code paths for potential task cycles
dey4ss Aug 15, 2023
d9c59c2
where is my mind?
dey4ss Aug 16, 2023
e514e72
Merge branch 'master' into dey4ss/subquery_pruning
dey4ss Aug 17, 2023
914b027
Merge branch 'master' into dey4ss/subquery_pruning
dey4ss Aug 17, 2023
634e979
memory leak? who said memory leak...?
dey4ss Aug 23, 2023
c40c8b0
trigger
dey4ss Aug 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions scripts/test/hyriseBenchmarkStarSchema_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
def main():
build_dir = initialize()

# RunSSB and validate its output using pexpect and check if all queries were successfully verified with sqlite.
arguments = {}
# Run SSB and validate its output using pexpect and check if all queries were successfully verified with sqlite.
arguments = dict()
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
arguments["--queries"] = "'1.1,1.2,2.2,3.3'"
arguments["--scale"] = "0.01"
arguments["--time"] = "10"
Expand Down
5 changes: 2 additions & 3 deletions src/lib/expression/lqp_subquery_expression.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,8 @@ bool LQPSubqueryExpression::_shallow_equals(const AbstractExpression& expression
}

size_t LQPSubqueryExpression::_shallow_hash() const {
// Return 0, thus forcing a hash collision for LQPSubqueryExpressions and triggering a full equality check.
// TODO(moritz) LQP hashing will be introduced with the JoinOrdering optimizer, until then we live with these
// collisions
// Return 0, thus forcing a hash collision for LQPSubqueryExpressions and triggering a full equality check. Since we
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
// often hash full query plans (that do not contain many LQPSubqueryExpressions), this should be fine.
return AbstractExpression::_shallow_hash();
}

Expand Down
33 changes: 32 additions & 1 deletion src/lib/logical_query_plan/abstract_lqp_node.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,38 @@ size_t AbstractLQPNode::output_count() const {
}

std::shared_ptr<AbstractLQPNode> AbstractLQPNode::deep_copy(LQPNodeMapping node_mapping) const {
return _deep_copy_impl(node_mapping);
const auto copy = _deep_copy_impl(node_mapping);

// Predicates that contain uncorrelated subqueries cannot be used for chunk pruning in the optimization phase since we
// do not know the predicate value yet. However, the ChunkPruningRule attaches the corresponding PredicateNodes to the
// StoreTableNode of the table the predicates are performed on. We attach the translated Predicates (i.e., TableScans)
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
// to the GetTable operators so they can use them for pruning during execution, when the subqueries might have already
// been executed and the predicate value is known. During a deep_copy, we must set the copied PredicateNodes as
// prunable subquery predicates of the StoredTableNode after copying: Due to the recursion into the inputs of each
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
// LQP node, the PredicateNodes are copied after the StoredTableNodes.
for (const auto& [node, node_copy] : node_mapping) {
if (node->type != LQPNodeType::StoredTable) {
continue;
}

const auto& stored_table_node = static_cast<const StoredTableNode&>(*node);
const auto& prunable_subquery_predicates = stored_table_node.prunable_subquery_predicates();
if (prunable_subquery_predicates.empty()) {
continue;
}

// Find the copies of the original PredicateNodes and set them as prunable subquery predicates of the
// StoredTableNode copy.
auto prunable_subquery_predicates_copy = std::vector<std::weak_ptr<AbstractLQPNode>>{};
prunable_subquery_predicates_copy.reserve(prunable_subquery_predicates.size());
for (const auto& predicate_node : prunable_subquery_predicates) {
DebugAssert(node_mapping.contains(predicate_node), "Could not find referenced node. LQP is invalid.");
prunable_subquery_predicates_copy.emplace_back(node_mapping.at(predicate_node));
}
static_cast<StoredTableNode&>(*node_copy).set_prunable_subquery_predicates(prunable_subquery_predicates_copy);
}

return copy;
}

bool AbstractLQPNode::shallow_equals(const AbstractLQPNode& rhs, const LQPNodeMapping& node_mapping) const {
Expand Down
32 changes: 31 additions & 1 deletion src/lib/logical_query_plan/lqp_translator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,37 @@
namespace hyrise {

std::shared_ptr<AbstractOperator> LQPTranslator::translate_node(const std::shared_ptr<AbstractLQPNode>& node) const {
return _translate_node_recursively(node);
const auto pqp = _translate_node_recursively(node);

// Predicates that contain uncorrelated subqueries cannot be used for chunk pruning in the optimization phase since we
// do not know the predicate value yet. However, the ChunkPruningRule attaches the corresponding PredicateNodes to the
// StoreTableNode of the table the predicates are performed on. We attach the translated Predicates (i.e., TableScans)
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
// to the GetTable operators so they can use them for pruning during execution, when the subqueries might have already
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
// been executed and the predicate value is known. We must set the PredicateNodes after translation: Due to the
// recursion into the inputs of each LQP node, the PredicateNodes are translated after the StoredTableNodes.
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
for (const auto& [_, op] : _operator_by_lqp_node) {
if (op->type() != OperatorType::GetTable) {
continue;
}

DebugAssert(op->lqp_node->type == LQPNodeType::StoredTable, "Traslated GetTable operator from wrong LQP node.");
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
const auto& stored_table_node = static_cast<const StoredTableNode&>(*op->lqp_node);
const auto& prunable_lqp_predicates = stored_table_node.prunable_subquery_predicates();
if (prunable_lqp_predicates.empty()) {
continue;
}

auto prunable_pqp_predicates = std::vector<std::weak_ptr<const AbstractOperator>>{};
prunable_pqp_predicates.reserve(prunable_lqp_predicates.size());
for (const auto& predicate : prunable_lqp_predicates) {
DebugAssert(_operator_by_lqp_node.contains(predicate), "Could not find referenced node. LQP/PQP is invalid.");
prunable_pqp_predicates.emplace_back(_operator_by_lqp_node.at(predicate));
}

static_cast<GetTable&>(*op).set_prunable_subquery_scans(prunable_pqp_predicates);
}

return pqp;
}

std::shared_ptr<AbstractOperator> LQPTranslator::_translate_node_recursively(
Expand Down
57 changes: 54 additions & 3 deletions src/lib/logical_query_plan/stored_table_node.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,25 @@ const std::vector<ColumnID>& StoredTableNode::pruned_column_ids() const {
return _pruned_column_ids;
}

void StoredTableNode::set_prunable_subquery_predicates(
const std::vector<std::weak_ptr<AbstractLQPNode>>& predicate_nodes) {
DebugAssert(std::all_of(predicate_nodes.cbegin(), predicate_nodes.cend(),
[](const auto& node) { return node.lock() && node.lock()->type == LQPNodeType::Predicate; }),
"No PredicateNode set as prunable predicate.");
_prunable_subquery_predicates = predicate_nodes;
}

std::vector<std::shared_ptr<AbstractLQPNode>> StoredTableNode::prunable_subquery_predicates() const {
auto subquery_predicates = std::vector<std::shared_ptr<AbstractLQPNode>>{};
subquery_predicates.reserve(_prunable_subquery_predicates.size());
for (const auto& subquery_predicate_ref : _prunable_subquery_predicates) {
const auto& subquery_predicate = subquery_predicate_ref.lock();
Assert(subquery_predicate, "Referenced PredicateNode expired. LQP is invalid.");
subquery_predicates.emplace_back(subquery_predicate);
}
return subquery_predicates;
}

std::string StoredTableNode::description(const DescriptionMode /*mode*/) const {
const auto stored_table = Hyrise::get().storage_manager.get_table(table_name);

Expand Down Expand Up @@ -183,20 +202,52 @@ size_t StoredTableNode::_on_shallow_hash() const {
for (const auto& pruned_column_id : _pruned_column_ids) {
boost::hash_combine(hash, static_cast<size_t>(pruned_column_id));
}
// We intentionally firce a hash collision for StoredTableNodes with the same number of (but different) prunable
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
// subquery predicates. Since we assume that (i) these predicates are not often set and (ii) we do hash LQPs often,
// this reduces the hash overhead, makes the code simpler, and triggers an in-depth equality check for the rare cases
// with prunable subquery predicates.
boost::hash_combine(hash, _prunable_subquery_predicates.size());
return hash;
}

std::shared_ptr<AbstractLQPNode> StoredTableNode::_on_shallow_copy(LQPNodeMapping& /*node_mapping*/) const {
// We cannot copy _prunable_subquery_predicated here since deep_copy() recurses into the input nodes and the
// StoredTableNodes are the first ones to be copied. Instead, AbstractLQPNode::deep_copy() sets the copied
// PredicateNodes after the whole LQP has been copied.
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
const auto copy = make(table_name);
copy->set_pruned_chunk_ids(_pruned_chunk_ids);
copy->set_pruned_column_ids(_pruned_column_ids);
return copy;
}

bool StoredTableNode::_on_shallow_equals(const AbstractLQPNode& rhs, const LQPNodeMapping& /*node_mapping*/) const {
bool StoredTableNode::_on_shallow_equals(const AbstractLQPNode& rhs, const LQPNodeMapping& node_mapping) const {
const auto& stored_table_node = static_cast<const StoredTableNode&>(rhs);
return table_name == stored_table_node.table_name && _pruned_chunk_ids == stored_table_node._pruned_chunk_ids &&
_pruned_column_ids == stored_table_node._pruned_column_ids;
if (table_name != stored_table_node.table_name || _pruned_chunk_ids != stored_table_node._pruned_chunk_ids ||
_pruned_column_ids != stored_table_node._pruned_column_ids) {
return false;
}

// Check equality of prunable subquery predicates. For now, the order of the predicates matters. Though this is a
// missed opportunity for LQP deduplication, we do not consider this a problem for now.
const auto& prunable_subquery_predicates = this->prunable_subquery_predicates();
const auto& rhs_prunable_subquery_predicates = stored_table_node.prunable_subquery_predicates();
const auto subquery_predicate_count = prunable_subquery_predicates.size();

if (subquery_predicate_count != rhs_prunable_subquery_predicates.size()) {
return false;
}

for (auto predicate_idx = size_t{0}; predicate_idx < subquery_predicate_count; ++predicate_idx) {
// We cannot check that the PredicateNodes are equal since this equality check recurses into the inputs und we do
// not terminate. We have to compare the predicate expressions.
if (!expressions_equal_to_expressions_in_different_lqp(
prunable_subquery_predicates[predicate_idx]->node_expressions,
rhs_prunable_subquery_predicates[predicate_idx]->node_expressions, node_mapping)) {
return false;
}
}

return true;
}

} // namespace hyrise
9 changes: 8 additions & 1 deletion src/lib/logical_query_plan/stored_table_node.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ class StoredTableNode : public EnableMakeForLQPNode<StoredTableNode>, public Abs

void set_pruned_column_ids(const std::vector<ColumnID>& pruned_column_ids);
const std::vector<ColumnID>& pruned_column_ids() const;

// We cannot use predicates with uncorrelated subqueries to get pruned ChunkIDs during optimization. However, we can
// reference these predicates and keep track of them in the plan. Once we execute the plan, the subqueries might have
// already been executed, so we can use them for pruning during execution.
void set_prunable_subquery_predicates(const std::vector<std::weak_ptr<AbstractLQPNode>>& predicate_nodes);
std::vector<std::shared_ptr<AbstractLQPNode>> prunable_subquery_predicates() const;
/** @} */

std::vector<ChunkIndexStatistics> chunk_indexes_statistics() const;
Expand All @@ -52,12 +58,13 @@ class StoredTableNode : public EnableMakeForLQPNode<StoredTableNode>, public Abs
protected:
size_t _on_shallow_hash() const override;
std::shared_ptr<AbstractLQPNode> _on_shallow_copy(LQPNodeMapping& /*node_mapping*/) const override;
bool _on_shallow_equals(const AbstractLQPNode& rhs, const LQPNodeMapping& /*node_mapping*/) const override;
bool _on_shallow_equals(const AbstractLQPNode& rhs, const LQPNodeMapping& node_mapping) const override;

private:
mutable std::optional<std::vector<std::shared_ptr<AbstractExpression>>> _output_expressions;
std::vector<ChunkID> _pruned_chunk_ids;
std::vector<ColumnID> _pruned_column_ids;
std::vector<std::weak_ptr<AbstractLQPNode>> _prunable_subquery_predicates;
};

} // namespace hyrise
33 changes: 32 additions & 1 deletion src/lib/operators/abstract_operator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,38 @@ std::string AbstractOperator::description(DescriptionMode /*description_mode*/)

std::shared_ptr<AbstractOperator> AbstractOperator::deep_copy() const {
auto copied_ops = std::unordered_map<const AbstractOperator*, std::shared_ptr<AbstractOperator>>{};
return deep_copy(copied_ops);
const auto copy = deep_copy(copied_ops);

// Predicates that contain uncorrelated subqueries cannot be used for chunk pruning in the optimization phase since we
// do not know the predicate value yet. However, the ChunkPruningRule attaches the corresponding PredicateNodes to the
// StoreTableNode of the table the predicates are performed on. We attach the translated Predicates (i.e., TableScans)
// to the GetTable operators so they can use them for pruning during execution, when the subqueries might have already
// been executed and the predicate value is known. During a deep_copy, we must set the copied TableScan operators as
// prunable subquery scans of the GetTable operator after copying: Due to the recursion into the inputs of each
// operator, the TableSans are copied after the GetTable operators.
for (const auto& [op, op_copy] : copied_ops) {
if (op->type() != OperatorType::GetTable) {
continue;
}

const auto& get_table = static_cast<const GetTable&>(*op);
const auto& prunable_subquery_scans = get_table.prunable_subquery_scans();
if (prunable_subquery_scans.empty()) {
continue;
}

// Find the copies of the original TableScans and set them as prunable subquery scans of the GetTable copy.
auto prunable_subquery_scans_copy = std::vector<std::weak_ptr<const AbstractOperator>>{};
prunable_subquery_scans_copy.reserve(prunable_subquery_scans.size());
for (const auto& table_scan : prunable_subquery_scans) {
DebugAssert(copied_ops.contains(table_scan.get()), "Could not find referenced operator. PQP is invalid.");
prunable_subquery_scans_copy.emplace_back(copied_ops.at(table_scan.get()));
}

static_cast<GetTable&>(*op_copy).set_prunable_subquery_scans(prunable_subquery_scans_copy);
dey4ss marked this conversation as resolved.
Show resolved Hide resolved
}

return copy;
}

std::shared_ptr<AbstractOperator> AbstractOperator::deep_copy(
Expand Down
Loading