Skip to content

Commit

Permalink
Merge 'cql3: convert the SELECT clause evaluation phase to expression…
Browse files Browse the repository at this point in the history
…s' from Avi Kivity

SELECT clause components (selectors) are currently evaluated during query execution
using a stateful class hierarchy. This state is needed to hold intermediate state while
aggregating over multiple rows. Because the selectors are stateful, we must re-create
them each query using a selector_factory hierarchy.

We'd like to convert all of this to the unified expression evaluation machinery, so we can
have just one grammar for expressions, and just one way to evaluate expressions, but
the statefulness makes this complex.

In commit 59ab9aa "(Merge 'functions: reframe aggregate functions in terms
of scalar functions' from Avi Kivity)", we made aggregate functions stateless, moving
their state to aggregate_function_selector::_accumulator, and therefore into the
class hierarchy we're addressing now. Another reason for keeping state is that selectors
that aren't aggregated capture the first value they see in a GROUP BY group.

Since expressions can't contain state directly, we break apart expressions that contain
aggregate functions into two: an inner expression that processes incoming rows within
a group, and an outer expression that generates the group's output. The two expressions
communicate via a newly introduced expression element: a temporary.

The problem of non-aggregated columns requiring state is solved by encapsulating
those columns in an internal aggregate function, called the "first" function.

In terms of performance, this series has little effect, since the common case of selectors
that only contain direct column references without transformations is evaluated via a fast
path (`simple_selection`). This fast-path is preserved with almost no changes.

While the series makes it possible to start to extend the grammar and unify expression
syntaxes, it does not do so. The grammar is unchanged. There is just one breaking change:
the `SELECT JSON` statement generates json object field names based on the input selectors.
In one case the name of the field has changed, but it is an esoteric case (where a function call
is selected as part of `SELECT JSON`), and the new behavior is compatible with Cassandra.

Closes #14467

* github.com:scylladb/scylladb:
  cql3: selection: drop selector_factories, selectables, and selectors
  cql3: select_statement: stop using selector_factories in SELECT JSON
  cql3: selection: don't create selector_factories any more
  cql3: selection: collect column_definitions using expressions
  cql3: selection: reimplement selection::is_aggregate()
  cql3: selection: evaluate aggregation queries via expr::evaluate()
  cql3: selection, select_statement: fine tune add_column_for_post_processing() usage
  cql3: selection: evaluate non-aggregating complex selections using expr::evaluate()
  cql3: selection: store primary key in result_set_builder
  cql3: expression: fix field_selection::type interpretation by evaluate()
  cql3: selection: make result_set_builder::current non-optional<>
  cql3: selection: simplify row/group processing
  cql3: selection: convert requires_thread to expressions
  cql: selection: convert used_functions() to expressions
  cql3: selection: convert is_reducible/get_reductions to expressions
  cql3: selection: convert is_count() to expressions
  cql3: selection convert contains_ttl/contains_writetime to work on expressions
  cql3: selection: make simple_selectors stateless
  cql3: expression: add helper to split expressions with aggregate functions
  cql3: selection: short-circuit non-aggregations
  cql3: selection: drop validate_selectors
  cql3: select_statement: force aggregation if GROUP BY is used
  cql3: select_statement: levellize aggregation depth
  cql3: selection: skip first_function when collecting metadata
  cql3: select_statement: explicitly disable automatic parallelization with no aggregates
  cql3: expression: introduce temporaries
  cql3: select_statement: use prepared selectors
  cql3: selection: avoid selector_factories in collect_metadata()
  cql3: expressions: add "metadata mode" formatter for expressions
  cql3: selection: convert collect_metadata() to the prepared expression domain
  cql3: selection: convert processes_selection to work on prepared expressions
  cql3: selection: prepare selectors earlier
  cql3: raw_selector: deinline
  cql3: expression: reimplement verify_no_aggregate_functions()
  cql3: expression: add helpers to manage an expression's aggregation depth
  cql3: expression: improve printing of prepared function calls
  cql3: functions: add "first" aggregate function
  • Loading branch information
nyh committed Jul 3, 2023
2 parents 94bf6bb + 66c47d4 commit ec77172
Show file tree
Hide file tree
Showing 41 changed files with 906 additions and 1,663 deletions.
3 changes: 0 additions & 3 deletions configure.py
Expand Up @@ -916,10 +916,7 @@ def find_headers(repodir, excluded_dirs):
'cql3/query_options.cc',
'cql3/user_types.cc',
'cql3/untyped_result_set.cc',
'cql3/selection/abstract_function_selector.cc',
'cql3/selection/simple_selector.cc',
'cql3/selection/selectable.cc',
'cql3/selection/selector_factories.cc',
'cql3/selection/selection.cc',
'cql3/selection/selector.cc',
'cql3/restrictions/statement_restrictions.cc',
Expand Down
3 changes: 0 additions & 3 deletions cql3/CMakeLists.txt
Expand Up @@ -98,10 +98,7 @@ target_sources(cql3
query_options.cc
user_types.cc
untyped_result_set.cc
selection/abstract_function_selector.cc
selection/simple_selector.cc
selection/selectable.cc
selection/selector_factories.cc
selection/selection.cc
selection/selector.cc
restrictions/statement_restrictions.cc
Expand Down
3 changes: 1 addition & 2 deletions cql3/Cql.g
Expand Up @@ -16,7 +16,6 @@ options {
}

@parser::includes {
#include "cql3/selection/writetime_or_ttl.hh"
#include "cql3/statements/raw/parsed_statement.hh"
#include "cql3/statements/raw/select_statement.hh"
#include "cql3/statements/alter_keyspace_statement.hh"
Expand Down Expand Up @@ -67,8 +66,8 @@ options {
#include "cql3/statements/index_target.hh"
#include "cql3/statements/ks_prop_defs.hh"
#include "cql3/selection/raw_selector.hh"
#include "cql3/selection/selectable-expr.hh"
#include "cql3/keyspace_element_name.hh"
#include "cql3/selection/selectable_with_field_selection.hh"
#include "cql3/constants.hh"
#include "cql3/operation_impl.hh"
#include "cql3/error_listener.hh"
Expand Down
4 changes: 0 additions & 4 deletions cql3/column_identifier.cc
Expand Up @@ -86,10 +86,6 @@ column_identifier_raw::prepare_column_identifier(const schema& schema) const {
return ::make_shared<column_identifier>(schema.regular_column_name_type()->from_string(_raw_text), _text);
}

bool column_identifier_raw::processes_selection() const {
return false;
}

bool column_identifier_raw::operator==(const column_identifier_raw& other) const {
return _text == other._text;
}
Expand Down
3 changes: 0 additions & 3 deletions cql3/column_identifier.hh
Expand Up @@ -83,9 +83,6 @@ public:

::shared_ptr<column_identifier> prepare_column_identifier(const schema& s) const;

// for selectable::with_expression::raw:
bool processes_selection() const;

bool operator==(const column_identifier_raw& other) const;

const sstring& text() const;
Expand Down
2 changes: 2 additions & 0 deletions cql3/expr/evaluate.hh
Expand Up @@ -11,6 +11,7 @@
namespace cql3 {

class query_options;
class raw_value;

}

Expand All @@ -26,6 +27,7 @@ struct evaluation_inputs {
const query_options* options = nullptr;
std::span<const api::timestamp_type> static_and_regular_timestamps; // indexes match `selection` member
std::span<const int32_t> static_and_regular_ttls; // indexes match `selection` member
std::span<const cql3::raw_value> temporaries; // indexes match temporary::index
};

// Takes a prepared expression and calculates its value.
Expand Down
38 changes: 36 additions & 2 deletions cql3/expr/expr-utils.hh
Expand Up @@ -352,5 +352,39 @@ bool has_only_eq_binops(const expression&);
data_type column_mutation_attribute_type(const column_mutation_attribute& e);



}
// How deep aggregations are nested. e.g. sum(avg(count(col))) == 3
unsigned aggregation_depth(const cql3::expr::expression& e);

// Make sure evey column_value is nested in exactly `depth` aggregations, by adding
// first() calls at the deepest level. e.g. if depth=3, then
//
// my_agg(sum(x), y)
//
// becomes
//
// my_agg(sum(first(x)), first(first(y)))
//
cql3::expr::expression levellize_aggregation_depth(const cql3::expr::expression& e, unsigned depth);


struct aggregation_split_result {
std::vector<expression> inner_loop;
std::vector<expression> outer_loop;
std::vector<cql3::raw_value> initial_values_for_temporaries; // same size as inner_loop
};

// Given a vector of aggergation expressions, split them into an inner loop that
// calls the aggregating function on each input row, and an outer loop that calls
// the final function on temporaries and generate the result.
//
// inner_loop should be evaluated with for each input row in a group, and its
// results stored in temporaries seeded from initial_values_for_temporaries
//
// outer_loop should be evaluated once for each group, just with temporaries
// as input.
//
// If the expressions don't contain aggregates, inner_loop and initial_values_for_temporaries
// are empty, and outer_loop should be evaluated for each loop.
aggregation_split_result split_aggregation(std::span<const expression> aggregation);

}

0 comments on commit ec77172

Please sign in to comment.