Skip to content

Commit

Permalink
guardrails: restrict replication factor (RF)
Browse files Browse the repository at this point in the history
Replacing `minimum_keyspace_rf` config option with 4 config options:
`{minimum,maximum}_replication_factor_{warn,fail}_threshold`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on RF when creating/altering a keyspace.
The reason to rather replace than extend `minimum_keyspace_rf` config
option is to be aligned with Cassandra, which did the same, and has the
same parameters' names.
Only min soft limit is enabled by default and it is set to 3, which means
that we'll generate a CQL warning whenever RF is set to either 1 or 2.
RF's value of 0 is always allowed and means that there will not be any
replicas on a given DC. This was agreed with PM.
Because we don't allow to change guardrails' values when scylla is
running (per PM), there're no tests provided with this PR, and dtests will be
provided separately.
Exceeding guardrails' thresholds will be tracked by metrics.

Resolves #8619
Refs #8892 (the RF part, not the replication-strategy part)

Closes #14262
  • Loading branch information
ptrsmrn authored and nyh committed Sep 4, 2023
1 parent 06c0c6a commit eb46f1b
Show file tree
Hide file tree
Showing 8 changed files with 99 additions and 17 deletions.
14 changes: 14 additions & 0 deletions conf/scylla.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -605,3 +605,17 @@ strict_is_not_null_in_views: true
# configuration parameters in runtime, e.g. sending SIGHUP or using API. This option should be set to false
# e.g. for cloud users, for whom scylla's configuration should be changed only by support engineers.
# live_updatable_config_params_changeable_via_cql: true

# ****************
# * GUARDRAILS *
# ****************

# Guardrails to warn or fail when Replication Factor is smaller/greater than the threshold.
# Please note that the value of 0 is always allowed,
# which means that having no replication at all, i.e. RF = 0, is always valid.
# A guardrail value smaller than 0, e.g. -1, means that the guardrail is disabled.
# Commenting out a guardrail also means it is disabled.
# minimum_replication_factor_fail_threshold: -1
# minimum_replication_factor_warn_threshold: 3
# maximum_replication_factor_warn_threshold: -1
# maximum_replication_factor_fail_threshold: -1
25 changes: 24 additions & 1 deletion cql3/query_processor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -452,8 +452,31 @@ query_processor::query_processor(service::storage_proxy& proxy, data_dictionary:
"unpaged_select_queries_per_ks",
_cql_stats.unpaged_select_queries(ks_selector::SYSTEM),
sm::description("Counts the number of unpaged CQL SELECT requests against particular keyspaces."),
{system_ks_label_instance})
{system_ks_label_instance}),

sm::make_counter(
"minimum_replication_factor_fail_violations",
_cql_stats.minimum_replication_factor_fail_violations,
sm::description("Counts the number of minimum_replication_factor_fail_threshold guardrail violations, "
"i.e. attempts to create a keyspace with RF on one of the DCs below the set guardrail.")),

sm::make_counter(
"minimum_replication_factor_warn_violations",
_cql_stats.minimum_replication_factor_warn_violations,
sm::description("Counts the number of minimum_replication_factor_warn_threshold guardrail violations, "
"i.e. attempts to create a keyspace with RF on one of the DCs below the set guardrail.")),

sm::make_counter(
"maximum_replication_factor_warn_violations",
_cql_stats.maximum_replication_factor_warn_violations,
sm::description("Counts the number of maximum_replication_factor_warn_threshold guardrail violations, "
"i.e. attempts to create a keyspace with RF on one of the DCs above the set guardrail.")),

sm::make_counter(
"maximum_replication_factor_fail_violations",
_cql_stats.maximum_replication_factor_fail_violations,
sm::description("Counts the number of maximum_replication_factor_fail_threshold guardrail violations, "
"i.e. attempts to create a keyspace with RF on one of the DCs above the set guardrail.")),
});

_mnotifier.register_listener(_migration_subscriber.get());
Expand Down
2 changes: 1 addition & 1 deletion cql3/statements/alter_keyspace_statement.cc
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ static logging::logger mylogger("alter_keyspace");

future<::shared_ptr<cql_transport::messages::result_message>>
cql3::statements::alter_keyspace_statement::execute(query_processor& qp, service::query_state& state, const query_options& options, std::optional<service::group0_guard> guard) const {
std::optional<sstring> warning = check_restricted_replication_strategy(qp, keyspace(), *_attrs);
std::optional<sstring> warning = check_restricted_replication_strategy(qp, keyspace(), *_attrs, qp.get_cql_stats());
return schema_altering_statement::execute(qp, state, options, std::move(guard)).then([warning = std::move(warning)] (::shared_ptr<messages::result_message> msg) {
if (warning) {
msg->add_warning(*warning);
Expand Down
56 changes: 44 additions & 12 deletions cql3/statements/create_keyspace_statement.cc
Original file line number Diff line number Diff line change
Expand Up @@ -142,13 +142,15 @@ future<> cql3::statements::create_keyspace_statement::grant_permissions_to_creat
std::optional<sstring> check_restricted_replication_strategy(
query_processor& qp,
const sstring& keyspace,
const ks_prop_defs& attrs)
const ks_prop_defs& attrs,
cql_stats& stats)
{
if (!attrs.get_replication_strategy_class()) {
return std::nullopt;
}
sstring replication_strategy = locator::abstract_replication_strategy::to_qualified_class_name(
*attrs.get_replication_strategy_class());
auto& topology = qp.proxy().get_token_metadata_ptr()->get_topology();
// SimpleStrategy is not recommended in any setup which already has - or
// may have in the future - multiple racks or DCs. So depending on how
// protective we are configured, let's prevent it or allow with a warning:
Expand All @@ -170,14 +172,14 @@ std::optional<sstring> check_restricted_replication_strategy(
case db::tri_mode_restriction_t::mode::FALSE:
// Scylla was configured to allow SimpleStrategy, but let's warn
// if it's used on a cluster which *already* has multiple DCs:
if (qp.proxy().get_token_metadata_ptr()->get_topology().get_datacenter_endpoints().size() > 1) {
if (topology.get_datacenter_endpoints().size() > 1) {
return "Using SimpleStrategy in a multi-datacenter environment is not recommended.";
}
break;
}
}
// The minimum_keyspace_rf configuration option can be used to forbid
// a lower replication factor. We assume that all numeric replication
// The {minimum,maximum}_replication_factor_{warn,fail}_threshold configuration option can be used to forbid
// a smaller/greater replication factor. We assume that all numeric replication
// options are replication factors - this is true for SimpleStrategy and
// NetworkTopologyStrategy but in the future if we add more strategies,
// we may need to limit this test only to specific options.
Expand All @@ -188,13 +190,43 @@ std::optional<sstring> check_restricted_replication_strategy(
for (auto opt : attrs.get_replication_options()) {
try {
auto rf = std::stol(opt.second);
if (rf > 0 && rf < qp.proxy().data_dictionary().get_config().minimum_keyspace_rf()) {
throw exceptions::configuration_exception(format(
"Replication factor {}={} is forbidden by the current "
"configuration setting of minimum_keyspace_rf={}. Please "
"increase replication factor, or lower minimum_keyspace_rf "
"set in the configuration.", opt.first, opt.second,
qp.proxy().data_dictionary().get_config().minimum_keyspace_rf()));
if (rf > 0) {
if (auto min_fail = qp.proxy().data_dictionary().get_config().minimum_replication_factor_fail_threshold();
min_fail >= 0 && rf < min_fail) {
++stats.minimum_replication_factor_fail_violations;
throw exceptions::configuration_exception(format(
"Replication Factor {}={} is forbidden by the current "
"configuration setting of minimum_replication_factor_fail_threshold={}. Please "
"increase replication factor, or lower minimum_replication_factor_fail_threshold "
"set in the configuration.", opt.first, rf,
qp.proxy().data_dictionary().get_config().minimum_replication_factor_fail_threshold()));
}
else if (auto max_fail = qp.proxy().data_dictionary().get_config().maximum_replication_factor_fail_threshold();
max_fail >= 0 && rf > max_fail) {
++stats.maximum_replication_factor_fail_violations;
throw exceptions::configuration_exception(format(
"Replication Factor {}={} is forbidden by the current "
"configuration setting of maximum_replication_factor_fail_threshold={}. Please "
"decrease replication factor, or increase maximum_replication_factor_fail_threshold "
"set in the configuration.", opt.first, rf,
qp.proxy().data_dictionary().get_config().maximum_replication_factor_fail_threshold()));
}
else if (auto min_warn = qp.proxy().data_dictionary().get_config().minimum_replication_factor_warn_threshold();
min_warn >= 0 && rf < min_warn)
{
++stats.minimum_replication_factor_warn_violations;
return format("Using Replication Factor {}={} lower than the "
"minimum_replication_factor_warn_threshold={} is not recommended.",
opt.first, rf, qp.proxy().data_dictionary().get_config().minimum_replication_factor_warn_threshold());
}
else if (auto max_warn = qp.proxy().data_dictionary().get_config().maximum_replication_factor_warn_threshold();
max_warn >= 0 && rf > max_warn)
{
++stats.maximum_replication_factor_warn_violations;
return format("Using Replication Factor {}={} greater than the "
"maximum_replication_factor_warn_threshold={} is not recommended.",
opt.first, rf, qp.proxy().data_dictionary().get_config().maximum_replication_factor_warn_threshold());
}
}
} catch (std::invalid_argument&) {
} catch (std::out_of_range& ) {
Expand All @@ -205,7 +237,7 @@ std::optional<sstring> check_restricted_replication_strategy(

future<::shared_ptr<messages::result_message>>
create_keyspace_statement::execute(query_processor& qp, service::query_state& state, const query_options& options, std::optional<service::group0_guard> guard) const {
std::optional<sstring> warning = check_restricted_replication_strategy(qp, keyspace(), *_attrs);
std::optional<sstring> warning = check_restricted_replication_strategy(qp, keyspace(), *_attrs, qp.get_cql_stats());
return schema_altering_statement::execute(qp, state, options, std::move(guard)).then([warning = std::move(warning)] (::shared_ptr<messages::result_message> msg) {
if (warning) {
msg->add_warning(*warning);
Expand Down
3 changes: 2 additions & 1 deletion cql3/statements/create_keyspace_statement.hh
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,8 @@ public:
std::optional<sstring> check_restricted_replication_strategy(
query_processor& qp,
const sstring& keyspace,
const ks_prop_defs& attrs);
const ks_prop_defs& attrs,
cql_stats& stats);

}

Expand Down
5 changes: 5 additions & 0 deletions cql3/stats.hh
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,11 @@ struct cql_stats {
int64_t select_partition_range_scan_no_bypass_cache = 0;
int64_t select_parallelized = 0;

uint64_t minimum_replication_factor_fail_violations = 0;
uint64_t minimum_replication_factor_warn_violations = 0;
uint64_t maximum_replication_factor_warn_violations = 0;
uint64_t maximum_replication_factor_fail_violations = 0;

private:
uint64_t _unpaged_select_queries[(size_t)ks_selector::SIZE] = {0ul};
uint64_t _query_cnt[(size_t)source_selector::SIZE]
Expand Down
5 changes: 4 additions & 1 deletion db/config.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1015,7 +1015,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
, wasm_udf_memory_limit(this, "wasm_udf_memory_limit", value_status::Used, 2*1024*1024, "How much memory each WASM UDF can allocate at most")
, relabel_config_file(this, "relabel_config_file", value_status::Used, "", "Optionally, read relabel config from file")
, object_storage_config_file(this, "object_storage_config_file", value_status::Used, "", "Optionally, read object-storage endpoints config from file")
, minimum_keyspace_rf(this, "minimum_keyspace_rf", liveness::LiveUpdate, value_status::Used, 0, "The minimum allowed replication factor when creating or altering a keyspace.")
, live_updatable_config_params_changeable_via_cql(this, "live_updatable_config_params_changeable_via_cql", liveness::MustRestart, value_status::Used, true, "If set to true, configuration parameters defined with LiveUpdate can be updated in runtime via CQL (by updating system.config virtual table), otherwise they can't.")
, auth_superuser_name(this, "auth_superuser_name", value_status::Used, "",
"Initial authentication super username. Ignored if authentication tables already contain a super user")
Expand All @@ -1024,6 +1023,10 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"Ignored if authentication tables already contain a super user password.")
, auth_certificate_role_queries(this, "auth_certificate_role_queries", value_status::Used, { { { "source", "SUBJECT" }, {"query", "CN=([^,]+)" } } },
"Regular expression used by CertificateAuthenticator to extract role name from an accepted transport authentication certificate subject info.")
, minimum_replication_factor_fail_threshold(this, "minimum_replication_factor_fail_threshold", liveness::LiveUpdate, value_status::Used, -1, "")
, minimum_replication_factor_warn_threshold(this, "minimum_replication_factor_warn_threshold", liveness::LiveUpdate, value_status::Used, 3, "")
, maximum_replication_factor_warn_threshold(this, "maximum_replication_factor_warn_threshold", liveness::LiveUpdate, value_status::Used, -1, "")
, maximum_replication_factor_fail_threshold(this, "maximum_replication_factor_fail_threshold", liveness::LiveUpdate, value_status::Used, -1, "")
, error_injections_at_startup(this, "error_injections_at_startup", error_injection_value_status, {}, "List of error injections that should be enabled on startup.")
, default_log_level(this, "default_log_level", value_status::Used)
, logger_log_level(this, "logger_log_level", value_status::Used)
Expand Down
6 changes: 5 additions & 1 deletion db/config.hh
Original file line number Diff line number Diff line change
Expand Up @@ -441,7 +441,6 @@ public:
// configuring the Seastar memory subsystem.
static constexpr size_t wasm_udf_reserved_memory = 50 * 1024 * 1024;

named_value<unsigned> minimum_keyspace_rf;
named_value<bool> live_updatable_config_params_changeable_via_cql;
bool are_live_updatable_config_params_changeable_via_cql() const override {
return live_updatable_config_params_changeable_via_cql();
Expand All @@ -453,6 +452,11 @@ public:

named_value<std::vector<std::unordered_map<sstring, sstring>>> auth_certificate_role_queries;

named_value<int> minimum_replication_factor_fail_threshold;
named_value<int> minimum_replication_factor_warn_threshold;
named_value<int> maximum_replication_factor_warn_threshold;
named_value<int> maximum_replication_factor_fail_threshold;

seastar::logging_settings logging_settings(const log_cli::options&) const;

const db::extensions& extensions() const;
Expand Down

1 comment on commit eb46f1b

@annastuchlik
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit removes the minimum_keyspace_rf option, but the documentation is not updated. This PR will fix it (the option needs to be removed from the CREATE KEYSPACE section): #18760

Please sign in to comment.