Skip to content

Commit

Permalink
storage_proxy: Make split_stats resilient to being called from differ…
Browse files Browse the repository at this point in the history
…ent scheduling group

Fixes scylladb#11017

When doing writes, storage proxy creates types deriving from abstract_write_response_handler.
These are created in the various scheduling groups executing the write inducing code. They
pick up a group-local reference to the various metrics used by SP. Normally all code
using (and esp. modifying) these metrics are executed in the same scheduling group.
However, if gossip sees a node go down, it will notify listeners, which eventually
calls get_ep_stat and register_metrics.
This code (before this patch) uses _active_ scheduling group to eventually add
metrics, using a local dict as guard against double regs. If, as described above,
we're called in a different sched group than the original one however, this
can cause double registrations.

Fixed here by keeping a reference to creating scheduling group and using this, not
active one, when/if creating new metrics.
  • Loading branch information
Calle Wilund committed Jul 11, 2023
1 parent e7a5c13 commit fd78f4f
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 4 deletions.
13 changes: 9 additions & 4 deletions service/storage_proxy.cc
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,12 @@ static const seastar::metrics::label op_type_label("op_type");
static const seastar::metrics::label scheduling_group_label("scheduling_group_name");
static const seastar::metrics::label rejected_by_coordinator_label("rejected_by_coordinator");

seastar::metrics::label_instance make_scheduling_group_label(const scheduling_group& sg) {
return scheduling_group_label(sg.name());
}

seastar::metrics::label_instance current_scheduling_group_label() {
return scheduling_group_label(current_scheduling_group().name());
return make_scheduling_group_label(current_scheduling_group());
}

}
Expand Down Expand Up @@ -1509,7 +1513,8 @@ storage_proxy_stats::split_stats::split_stats(const sstring& category, const sst
, _long_description_prefix(long_description_prefix)
, _category(category)
, _op_type(op_type)
, _auto_register_metrics(auto_register_metrics) { }
, _auto_register_metrics(auto_register_metrics)
, _sg(current_scheduling_group()) { }

storage_proxy_stats::write_stats::write_stats()
: writes_attempts(COORDINATOR_STATS_CATEGORY, "total_write_attempts", "total number of write requests", "mutation_data")
Expand Down Expand Up @@ -1817,7 +1822,7 @@ void storage_proxy_stats::split_stats::register_metrics_local() {

_metrics.add_group(_category, {
sm::make_counter(_short_description_prefix + sstring("_local_node"), [this] { return _local.val; },
sm::description(_long_description_prefix + "on a local Node"), {storage_proxy_stats::current_scheduling_group_label(), op_type_label(_op_type)})
sm::description(_long_description_prefix + "on a local Node"), {storage_proxy_stats::make_scheduling_group_label(_sg), op_type_label(_op_type)})
});
}

Expand All @@ -1829,7 +1834,7 @@ void storage_proxy_stats::split_stats::register_metrics_for(sstring dc, gms::ine
if (auto [ignored, added] = _dc_stats.try_emplace(dc); added) {
_metrics.add_group(_category, {
sm::make_counter(_short_description_prefix + sstring("_remote_node"), [this, dc] { return _dc_stats[dc].val; },
sm::description(seastar::format("{} when communicating with external Nodes in DC {}", _long_description_prefix, dc)), {storage_proxy_stats::current_scheduling_group_label(), datacenter_label(dc), op_type_label(_op_type)})
sm::description(seastar::format("{} when communicating with external Nodes in DC {}", _long_description_prefix, dc)), {storage_proxy_stats::make_scheduling_group_label(_sg), datacenter_label(dc), op_type_label(_op_type)})
});
}
}
Expand Down
1 change: 1 addition & 0 deletions service/storage_proxy_stats.hh
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ private:
// whether to register per-endpoint metrics automatically
bool _auto_register_metrics;

scheduling_group _sg;
public:
/**
* @param category a statistics category, e.g. "client" or "replica"
Expand Down

0 comments on commit fd78f4f

Please sign in to comment.