storage_proxy: Make split_stats resilient to being called from differ… #14636

elcallio · 2023-07-11T12:14:51Z

…ent scheduling group

When doing writes, storage proxy creates types deriving from abstract_write_response_handler. These are created in the various scheduling groups executing the write inducing code. They pick up a group-local reference to the various metrics used by SP. Normally all code using (and esp. modifying) these metrics are executed in the same scheduling group. However, if gossip sees a node go down, it will notify listeners, which eventually calls get_ep_stat and register_metrics.
This code (before this patch) uses active scheduling group to eventually add metrics, using a local dict as guard against double regs. If, as described above, we're called in a different sched group than the original one however, this can cause double registrations.

Fixed here by keeping a reference to creating scheduling group and using this, not active one, when/if creating new metrics.

5.2 backport version

…ent scheduling group Fixes scylladb#11017 When doing writes, storage proxy creates types deriving from abstract_write_response_handler. These are created in the various scheduling groups executing the write inducing code. They pick up a group-local reference to the various metrics used by SP. Normally all code using (and esp. modifying) these metrics are executed in the same scheduling group. However, if gossip sees a node go down, it will notify listeners, which eventually calls get_ep_stat and register_metrics. This code (before this patch) uses _active_ scheduling group to eventually add metrics, using a local dict as guard against double regs. If, as described above, we're called in a different sched group than the original one however, this can cause double registrations. Fixed here by keeping a reference to creating scheduling group and using this, not active one, when/if creating new metrics.

scylladb-promoter · 2023-07-11T14:22:37Z

CI state SUCCESS - https://jenkins.scylladb.com/job/scylla-5.2/job/scylla-ci/27/

…ent scheduling group Fixes #11017 When doing writes, storage proxy creates types deriving from abstract_write_response_handler. These are created in the various scheduling groups executing the write inducing code. They pick up a group-local reference to the various metrics used by SP. Normally all code using (and esp. modifying) these metrics are executed in the same scheduling group. However, if gossip sees a node go down, it will notify listeners, which eventually calls get_ep_stat and register_metrics. This code (before this patch) uses _active_ scheduling group to eventually add metrics, using a local dict as guard against double regs. If, as described above, we're called in a different sched group than the original one however, this can cause double registrations. Fixed here by keeping a reference to creating scheduling group and using this, not active one, when/if creating new metrics. Closes #14636

denesb · 2023-07-12T06:26:08Z

Queued.

denesb · 2023-07-12T12:26:41Z

Promoted.

elcallio requested a review from denesb July 11, 2023 12:15

denesb closed this Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage_proxy: Make split_stats resilient to being called from differ… #14636

storage_proxy: Make split_stats resilient to being called from differ… #14636

elcallio commented Jul 11, 2023

scylladb-promoter commented Jul 11, 2023

denesb commented Jul 12, 2023

denesb commented Jul 12, 2023

storage_proxy: Make split_stats resilient to being called from differ… #14636

storage_proxy: Make split_stats resilient to being called from differ… #14636

Conversation

elcallio commented Jul 11, 2023

scylladb-promoter commented Jul 11, 2023

denesb commented Jul 12, 2023

denesb commented Jul 12, 2023