-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsynchronized cross-shard memory operations caused by incorrectly used updateable_value #7310
Comments
Recently, the cql_server_config::max_concurrent_requests field was changed to be an updateable_value, so that it is updated when the corresponding option in Scylla's configuration is live-reloaded. Unfortunately, due to how cql_server is constructed, this caused cql_server instances on all shards to store an updateable_value which pointed to an updateable_value_source on shard 0. Unsynchronized cross-shard memory operations ensue. The fix changes the cql_server_config so that it holds a function which creates an updateable_value appropriate for the given shard. This pattern is similar to another, already existing option in the config: get_service_memory_limiter_semaphore. This fix can be reverted if updateable_value becomes safe to use across shards. Tests: unit(dev) Fixes: scylladb#7310
@avikivity About the comment above |
There's also another series of mine, already queued in next, which adds the same mistake to alternator. I'll temporarily dequeue it. |
updatable_value was explicitly written to support this use case (I know the author personally). It should be fixed, so it is easy and cheap to pass live configuration values across shards. |
Recently, the cql_server_config::max_concurrent_requests field was changed to be an updateable_value, so that it is updated when the corresponding option in Scylla's configuration is live-reloaded. Unfortunately, due to how cql_server is constructed, this caused cql_server instances on all shards to store an updateable_value which pointed to an updateable_value_source on shard 0. Unsynchronized cross-shard memory operations ensue. The fix changes the cql_server_config so that it holds a function which creates an updateable_value appropriate for the given shard. This pattern is similar to another, already existing option in the config: get_service_memory_limiter_semaphore. This fix can be reverted if updateable_value becomes safe to use across shards. Tests: unit(dev) Fixes: scylladb#7310
@avikivity I made an issue: #7316 |
Not in any release branch, so no backports needed. |
The bug was introduced in 4b856cf.
In
transport/controller.cc
, there is a function namedcontroller::do_start_server()
which creates acql_server
service. That function runs on shard 0, where it creates acql_server_config
object. This object is then used as an argument to start thecql_server
service:cserver->start(std::ref(cql3::get_query_processor()), std::ref(_auth_service), std::ref(_mnotifier), cql_server_config).get();
Calling
distributed<cql_server>::start
means that the configuration is copied to all shards and used as an argument forcql_server
's constructor.The commit mentioned at the beginning converted one of
cql_server_config
's parameters into anupdateable_value
. Theupdateable_value
is an object which tracks the value ofupdateable_value_source
(which can be a config parameter, for example) and is updated with source's value when it changes.The problem is that the
updateable_value
s sent to other shards are still associated with the source on shard 0. Because they are tracked through astd::vector
of pointers which is a member of the source, it can cause unsynchronized cross-shard reads and writes which is unsafe and may lead to strange bugs.The implementation of
updateable_value
is definitely not safe to have cross-shard references like that - unlike what the comment aboveupdateable_value
's definition would suggest:The text was updated successfully, but these errors were encountered: