-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.peers: enforce host_id #16376
System.peers: enforce host_id #16376
Conversation
🟢 CI State: SUCCESS✅ - Build Build Details:
|
7965a1a
to
de4a404
Compare
In v2 (de4a404):
|
🔴 CI State: FAILURE✅ - Build Failed Tests (6/23439):
Build Details:
|
de4a404
to
1636d77
Compare
In v3 (1636d77):
|
🟢 CI State: SUCCESS✅ - Build Build Details:
|
How does this work during upgrade?
|
cql3/query_processor.cc
Outdated
future<::shared_ptr<untyped_result_set>> | ||
query_processor::execute_with_params( | ||
statements::prepared_statement::checked_weak_ptr p, | ||
db::consistency_level cl, | ||
service::query_state& query_state, | ||
const std::initializer_list<data_value>& values) { | ||
const Range& values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be done more cleanly by having a data_value_list
type that's a wrapper over std::vector<raw_value>
. Give that two non-explicit constructors - from std::initializer_list<data_value>
and your range. Then execute_with_params() passes data_value_list
to make_internal_options. This limits the templating to a small area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds promising. I'll try that
types/types.hh
Outdated
|
||
class data_values_map { | ||
std::unordered_map<std::string, data_value> _map; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why std::string and not sstring?
Why a wrapper over a map? Can use the map type directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why sstring? I thought we're trying to stick with the standard when there's no special reason to use sstring (like the uninitialized construct case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a wrapper over a map? Can use the map type directly.
To make the update case simpler.
With bare map m["foo"] = "bar"
doesn't work because data_value can't be constructed this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unordered_map::insert_or_assign()
If you're solving a common problem, look for a common solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why sstring? I thought we're trying to stick with the standard when there's no special reason to use sstring (like the uninitialized construct case)
We're using sstring over std::string to avoid conversions when we do need sstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the main argument for sstring
over std::string
is performance, since the former doesn't have to be thread safe. Isn't this the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recent (C++11) std::string is also not using copy-on-write. But it's still slower than sstring (sstring doesn't have a separate capacity word).
So we use sstring where performance is important, but also generally to avoid conversions.
types/types.hh
Outdated
return true; | ||
} | ||
it->second = std::move(value); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this operator[]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish, but the compiler is playing hard to get (or actually hard to set in this case :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's because the value is not default constructible.
There's insert_or_assign() though.
db/system_keyspace.cc
Outdated
|
||
auto req = fmt::format("INSERT INTO system.{} ({}) VALUES ({})", PEERS, fmt::join(column_names, ","), values_ph_str); | ||
slogger.debug("INSERT INTO system.{} ({}) VALUES ({})", PEERS, fmt::join(column_names, ","), fmt::join(data_values, ",")); | ||
co_await execute_cql_query(req, data_values).discard_result(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generally bad practice to generate CQL on the fly. CQL represents updates to structured data.
If you need to merge two updates, use BATCH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would involve batchlog, wouldn't it?
The whole point of this series is to update system.peers with a single commitlog write.
Would we lose that if we use a batch insert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, single partition batch doesn't touch batchlog. It's just a way to generate a mutation.
insert_string("supported_features", value); | ||
break; | ||
default: | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use BATCH, or the ugly UNSET feature thing.
service/storage_service.cc
Outdated
info.update("rack", rs.rack); | ||
info.update("host_id", id.uuid()); | ||
info.update("release_version", rs.release_version); | ||
co_await _sys_ks.local().update_peer_info(ip, std::move(info)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a single fixed CQL statement (apart from merging the table name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true
We can pass CQL to uodate_peer_info rather than a column/value map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing CQL strings is also not the cleanest.
future<std::unordered_map<gms::inet_address, std::unordered_set<dht::token>>> system_keyspace::load_tokens() { | ||
sstring req = format("SELECT peer, tokens FROM system.{}", PEERS); | ||
sstring req = format("SELECT peer, host_id, tokens FROM system.{}", PEERS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this handle mixed clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All supported versions already store host_id in system.peers. It shouldn't be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When did we start writing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4311662 called update_peer_info
from storage_service::handle_state_normal
back in 0.10
.
Note the use of gossiper::uses_host_id
which was true from inception (See 6e83e54 where I got rid of it)
The pretty-much initial implementation of prepare_to_join
at efe067f already populated the HOST_ID
in the endpoint state, and it is sent in gossip also basically from inception (eed2795).
e86d39f moved the call to update_peer_info
a bit later in handle_state_normal
, when the node becomes a normal token owner (in branch 4.6)
but it doesn't matter since we've wrote the host_id to system.peers even before that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so no compatibility issues.
We're already populating the host_id in system.peers in the normal path. This series just enforces it. |
Since which version? |
See #16376 (comment) |
Please note it in the cover letter. |
1636d77
to
0e20709
Compare
When adding a peer via update_peer_info, insert all columns in a single query using system_keyspace::peer_info. This ensures that `host_id` is inserted along with all other app states, so we can rely on it when loading the peer info after restart. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is unused now after the previous patch to update_peer_info in one call. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Skip rows that have no host_id to make sure the node state we load always has a valid host_id. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
None of the subscribers is doing anything before_change. This is done before changing `on_change` in the following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than calling on_change for each particular application_state, pass an endpoint_state::map_type with all changed states, to be processed as a batch. In particular, thise allows storage_service::on_change to update_peer_info once for all changed states. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is no longer used after previous patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
They are no longer used. Instead, all callers now pass peer_info. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
State changes are processed as a batch and there is no reason to maintain them as an ordered map. Instead, use a std::unordered_map that is more efficient. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
a6cff77
to
cdd5605
Compare
In v6 (cdd5605):
|
std::visit(overloaded_functor { | ||
[&] (const data_value& v) { | ||
if (v.type() == bytes_type) { | ||
bound_values.values.emplace_back(cql3::raw_value::make_value(value_cast<bytes>(v))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this special case is probably a pessimization, since it forces linearization. It's not critical since I don't expect huge values here, but it would be nice to remove it, separately.
} | ||
}; | ||
|
||
using data_value_list = std::initializer_list<data_value_or_unset>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm impressed the compiler is able to implicitly convert through so many layers.
}, | ||
[&ctx] (const unset_value& u) { | ||
return fmt::format_to(ctx.out(), "{}", u); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two could have been merged ([&] (const auto& v) { ... }). But no matter.
🟢 CI State: SUCCESS✅ - Build Build Details:
|
@@ -19,7 +19,7 @@ | |||
|
|||
namespace gms { | |||
|
|||
using application_state_map = std::map<application_state, versioned_value>; | |||
using application_state_map = std::unordered_map<application_state, versioned_value>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if idl supports it.
It's a matter of just copy-pasting (or templatizing) this std::map serializer. There is already absl::btree_set
support there.
Note: there is no compatibility issue here since std::map
and std::unordered_map
are serialised/deserialized in the same way. The same could be true for absl::flat_hash_map
.
@@ -763,7 +763,7 @@ std::pair<std::reference_wrapper<struct query_processor::remote>, gate::holder> | |||
|
|||
query_options query_processor::make_internal_options( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: commit message:
data_value_list is a wrapper around std::initializer_list<data_value>.
doesn't correspond to the code (data_value_list
is defined to be equal to std::initializer_list<data_value>;
in this commit)
@@ -771,27 +771,37 @@ query_options query_processor::make_internal_options( | |||
format("Invalid number of values. Expecting {:d} but got {:d}", p->bound_names.size(), values.size())); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: commit message:
Add overloads for execute_internal and friends
accepting a vector of optional<data_value>.
doesn't correspond to the code
@@ -485,8 +486,9 @@ future<> storage_service::topology_state_load() { | |||
if (rs.ring.has_value()) { | |||
if (!is_me(ip)) { | |||
// Save ip -> id mapping in peers table because we need it on restart, but do not save tokens until owned | |||
co_await _sys_ks.local().update_tokens(ip, {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we lost co_await _sys_ks.local().update_tokens(ip, {});
here. I'm not sure if it's important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if no tokens are stored yet' then storing an empty set is equivalent (yet I think it's better to store no tokens than to store an empty set).
If there were tokens than storing an empty set here is definitely wrong. The mode should be decommissioned or removed to lose its tokens.
It's really out of scope for this PR. |
When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following scylladb#16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following scylladb#16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following scylladb#16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following scylladb#16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following scylladb#16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following scylladb#16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following scylladb#16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following scylladb#16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following scylladb#16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following scylladb#16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The HOST_ID is already written to system.peers since inception pretty much (See #16376 (comment) for details).
However, it is written to the table using an individual CQL query and so it is not set atomically with other columns.
If scylla crashes or even hits an exception before updating the host_id, then system.peers might be left in an inconsistent state, and in particular without no HOST_ID value.
This series makes sure that HOST_ID is written to system.peers and use it to "seal" the record by upserting it in a single CQL BATCH query when adding the state for new nodes.
On the read side, skip rows that have no HOST_ID state in system.peers, assuming they are incomplete, i.e. scylla got an exception or crashed while writing them, so they can't be trusted.
With that change we can assume that endpoint state loaded from system.peers will always have a valid host_id.
Refs #15903