Skip to content

Commit

Permalink
Merge 'raft_group0: trigger snapshot if existing snapshot index is 0'…
Browse files Browse the repository at this point in the history
… from Kamil Braun

The persisted snapshot index may be 0 if the snapshot was created in
older version of Scylla, which means snapshot transfer won't be
triggered to a bootstrapping node. Commands present in the log may not
cover all schema changes --- group 0 might have been created through the
upgrade upgrade procedure, on a cluster with existing schema. So a
deployment with index=0 snapshot is broken and we need to fix it. We can
use the new `raft::server::trigger_snapshot` API for that.

Also add a test.

Fixes scylladb#16683

Closes scylladb#17072

* github.com:scylladb/scylladb:
  test: add test for fixing a broken group 0 snapshot
  raft_group0: trigger snapshot if existing snapshot index is 0

(cherry picked from commit 181f68f)

Backport node: test_raft_fix_broken_snapshot had to be removed because
the "error injections enabled at startup" feature does not yet exist in
5.2.
  • Loading branch information
denesb authored and kbr-scylla committed Jan 31, 2024
1 parent 95009f4 commit c254ef0
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 1 deletion.
20 changes: 19 additions & 1 deletion service/raft/raft_group0.cc
Original file line number Diff line number Diff line change
Expand Up @@ -381,8 +381,26 @@ future<> raft_group0::start_server_for_group0(raft::group_id group0_id) {
// we ensure we haven't missed any IP update in the map.
load_initial_raft_address_map();
group0_log.info("Server {} is starting group 0 with id {}", my_id, group0_id);
co_await _raft_gr.start_server_for_group(create_server_for_group0(group0_id, my_id));
auto srv_for_group0 = create_server_for_group0(group0_id, my_id);
auto& persistence = srv_for_group0.persistence;
auto& server = *srv_for_group0.server;
co_await _raft_gr.start_server_for_group(std::move(srv_for_group0));
_group0.emplace<raft::group_id>(group0_id);

// Fix for scylladb/scylladb#16683:
// If the snapshot index is 0, trigger creation of a new snapshot
// so bootstrapping nodes will receive a snapshot transfer.
auto snap = co_await persistence.load_snapshot_descriptor();
if (snap.idx == raft::index_t{0}) {
group0_log.info("Detected snapshot with index=0, id={}, triggering new snapshot", snap.id);
bool created = co_await server.trigger_snapshot(&_abort_source);
if (created) {
snap = co_await persistence.load_snapshot_descriptor();
group0_log.info("New snapshot created, index={} id={}", snap.idx, snap.id);
} else {
group0_log.warn("Could not create new snapshot, there are no entries applied");
}
}
}

future<> raft_group0::join_group0(std::vector<gms::inet_address> seeds, bool as_voter) {
Expand Down
4 changes: 4 additions & 0 deletions service/raft/raft_sys_table_storage.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "cql3/untyped_result_set.hh"
#include "db/system_keyspace.hh"
#include "utils/UUID.hh"
#include "utils/error_injection.hh"

#include "serializer.hh"
#include "idl/raft_storage.dist.hh"
Expand Down Expand Up @@ -303,6 +304,9 @@ future<> raft_sys_table_storage::execute_with_linearization_point(std::function<

future<> raft_sys_table_storage::bootstrap(raft::configuration initial_configuation, bool nontrivial_snapshot) {
auto init_index = nontrivial_snapshot ? raft::index_t{1} : raft::index_t{0};
utils::get_local_injector().inject("raft_sys_table_storage::bootstrap/init_index_0", [&init_index] {
init_index = raft::index_t{0};
});
raft::snapshot_descriptor snapshot{.idx{init_index}};
snapshot.id = raft::snapshot_id::create_random_id();
snapshot.config = std::move(initial_configuation);
Expand Down

0 comments on commit c254ef0

Please sign in to comment.