[Backport 5.4] Raft snapshot fixes #17123

kbr-scylla · 2024-02-02T08:46:12Z

Backports required to fix #16683 in 5.4:

add an API to trigger Raft snapshot
use the API when we restart and see that the existing snapshot is at index 0, to trigger a new one --- in order to fix broken deployments that already bootstrapped with index-0 snapshot (we may get such deployments by upgrading from 5.2)

scylladb-promoter · 2024-02-02T11:11:13Z

🔴 CI State: FAILURE

✅ - Build
✅ - Sanity Tests
❌ - Unit Tests

Failed Tests (36/21125):

Build Details:

Duration: 2 hr 24 min
Builder: spider3.cloudius-systems.com

This allows the user of `raft::server` to cause it to create a snapshot and truncate the Raft log (leaving no trailing entries; in the future we may extend the API to specify number of trailing entries left if needed). In a later commit we'll add a REST endpoint to Scylla to trigger group 0 snapshots. One use case for this API is to create group 0 snapshots in Scylla deployments which upgraded to Raft in version 5.2 and started with an empty Raft log with no snapshot at the beginning. This causes problems, e.g. when a new node bootstraps to the cluster, it will not receive a snapshot that would contain both schema and group 0 history, which would then lead to inconsistent schema state and trigger assertion failures as observed in scylladb#16683. In 5.4 the logic of initial group 0 setup was changed to start the Raft log with a snapshot at index 1 (ff386e7) but a problem remains with these existing deployments coming from 5.2, we need a way to trigger a snapshot in them (other than performing 1000 arbitrary schema changes). Another potential use case in the future would be to trigger snapshots based on external memory pressure in tablet Raft groups (for strongly consistent tables). The PR adds the API to `raft::server` and a HTTP endpoint that uses it. In a follow-up PR, we plan to modify group 0 server startup logic to automatically call this API if it sees that no snapshot is present yet (to automatically fix the aforementioned 5.2 deployments once they upgrade.) Closes scylladb#16816 * github.com:scylladb/scylladb: raft: remove `empty()` from `fsm_output` test: add test for manual triggering of Raft snapshots api: add HTTP endpoint to trigger Raft snapshots raft: server: add `trigger_snapshot` API raft: server: track last persisted snapshot descriptor index raft: server: framework for handling server requests raft: server: inline `poll_fsm_output` raft: server: fix indentation raft: server: move `io_fiber`'s processing of `batch` to a separate function raft: move `poll_output()` from `fsm` to `server` raft: move `_sm_events` from `fsm` to `server` raft: fsm: remove constructor used only in tests raft: fsm: move trace message from `poll_output` to `has_output` raft: fsm: extract `has_output()` raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor` raft: server: pass `*_aborted` to `set_exception` call (cherry picked from commit d202d32) Backport note: the HTTP API is only started if raft_group_registry is started.

… from Kamil Braun The persisted snapshot index may be 0 if the snapshot was created in older version of Scylla, which means snapshot transfer won't be triggered to a bootstrapping node. Commands present in the log may not cover all schema changes --- group 0 might have been created through the upgrade upgrade procedure, on a cluster with existing schema. So a deployment with index=0 snapshot is broken and we need to fix it. We can use the new `raft::server::trigger_snapshot` API for that. Also add a test. Fixes scylladb#16683 Closes scylladb#17072 * github.com:scylladb/scylladb: test: add test for fixing a broken group 0 snapshot raft_group0: trigger snapshot if existing snapshot index is 0 (cherry picked from commit 181f68f)

Add workaround for scylladb/python-driver#295. Also an assert made at the end of the test was false, it is fixed with appropriate comment added. (cherry picked from commit 74bf60a)

At the end of the test, we wait until a restarted node receives a snapshot from the leader, and then verify that the log has been truncated. To check the snapshot, the test used the `system.raft_snapshots` table, while the log is stored in `system.raft`. Unfortunately, the two tables are not updated atomically when Raft persists a snapshot (scylladb#9603). We first update `system.raft_snapshots`, then `system.raft` (see `raft_sys_table_storage::store_snapshot_descriptor`). So after the wait finishes, there's no guarantee the log has been truncated yet -- there's a race between the test's last check and Scylla doing that last delete. But we can check the snapshot using `system.raft` instead of `system.raft_snapshots`, as `system.raft` has the latest ID. And since 1640f83, storing that ID and truncating the log in `system.raft` happens atomically. Closes scylladb#17106 (cherry picked from commit c911bf1)

kbr-scylla · 2024-02-02T12:03:12Z

Fixed some imports in backported test

scylladb-promoter · 2024-02-02T14:39:33Z

🟢 CI State: SUCCESS

✅ - Build
✅ - Sanity Tests
✅ - Unit Tests

Build Details:

Duration: 2 hr 35 min
Builder: spider6.cloudius-systems.com

kbr-scylla · 2024-02-05T09:03:32Z

@gleb-cloudius please also approve this one (it's much simpler than the 5.2 one)

kbr-scylla · 2024-02-05T09:14:13Z

@denesb can you please look at this too?

kbr-scylla · 2024-02-05T15:08:15Z

or @tgrabiec

kbr-scylla · 2024-02-07T09:34:24Z

@scylladb/scylla-maint please review/merge

denesb · 2024-02-07T09:52:08Z

@scylladb/scylla-maint please review/merge

You are allowed to merge your own backports.

denesb · 2024-02-07T09:53:16Z

Queued.

kbr-scylla · 2024-02-07T10:23:23Z

Hm, you're right... Thanks.

kbr-scylla requested a review from gleb-cloudius February 2, 2024 08:46

kbr-scylla requested review from nyh, tgrabiec and vladzcloudius as code owners February 2, 2024 08:46

kbr-scylla changed the base branch from master to branch-5.4 February 2, 2024 08:46

kbr-scylla mentioned this pull request Feb 2, 2024

Bootstrap fails due to inconsistent schema after group0 catch up when table was recreated in the past #16683

Closed

denesb and others added 4 commits February 2, 2024 12:35

test_raft_snapshot_request: fix flakiness

6a6a4fd

Add workaround for scylladb/python-driver#295. Also an assert made at the end of the test was false, it is fixed with appropriate comment added. (cherry picked from commit 74bf60a)

kbr-scylla force-pushed the backport-snapshot-fixes-5.4 branch from 1adfaff to 311e31b Compare February 2, 2024 12:02

mykaul added the area/raft label Feb 4, 2024

mykaul added this to the 5.4.2 milestone Feb 4, 2024

gleb-cloudius approved these changes Feb 5, 2024

View reviewed changes

dani-tweig modified the milestones: 5.4.2, 5.4.3 Feb 6, 2024

kbr-scylla added the Field-Tier1 Urgent issues as per FieldEngineering request label Feb 6, 2024

scylladb-promoter merged commit 8080c15 into scylladb:branch-5.4 Feb 7, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backport 5.4] Raft snapshot fixes #17123

[Backport 5.4] Raft snapshot fixes #17123

kbr-scylla commented Feb 2, 2024

scylladb-promoter commented Feb 2, 2024

kbr-scylla commented Feb 2, 2024

scylladb-promoter commented Feb 2, 2024

kbr-scylla commented Feb 5, 2024

kbr-scylla commented Feb 5, 2024

kbr-scylla commented Feb 5, 2024

kbr-scylla commented Feb 7, 2024

denesb commented Feb 7, 2024

denesb commented Feb 7, 2024

kbr-scylla commented Feb 7, 2024

[Backport 5.4] Raft snapshot fixes #17123

[Backport 5.4] Raft snapshot fixes #17123

Conversation

kbr-scylla commented Feb 2, 2024

scylladb-promoter commented Feb 2, 2024

🔴 CI State: FAILURE

Failed Tests (36/21125):

Build Details:

kbr-scylla commented Feb 2, 2024

scylladb-promoter commented Feb 2, 2024

🟢 CI State: SUCCESS

Build Details:

kbr-scylla commented Feb 5, 2024

kbr-scylla commented Feb 5, 2024

kbr-scylla commented Feb 5, 2024

kbr-scylla commented Feb 7, 2024

denesb commented Feb 7, 2024

denesb commented Feb 7, 2024

kbr-scylla commented Feb 7, 2024