Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 5.4] Raft snapshot fixes #17123

Merged

Conversation

kbr-scylla
Copy link
Contributor

Backports required to fix #16683 in 5.4:

  • add an API to trigger Raft snapshot
  • use the API when we restart and see that the existing snapshot is at index 0, to trigger a new one --- in order to fix broken deployments that already bootstrapped with index-0 snapshot (we may get such deployments by upgrading from 5.2)

denesb and others added 4 commits February 2, 2024 12:35
This allows the user of `raft::server` to cause it to create a snapshot
and truncate the Raft log (leaving no trailing entries; in the future we
may extend the API to specify number of trailing entries left if
needed). In a later commit we'll add a REST endpoint to Scylla to
trigger group 0 snapshots.

One use case for this API is to create group 0 snapshots in Scylla
deployments which upgraded to Raft in version 5.2 and started with an
empty Raft log with no snapshot at the beginning. This causes problems,
e.g. when a new node bootstraps to the cluster, it will not receive a
snapshot that would contain both schema and group 0 history, which would
then lead to inconsistent schema state and trigger assertion failures as
observed in scylladb#16683.

In 5.4 the logic of initial group 0 setup was changed to start the Raft
log with a snapshot at index 1 (ff386e7)
but a problem remains with these existing deployments coming from 5.2,
we need a way to trigger a snapshot in them (other than performing 1000
arbitrary schema changes).

Another potential use case in the future would be to trigger snapshots
based on external memory pressure in tablet Raft groups (for strongly
consistent tables).

The PR adds the API to `raft::server` and a HTTP endpoint that uses it.

In a follow-up PR, we plan to modify group 0 server startup logic to automatically
call this API if it sees that no snapshot is present yet (to automatically
fix the aforementioned 5.2 deployments once they upgrade.)

Closes scylladb#16816

* github.com:scylladb/scylladb:
  raft: remove `empty()` from `fsm_output`
  test: add test for manual triggering of Raft snapshots
  api: add HTTP endpoint to trigger Raft snapshots
  raft: server: add `trigger_snapshot` API
  raft: server: track last persisted snapshot descriptor index
  raft: server: framework for handling server requests
  raft: server: inline `poll_fsm_output`
  raft: server: fix indentation
  raft: server: move `io_fiber`'s processing of `batch` to a separate function
  raft: move `poll_output()` from `fsm` to `server`
  raft: move `_sm_events` from `fsm` to `server`
  raft: fsm: remove constructor used only in tests
  raft: fsm: move trace message from `poll_output` to `has_output`
  raft: fsm: extract `has_output()`
  raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor`
  raft: server: pass `*_aborted` to `set_exception` call

(cherry picked from commit d202d32)

Backport note: the HTTP API is only started if raft_group_registry is
started.
… from Kamil Braun

The persisted snapshot index may be 0 if the snapshot was created in
older version of Scylla, which means snapshot transfer won't be
triggered to a bootstrapping node. Commands present in the log may not
cover all schema changes --- group 0 might have been created through the
upgrade upgrade procedure, on a cluster with existing schema. So a
deployment with index=0 snapshot is broken and we need to fix it. We can
use the new `raft::server::trigger_snapshot` API for that.

Also add a test.

Fixes scylladb#16683

Closes scylladb#17072

* github.com:scylladb/scylladb:
  test: add test for fixing a broken group 0 snapshot
  raft_group0: trigger snapshot if existing snapshot index is 0

(cherry picked from commit 181f68f)
Add workaround for scylladb/python-driver#295.

Also an assert made at the end of the test was false, it is fixed with
appropriate comment added.

(cherry picked from commit 74bf60a)
At the end of the test, we wait until a restarted node receives a
snapshot from the leader, and then verify that the log has been
truncated.

To check the snapshot, the test used the `system.raft_snapshots` table,
while the log is stored in `system.raft`.

Unfortunately, the two tables are not updated atomically when Raft
persists a snapshot (scylladb#9603). We first update
`system.raft_snapshots`, then `system.raft` (see
`raft_sys_table_storage::store_snapshot_descriptor`). So after the wait
finishes, there's no guarantee the log has been truncated yet -- there's
a race between the test's last check and Scylla doing that last delete.

But we can check the snapshot using `system.raft` instead of
`system.raft_snapshots`, as `system.raft` has the latest ID. And since
1640f83, storing that ID and truncating
the log in `system.raft` happens atomically.

Closes scylladb#17106

(cherry picked from commit c911bf1)
@kbr-scylla
Copy link
Contributor Author

Fixed some imports in backported test

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Sanity Tests
✅ - Unit Tests

Build Details:

  • Duration: 2 hr 35 min
  • Builder: spider6.cloudius-systems.com

@mykaul mykaul added this to the 5.4.2 milestone Feb 4, 2024
@kbr-scylla
Copy link
Contributor Author

@gleb-cloudius please also approve this one (it's much simpler than the 5.2 one)

@kbr-scylla
Copy link
Contributor Author

@denesb can you please look at this too?

@kbr-scylla
Copy link
Contributor Author

or @tgrabiec

@dani-tweig dani-tweig modified the milestones: 5.4.2, 5.4.3 Feb 6, 2024
@kbr-scylla kbr-scylla added the Field-Tier1 Urgent issues as per FieldEngineering request label Feb 6, 2024
@kbr-scylla
Copy link
Contributor Author

@scylladb/scylla-maint please review/merge

@denesb
Copy link
Contributor

denesb commented Feb 7, 2024

@scylladb/scylla-maint please review/merge

You are allowed to merge your own backports.

@denesb
Copy link
Contributor

denesb commented Feb 7, 2024

Queued.

@kbr-scylla
Copy link
Contributor Author

Hm, you're right... Thanks.

@scylladb-promoter scylladb-promoter merged commit 8080c15 into scylladb:branch-5.4 Feb 7, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/raft Field-Tier1 Urgent issues as per FieldEngineering request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bootstrap fails due to inconsistent schema after group0 catch up when table was recreated in the past
6 participants