Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR expands single-machine raft failure simulation coverage around transport-style faults and hardens one leader-side reply handling edge case that the new tests exposed.
What Changed
append_entries_replynot regressingnext_indexappend_entries_replybacktracking only above the matched prefixinstall_snapshot_replyforcing leader stepdownappend_entries_replycannot backtrack below the follower's known matched prefixappend_entries_replyrecovering on heartbeat retry without duplicating log entriesappend_entriesdelivery being idempotentinstall_snapshot_replyrecovering on retry without regressing follower stateinstall_snapshotdelivery being idempotentrequest_vote_replymessages recovering on the next election roundWhy
The earlier simulation work covered dropped delivery and restart continuity, but it still left important transport-fault classes under-tested:
These tests improve confidence in raft behavior on a single machine without needing real distributed fault injection.
Validation
All test runs were executed serially with
YOQ_SKIP_SLOW_TESTS=1.zig build test -Doptimize=ReleaseSafe -Dtest-filter='append_entries_reply'zig build test -Doptimize=ReleaseSafe -Dtest-filter='leader steps down on higher term in install_snapshot_reply'zig build test-sim -Doptimize=ReleaseSafe -Dtest-filter='dropped append_entries replies recover on heartbeat retry without duplicating log entries'zig build test-sim -Doptimize=ReleaseSafe -Dtest-filter='duplicate append_entries delivery is idempotent'zig build test-sim -Doptimize=ReleaseSafe -Dtest-filter='dropped install_snapshot reply recovers on retry without regressing follower state'zig build test-sim -Doptimize=ReleaseSafe -Dtest-filter='duplicate install_snapshot delivery is idempotent'zig build test-sim -Doptimize=ReleaseSafe -Dtest-filter='dropped request_vote replies recover on the next election round'zig build test-sim -Doptimize=ReleaseSafe -Dtest-filter='5-node mixed transport faults still commit with quorum and repair laggards later'zig build test-sim -Doptimize=ReleaseSafe -Dtest-filter='snapshot retry after follower restart resumes replication cleanly'