Skip to content

test: preserve lease primitives across replication and failover#94

Merged
skel84 merged 1 commit intomainfrom
issue-87-m9-replication-lease
Mar 19, 2026
Merged

test: preserve lease primitives across replication and failover#94
skel84 merged 1 commit intomainfrom
issue-87-m9-replication-lease

Conversation

@skel84
Copy link
Owner

@skel84 skel84 commented Mar 19, 2026

Summary

Validation

  • cargo test -p allocdb-node replicated_simulation -- --nocapture
  • cargo test -p allocdb-node replica -- --nocapture
  • ./scripts/preflight.sh

Closes #87

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

Warning

Rate limit exceeded

@skel84 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 9 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 999d43e1-6b3e-4b44-81a5-502afc285f09

📥 Commits

Reviewing files that changed from the base of the PR and between 6f04a5a and 77937b4.

📒 Files selected for processing (2)
  • crates/allocdb-node/src/replicated_simulation_tests.rs
  • docs/status.md

Walkthrough

The PR adds bundle reservation test infrastructure and a new failover simulation test to verify bundle membership state preservation across quorum replication and view change, alongside documentation updates reflecting completion of M9-T09 and shift to M9-T10 objectives.

Changes

Cohort / File(s) Summary
Bundle Failover Simulation Tests
crates/allocdb-node/src/replicated_simulation_tests.rs
Introduced bundle_core_config() to configure max bundle size; added request-encoding helpers reserve_bundle_payload() and release_payload(); added replica_reservation_member_ids() to read reservation state; introduced new test committed_bundle_membership_survives_failover_and_suffix_rejoin() that verifies bundle reservation and membership preservation across primary crash, view change, and rejoin replay; extended committed_revoke_stays_non_reusable_across_failover_until_reclaim() to assert stale holder release yields ResultCode::StaleEpoch.
Project Status Documentation
docs/status.md
Updated "Current Focus" section to reflect M9-T09 completion and shift active implementation focus to issue #87 / M9-T10, reframing validation targets toward replicated-path test coverage (replicated_simulation, replica, and preflight.sh).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • PR #89: Introduces Command::ReserveBundle, bundle encoding/decoding, and max_bundle_size config that the new tests directly exercise.
  • PR #92: Extends the same test file with revoke/reclaim payloads and directly touches the test being modified in this PR.
  • PR #61: Introduces the replicated simulation harness and test infrastructure that these new bundle tests build upon.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.53% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding test coverage for preserving lease primitives across replication and failover.
Description check ✅ Passed The description covers the summary, validation steps, and linked issue. All key sections from the template are adequately addressed with concrete details about the changes.
Linked Issues check ✅ Passed The code changes directly address all acceptance criteria from issue #87: added bundle membership failover test, extended revoke coverage for stale-holder rejection, and verified no second apply path was introduced.
Out of Scope Changes check ✅ Passed All changes are directly related to issue #87 scope: test coverage additions in replicated_simulation_tests.rs and status.md updates to mark the active implementation slice.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-87-m9-replication-lease
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@skel84
Copy link
Owner Author

skel84 commented Mar 19, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@skel84
Copy link
Owner Author

skel84 commented Mar 19, 2026

@coderabbitai summary

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

✅ Actions performed

Summary regeneration triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/allocdb-node/src/replicated_simulation_tests.rs (1)

1467-1481: Avoid hardcoded reservation identity/slot in assertions.

ReservationId(3) and current_slot=3 make this test brittle to unrelated sequencing changes. Prefer deriving the reservation id from the published result of "bundle-reserve" and using that in membership checks.

Suggested refactor
-    assert_eq!(first_member.current_reservation_id, Some(ReservationId(3)));
+    let bundle_result = harness
+        .published_result(bundle.lsn)
+        .expect("bundle reserve should publish after quorum commit");
+    let reservation_id = bundle_result
+        .outcome
+        .reservation_id
+        .expect("bundle reserve should allocate a reservation id");
+    assert_eq!(first_member.current_reservation_id, Some(reservation_id));

-    assert_eq!(second_member.current_reservation_id, Some(ReservationId(3)));
+    assert_eq!(second_member.current_reservation_id, Some(reservation_id));

-    assert_eq!(
-        replica_reservation_member_ids(&harness, 2, 3, 3),
+    assert_eq!(
+        replica_reservation_member_ids(&harness, 2, reservation_id.get(), 3),
         vec![ResourceId(81), ResourceId(82)]
     );

-    assert_eq!(
-        replica_reservation_member_ids(&harness, 3, 3, 3),
+    assert_eq!(
+        replica_reservation_member_ids(&harness, 3, reservation_id.get(), 3),
         vec![ResourceId(81), ResourceId(82)]
     );

Also applies to: 1487-1489

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/allocdb-node/src/replicated_simulation_tests.rs` around lines 1467 -
1481, The assertions hardcode ReservationId(3) and slot 3 making the test
brittle; instead extract the actual reservation id/slot from the bundle-reserve
publish result (the value produced when running the bundle reservation) and use
that variable in comparisons for first_member.current_reservation_id,
second_member.current_reservation_id, and the call to
replica_reservation_member_ids; update the test to capture that reservation id
(e.g. reserve_id or reservation_id) from the published bundle result (where
bundle.lsn is produced) and substitute that symbol in place of ReservationId(3)
and the literal slot 3 so membership checks reference the dynamically obtained
reservation id/slot.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/status.md`:
- Around line 211-220: Update the earlier milestone mention that still reads "M9
... T08 implementation in progress" in docs/status.md so the milestone state
matches the new block (which marks PR `#93` as M9-T09 merged and `#87` as M9-T10
active); locate the inconsistent milestone text and change the T08 reference to
T10 (and adjust any adjacent phrasing that implies T08 work) so the single-file
snapshot consistently reflects M9-T10 as the active implementation slice and
M9-T09 as merged.

---

Nitpick comments:
In `@crates/allocdb-node/src/replicated_simulation_tests.rs`:
- Around line 1467-1481: The assertions hardcode ReservationId(3) and slot 3
making the test brittle; instead extract the actual reservation id/slot from the
bundle-reserve publish result (the value produced when running the bundle
reservation) and use that variable in comparisons for
first_member.current_reservation_id, second_member.current_reservation_id, and
the call to replica_reservation_member_ids; update the test to capture that
reservation id (e.g. reserve_id or reservation_id) from the published bundle
result (where bundle.lsn is produced) and substitute that symbol in place of
ReservationId(3) and the literal slot 3 so membership checks reference the
dynamically obtained reservation id/slot.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7b977214-2a7d-4921-8d76-c4279cccdcdb

📥 Commits

Reviewing files that changed from the base of the PR and between 6f04a5a and 77937b4.

📒 Files selected for processing (2)
  • crates/allocdb-node/src/replicated_simulation_tests.rs
  • docs/status.md
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.

Files:

  • docs/status.md
docs/status.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep docs/status.md current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Files:

  • docs/status.md
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level: error for invariant breaks, corruption, and failed operations that require intervention; warn for degraded but expected conditions such as overload, lag, or rejected requests; info for meaningful lifecycle and state-transition events; debug for detailed execution traces useful in development; trace only for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.

Files:

  • crates/allocdb-node/src/replicated_simulation_tests.rs
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Treat CodeRabbit as part of the required review path when it is enabled on the repository. Wait for its status to complete before merge. If it completes without a visible review comment or review thread, request visible output with `coderabbitai summary`. Address every substantive CodeRabbit comment explicitly before merge by either applying the change or documenting why it is not being applied. Apply correctness, safety, recovery, test, and docs-alignment feedback by default; document why you reject suggestions that would weaken determinism, boundedness, or trusted-core discipline.
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to docs/status.md : Keep [`docs/status.md`](./docs/status.md) current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Applied to files:

  • docs/status.md
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to **/*.md : Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.

Applied to files:

  • docs/status.md
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Use the GitHub Project `AllocDB` as the operational work board. Keep planned work on the board, not only in milestone pages or local docs.

Applied to files:

  • docs/status.md
🧬 Code graph analysis (1)
crates/allocdb-node/src/replicated_simulation_tests.rs (4)
crates/allocdb-core/src/command_codec.rs (2)
  • encode_client_request (12-19)
  • new (184-186)
crates/allocdb-node/src/api_codec.rs (1)
  • new (671-673)
crates/allocdb-node/src/replicated_simulation.rs (3)
  • new (481-525)
  • replica (535-540)
  • current_slot (527-529)
crates/allocdb-core/src/state_machine_metrics.rs (1)
  • reservation (48-68)
🔇 Additional comments (3)
crates/allocdb-node/src/replicated_simulation_tests.rs (3)

32-37: Good helper extraction for bundle/failover test setup.

These additions keep the new replication tests focused and readable while reusing the existing request-encoding path.

Also applies to: 73-83, 102-117, 276-282


1423-1504: Strong failover regression coverage for bundle semantics.

This test meaningfully exercises commit, promotion, suffix-only rejoin, and post-failover conflict handling for bundle members.


1572-1588: Nice extension of revoke-failover semantics with stale release rejection.

The added stale-holder Release assertion plus operation-id progression improves negative-path confidence for M9-T10 without widening scope.

Also applies to: 1593-1593, 1610-1610

@skel84 skel84 merged commit b81a5d9 into main Mar 19, 2026
3 checks passed
@skel84 skel84 deleted the issue-87-m9-replication-lease branch March 19, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

M9-T10 Preserve lease primitives across replication and failover

1 participant