Skip to content

feat: add revoke and reclaim lease transitions#92

Merged
skel84 merged 3 commits intomainfrom
issue-85-m9-revoke-impl
Mar 19, 2026
Merged

feat: add revoke and reclaim lease transitions#92
skel84 merged 3 commits intomainfrom
issue-85-m9-revoke-impl

Conversation

@skel84
Copy link
Owner

@skel84 skel84 commented Mar 19, 2026

Summary

  • add core Revoke and Reclaim commands with active -> revoking -> revoked semantics and lease-epoch bumps on revoke
  • keep resources non-reusable until explicit reclaim, with snapshot/recovery/api codec support for the new states
  • add regression coverage for duplicate revoke/reclaim, stale holders after revoke, snapshot/recovery preservation, and failover-safe non-reuse until reclaim

Closes #85.

Validation

  • ./scripts/preflight.sh

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

Warning

Rate limit exceeded

@skel84 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 45 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 45a30472-9ca7-4183-ba57-85011963dc25

📥 Commits

Reviewing files that changed from the base of the PR and between 1ccad80 and a9bc1b1.

📒 Files selected for processing (18)
  • crates/allocdb-core/src/command.rs
  • crates/allocdb-core/src/command_codec.rs
  • crates/allocdb-core/src/recovery.rs
  • crates/allocdb-core/src/recovery_revoke_tests.rs
  • crates/allocdb-core/src/snapshot_codec.rs
  • crates/allocdb-core/src/snapshot_tests.rs
  • crates/allocdb-core/src/state_machine.rs
  • crates/allocdb-core/src/state_machine_apply.rs
  • crates/allocdb-core/src/state_machine_invariants.rs
  • crates/allocdb-core/src/state_machine_reservation_invariants.rs
  • crates/allocdb-core/src/state_machine_revoke_apply.rs
  • crates/allocdb-core/src/state_machine_revoke_tests.rs
  • crates/allocdb-core/src/state_machine_slots.rs
  • crates/allocdb-node/src/api_codec.rs
  • crates/allocdb-node/src/api_tests.rs
  • crates/allocdb-node/src/bin/allocdb-jepsen/events.rs
  • crates/allocdb-node/src/replicated_simulation_tests.rs
  • docs/status.md

Walkthrough

Adds explicit Revoke and Reclaim commands with new Revoking/Revoked states, updates fingerprints and wire/snapshot codecs, implements state-machine handlers and helpers for revoke/reclaim, extends invariants and slot checks, and adds recovery/replication/tests and API/replicated-simulation coverage.

Changes

Cohort / File(s) Summary
Command Definition & Codec
crates/allocdb-core/src/command.rs, crates/allocdb-core/src/command_codec.rs
Added Command::Revoke and Command::Reclaim, updated fingerprint() discriminants, added wire tags/decoding for new commands and legacy-expire compatibility, and added round-trip tests.
State Machine Surface
crates/allocdb-core/src/state_machine.rs
Added ResourceState::Revoking, ReservationState::Revoking/Revoked, and dispatch for Revoke/Reclaim.
Revoke/Reclaim Implementation
crates/allocdb-core/src/state_machine_revoke_apply.rs
New handlers apply_revoke/apply_reclaim, helpers mark_member_resources/release_member_resources, retirement enqueueing, lease_epoch/releases/retire_after_slot handling, and result-code paths.
Apply-path Refactors
crates/allocdb-core/src/state_machine_apply.rs
Replaced per-member inline resource mutations with helper calls; expanded invalid/terminal checks to include revoking/revoked.
Invariants & Reservation Validation
crates/allocdb-core/src/state_machine_invariants.rs, crates/allocdb-core/src/state_machine_reservation_invariants.rs
Treat Revoking as active (resource↔reservation consistency) and Revoked as terminal; added explicit checks for revoking/revoked mappings.
Slot Validation
crates/allocdb-core/src/state_machine_slots.rs
Applied retire-after-slot overflow validation to Reclaim; kept Revoke in no-op arm.
Snapshot & Recovery
crates/allocdb-core/src/snapshot_codec.rs, crates/allocdb-core/src/recovery.rs, crates/allocdb-core/src/recovery_revoke_tests.rs
Snapshot codec extended for Revoking/Revoked; added WAL-backed recovery tests for revoke/reclaim sequences.
State Machine Tests & Snapshots
crates/allocdb-core/src/snapshot_tests.rs, crates/allocdb-core/src/state_machine_revoke_tests.rs
Added snapshot round-trip and unit tests exercising revoke→reclaim flows, epoch bumping, stale rejection, resource-busy behavior, reclaim semantics, and idempotency.
Node API & Tests
crates/allocdb-node/src/api_codec.rs, crates/allocdb-node/src/api_tests.rs
API codec mappings extended: ResourceState::Revoking = 4, ReservationState::Revoking = 5, Revoked = 6; tests updated to include new response cases.
Replication / Simulation Tests
crates/allocdb-node/src/bin/allocdb-jepsen/events.rs, crates/allocdb-node/src/replicated_simulation_tests.rs
Jepsen mapping updated and a failover integration test added to ensure revoke persists across failover until reclaim.
Docs
docs/status.md
Updated M9-T08 status and documented implemented revoke/reclaim scope and validation targets.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant SM as StateMachine
    participant RS as Reservation
    participant R as Resources
    participant Q as RetirementQueue

    Client->>SM: Command::Revoke(reservation_id)
    SM->>RS: validate exists && state == Confirmed
    Note right of RS: Confirmed → Revoking\n(lease_epoch++)
    SM->>R: mark_member_resources(reservation_id, Revoking)
    SM-->>Client: ResultCode::Ok

    Client->>SM: Command::Reclaim(reservation_id)
    SM->>RS: validate state == Revoking && not retired
    Note right of RS: Revoking → Revoked\n(set released_lsn, retire_after_slot)
    SM->>R: release_member_resources(reservation_id) -> Available
    SM->>Q: enqueue_retirement(reservation_id, retire_after_slot)
    SM-->>Client: ResultCode::Ok
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.94% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main change: addition of revoke and reclaim lease state transitions.
Description check ✅ Passed The description covers summary, linked issue, changes via validation script, and docs updates; follows the template structure and provides sufficient context.
Linked Issues check ✅ Passed All acceptance criteria from #85 are satisfied: revoke transitions are logged/replayable, duplicates are deterministic, revoked resources stay non-reusable across crash/failover/retry.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing revoke/reclaim transitions with supporting codec, state, and recovery logic per #85; no unrelated work detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-85-m9-revoke-impl
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
crates/allocdb-core/src/state_machine_revoke_apply.rs (1)

35-38: Invariant check only validates anchor resource, not all bundle members.

The assert! on line 36 only validates that reservation.resource_id (the anchor resource) exists. For bundle reservations with multiple members, if any non-anchor member resource is unexpectedly missing, the panic will occur later in mark_member_resources with a less specific error message.

This is likely acceptable since all member resources should always exist if the anchor exists (structural invariant), but consider adding a comment explaining that mark_member_resources will also assert member presence, or expanding validation here for clearer diagnostics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/allocdb-core/src/state_machine_revoke_apply.rs` around lines 35 - 38,
The current assert only checks that self.resources contains
reservation.resource_id (the anchor) which can hide missing non-anchor bundle
members until mark_member_resources runs; either extend this validation here to
iterate reservation.members (or the bundle member list used by
mark_member_resources) and assert each member resource exists in self.resources,
or add a concise comment on the assert mentioning that mark_member_resources
will perform/assume per-member presence checks and will panic with a more
specific message if a non-anchor member is missing (reference the assert! on
reservation.resource_id and the mark_member_resources function to locate the
logic).
crates/allocdb-core/src/state_machine_revoke_tests.rs (1)

1-306: Consider adding edge-case tests for additional negative paths.

The test coverage is solid for the core acceptance criteria. For completeness, consider adding tests for:

  • revoke on a non-existent reservation (expect ReservationNotFound)
  • revoke on already Revoking or Revoked states (expect InvalidState)
  • Double reclaim on same reservation (expect ReservationRetired or InvalidState)

These would strengthen regression coverage. Based on learnings: "Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/allocdb-core/src/state_machine_revoke_tests.rs` around lines 1 - 306,
Add negative-path tests in state_machine_revoke_tests.rs that call
AllocDb::apply_client with Command::Revoke and Command::Reclaim to exercise
these edge cases: (1) revoke a non-existent reservation id (use
ReservationId(999) or similar) and assert ResultCode::ReservationNotFound; (2)
revoke a reservation already in ReservationState::Revoking and one already in
ReservationState::Revoked (drive to those states using existing
reserve/confirm/revoke/reclaim flows) and assert ResultCode::InvalidState; and
(3) call Reclaim twice on the same reservation and assert the second call
returns ReservationRetired or ResultCode::InvalidState as appropriate. Use the
same helpers (create, reserve_bundle, context, AllocDb::new, ClientRequest,
apply_client) and existing symbols (Command::Revoke, Command::Reclaim,
ReservationId, ResultCode, ReservationState) so tests integrate consistently
with the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/allocdb-core/src/command_codec.rs`:
- Around line 112-125: The wire tag for Command::Expire was changed causing old
WAL frames to mis-decode; revert/ensure the Expire variant uses the legacy tag
value (6) in the encoder/decoder where Command::Revoke and Command::Reclaim are
serialized (the push byte values around the Command::Revoke/Reclaim/Expire match
in the encoder) and update the decoder branch in decode_internal_command so tag
6 maps to Command::Expire (consuming reservation_id and deadline_slot) while
Revoke uses its original tag; add a fixed-input regression test that feeds a raw
legacy tag-6 Expire payload (reservation_id + deadline_slot bytes) into your
decoder to assert it produces Command::Expire and does not leave unread bytes or
return InvalidLayout.

In `@crates/allocdb-node/src/bin/allocdb-jepsen/events.rs`:
- Around line 350-358: Add a focused unit test that exercises the mapping of
ReservationState->JepsenReservationState: create a Reservation (or the minimal
input expected by the function that contains the shown match/convert logic) with
state=ReservationState::Revoking and assert the converter returns
JepsenReservationState::Active with confirmed == true and correct
resource_id/holder_id/deadline_slot; also create a Reservation with
state=ReservationState::Revoked and assert the converter returns
JepsenReservationState::Released with the expected released_lsn. Locate and call
the function that performs the match (the code containing
ReservationState::Confirmed | ReservationState::Revoking => ... and
ReservationState::Released | ReservationState::Expired |
ReservationState::Revoked => ...) and add these two assertions as unit tests to
the same module so regression changes fail fast.

---

Nitpick comments:
In `@crates/allocdb-core/src/state_machine_revoke_apply.rs`:
- Around line 35-38: The current assert only checks that self.resources contains
reservation.resource_id (the anchor) which can hide missing non-anchor bundle
members until mark_member_resources runs; either extend this validation here to
iterate reservation.members (or the bundle member list used by
mark_member_resources) and assert each member resource exists in self.resources,
or add a concise comment on the assert mentioning that mark_member_resources
will perform/assume per-member presence checks and will panic with a more
specific message if a non-anchor member is missing (reference the assert! on
reservation.resource_id and the mark_member_resources function to locate the
logic).

In `@crates/allocdb-core/src/state_machine_revoke_tests.rs`:
- Around line 1-306: Add negative-path tests in state_machine_revoke_tests.rs
that call AllocDb::apply_client with Command::Revoke and Command::Reclaim to
exercise these edge cases: (1) revoke a non-existent reservation id (use
ReservationId(999) or similar) and assert ResultCode::ReservationNotFound; (2)
revoke a reservation already in ReservationState::Revoking and one already in
ReservationState::Revoked (drive to those states using existing
reserve/confirm/revoke/reclaim flows) and assert ResultCode::InvalidState; and
(3) call Reclaim twice on the same reservation and assert the second call
returns ReservationRetired or ResultCode::InvalidState as appropriate. Use the
same helpers (create, reserve_bundle, context, AllocDb::new, ClientRequest,
apply_client) and existing symbols (Command::Revoke, Command::Reclaim,
ReservationId, ResultCode, ReservationState) so tests integrate consistently
with the file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 50dce838-cf6b-4275-ab51-32e3c3a7b71d

📥 Commits

Reviewing files that changed from the base of the PR and between 1ccad80 and 20081cb.

📒 Files selected for processing (18)
  • crates/allocdb-core/src/command.rs
  • crates/allocdb-core/src/command_codec.rs
  • crates/allocdb-core/src/recovery.rs
  • crates/allocdb-core/src/recovery_revoke_tests.rs
  • crates/allocdb-core/src/snapshot_codec.rs
  • crates/allocdb-core/src/snapshot_tests.rs
  • crates/allocdb-core/src/state_machine.rs
  • crates/allocdb-core/src/state_machine_apply.rs
  • crates/allocdb-core/src/state_machine_invariants.rs
  • crates/allocdb-core/src/state_machine_reservation_invariants.rs
  • crates/allocdb-core/src/state_machine_revoke_apply.rs
  • crates/allocdb-core/src/state_machine_revoke_tests.rs
  • crates/allocdb-core/src/state_machine_slots.rs
  • crates/allocdb-node/src/api_codec.rs
  • crates/allocdb-node/src/api_tests.rs
  • crates/allocdb-node/src/bin/allocdb-jepsen/events.rs
  • crates/allocdb-node/src/replicated_simulation_tests.rs
  • docs/status.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: semgrep-cloud-platform/scan
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level: error for invariant breaks, corruption, and failed operations that require intervention; warn for degraded but expected conditions such as overload, lag, or rejected requests; info for meaningful lifecycle and state-transition events; debug for detailed execution traces useful in development; trace only for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.

Files:

  • crates/allocdb-node/src/bin/allocdb-jepsen/events.rs
  • crates/allocdb-core/src/state_machine_invariants.rs
  • crates/allocdb-core/src/state_machine_slots.rs
  • crates/allocdb-core/src/snapshot_codec.rs
  • crates/allocdb-core/src/snapshot_tests.rs
  • crates/allocdb-core/src/recovery.rs
  • crates/allocdb-core/src/command.rs
  • crates/allocdb-node/src/api_codec.rs
  • crates/allocdb-core/src/state_machine.rs
  • crates/allocdb-core/src/state_machine_reservation_invariants.rs
  • crates/allocdb-core/src/command_codec.rs
  • crates/allocdb-node/src/api_tests.rs
  • crates/allocdb-core/src/recovery_revoke_tests.rs
  • crates/allocdb-core/src/state_machine_apply.rs
  • crates/allocdb-node/src/replicated_simulation_tests.rs
  • crates/allocdb-core/src/state_machine_revoke_tests.rs
  • crates/allocdb-core/src/state_machine_revoke_apply.rs
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.

Files:

  • docs/status.md
docs/status.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep docs/status.md current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Files:

  • docs/status.md
🧠 Learnings (2)
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to **/*.rs : Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.

Applied to files:

  • crates/allocdb-core/src/recovery.rs
  • crates/allocdb-core/src/recovery_revoke_tests.rs
  • crates/allocdb-core/src/state_machine_revoke_tests.rs
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to docs/status.md : Keep [`docs/status.md`](./docs/status.md) current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Applied to files:

  • docs/status.md
🧬 Code graph analysis (4)
crates/allocdb-node/src/bin/allocdb-jepsen/events.rs (1)
crates/allocdb-core/src/state_machine_metrics.rs (1)
  • reservation (48-68)
crates/allocdb-node/src/replicated_simulation_tests.rs (5)
crates/allocdb-core/src/command_codec.rs (1)
  • encode_client_request (12-19)
crates/allocdb-node/src/engine_checkpoint_tests.rs (2)
  • reserve (55-65)
  • confirm (67-77)
crates/allocdb-node/src/engine_tests.rs (2)
  • reserve (62-77)
  • confirm (79-89)
crates/allocdb-node/src/simulation_tests.rs (1)
  • reserve (51-66)
crates/allocdb-node/src/engine_issue_31_tests.rs (1)
  • reserve (65-80)
crates/allocdb-core/src/state_machine_revoke_tests.rs (4)
crates/allocdb-core/src/recovery_revoke_tests.rs (1)
  • config (24-36)
crates/allocdb-core/src/snapshot_tests.rs (2)
  • config (11-23)
  • context (25-30)
crates/allocdb-core/src/state_machine.rs (1)
  • new (160-199)
crates/allocdb-core/src/state_machine_metrics.rs (2)
  • reservation (48-68)
  • resource (36-38)
crates/allocdb-core/src/state_machine_revoke_apply.rs (2)
crates/allocdb-core/src/state_machine_metrics.rs (2)
  • reservation (48-68)
  • resource (36-38)
crates/allocdb-core/src/state_machine_slots.rs (1)
  • slot_overflow_outcome (38-49)
🔇 Additional comments (25)
crates/allocdb-core/src/snapshot_tests.rs (1)

125-204: LGTM! Comprehensive snapshot round-trip test for the new revoke/reclaim lifecycle.

The test correctly validates that both Revoking and Revoked states survive snapshot encode/decode cycles. The progression through CreateResourceReserveBundleConfirmRevoke → snapshot/restore → Reclaim → snapshot/restore ensures the full lifecycle is exercised.

crates/allocdb-core/src/state_machine.rs (2)

63-79: LGTM! State enum extensions follow the existing pattern.

The new Revoking state for resources and Revoking/Revoked states for reservations are correctly positioned in the enums, maintaining the logical state progression.


224-225: LGTM! Command dispatch correctly delegates to new apply methods.

The Revoke command correctly omits context (since it doesn't need slot-derived fields), while Reclaim passes context for retirement scheduling. This aligns with the pattern used by Release and Expire.

crates/allocdb-node/src/api_tests.rs (2)

178-222: LGTM! API codec test coverage for new revoke/reclaim states.

The test cases correctly model the expected field states:

  • Revoking reservation: released_lsn: None, retire_after_slot: None (not yet terminal)
  • Revoked reservation: released_lsn: Some(...), retire_after_slot: Some(...) (terminal state with retirement scheduled)

This aligns with the lifecycle semantics.


271-282: LGTM! Clean refactoring improves test organization.

Splitting response cases into submit_response_cases, read_response_cases, and maintenance_response_cases improves maintainability without changing test semantics.

crates/allocdb-core/src/state_machine_reservation_invariants.rs (2)

13-15: LGTM! Invariant classification correctly models the revoke lifecycle.

Revoking is correctly treated as active (no retire_after_slot) since the resource is still held pending explicit reclaim. Revoked is correctly terminal with required retire_after_slot since the reservation is complete and awaiting history window expiry.

Also applies to: 24-24


161-173: LGTM! Resource-reservation state coupling invariant for Revoking.

The invariant correctly requires that when a reservation is in Revoking state, all member resources must be in ResourceState::Revoking with matching current_reservation_id. This prevents state inconsistencies during the revoke phase.

docs/status.md (1)

36-40: LGTM! Documentation correctly tracks implementation progress.

The status file accurately reflects:

  • New revoke/reclaim operations in the allocator surface
  • Active branch validation scope
  • Clear T08/T09 boundary for deferred work

Based on learnings: "Keep docs/status.md current as the single-file progress snapshot for the repository."

Also applies to: 50-50, 209-219

crates/allocdb-core/src/state_machine_apply.rs (3)

176-186: LGTM! Release correctly rejects reservations in revoke lifecycle.

Adding Revoking and Revoked to the invalid-state rejection is correct: once revocation has begun (or completed), the holder cannot release since they've already been fenced out by the revoke operation.


244-258: LGTM! Expire correctly treats revoke states as noop.

Revoking and Revoked reservations are correctly excluded from TTL expiration since:

  1. Revoking transitioned from Confirmed which already unscheduled the expiration
  2. Revoked is terminal and shouldn't be expired

114-114: LGTM! Helper method refactoring improves maintainability.

Extracting mark_member_resources and release_member_resources helpers reduces duplication between confirm/release/expire paths and the new revoke/reclaim paths.

Also applies to: 208-208, 285-285

crates/allocdb-core/src/recovery_revoke_tests.rs (2)

47-104: LGTM! Recovery test correctly validates revoking state preservation.

The test verifies critical revoking-state invariants after WAL replay:

  • lease_epoch bumped to 2 (revoke increments epoch)
  • released_lsn is None (revoking is not terminal)
  • Resource remains in Revoking state with reservation link intact

This ensures the "late-not-early" reuse guarantee survives recovery.


106-172: LGTM! Recovery test correctly validates revoked state preservation.

The test verifies terminal revoked-state semantics after WAL replay:

  • released_lsn set to the reclaim LSN
  • retire_after_slot = Slot(7) (request_slot=3 + history_window=4)
  • Resource transitions to Available with cleared reservation link

Based on learnings: "Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage."

crates/allocdb-node/src/replicated_simulation_tests.rs (3)

54-101: LGTM! Payload helpers follow established patterns.

The new reserve_payload, confirm_payload, revoke_payload, and reclaim_payload helpers mirror the existing create_payload pattern and provide clean test construction for reservation lifecycle commands.


343-351: LGTM! Helper for full backup commit.

commit_to_all_backups correctly chains the prepare/ack/commit delivery to backup 2 and the prepare/commit delivery to backup 3. This ensures majority-committed state before failover scenarios.


1359-1460: Excellent regression test for the core safety property.

This test directly validates the PR acceptance criteria: "Revoked ownership must not be reusable early under crash, retry, or failover scenarios." The test flow:

  1. Establishes confirmed reservation across all replicas
  2. Commits revoke to majority (backup 2), prepares to backup 3
  3. Crashes primary, completes view change
  4. Verifies ResourceState::Revoking persists with ReservationId link
  5. Critically: early reserve attempt returns ResourceBusy
  6. Only after explicit Reclaim does resource become Available

This is exactly the "late-not-early" reuse guarantee the issue requires.

crates/allocdb-core/src/state_machine_revoke_tests.rs (6)

7-53: LGTM! Well-structured test helpers.

The helper functions provide clean abstractions for creating test fixtures. The config() function appropriately sets max_bundle_size: 4 to support bundle tests, differing from other test modules that use max_bundle_size: 1.


55-105: LGTM! Comprehensive revoke transition test.

Good invariant coverage: verifies epoch bump (1→2), state transitions for both reservation and member resources, and confirms released_lsn/retire_after_slot remain None until reclaim.


107-162: LGTM! Critical negative-path coverage for stale epoch rejection.

This test directly validates the acceptance criteria that stale holder commands are rejected after revoke. Both Confirm and Release with the pre-revoke lease_epoch correctly return StaleEpoch.


164-234: LGTM! Validates the core "no early reuse" safety guarantee.

This test directly addresses the acceptance criteria: "Revoked ownership must not be reusable early." The sequence demonstrates that resources in Revoking state correctly return ResourceBusy, and only Reclaim transitions them to Available.


236-265: LGTM! Negative-path test for state ordering.

Validates that the state machine enforces the required Confirmed → Revoking → Revoked transition order.


267-306: LGTM! Validates deterministic duplicate handling.

Directly addresses the acceptance criteria: "Duplicate revoke requests must be handled deterministically." The test confirms that replaying the same operation_id with different LSN contexts returns cached results.

crates/allocdb-core/src/state_machine_revoke_apply.rs (3)

57-114: LGTM! Correct reclaim implementation.

The function properly:

  • Validates preconditions (exists, not retired, state is Revoking)
  • Handles slot overflow gracefully
  • Sets released_lsn from context for WAL recovery support
  • Queues retirement for eventual cleanup
  • Follows logging guidelines (warn! for rejections, debug! for success)

116-138: LGTM! Clean reusable helper for member resource state updates.

The version increment on each resource update is important for change detection. The parameterized state allows reuse across different transitions (Revoking, etc.).


140-159: LGTM! Correctly releases all member resources.

The function properly clears current_reservation_id and sets state to Available, enabling safe reuse after reclaim.

@skel84
Copy link
Owner Author

skel84 commented Mar 19, 2026

Addressed CodeRabbit feedback:

  • restored the legacy internal Expire wire tag and added a fixed-input decode regression for legacy WAL payloads
  • added focused Jepsen mapping tests for Revoking and Revoked reservation states

Validation rerun: ./scripts/preflight.sh

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

@skel84: Thank you for the update! I'll kick off a fresh review now, covering the restored legacy Expire wire tag with its decode regression test and the new Jepsen mapping tests for Revoking/Revoked states alongside the rest of the changes.


🧠 Learnings used
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Treat CodeRabbit as part of the required review path when it is enabled on the repository. Wait for its status to complete before merge. If it completes without a visible review comment or review thread, request visible output with `coderabbitai summary`. Address every substantive CodeRabbit comment explicitly before merge by either applying the change or documenting why it is not being applied. Apply correctness, safety, recovery, test, and docs-alignment feedback by default; document why you reject suggestions that would weaken determinism, boundedness, or trusted-core discipline.
✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
crates/allocdb-core/src/command_codec.rs (1)

273-286: ⚠️ Potential issue | 🟡 Minor

Pin the new wire tags with fixed-byte tests.

This only proves encoder/decoder self-consistency. If 7 and 8 are renumbered together later, the test still passes while WAL frames written by this release become unreadable after upgrade. Please add raw-byte assertions for both variants.

🔧 Suggested test tightening
 #[test]
 fn revoke_and_reclaim_round_trip() {
-    for command in [
-        Command::Revoke {
-            reservation_id: ReservationId(41),
-        },
-        Command::Reclaim {
-            reservation_id: ReservationId(42),
-        },
-    ] {
-        let decoded = decode_internal_command(&encode_internal_command(&command)).unwrap();
+    for (command, reservation_id, tag) in [
+        (
+            Command::Revoke {
+                reservation_id: ReservationId(41),
+            },
+            ReservationId(41),
+            7,
+        ),
+        (
+            Command::Reclaim {
+                reservation_id: ReservationId(42),
+            },
+            ReservationId(42),
+            8,
+        ),
+    ] {
+        let encoded = encode_internal_command(&command);
+        let mut expected = vec![tag];
+        expected.extend_from_slice(&reservation_id.get().to_le_bytes());
+        assert_eq!(encoded, expected);
+        let decoded = decode_internal_command(&expected).unwrap();
         assert_eq!(decoded, command);
     }
 }

As per coding guidelines, **/*.rs: "Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/allocdb-core/src/command_codec.rs` around lines 273 - 286, The
existing test revoke_and_reclaim_round_trip only checks encoder/decoder
round-trip; add deterministic raw-byte assertions to pin the wire tags so future
renumbering breaks tests. For Command::Revoke and Command::Reclaim (the variants
used in the test) compute their encoded bytes via encode_internal_command and
assert equality against hard-coded byte slices that reflect the current wire
tags (e.g., the exact bytes that encode reservation_id and the tag values
currently used, including the tag bytes for 7 and 8); keep the round-trip decode
assertions but add assert_eq!(encoded, expected_bytes) for each variant to lock
in the on-wire format for encode_internal_command and ensure
decode_internal_command still parses those exact bytes. Use the same helper
functions encode_internal_command/decode_internal_command and update test
names/comments accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/allocdb-core/src/command_codec.rs`:
- Around line 273-286: The existing test revoke_and_reclaim_round_trip only
checks encoder/decoder round-trip; add deterministic raw-byte assertions to pin
the wire tags so future renumbering breaks tests. For Command::Revoke and
Command::Reclaim (the variants used in the test) compute their encoded bytes via
encode_internal_command and assert equality against hard-coded byte slices that
reflect the current wire tags (e.g., the exact bytes that encode reservation_id
and the tag values currently used, including the tag bytes for 7 and 8); keep
the round-trip decode assertions but add assert_eq!(encoded, expected_bytes) for
each variant to lock in the on-wire format for encode_internal_command and
ensure decode_internal_command still parses those exact bytes. Use the same
helper functions encode_internal_command/decode_internal_command and update test
names/comments accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7429367c-2df1-4fb0-a901-35a657fc7545

📥 Commits

Reviewing files that changed from the base of the PR and between 20081cb and b0ffc6c.

📒 Files selected for processing (2)
  • crates/allocdb-core/src/command_codec.rs
  • crates/allocdb-node/src/bin/allocdb-jepsen/events.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/allocdb-node/src/bin/allocdb-jepsen/events.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: semgrep-cloud-platform/scan
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level: error for invariant breaks, corruption, and failed operations that require intervention; warn for degraded but expected conditions such as overload, lag, or rejected requests; info for meaningful lifecycle and state-transition events; debug for detailed execution traces useful in development; trace only for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.

Files:

  • crates/allocdb-core/src/command_codec.rs
🔇 Additional comments (2)
crates/allocdb-core/src/command_codec.rs (2)

112-119: Good call on preserving the legacy tag layout.

Appending Revoke and Reclaim after Expire keeps existing tag-6 frames decodable during replay.

Also applies to: 164-173


288-303: Nice regression guard for legacy Expire payloads.

Feeding a raw tag-6 frame through the decoder is exactly the right coverage for replay compatibility.

@skel84
Copy link
Owner Author

skel84 commented Mar 19, 2026

Addressed the remaining CodeRabbit feedback by pinning the Revoke / Reclaim internal wire tags with raw-byte assertions in command_codec tests.

Validation rerun: ./scripts/preflight.sh

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

@skel84: Thank you for the follow-up! Pinning the Revoke/Reclaim internal wire tags with raw-byte assertions is a solid correctness anchor — it ensures those codec contracts remain stable and any accidental tag drift will be caught immediately. Kicking off a fresh full review now.


🧠 Learnings used
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Treat CodeRabbit as part of the required review path when it is enabled on the repository. Wait for its status to complete before merge. If it completes without a visible review comment or review thread, request visible output with `coderabbitai summary`. Address every substantive CodeRabbit comment explicitly before merge by either applying the change or documenting why it is not being applied. Apply correctness, safety, recovery, test, and docs-alignment feedback by default; document why you reject suggestions that would weaken determinism, boundedness, or trusted-core discipline.

Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: After review-driven edits, rerun the relevant validation commands before considering the work ready to merge.
✅ Actions performed

Full review triggered.

@skel84 skel84 merged commit efaf307 into main Mar 19, 2026
2 checks passed
@skel84 skel84 deleted the issue-85-m9-revoke-impl branch March 19, 2026 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

M9-T08 Implement explicit revoke and safe reuse rules

1 participant