diff --git a/docs/README.md b/docs/README.md index a6dcbb6..8f10bf0 100644 --- a/docs/README.md +++ b/docs/README.md @@ -21,6 +21,7 @@ - [Jepsen Refactor Plan](./jepsen-refactor-plan.md) - [Lease Kernel Design Decisions](./lease-kernel-design.md) - [Lease Kernel Follow-On](./lease-kernel-follow-on.md) +- [Revoke Safety Slice](./revoke-safety-slice.md) - [Operator Runbook](./operator-runbook.md) - [KubeVirt Jepsen Report](./kubevirt-jepsen-report.md) - [Replication Notes](./replication.md) diff --git a/docs/design.md b/docs/design.md index 337b91c..46c4f81 100644 --- a/docs/design.md +++ b/docs/design.md @@ -55,6 +55,8 @@ Current implementation anchor: first-class leases, bundle ownership, fencing, and revoke - [lease-kernel-follow-on.md](./lease-kernel-follow-on.md): post-`M8` planning for minimal generic lease-kernel extensions such as bundle ownership, fencing, and revoke +- [revoke-safety-slice.md](./revoke-safety-slice.md): exact `M9-T08` scope for revoke, reclaim, + and safe-reuse behavior in the current implementation - [replication.md](./replication.md): deferred distributed design areas and boundaries - [roadmap.md](./roadmap.md): high-level roadmap and exit criteria - [work-breakdown.md](./work-breakdown.md): concrete units of work for the first implementation diff --git a/docs/revoke-safety-slice.md b/docs/revoke-safety-slice.md new file mode 100644 index 0000000..717b58d --- /dev/null +++ b/docs/revoke-safety-slice.md @@ -0,0 +1,194 @@ +# Revoke Safety Slice + +## Status + +Draft local planning baseline for `M9-T08` / issue `#85`. + +This document narrows the next implementation slice before code work starts. It is not a new +authoritative API surface; it is the execution plan for bringing the accepted revoke/reclaim +semantics into the current trusted-core implementation. + +## Purpose + +`M9-T07` established fencing and stale-holder rejection. The next kernel step is to withdraw +holder authority explicitly without permitting early reuse. + +The implementation question for `M9-T08` is narrower than the full lease-model transition: + +- how revoke enters the current core cleanly +- how reclaim becomes the only point where reuse is allowed +- how to keep the reservation-era compatibility surface from drifting away from the accepted + lease-centric semantics + +## Slice Goal + +Implement the minimum revoke/reclaim behavior needed to preserve the late-not-early reuse rule in +the current execution path, including the already-approved crash/retry/failover safety contract for +`M9-T08`. + +For this slice, "done" means: + +- the core can log and apply `revoke` and `reclaim` +- stale holders lose authority as soon as revoke commits +- resources remain unavailable until reclaim commits +- exact retries stay deterministic + +## In Scope + +`M9-T08` should include: + +1. one explicit revoke command in the trusted core +2. one explicit reclaim command in the trusted core +3. one live non-terminal state for revoked-but-not-yet-reusable ownership +4. one terminal revoked outcome that preserves history after reclaim +5. the minimum executor, persistence, and replay plumbing needed so committed revoke/reclaim + outcomes survive live apply, restart, and the existing failover contract +6. invariant, negative-path, retry, and crash-recovery tests for the new safety rule + +## Out Of Scope + +`M9-T08` should not expand into: + +- new replication protocol design, failover refactors, or replicated-surface expansion beyond what + is needed to preserve committed revoke/reclaim outcomes under the existing path +- WAL or snapshot reshaping beyond the exact command/state support required for revoke/reclaim +- broader public API and transport cleanup beyond the narrow compatibility bridge already required + by this slice +- heartbeat ingestion or wall-clock reclaim logic inside the state machine +- policy reasons or operator metadata attached to revoke/reclaim +- holder transfer or shared-resource semantics + +Those belong to later slices, primarily `M9-T09` through `M9-T11`. + +## Compatibility Rule + +The accepted model is lease-centric, but the current implementation is still reservation-centric in +spelling and data layout. + +For `M9-T08`, that bridge is allowed under one rule: + +- reservation-era names may remain temporarily, but revoke/reclaim behavior must match the + authoritative lease semantics exactly + +That means: + +- the current `reservation_id` may continue to serve as the implementation anchor for `lease_id` +- the existing `confirmed` state may continue as the compatibility spelling for authoritative + `active` +- the slice must not introduce reservation-era shortcuts that would be invalid in the final lease + model + +## Exact Command Semantics For This Slice + +### Revoke + +Implementation intent: + +```text +revoke(lease_id) +``` + +Current compatibility spelling may still route this through the reservation-era implementation, but +the effect must be: + +- precondition: live lease exists and is currently `active` +- success: lease moves to `revoking` +- success: `lease_epoch` increments immediately +- success: member resources stay unavailable and keep pointing at the same live owner +- success: resource state becomes `revoking` +- success: no retirement is scheduled yet + +Failure behavior: + +- `lease_not_found` if the lease never existed +- `lease_retired` if retained history says the live record is already gone +- `invalid_state` for `reserved`, `revoking`, `released`, `expired`, or `revoked` + +Duplicate behavior: + +- exact retry with the same `operation_id` must return the cached original result +- a later distinct revoke with a different `operation_id` must not invent a second success; once a + lease is already `revoking` or terminal, the answer is `invalid_state` + +### Reclaim + +Implementation intent: + +```text +reclaim(lease_id) +``` + +Effect: + +- precondition: live lease exists and is currently `revoking` +- success: lease moves to terminal `revoked` +- success: member resources return to `available` +- success: per-resource current owner pointers clear +- success: retirement is scheduled through the normal bounded history path + +Failure behavior: + +- `lease_not_found` +- `lease_retired` +- `invalid_state` for `reserved`, `active`, `released`, `expired`, or already `revoked` + +Duplicate behavior: + +- exact retry with the same `operation_id` must return the cached original result +- a later distinct reclaim on an already terminal record must not produce a second success + +## Required Safety Properties + +`M9-T08` must preserve these invariants: + +1. revoke removes holder authority before reuse is possible +2. reclaim is the only transition that makes a revoked resource reusable +3. active or revoking leases are never freed by timer +4. late external reclaim is acceptable; early reclaim is not +5. holder-authorized commands that arrive after revoke with the old epoch fail deterministically +6. replay of committed revoke/reclaim commands yields the same resource availability outcome + +## Implementation Boundaries + +The slice should be built in this order: + +1. add core state and command variants for revoke/reclaim +2. apply revoke/reclaim through the same executor path already used by reserve/confirm/release +3. add only the exact codec, snapshot, and recovery support required so live apply and replay + preserve committed revoke/reclaim outcomes +4. preserve the current no-early-reuse contract under crash, retry, and failover without broadening + the replication surface in this slice +5. add resource-state and lease-state invariants for `revoking` and `revoked` +6. add retry and stale-holder regression coverage + +Important boundary: + +- if broader WAL/snapshot cleanup, transport normalization, or replication-surface redesign becomes + necessary, keep only the revoke/reclaim unblocker here and defer the broader cleanup to `M9-T09` + and `M9-T10` + +## Tests This Slice Should Add + +Minimum test set: + +- revoke on active lease moves the lease to `revoking` and bumps `lease_epoch` +- revoke does not free member resources +- holder `release` or `confirm` with the old epoch after revoke fails deterministically +- reclaim from `revoking` returns resources to `available` and records terminal `revoked` history +- reclaim before revoke is `invalid_state` +- exact duplicate revoke and reclaim requests return cached committed results +- reserved, active, and revoking resources cannot be reused early +- crash/restart replay preserves `revoking` vs `revoked` outcomes +- committed revoke/reclaim outcomes preserve the same no-early-reuse behavior across the current + failover path + +## Exit Condition + +`M9-T08` is ready to hand off when: + +- the exact revoke/reclaim behavior above is implemented or explicitly mapped to narrower code + tasks +- `docs/status.md` points at `#85` instead of stale `#84` language +- the slice still satisfies the existing `M9-T08` crash/retry/failover acceptance criteria without + silently expanding into broader `M9-T09` or `M9-T10` cleanup +- later work is cleanly reserved for `M9-T09` through `M9-T11` diff --git a/docs/status.md b/docs/status.md index c5a070f..eb7726c 100644 --- a/docs/status.md +++ b/docs/status.md @@ -14,7 +14,7 @@ - `M6` replication design: implemented - `M7` replicated core prototype: in progress - `M8` external cluster validation: in progress - - `M9` generic lease-kernel follow-on: `T06` in progress on issue branch + - `M9` generic lease-kernel follow-on: `T07` merged, `T08` planning in progress on issue branch - Latest completed implementation chunks: - `4156a80` `Bootstrap AllocDB core and docs` - `f84a641` `Add WAL file and snapshot recovery primitives` @@ -32,8 +32,9 @@ with divergent prepared-suffix discard; promoted partition and primary-crash scenarios that preserve fail-closed behavior and retry/read continuity after failover; the local three-replica cluster runner, fault-control harness, and QEMU testbed around the real replica - daemon; and the first trusted-core bundle-commit slice with bundle membership, bundle-aware - confirm/release/expire, and bundle regression coverage + daemon; the first trusted-core bundle-commit slice with bundle membership, bundle-aware + confirm/release/expire, and bundle regression coverage; and the first fencing slice with + lease-epoch propagation, stale-holder rejection, and epoch-aware retry/read coverage ## What Exists @@ -85,9 +86,11 @@ - one draft lease-kernel design-decision document that chooses a first-class lease authority object, bundle size `1` as the single-resource semantic special case, a lease-scoped fencing token, and a two-stage `revoke -> reclaim` safety model - - one active authoritative-docs pass under issue `#80` that is rewriting semantics, API, + - one merged authoritative-docs pass under issue `#80` that rewrote semantics, API, architecture, and fault-model docs to the approved lease-centric contract while keeping the current reservation-centric implementation explicitly marked as compatibility surface + - one active `M9-T08` planning note that narrows revoke/reclaim implementation scope before the + next code-bearing lease-kernel branch - Replication design draft: - VSR-style primary/backup replicated log with fixed membership and majority quorums - primary-only reads in the first replicated release @@ -198,13 +201,15 @@ - PR `#89` merged `M9-T06` on `main`: the trusted core now supports atomic bundle reservation, explicit bundle membership records, bundle-aware confirm/release/expire, and bundle-aware snapshot/codec coverage while preserving the existing reservation compatibility surface -- issue `#84` / `M9-T07` is the active implementation slice on the current branch: lease epochs - now flow through holder-authorized commands and command outcomes, the core rejects stale holder - epochs deterministically, and read/retry surfaces expose the current authority token for active - reservations -- validation for the active `#84` branch currently includes - `cargo test -p allocdb-core -- --nocapture`, `cargo test -p allocdb-node api -- --nocapture`, - `cargo test -p allocdb-node engine -- --nocapture`, and - `cargo test --workspace --no-run` -- the next planned slices after `#84` remain `M9-T08` revoke/safe reuse, `M9-T09` persistence and - transport extension, `M9-T10` replication preservation, and `M9-T11` broader regression coverage +- PR `#90` merged `M9-T07` on `main`: lease epochs now flow through holder-authorized commands and + command outcomes, the core rejects stale holder epochs deterministically, and read/retry + surfaces expose the current authority token for active reservations +- issue `#85` / `M9-T08` is the active planning slice on the current branch: the local scope is + being narrowed to explicit `revoke` and `reclaim`, `revoking` as the only non-reusable + post-authority state, and the minimum replay-safe bridge from the reservation-era implementation + to the accepted lease-centric semantics +- the active `#85` planning branch is defining exactly what belongs in revoke/reclaim now versus + what stays deferred to `M9-T09` through `M9-T11`, especially around WAL/snapshot broadening, + transport cleanup, and replication preservation +- the next planned code-bearing slices after `#85` remain `M9-T09` persistence and transport + extension, `M9-T10` replication preservation, and `M9-T11` broader regression coverage