Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
- [Jepsen Refactor Plan](./jepsen-refactor-plan.md)
- [Lease Kernel Design Decisions](./lease-kernel-design.md)
- [Lease Kernel Follow-On](./lease-kernel-follow-on.md)
- [Revoke Safety Slice](./revoke-safety-slice.md)
- [Operator Runbook](./operator-runbook.md)
- [KubeVirt Jepsen Report](./kubevirt-jepsen-report.md)
- [Replication Notes](./replication.md)
Expand Down
2 changes: 2 additions & 0 deletions docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ Current implementation anchor:
first-class leases, bundle ownership, fencing, and revoke
- [lease-kernel-follow-on.md](./lease-kernel-follow-on.md): post-`M8` planning for minimal
generic lease-kernel extensions such as bundle ownership, fencing, and revoke
- [revoke-safety-slice.md](./revoke-safety-slice.md): exact `M9-T08` scope for revoke, reclaim,
and safe-reuse behavior in the current implementation
- [replication.md](./replication.md): deferred distributed design areas and boundaries
- [roadmap.md](./roadmap.md): high-level roadmap and exit criteria
- [work-breakdown.md](./work-breakdown.md): concrete units of work for the first implementation
Expand Down
194 changes: 194 additions & 0 deletions docs/revoke-safety-slice.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
# Revoke Safety Slice

## Status

Draft local planning baseline for `M9-T08` / issue `#85`.

This document narrows the next implementation slice before code work starts. It is not a new
authoritative API surface; it is the execution plan for bringing the accepted revoke/reclaim
semantics into the current trusted-core implementation.

## Purpose

`M9-T07` established fencing and stale-holder rejection. The next kernel step is to withdraw
holder authority explicitly without permitting early reuse.

The implementation question for `M9-T08` is narrower than the full lease-model transition:

- how revoke enters the current core cleanly
- how reclaim becomes the only point where reuse is allowed
- how to keep the reservation-era compatibility surface from drifting away from the accepted
lease-centric semantics

## Slice Goal

Implement the minimum revoke/reclaim behavior needed to preserve the late-not-early reuse rule in
the current execution path, including the already-approved crash/retry/failover safety contract for
`M9-T08`.

For this slice, "done" means:

- the core can log and apply `revoke` and `reclaim`
- stale holders lose authority as soon as revoke commits
- resources remain unavailable until reclaim commits
- exact retries stay deterministic

## In Scope

`M9-T08` should include:

1. one explicit revoke command in the trusted core
2. one explicit reclaim command in the trusted core
3. one live non-terminal state for revoked-but-not-yet-reusable ownership
4. one terminal revoked outcome that preserves history after reclaim
5. the minimum executor, persistence, and replay plumbing needed so committed revoke/reclaim
outcomes survive live apply, restart, and the existing failover contract
6. invariant, negative-path, retry, and crash-recovery tests for the new safety rule

## Out Of Scope

`M9-T08` should not expand into:

- new replication protocol design, failover refactors, or replicated-surface expansion beyond what
is needed to preserve committed revoke/reclaim outcomes under the existing path
- WAL or snapshot reshaping beyond the exact command/state support required for revoke/reclaim
- broader public API and transport cleanup beyond the narrow compatibility bridge already required
by this slice
- heartbeat ingestion or wall-clock reclaim logic inside the state machine
- policy reasons or operator metadata attached to revoke/reclaim
- holder transfer or shared-resource semantics

Those belong to later slices, primarily `M9-T09` through `M9-T11`.

## Compatibility Rule

The accepted model is lease-centric, but the current implementation is still reservation-centric in
spelling and data layout.

For `M9-T08`, that bridge is allowed under one rule:

- reservation-era names may remain temporarily, but revoke/reclaim behavior must match the
authoritative lease semantics exactly

That means:

- the current `reservation_id` may continue to serve as the implementation anchor for `lease_id`
- the existing `confirmed` state may continue as the compatibility spelling for authoritative
`active`
- the slice must not introduce reservation-era shortcuts that would be invalid in the final lease
model

## Exact Command Semantics For This Slice

### Revoke

Implementation intent:

```text
revoke(lease_id)
```

Current compatibility spelling may still route this through the reservation-era implementation, but
the effect must be:

- precondition: live lease exists and is currently `active`
- success: lease moves to `revoking`
- success: `lease_epoch` increments immediately
- success: member resources stay unavailable and keep pointing at the same live owner
- success: resource state becomes `revoking`
- success: no retirement is scheduled yet

Failure behavior:

- `lease_not_found` if the lease never existed
- `lease_retired` if retained history says the live record is already gone
- `invalid_state` for `reserved`, `revoking`, `released`, `expired`, or `revoked`

Duplicate behavior:

- exact retry with the same `operation_id` must return the cached original result
- a later distinct revoke with a different `operation_id` must not invent a second success; once a
lease is already `revoking` or terminal, the answer is `invalid_state`

### Reclaim

Implementation intent:

```text
reclaim(lease_id)
```

Effect:

- precondition: live lease exists and is currently `revoking`
- success: lease moves to terminal `revoked`
- success: member resources return to `available`
- success: per-resource current owner pointers clear
- success: retirement is scheduled through the normal bounded history path

Failure behavior:

- `lease_not_found`
- `lease_retired`
- `invalid_state` for `reserved`, `active`, `released`, `expired`, or already `revoked`

Duplicate behavior:

- exact retry with the same `operation_id` must return the cached original result
- a later distinct reclaim on an already terminal record must not produce a second success

## Required Safety Properties

`M9-T08` must preserve these invariants:

1. revoke removes holder authority before reuse is possible
2. reclaim is the only transition that makes a revoked resource reusable
3. active or revoking leases are never freed by timer
4. late external reclaim is acceptable; early reclaim is not
5. holder-authorized commands that arrive after revoke with the old epoch fail deterministically
6. replay of committed revoke/reclaim commands yields the same resource availability outcome

## Implementation Boundaries

The slice should be built in this order:

1. add core state and command variants for revoke/reclaim
2. apply revoke/reclaim through the same executor path already used by reserve/confirm/release
3. add only the exact codec, snapshot, and recovery support required so live apply and replay
preserve committed revoke/reclaim outcomes
4. preserve the current no-early-reuse contract under crash, retry, and failover without broadening
the replication surface in this slice
5. add resource-state and lease-state invariants for `revoking` and `revoked`
6. add retry and stale-holder regression coverage

Important boundary:

- if broader WAL/snapshot cleanup, transport normalization, or replication-surface redesign becomes
necessary, keep only the revoke/reclaim unblocker here and defer the broader cleanup to `M9-T09`
and `M9-T10`

## Tests This Slice Should Add

Minimum test set:

- revoke on active lease moves the lease to `revoking` and bumps `lease_epoch`
- revoke does not free member resources
- holder `release` or `confirm` with the old epoch after revoke fails deterministically
- reclaim from `revoking` returns resources to `available` and records terminal `revoked` history
- reclaim before revoke is `invalid_state`
- exact duplicate revoke and reclaim requests return cached committed results
- reserved, active, and revoking resources cannot be reused early
- crash/restart replay preserves `revoking` vs `revoked` outcomes
- committed revoke/reclaim outcomes preserve the same no-early-reuse behavior across the current
failover path

## Exit Condition

`M9-T08` is ready to hand off when:

- the exact revoke/reclaim behavior above is implemented or explicitly mapped to narrower code
tasks
- `docs/status.md` points at `#85` instead of stale `#84` language
- the slice still satisfies the existing `M9-T08` crash/retry/failover acceptance criteria without
silently expanding into broader `M9-T09` or `M9-T10` cleanup
- later work is cleanly reserved for `M9-T09` through `M9-T11`
33 changes: 19 additions & 14 deletions docs/status.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
- `M6` replication design: implemented
- `M7` replicated core prototype: in progress
- `M8` external cluster validation: in progress
- `M9` generic lease-kernel follow-on: `T06` in progress on issue branch
- `M9` generic lease-kernel follow-on: `T07` merged, `T08` planning in progress on issue branch
- Latest completed implementation chunks:
- `4156a80` `Bootstrap AllocDB core and docs`
- `f84a641` `Add WAL file and snapshot recovery primitives`
Expand All @@ -32,8 +32,9 @@
with divergent prepared-suffix discard; promoted partition and primary-crash scenarios that
preserve fail-closed behavior and retry/read continuity after failover; the local
three-replica cluster runner, fault-control harness, and QEMU testbed around the real replica
daemon; and the first trusted-core bundle-commit slice with bundle membership, bundle-aware
confirm/release/expire, and bundle regression coverage
daemon; the first trusted-core bundle-commit slice with bundle membership, bundle-aware
confirm/release/expire, and bundle regression coverage; and the first fencing slice with
lease-epoch propagation, stale-holder rejection, and epoch-aware retry/read coverage

## What Exists

Expand Down Expand Up @@ -85,9 +86,11 @@
- one draft lease-kernel design-decision document that chooses a first-class lease authority
object, bundle size `1` as the single-resource semantic special case, a lease-scoped fencing
token, and a two-stage `revoke -> reclaim` safety model
- one active authoritative-docs pass under issue `#80` that is rewriting semantics, API,
- one merged authoritative-docs pass under issue `#80` that rewrote semantics, API,
architecture, and fault-model docs to the approved lease-centric contract while keeping the
current reservation-centric implementation explicitly marked as compatibility surface
- one active `M9-T08` planning note that narrows revoke/reclaim implementation scope before the
next code-bearing lease-kernel branch
- Replication design draft:
- VSR-style primary/backup replicated log with fixed membership and majority quorums
- primary-only reads in the first replicated release
Expand Down Expand Up @@ -198,13 +201,15 @@
- PR `#89` merged `M9-T06` on `main`: the trusted core now supports atomic bundle reservation,
explicit bundle membership records, bundle-aware confirm/release/expire, and bundle-aware
snapshot/codec coverage while preserving the existing reservation compatibility surface
- issue `#84` / `M9-T07` is the active implementation slice on the current branch: lease epochs
now flow through holder-authorized commands and command outcomes, the core rejects stale holder
epochs deterministically, and read/retry surfaces expose the current authority token for active
reservations
- validation for the active `#84` branch currently includes
`cargo test -p allocdb-core -- --nocapture`, `cargo test -p allocdb-node api -- --nocapture`,
`cargo test -p allocdb-node engine -- --nocapture`, and
`cargo test --workspace --no-run`
- the next planned slices after `#84` remain `M9-T08` revoke/safe reuse, `M9-T09` persistence and
transport extension, `M9-T10` replication preservation, and `M9-T11` broader regression coverage
- PR `#90` merged `M9-T07` on `main`: lease epochs now flow through holder-authorized commands and
command outcomes, the core rejects stale holder epochs deterministically, and read/retry
surfaces expose the current authority token for active reservations
- issue `#85` / `M9-T08` is the active planning slice on the current branch: the local scope is
being narrowed to explicit `revoke` and `reclaim`, `revoking` as the only non-reusable
post-authority state, and the minimum replay-safe bridge from the reservation-era implementation
to the accepted lease-centric semantics
- the active `#85` planning branch is defining exactly what belongs in revoke/reclaim now versus
what stays deferred to `M9-T09` through `M9-T11`, especially around WAL/snapshot broadening,
transport cleanup, and replication preservation
- the next planned code-bearing slices after `#85` remain `M9-T09` persistence and transport
extension, `M9-T10` replication preservation, and `M9-T11` broader regression coverage
Loading