Skip to content

feat(jepsen): complete kubevirt release gate#71

Merged
skel84 merged 4 commits intomainfrom
issue-less-kubevirt-jepsen-backend
Mar 15, 2026
Merged

feat(jepsen): complete kubevirt release gate#71
skel84 merged 4 commits intomainfrom
issue-less-kubevirt-jepsen-backend

Conversation

@skel84
Copy link
Owner

@skel84 skel84 commented Mar 15, 2026

Summary

  • complete and document the green KubeVirt Jepsen release gate on the rebuilt 3-lane profile
  • stabilize the Jepsen/KubeVirt runner and expiration scenarios for long-lived lanes
  • keep the collected Jepsen evidence repo-local under .artifacts/ while leaving it ignored by git

Validation

  • ./scripts/preflight.sh
  • full fresh 15-run KubeVirt Jepsen matrix on the rebuilt longhorn-strict-local-wffc profile

Follow-up

skel84 added 2 commits March 14, 2026 13:44
Stabilize the KubeVirt Jepsen runner, add the final gate report, and keep the generated artifacts repo-local but ignored while preserving the fully green 15-run matrix evidence.
@coderabbitai
Copy link

coderabbitai bot commented Mar 15, 2026

Warning

Rate limit exceeded

@skel84 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 54 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d7b889ac-0785-45de-9410-2c1c809e4b4f

📥 Commits

Reviewing files that changed from the base of the PR and between bdd3660 and 4257dc7.

📒 Files selected for processing (5)
  • crates/allocdb-node/src/bin/allocdb-jepsen.rs
  • crates/allocdb-node/src/bin/allocdb-local-cluster.rs
  • crates/allocdb-node/src/kubevirt_testbed.rs
  • docs/kubevirt-jepsen-report.md
  • docs/testing.md

Walkthrough

This pull request enhances observability and introduces KubeVirt testbed infrastructure. Changes include: adding comprehensive logging throughout protocol handling, replica state transitions, and expiration processing in the allocdb-node binary; introducing a new public kubevirt_testbed module for KubeVirt cluster configuration and layout management; expanding documentation with KubeVirt-specific testing workflows and Jepsen gate reports; and refactoring protocol request handling into dedicated functions.

Changes

Cohort / File(s) Summary
Build Configuration
.gitignore, crates/allocdb-node/Cargo.toml
Added .artifacts/ ignore rule and env_logger dependency for logging support.
Protocol and Replica Instrumentation
crates/allocdb-node/src/bin/allocdb-local-cluster.rs, crates/allocdb-node/src/replica.rs, crates/allocdb-node/src/engine.rs
Extensive logging and diagnostics added to protocol request handlers (Prepare/Commit), replica state transitions, and expiration processing. Refactored protocol handling into dedicated functions (handle_prepare_protocol_request, handle_commit_protocol_request). Introduced result structures (PrepareQuorumResult, CommitBroadcastResult, TickCommitResult) and formatting utilities to track quorum/broadcast outcomes and emit structured logs. Added diagnostic helpers (format_replica_debug_summary, log_metadata_transition) and extended LoadedMetadata with Faulted variant.
KubeVirt Testbed Infrastructure
crates/allocdb-node/src/kubevirt_testbed.rs, crates/allocdb-node/src/lib.rs
New public module for KubeVirt testbed configuration, including KubevirtTestbedConfig, KubevirtGuestConfig, KubevirtTestbedLayout structs and KubevirtTestbedLayoutError enum. Provides layout loading/persistence with atomic writes, replica validation (exactly 3 with sequential IDs), workspace preparation, and text-based serialization with versioning.
Documentation
docs/README.md, docs/kubevirt-jepsen-report.md, docs/operator-runbook.md, docs/status.md, docs/testing.md
Added Jepsen gate report for KubeVirt homelab profile documenting execution profile, results, and artifact policy. Extended operator runbook and testing guides with KubeVirt-specific commands (capture-kubevirt-layout, verify-kubevirt-surface, run-kubevirt, etc.), fault-window override guidance, and multi-lane execution model. Updated status document with KubeVirt surface verification and enhanced observability notes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(jepsen): complete kubevirt release gate' is directly related to the main objectives of the PR, clearly summarizing the primary change.
Description check ✅ Passed The PR description covers Summary, Validation, and Follow-up sections with sufficient detail. However, it omits the Linked Issue and Changes sections from the template.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch issue-less-kubevirt-jepsen-backend
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
crates/allocdb-node/src/replica.rs (1)

1676-1678: Consider consolidating duplicate format_optional_lsn implementations.

This function has the same signature and implementation as the one in crates/allocdb-node/src/bin/allocdb-jepsen.rs (lines 3826-3828 per relevant code snippets). Consider extracting this to a shared utility module to avoid duplication, though this is a minor concern given the function's simplicity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/allocdb-node/src/replica.rs` around lines 1676 - 1678, The helper
function format_optional_lsn (fn format_optional_lsn(lsn: Option<Lsn>) ->
String) is duplicated; extract it into a shared utility (e.g., a new module or
existing utils lib) and replace the local implementations with a single
use/import of that function; ensure the new shared function has the same
signature and uses Lsn::get().to_string(), update both references (the one in
replica.rs and the one in allocdb-jepsen.rs) to call the consolidated function,
and add any needed pub(crate) visibility and module imports so the callers
compile.
docs/kubevirt-jepsen-report.md (1)

114-124: Consider clarifying that artifact links are local-only references.

The links to ../.artifacts/kubevirt-jepsen-20260315/ paths reference git-ignored local artifacts. These links will not resolve for users who clone the repository fresh. Consider adding a note that these are representative path examples for local development, not clickable links in the repository.

📝 Suggested clarification
 ## Representative Evidence

+Note: The paths below reference local `.artifacts/` directories that are git-ignored. They serve as
+representative examples for developers who have run the gate locally.
+
 Representative final status files:

 - [`reservation_contention-crash-restart-status.txt`](../.artifacts/kubevirt-jepsen-20260315/lane-a/reservation_contention-crash-restart-status.txt)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/kubevirt-jepsen-report.md` around lines 114 - 124, The artifact links
under the representative final artifact bundles section point to
../.artifacts/kubevirt-jepsen-20260315/... which are local, git-ignored build
artifacts and won't resolve for fresh clones; update the
docs/kubevirt-jepsen-report.md text near those bullet lists to add a short
clarifying note that these are local-only representative paths (e.g., "These
links point to local, git-ignored artifacts in .artifacts and will not be
accessible from a fresh clone; see instructions below to reproduce or where to
find uploaded bundles."), and optionally suggest how readers can obtain the
artifacts (reproduce the run, or link to external storage) so users know why the
links may be broken.
crates/allocdb-node/src/kubevirt_testbed.rs (1)

147-152: Keep load() read-only.

Right now load() goes through new(), which creates the workspace on disk and re-absolutizes relative paths against the current working directory. For a read path, that’s a surprising side effect and it can mask a typoed or stale archived layout. I’d split validation/normalization from on-disk workspace preparation.

Also applies to: 271-284, 324-368

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/allocdb-node/src/kubevirt_testbed.rs` around lines 147 - 152, load()
currently triggers on-disk workspace creation and path re-absolutization via
new(); change load() to be pure/read-only by having it only read the file and
call decode_layout(&bytes) (or a new pure parse function) without creating the
workspace or mutating paths; extract the validation/normalization logic
currently embedded in new() into a separate pure method (e.g.,
normalize_or_validate_layout or normalize_paths) that new() will call before
performing on-disk workspace preparation, and update the code paths in the other
affected areas (the blocks around the original new() usage at the ranges noted:
271-284 and 324-368) to use the new pure normalization function where they
should not touch disk, reserving on-disk creation exclusively for a clearly
named method like create_workspace_on_disk().
crates/allocdb-node/src/bin/allocdb-local-cluster.rs (2)

811-899: Please add direct regression coverage for the extracted protocol helpers.

This refactor moves substantive behavior into handle_prepare_protocol_request, handle_commit_protocol_request, and commit_tick_entry, but the new tests only pin ACK matching and oversized reads. A few focused negative-path tests around view mismatch, append/commit rejection, and quorum-unavailable tick commits would make this much safer to evolve.

As per coding guidelines **/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.

Also applies to: 1065-1110, 2304-2357

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/allocdb-node/src/bin/allocdb-local-cluster.rs` around lines 811 - 899,
Add regression tests that directly exercise the new protocol helper functions to
cover negative paths: write unit/integration tests that call
handle_prepare_protocol_request, handle_commit_protocol_request, and
commit_tick_entry (or the public API that triggers them) to assert behavior on
view mismatch, append_prepared_entry rejection, commit_prepared_through
rejection, and tick-commit when quorum is unavailable; for each test, construct
a ReplicaNode/LocalClusterReplicaConfig in the failing state, invoke the helper
(or simulate the incoming protocol message), assert the returned Err (and/or
ReplicaProtocolResponse) matches the expected protocol_error_message or
view-mismatch string, and verify side-effects/logs/metadata (e.g., no commit
applied, commit_lsn unchanged) so these negative-paths are explicitly covered
alongside the existing ACK/oversize tests.

118-121: Use leveled logging here, but keep the replication hot path out of info.

init_logging() defaults to info, so every prepare/commit/quorum/tick trace below becomes always-on during Jepsen runs, while several rejection paths still bypass the logger with eprintln!. That will make long-lane artifacts noisy and hard to filter. I’d keep startup/role/view transitions at info, move per-request traces to debug, and route expected rejections/quorum loss through warn!/error!.

As per coding guidelines **/*.rs: Use the right log level: info for meaningful lifecycle and state-transition events; debug for detailed execution traces useful in development. Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.

Also applies to: 828-847, 862-893, 908-923, 958-969, 983-998, 1021-1049, 1098-1108, 1274-1289, 1329-1407

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/allocdb-node/src/bin/allocdb-local-cluster.rs` around lines 118 - 121,
init_logging currently sets a global default that makes hot-path traces
always-on; update the init_logging function (the env_logger::Builder::from_env
call) to use a global default of "info" but add module-level filters so
per-request/replication hot-path logs are at "debug" (use Builder::filter_module
or parse_filters to set e.g. "allocdb_node::replication=debug" and any
request-handling modules to debug) and keep lifecycle/startup paths at info;
also ensure expected rejection paths in the codebase use warn!/error! (adjust
those call sites) rather than eprintln! so logs follow the leveled
configuration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/allocdb-node/src/bin/allocdb-local-cluster.rs`:
- Around line 1122-1130: The structs PrepareQuorumResult and
CommitBroadcastResult currently populate acked_peers by pre-seeding with
replica.replica_id (so acked_peers/acked_prepare_peers/commit_ack_peers include
the local replica), causing over-reporting of remote acknowledgements; fix by
initializing the acked_peers Vecs empty (or if semantics require including local
replica, rename the fields to something like all_acked_replicas) and ensure code
paths that push acknowledgements (the logic around replica.replica_id) only add
remote replicas so that acked_replicas and acked_peers accurately represent
remote ack counts and peers (check usages at the
PrepareQuorumResult/CommitBroadcastResult construction sites and the consumers
acked_prepare_peers and commit_ack_peers).

In `@crates/allocdb-node/src/kubevirt_testbed.rs`:
- Around line 114-139: new() currently validates replica_guests but allows a
control_guest that may carry a replica_id; add a guard in KubevirtTestbed::new
(and the other constructors/creators that mirror this pattern, e.g., the
functions around the other affected ranges) to reject any KubevirtGuestConfig
where control_guest.replica_id.is_some(), returning a
KubevirtTestbedLayoutError; reuse or mirror the validation logic from
validate_replica_guests (or extract a helper) so that control_guest with a
non-None replica_id is treated as invalid and the error prevents creating the
layout, ensuring persist() / load() round-trips remain consistent.
- Around line 185-245: The cluster builder maps fixed ports onto each guest IP
causing duplicate socket addresses when multiple replicas share the same IP;
update validation to detect and reject overlapping addresses by checking
LocalClusterLayout.replicas' LocalClusterReplicaConfig.control_addr, client_addr
and protocol_addr for duplicates (in validate_replica_guests or immediately
after build_guest_local_cluster_layout) and return an error (or panic) with a
clear message listing the conflicting ReplicaId(s) and address(es) so callers
cannot create an invalid LocalClusterLayout.

In `@docs/testing.md`:
- Around line 447-455: Update the documented command examples (the jepsen CLI
listings that include flags like the --output-root <artifacts> usage such as in
the allocdb-jepsen commands shown and the other occurrences around lines
referenced) to explicitly state and recommend the repo-local output root
`.artifacts/` (e.g., replace or annotate `--output-root <artifacts>` with
`--output-root .artifacts/` and add a short note that evidence should be
archived under the repo-local `.artifacts/` directory). Ensure you update every
occurrence (including the other block noted) of `--output-root <artifacts>` in
this section so docs match the PR-standardized location and avoid archiving
outside the ignored path.

---

Nitpick comments:
In `@crates/allocdb-node/src/bin/allocdb-local-cluster.rs`:
- Around line 811-899: Add regression tests that directly exercise the new
protocol helper functions to cover negative paths: write unit/integration tests
that call handle_prepare_protocol_request, handle_commit_protocol_request, and
commit_tick_entry (or the public API that triggers them) to assert behavior on
view mismatch, append_prepared_entry rejection, commit_prepared_through
rejection, and tick-commit when quorum is unavailable; for each test, construct
a ReplicaNode/LocalClusterReplicaConfig in the failing state, invoke the helper
(or simulate the incoming protocol message), assert the returned Err (and/or
ReplicaProtocolResponse) matches the expected protocol_error_message or
view-mismatch string, and verify side-effects/logs/metadata (e.g., no commit
applied, commit_lsn unchanged) so these negative-paths are explicitly covered
alongside the existing ACK/oversize tests.
- Around line 118-121: init_logging currently sets a global default that makes
hot-path traces always-on; update the init_logging function (the
env_logger::Builder::from_env call) to use a global default of "info" but add
module-level filters so per-request/replication hot-path logs are at "debug"
(use Builder::filter_module or parse_filters to set e.g.
"allocdb_node::replication=debug" and any request-handling modules to debug) and
keep lifecycle/startup paths at info; also ensure expected rejection paths in
the codebase use warn!/error! (adjust those call sites) rather than eprintln! so
logs follow the leveled configuration.

In `@crates/allocdb-node/src/kubevirt_testbed.rs`:
- Around line 147-152: load() currently triggers on-disk workspace creation and
path re-absolutization via new(); change load() to be pure/read-only by having
it only read the file and call decode_layout(&bytes) (or a new pure parse
function) without creating the workspace or mutating paths; extract the
validation/normalization logic currently embedded in new() into a separate pure
method (e.g., normalize_or_validate_layout or normalize_paths) that new() will
call before performing on-disk workspace preparation, and update the code paths
in the other affected areas (the blocks around the original new() usage at the
ranges noted: 271-284 and 324-368) to use the new pure normalization function
where they should not touch disk, reserving on-disk creation exclusively for a
clearly named method like create_workspace_on_disk().

In `@crates/allocdb-node/src/replica.rs`:
- Around line 1676-1678: The helper function format_optional_lsn (fn
format_optional_lsn(lsn: Option<Lsn>) -> String) is duplicated; extract it into
a shared utility (e.g., a new module or existing utils lib) and replace the
local implementations with a single use/import of that function; ensure the new
shared function has the same signature and uses Lsn::get().to_string(), update
both references (the one in replica.rs and the one in allocdb-jepsen.rs) to call
the consolidated function, and add any needed pub(crate) visibility and module
imports so the callers compile.

In `@docs/kubevirt-jepsen-report.md`:
- Around line 114-124: The artifact links under the representative final
artifact bundles section point to ../.artifacts/kubevirt-jepsen-20260315/...
which are local, git-ignored build artifacts and won't resolve for fresh clones;
update the docs/kubevirt-jepsen-report.md text near those bullet lists to add a
short clarifying note that these are local-only representative paths (e.g.,
"These links point to local, git-ignored artifacts in .artifacts and will not be
accessible from a fresh clone; see instructions below to reproduce or where to
find uploaded bundles."), and optionally suggest how readers can obtain the
artifacts (reproduce the run, or link to external storage) so users know why the
links may be broken.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 721952da-9385-4193-9f72-c02be491cb17

📥 Commits

Reviewing files that changed from the base of the PR and between 0ad54b9 and bdd3660.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • .gitignore
  • crates/allocdb-node/Cargo.toml
  • crates/allocdb-node/src/bin/allocdb-jepsen.rs
  • crates/allocdb-node/src/bin/allocdb-local-cluster.rs
  • crates/allocdb-node/src/engine.rs
  • crates/allocdb-node/src/kubevirt_testbed.rs
  • crates/allocdb-node/src/lib.rs
  • crates/allocdb-node/src/replica.rs
  • docs/README.md
  • docs/kubevirt-jepsen-report.md
  • docs/operator-runbook.md
  • docs/status.md
  • docs/testing.md
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level: error for invariant breaks, corruption, and failed operations that require intervention; warn for degraded but expected conditions such as overload, lag, or rejected requests; info for meaningful lifecycle and state-transition events; debug for detailed execution traces useful in development; trace only for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.

Files:

  • crates/allocdb-node/src/lib.rs
  • crates/allocdb-node/src/engine.rs
  • crates/allocdb-node/src/bin/allocdb-local-cluster.rs
  • crates/allocdb-node/src/replica.rs
  • crates/allocdb-node/src/kubevirt_testbed.rs
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.

Files:

  • docs/status.md
  • docs/kubevirt-jepsen-report.md
  • docs/README.md
  • docs/operator-runbook.md
  • docs/testing.md
docs/status.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep docs/status.md current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Files:

  • docs/status.md
🧠 Learnings (4)
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to **/*.rs : Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level: `error` for invariant breaks, corruption, and failed operations that require intervention; `warn` for degraded but expected conditions such as overload, lag, or rejected requests; `info` for meaningful lifecycle and state-transition events; `debug` for detailed execution traces useful in development; `trace` only for very high-volume diagnostic detail.

Applied to files:

  • crates/allocdb-node/src/engine.rs
  • crates/allocdb-node/src/bin/allocdb-local-cluster.rs
  • crates/allocdb-node/src/replica.rs
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Use the GitHub Project `AllocDB` as the operational work board. Keep planned work on the board, not only in milestone pages or local docs.

Applied to files:

  • docs/status.md
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to docs/status.md : Keep [`docs/status.md`](./docs/status.md) current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Applied to files:

  • docs/status.md
📚 Learning: 2026-03-12T15:18:53.086Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to **/*.rs : Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.

Applied to files:

  • crates/allocdb-node/src/bin/allocdb-local-cluster.rs
🧬 Code graph analysis (4)
crates/allocdb-node/src/engine.rs (4)
crates/allocdb-node/src/replica.rs (1)
  • get (36-38)
crates/allocdb-node/src/bounded_queue.rs (1)
  • len (26-28)
crates/allocdb-core/src/state_machine_metrics.rs (2)
  • config (31-33)
  • last_applied_lsn (21-23)
crates/allocdb-core/src/state_machine.rs (1)
  • deadline_slot (225-230)
crates/allocdb-node/src/bin/allocdb-local-cluster.rs (4)
crates/allocdb-node/src/local_cluster.rs (7)
  • replica (303-307)
  • encode_role (1261-1269)
  • from (69-71)
  • from (119-121)
  • from (188-190)
  • from (233-235)
  • from (1440-1462)
crates/allocdb-node/src/jepsen.rs (2)
  • encode_role (1408-1416)
  • kind (514-521)
crates/allocdb-core/src/state_machine_metrics.rs (1)
  • last_applied_lsn (21-23)
crates/allocdb-node/src/engine_observe.rs (1)
  • active_snapshot_lsn (43-45)
crates/allocdb-node/src/replica.rs (3)
crates/allocdb-node/src/replica_tests.rs (1)
  • identity (61-66)
crates/allocdb-node/src/bin/allocdb-jepsen.rs (1)
  • format_optional_lsn (3827-3829)
crates/allocdb-node/src/engine_observe.rs (1)
  • active_snapshot_lsn (43-45)
crates/allocdb-node/src/kubevirt_testbed.rs (1)
crates/allocdb-node/src/local_cluster.rs (2)
  • default_local_cluster_core_config (457-468)
  • default_local_cluster_engine_config (471-477)
🔇 Additional comments (15)
.gitignore (1)

17-17: LGTM! Correctly excludes repo-local Jepsen artifacts.

The pattern /.artifacts/ appropriately ignores the root-level artifacts directory while keeping Jepsen evidence repo-local. The placement in the "Test and benchmark artifacts" section is logical.

crates/allocdb-node/src/engine.rs (3)

655-662: LGTM: Structured logging for expiration tick planning.

The info level is appropriate for this lifecycle event per coding guidelines. The log includes all relevant diagnostic fields (current_slot, request_slot, due_candidates, max_per_tick, queue_depth) in a structured format.


682-688: LGTM: Completion log complements the planning log.

Logging both the planning and completion of expiration ticks provides good observability for diagnosing expiration processing issues.


1082-1095: LGTM: Per-command logging for internal expiration application.

Logging individual expiration commands at info level is appropriate given the operational significance of expiration state transitions. The conditional logging only for Command::Expire variants keeps the log focused.

docs/README.md (1)

22-22: LGTM: Documentation index updated with new report link.

The KubeVirt Jepsen Report link is correctly placed in the Engineering Docs section, maintaining consistency with the existing documentation structure.

docs/kubevirt-jepsen-report.md (1)

1-74: LGTM: Comprehensive and well-structured Jepsen gate report.

The report clearly documents:

  • Execution profile and topology
  • What the gate proved (and importantly, what it does not prove)
  • Artifact policy rationale

This provides valuable context for understanding the validation performed in this PR.

crates/allocdb-node/src/replica.rs (3)

1064-1072: LGTM: Appropriate warn-level logging for protocol rejections.

Using warn for rejected role transitions aligns with coding guidelines (degraded but expected conditions). The structured format includes all relevant diagnostic context.


1637-1664: LGTM: Well-designed transition logging helper.

The log_metadata_transition helper consolidates transition logging with before/after state snapshots, reducing duplication across configure_normal_role, record_durable_vote, and enter_view_uncertain. The format provides clear visibility into state changes.


1498-1530: LGTM: Commit-through logging provides good operational visibility.

The logging before and after commit operations captures the full state transition, which is valuable for debugging replication issues. The early-return log when commit is already satisfied helps trace no-op paths.

docs/status.md (1)

206-219: LGTM: Status accurately reflects the completed KubeVirt release gate.

The status update correctly captures:

  • The completed 15-run Jepsen matrix across 3 lanes
  • Isolation improvements and debugging enhancements
  • Clear articulation of the next decision point

This maintains docs/status.md as the authoritative progress snapshot per coding guidelines.

docs/operator-runbook.md (3)

124-132: LGTM: New KubeVirt commands documented with consistent formatting.

The new commands (capture-kubevirt-layout, verify-kubevirt-surface, run-kubevirt, watch-kubevirt, watch-kubevirt-fleet, archive-kubevirt) follow the existing documentation patterns and provide operators with clear command surfaces.


134-136: LGTM: Fault window override documented with appropriate caveats.

The documentation correctly warns that ALLOCDB_JEPSEN_FAULT_WINDOW_SECS_OVERRIDE is for debugging only and that release-gate evidence requires the full 1800s minimum.


227-242: LGTM: Comprehensive KubeVirt operational guidance.

The documentation covers:

  • Watch commands for monitoring runs
  • Fleet aggregation for multi-lane batches
  • Storage and anti-affinity configuration
  • Lane-scoped temp path handling
  • Debugging guidance with fault window override

This provides operators with actionable guidance for the KubeVirt testbed.

crates/allocdb-node/Cargo.toml (1)

10-10: LGTM: env_logger dependency addition supports the logging infrastructure.

The env_logger crate enables the structured logging introduced throughout the replica and engine code paths. This is appropriate for the operational observability goals of the PR.

crates/allocdb-node/src/lib.rs (1)

5-5: LGTM: New public module for KubeVirt testbed support.

The kubevirt_testbed module is correctly exported as public, following the same pattern as qemu_testbed and local_cluster. The alphabetical ordering with other modules is maintained.

@skel84 skel84 merged commit 65a8953 into main Mar 15, 2026
2 checks passed
@skel84 skel84 deleted the issue-less-kubevirt-jepsen-backend branch March 15, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant