test: promote replicated simulation scenarios by skel84 · Pull Request #65 · skel84/allocdb

skel84 · 2026-03-13T17:39:10Z

Summary

promote the missing replicated partition and primary-crash scenario families into deterministic harness tests
add minimal retry-aware harness support for ambiguous client outcomes across failover
align replicated testing and status docs with the new executable coverage

Validation

cargo test -p allocdb-node replicated_simulation -- --nocapture
./scripts/preflight.sh

Closes #54

Add deterministic partition and primary-crash regression coverage to the real three-replica harness, plus the minimal retry-aware harness support needed to replay ambiguous client outcomes after failover. Closes #54

coderabbitai · 2026-03-13T17:39:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d4dfe091-4977-4a60-acd9-cf42ca51c692

📥 Commits

Reviewing files that changed from the base of the PR and between d8348da and bbc3d22.

📒 Files selected for processing (4)

crates/allocdb-node/src/replicated_simulation.rs
crates/allocdb-node/src/replicated_simulation_tests.rs
docs/status.md
docs/testing.md

📜 Recent review details

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level: error for invariant breaks, corruption, and failed operations that require intervention; warn for degraded but expected conditions such as overload, lag, or rejected requests; info for meaningful lifecycle and state-transition events; debug for detailed execution traces useful in development; trace only for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.

Files:

crates/allocdb-node/src/replicated_simulation_tests.rs
crates/allocdb-node/src/replicated_simulation.rs

**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.

Files:

docs/testing.md
docs/status.md

docs/status.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep docs/status.md current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Files:

docs/status.md

🧠 Learnings (2)

📚 Learning: 2026-03-12T15:18:53.086Z

Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to **/*.rs : Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.

Applied to files:

crates/allocdb-node/src/replicated_simulation_tests.rs

📚 Learning: 2026-03-12T15:18:53.086Z

Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to docs/status.md : Keep [`docs/status.md`](./docs/status.md) current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Applied to files:

docs/status.md

🧬 Code graph analysis (2)

crates/allocdb-node/src/replicated_simulation_tests.rs (4)

crates/allocdb-node/src/replica.rs (1)

engine (719-721)

crates/allocdb-node/src/simulation.rs (1)

engine (341-343)

crates/allocdb-node/src/replicated_simulation.rs (3)

replica (535-540)

published_result (682-684)

configured_primary (578-580)

crates/allocdb-core/src/state_machine.rs (1)

deadline_slot (225-230)

crates/allocdb-node/src/replicated_simulation.rs (2)

crates/allocdb-node/src/replica.rs (1)

engine (719-721)

crates/allocdb-node/src/simulation.rs (1)

engine (341-343)

🔇 Additional comments (9)

docs/testing.md (1)

261-263: LGTM!

The documentation updates accurately reflect the new executable test coverage. The scenario descriptions in lines 261-263 now correctly emphasize retry recovery and commit propagation semantics, and the expanded coverage list in lines 324-341 aligns with the newly promoted test cases.

Also applies to: 324-341

crates/allocdb-node/src/replicated_simulation_tests.rs (4)

745-794: LGTM!

This test correctly validates that retry-cache hits do not overwrite canonical published results. The assertions on lines 758, 767, 773, 786, and 793 comprehensively verify:

Original commit produces a non-retry-cache result

Retry returns a from_retry_cache result while preserving the original

Conflicting payload returns OperationConflict without corrupting the canonical entry

948-1010: LGTM!

This regression test directly validates the fix for the primary-only prepared promotion bug raised in past reviews. The test correctly:

Creates a prepare that only exists on the primary (lines 961-984 verify backups have no prepared entry)

Forces view change and verifies the entry is NOT promoted (lines 988-996)

Confirms retry creates a fresh Prepared entry in the new view (lines 1002-1009)

1012-1071: LGTM!

This test validates the positive case for prepared-suffix recovery during view change. It correctly demonstrates that when a backup (replica 3) holds a prepared entry that proves majority append, the new primary (replica 2) can reconstruct the committed prefix even though it never directly received the prepare.

1073-1297: LGTM!

The partition and primary-crash scenario tests comprehensively cover the failure modes documented in docs/testing.md:

minority_partition (lines 1073-1108): Validates quorum preservation and stale backup catch-up

split_cluster (lines 1110-1164): Validates fail-closed behavior and later rejoin convergence

crash_before_quorum (lines 1166-1203): Validates pre-quorum retry creates fresh prepare

crash_after_quorum (lines 1205-1249): Validates retry resolves from reconstructed commit cache

crash_after_reply (lines 1251-1297): Validates read and retry preservation on new primary

The tests correctly use ReplicatedClientRequestOutcome to distinguish between Prepared and Published outcomes and verify the from_retry_cache flag appropriately.

docs/status.md (1)

50-57: LGTM!

The status updates accurately reflect the PR's contributions:

Lines 50-57: New promoted scenarios (partition, crash, retry) are documented

Lines 172-173: Retry-aware client submit helper matches the new client_submit_or_retry API

Lines 185-189: Regression coverage list now includes all newly promoted test families

The validation commands reorganization (lines 191-195) improves clarity by grouping related tests.

Also applies to: 172-173, 185-189

crates/allocdb-node/src/replicated_simulation.rs (3)

60-64: LGTM!

The ReplicatedClientRequestOutcome enum and client_submit_or_retry method correctly implement retry-aware submission. Crucially, when a retry cache hit occurs (line 674-676), the method returns the result directly without writing to published_results, preserving the canonical per-LSN result. This addresses the previously raised concern about overwriting canonical published results.

Also applies to: 659-680

1207-1244: LGTM!

The lookup_retry_result helper correctly implements the retry cache lookup:

Returns None when the operation is not found (line 1227), allowing a fresh prepare

Uses fingerprint comparison to detect conflicts and return OperationConflict (lines 1230-1238)

Properly sets from_retry_cache: true to distinguish from canonical commits

1315-1337: LGTM!

The fix for the primary-only prepared promotion bug is correctly implemented. The view_change_target_commit_lsn method now:

Takes old_primary as a parameter (line 1318)

Excludes the old primary's prepared suffix from the target commit calculation (lines 1329-1331)

Only considers backup-held prepared suffixes as proof of majority append

The inline comment (lines 1326-1328) clearly documents the rationale. This addresses the critical issue raised in past reviews about incorrectly promoting uncommitted entries that only existed on the primary.

Summary by CodeRabbit

New Features
- Retry cache mechanism for client request recovery after primary failures
- Improved prepared suffix handling during failover and view changes
- Multi-process local cluster runner with durable workspace persistence across restarts
Tests
- Comprehensive test coverage for primary crash scenarios, partition healing, and recovery workflows
- New validation for retry cache behavior and commit reconstruction
Documentation
- Updated status documentation reflecting deterministic partition handling and durability guarantees
- Enhanced testing scenarios covering failover, minority partitions, and rejoin strategies

Walkthrough

Adds a retry-aware client submit flow to the replicated simulation, exposes ReplicaNode::highest_prepared_lsn, updates view-change commit selection to consider prepared suffixes, and adds many tests and docs covering partition, crash, and rejoin scenarios.

Changes

Cohort / File(s)	Summary
Replica Prepared LSN Tracking `crates/allocdb-node/src/replica.rs`	Adds `pub fn highest_prepared_lsn(&self) -> Option<Lsn>` to return the highest LSN among prepared entries.
Replicated Simulation Core `crates/allocdb-node/src/replicated_simulation.rs`	Adds `ReplicatedClientRequestOutcome` enum, `client_submit_or_retry()` method, and `lookup_retry_result()` helper; integrates retry-cache lookup into client submit flow; updates `view_change_target_commit_lsn` to consider `highest_prepared_lsn`; expands imports for request decoding and result handling.
Replicated Simulation Tests `crates/allocdb-node/src/replicated_simulation_tests.rs`	Adds extensive tests validating retry-cache semantics, prepared-suffix recovery across view changes, partition/crash/rejoin scenarios, and exposes `ReplicatedClientRequestOutcome` for assertions.
Documentation `docs/status.md`, `docs/testing.md`	Updates status and testing docs to describe retry-aware submit behavior, primary-crash/rejoin/partition scenarios, and expanded test coverage and scenarios.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Simulation as Replicated Simulation
    participant RetryCache as Retry Lookup
    participant Engine as Primary Engine State

    Client->>Simulation: client_submit_or_retry(primary, slot, payload)
    activate Simulation

    Simulation->>RetryCache: lookup_retry_result(primary, slot, payload)
    activate RetryCache

    RetryCache->>Engine: fetch active primary engine state
    Engine-->>RetryCache: engine state / prepared & published records

    alt Retry Found (Fingerprint match -> published)
        RetryCache-->>Simulation: SubmissionResult (published)
        Simulation-->>Client: Published(SubmissionResult)
    else No usable retry result
        RetryCache-->>Simulation: None
        deactivate RetryCache
        Simulation->>Simulation: client_submit() -> produce Prepared entry
        Simulation-->>Client: Prepared(ReplicaPreparedEntry)
    end

    deactivate Simulation

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

M7-T01 Add replicated node state and durable protocol metadata #60: Modifies ReplicaNode/replica.rs surface; related to the new highest_prepared_lsn addition.
feat(simulation): add deterministic replicated cluster harness #61: Introduced the replicated simulation harness and types that this PR extends with retry/cache and client submit changes.
Add view change and fail-closed reads on quorum loss #63: Overlaps on prepared-entry and view-change handling that interacts with the new prepared-suffix logic.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'test: promote replicated simulation scenarios' accurately summarizes the main change: promoting replicated simulation scenarios into executable tests.
Description check	✅ Passed	The description covers the main changes (promoting scenarios, adding retry support, aligning docs), includes validation steps, and references the closed issue.
Linked Issues check	✅ Passed	All requirements from issue `#54` are met: partition/primary-crash/rejoin scenarios are promoted into executable tests [`#54`], retry-aware support is added, docs are aligned, and transcripts remain replayable.
Out of Scope Changes check	✅ Passed	All changes are directly related to issue `#54`: new test scenarios, retry cache implementation for ambiguous outcomes, documentation updates, and a new method for accessing prepared LSN state.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch issue-54-promote-replicated-tests

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/allocdb-node/src/replicated_simulation_tests.rs`:
- Around line 1029-1045: The test currently observes the outcome via
published_result(entry.lsn) which is set by handle_prepare_ack() as soon as
quorum append completes, so it does not model a distinct “reply not yet
published” boundary; either change the test to explicitly model pre-reply by
removing/avoiding the published_result check and instead assert quorum-append
via a lower-level indicator (e.g., check a quorum-appended helper or persisted
log state) before crashing, or explicitly publish the reply before the crash if
you intend the “reply exposed” scenario (call the harness method that publishes
replies, e.g., harness.publish_reply(entry.lsn) or equivalent), and update the
test name/docs to match the chosen behavior; locate changes around
client_submit, deliver_protocol_message, published_result(entry.lsn), and
handle_prepare_ack() to implement this fix.

In `@crates/allocdb-node/src/replicated_simulation.rs`:
- Around line 1325-1329: The patch missed a regression test where a newly
elected primary must fetch a prepared suffix from another voter: add a test in
replicated_simulation.rs that simulates a 3-replica view where only replica 3
holds the prepared entry and replica 2 wins the new view, forcing the path that
calls highest_prepared_lsn() and then find_prepared_entry() to reconstruct the
missing entries; exercise the failover/recovery flow, assert that replica 2
successfully copies the prepared suffix (verify commit LSNs and that the
prepared entry appears on replica 2), and ensure the test covers the
negative/recovery case so this regression cannot recur.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8665918b-a8fd-42d1-80f6-de232bca5797

📥 Commits

Reviewing files that changed from the base of the PR and between bac464b and b97b5d4.

📒 Files selected for processing (5)

crates/allocdb-node/src/replica.rs
crates/allocdb-node/src/replicated_simulation.rs
crates/allocdb-node/src/replicated_simulation_tests.rs
docs/status.md
docs/testing.md

📜 Review details

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level: error for invariant breaks, corruption, and failed operations that require intervention; warn for degraded but expected conditions such as overload, lag, or rejected requests; info for meaningful lifecycle and state-transition events; debug for detailed execution traces useful in development; trace only for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.

Files:

crates/allocdb-node/src/replica.rs
crates/allocdb-node/src/replicated_simulation_tests.rs
crates/allocdb-node/src/replicated_simulation.rs

**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.

Files:

docs/status.md
docs/testing.md

docs/status.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep docs/status.md current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Files:

docs/status.md

🧠 Learnings (2)

📚 Learning: 2026-03-12T15:18:53.086Z

Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to **/*.rs : Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.

Applied to files:

crates/allocdb-node/src/replicated_simulation_tests.rs

📚 Learning: 2026-03-12T15:18:53.086Z

Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to docs/status.md : Keep [`docs/status.md`](./docs/status.md) current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Applied to files:

docs/status.md

🧬 Code graph analysis (2)

crates/allocdb-node/src/replicated_simulation_tests.rs (1)

crates/allocdb-node/src/replicated_simulation.rs (2)

replica (535-540)

configured_primary (578-580)

crates/allocdb-node/src/replicated_simulation.rs (2)

crates/allocdb-node/src/replica.rs (5)

engine (719-721)

new (210-220)

new (452-456)

new (539-543)

new (1376-1378)

crates/allocdb-node/src/simulation.rs (4)

engine (341-343)

new (86-88)

new (114-119)

new (166-185)

crates/allocdb-node/src/replicated_simulation_tests.rs

crates/allocdb-node/src/replicated_simulation.rs

skel84 · 2026-03-13T18:19:57Z

Addressed CodeRabbit feedback in commit d8348da.

Changes:

kept retry-cache hits from overwriting canonical published results
renamed the majority-appended crash regression to match the boundary the harness actually models and asserted the committed prefix via replica state
added regressions for prepared-suffix recovery from another voter during takeover and for retry-cache hits preserving canonical published results
updated replicated testing/status docs to match the executable coverage

Validation rerun:

cargo test -p allocdb-node replicated_simulation -- --nocapture
./scripts/preflight.sh

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

docs/testing.md (1)
261-263: ⚠️ Potential issue | 🟡 Minor

Don't describe an unmodeled pre-reply crash window.

The harness still publishes the client outcome as soon as quorum append completes in handle_prepare_ack(), so this scenario only models interrupted commit propagation. Please remove the “reply delivery” wording here to keep the docs aligned with the executable behavior.

As per coding guidelines, **/*.md: Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/testing.md` around lines 261 - 263, Update the test description to
remove the unmodeled "reply delivery" wording so it matches the harness
behavior: change the second bullet to state that the primary is crashed after
quorum append (and after reply) modeling only interrupted commit propagation,
and explicitly reference the existing behavior in handle_prepare_ack() which
publishes client outcome as soon as quorum append completes; remove any claim
that reply delivery can be interrupted because that pre-reply crash window is
not modeled.

🧹 Nitpick comments (1)

crates/allocdb-node/src/replicated_simulation.rs (1)
1206-1243: Log retry-cache hits and misses in this lookup path.

This is the new ambiguity-resolution surface, but hit/miss/conflict outcomes are silent today. A structured debug! here with primary, operation_id, request_slot, applied_lsn, and result_code would make failover regressions much easier to triage.

As per coding guidelines, **/*.rs: Add extensive logging where it materially improves debuggability or operational clarity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/allocdb-node/src/replicated_simulation.rs` around lines 1206 - 1243,
The lookup_retry_result path lacks observability for retry-cache behavior; add a
structured debug log in lookup_retry_result (after decoding request and after
retrieving the record) that logs primary, request.operation_id, request_slot,
and whether the record was found (hit/miss); when found include applied_lsn,
record.result_code and whether the outcome was a conflict (compare
record.command_fingerprint vs request.command.fingerprint()), and indicate
from_retry_cache=true in the log so operators can trace retry-cache hits/misses
and conflict outcomes for SubmissionResult.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/allocdb-node/src/replicated_simulation.rs`:
- Around line 1323-1327: The code is incorrectly using
node.highest_prepared_lsn() from every voter when computing target_commit_lsn,
which allows a prepare that exists only on the currently-configured primary to
be promoted into committed history; change the logic that sets target_commit_lsn
(the two max(...) lines) to ignore the still-configured primary (i.e., skip the
node whose metadata().is_configured_primary or matching view/primary identifier)
when calling highest_prepared_lsn(), and ensure
find_prepared_entry()/complete_view_change() logic only considers prepared LSNs
from non-primary voters; add a regression test that simulates a "primary-only
prepared entry" (force complete_view_change() while primary is still in config)
and asserts that such an LSN is not copied/committed on the new primary.

---

Duplicate comments:
In `@docs/testing.md`:
- Around line 261-263: Update the test description to remove the unmodeled
"reply delivery" wording so it matches the harness behavior: change the second
bullet to state that the primary is crashed after quorum append (and after
reply) modeling only interrupted commit propagation, and explicitly reference
the existing behavior in handle_prepare_ack() which publishes client outcome as
soon as quorum append completes; remove any claim that reply delivery can be
interrupted because that pre-reply crash window is not modeled.

---

Nitpick comments:
In `@crates/allocdb-node/src/replicated_simulation.rs`:
- Around line 1206-1243: The lookup_retry_result path lacks observability for
retry-cache behavior; add a structured debug log in lookup_retry_result (after
decoding request and after retrieving the record) that logs primary,
request.operation_id, request_slot, and whether the record was found (hit/miss);
when found include applied_lsn, record.result_code and whether the outcome was a
conflict (compare record.command_fingerprint vs request.command.fingerprint()),
and indicate from_retry_cache=true in the log so operators can trace retry-cache
hits/misses and conflict outcomes for SubmissionResult.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c6085148-432e-4d28-aace-28b83c28e380

📥 Commits

Reviewing files that changed from the base of the PR and between b97b5d4 and d8348da.

📒 Files selected for processing (4)

crates/allocdb-node/src/replicated_simulation.rs
crates/allocdb-node/src/replicated_simulation_tests.rs
docs/status.md
docs/testing.md

📜 Review details

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level: error for invariant breaks, corruption, and failed operations that require intervention; warn for degraded but expected conditions such as overload, lag, or rejected requests; info for meaningful lifecycle and state-transition events; debug for detailed execution traces useful in development; trace only for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.

Files:

crates/allocdb-node/src/replicated_simulation_tests.rs
crates/allocdb-node/src/replicated_simulation.rs

**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.

Files:

docs/status.md
docs/testing.md

docs/status.md

📄 CodeRabbit inference engine (AGENTS.md)

Keep docs/status.md current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Files:

docs/status.md

🧠 Learnings (2)

📚 Learning: 2026-03-12T15:18:53.086Z

Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to **/*.rs : Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.

Applied to files:

crates/allocdb-node/src/replicated_simulation_tests.rs
docs/status.md

📚 Learning: 2026-03-12T15:18:53.086Z

Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.086Z
Learning: Applies to docs/status.md : Keep [`docs/status.md`](./docs/status.md) current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.

Applied to files:

docs/status.md

🧬 Code graph analysis (1)

crates/allocdb-node/src/replicated_simulation.rs (2)

crates/allocdb-node/src/simulation.rs (1)

engine (341-343)

crates/allocdb-node/src/replica.rs (1)

engine (719-721)

crates/allocdb-node/src/replicated_simulation.rs

…icated-tests # Conflicts: # docs/status.md

skel84 · 2026-03-13T18:35:47Z

Addressed the latest CodeRabbit finding in bbc3d22.

Changes:

exclude the previous primary when view-change recovery derives a commit target from prepared suffixes
add a regression that forces a view change while only the old primary holds the prepared entry, and assert the new primary does not promote it

Reran:

cargo test -p allocdb-node replicated_simulation -- --nocapture
./scripts/preflight.sh

test(replication): promote replicated simulation scenarios

b97b5d4

Add deterministic partition and primary-crash regression coverage to the real three-replica harness, plus the minimal retry-aware harness support needed to replay ambiguous client outcomes after failover. Closes #54

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

crates/allocdb-node/src/replicated_simulation_tests.rs Outdated Show resolved Hide resolved

crates/allocdb-node/src/replicated_simulation.rs Show resolved Hide resolved

fix(replication): address CodeRabbit review feedback

d8348da

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

crates/allocdb-node/src/replicated_simulation.rs Outdated Show resolved Hide resolved

skel84 added 2 commits March 13, 2026 19:31

Merge remote-tracking branch 'origin/main' into issue-54-promote-repl…

85c6b5d

…icated-tests # Conflicts: # docs/status.md

fix(replication): reject primary-only prepared promotion

bbc3d22

skel84 merged commit 481b868 into main Mar 13, 2026
2 checks passed

coderabbitai bot mentioned this pull request Mar 19, 2026

core: add lease epoch fencing for holder commands #90

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: promote replicated simulation scenarios#65

test: promote replicated simulation scenarios#65
skel84 merged 4 commits intomainfrom
issue-54-promote-replicated-tests

skel84 commented Mar 13, 2026

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

skel84 commented Mar 13, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

skel84 commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

skel84 commented Mar 13, 2026

Summary

Validation

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

skel84 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skel84 commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

skel84 commented Mar 13, 2026 •

edited

Loading