M7-T01 Add replicated node state and durable protocol metadata#60
M7-T01 Add replicated node state and durable protocol metadata#60
Conversation
Summary by CodeRabbit
WalkthroughAdds a new public Changes
Sequence DiagramssequenceDiagram
participant User as User
participant ReplicaNode as ReplicaNode
participant ReplicaMetadataFile as ReplicaMetadataFile
participant SingleNodeEngine as SingleNodeEngine
participant Filesystem as Filesystem
User->>ReplicaNode: open(core_config, engine_config, identity, paths)
ReplicaNode->>ReplicaMetadataFile: load_metadata()
ReplicaMetadataFile->>Filesystem: read metadata file
alt metadata missing
ReplicaNode->>SingleNodeEngine: open(engine_config)
ReplicaNode->>ReplicaMetadataFile: write_metadata(metadata.bootstrap(...))
ReplicaMetadataFile->>Filesystem: write temp file
ReplicaMetadataFile->>Filesystem: fsync temp file
ReplicaMetadataFile->>Filesystem: rename temp -> metadata
ReplicaMetadataFile->>Filesystem: sync parent dir
else metadata exists
ReplicaNode->>ReplicaNode: validate(metadata vs engine state)
alt validation success
ReplicaNode->>SingleNodeEngine: open(engine_config)
else validation failure
ReplicaNode-->>User: Err(Faulted)
end
end
ReplicaNode-->>User: Result<ReplicaNode>
sequenceDiagram
participant User as User
participant ReplicaNode as ReplicaNode
participant ReplicaMetadataFile as ReplicaMetadataFile
participant SingleNodeEngine as SingleNodeEngine
participant Filesystem as Filesystem
User->>ReplicaNode: recover(core_config, engine_config, identity, paths)
ReplicaNode->>ReplicaMetadataFile: load_metadata()
ReplicaMetadataFile->>Filesystem: read metadata file
alt metadata indicates fault
ReplicaNode-->>User: Err(Faulted)
else metadata present
ReplicaNode->>SingleNodeEngine: recover(engine_config)
ReplicaNode->>ReplicaNode: validate(recovered engine state)
alt validation success
ReplicaNode->>ReplicaMetadataFile: persist_metadata()
else validation failure
ReplicaNode-->>User: Err(Faulted)
end
end
ReplicaNode-->>User: Result<ReplicaNode>
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
|
@coderabbitai summary |
✅ Actions performedSummary regeneration triggered. |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/allocdb-node/src/replica_tests.rs`:
- Around line 105-325: The test suite is missing coverage for several failure
branches in ReplicaMetadata::validate(); add focused unit/integration tests that
assert ReplicaNode::open or ReplicaNode::recover faults for each invariant:
shard mismatch (identity.shard_id mismatch vs identity()),
snapshot-without-commit (active_snapshot_lsn set but commit_lsn None),
snapshot-beyond-commit (active_snapshot_lsn > commit_lsn), last_normal_view
ordering (last_normal_view > current_view or inconsistent), durable_vote
ordering (durable_vote.view < current_view or inconsistent voted_for),
active-snapshot mismatch (active_snapshot_lsn not present in engine snapshot
state), and metadata rewrite/overwrite behavior (writing new metadata that
violates invariants or concurrent overwrite). Use
ReplicaMetadataFile::new(...).write_metadata(...) to seed on-disk metadata, then
call ReplicaNode::open or ::recover and assert ReplicaNodeStatus::Faulted with
the specific ReplicaStartupValidationError variants produced by
ReplicaMetadata::validate() to cover each branch.
In `@crates/allocdb-node/src/replica.rs`:
- Around line 551-553: The method engine_mut() exposes mutable access to
SingleNodeEngine and must not be public because callers can advance allocator
state without updating ReplicaMetadata; make engine_mut non-public (remove the
pub visibility) or restrict it to crate/internal use (e.g., change to pub(crate)
or rename to engine_mut_internal) and keep direct mutable access hidden until
you implement coordinated mutation APIs that update ReplicaMetadata
(commit_lsn/active_snapshot_lsn) alongside engine changes; update any call sites
or tests that relied on the public API to use the new internal name or the
future coordinated methods.
- Around line 358-379: The current write_metadata() (and its
temp_path()/sync_parent_dir() usage) advertises Unix-level durable replace but
on non-Unix builds the directory-sync branch is a no-op and std::fs::rename does
not provide the same replace-safety; to fix this either implement a
platform-specific safe-replace for non-Unix targets or make the stronger
guarantee explicit by gating the implementation to Unix-only: add #[cfg(unix)]
to the write_metadata()/temp_path()/sync_parent_dir() implementation and add a
#[cfg(not(unix))] compile_error or documented stub that clearly states non-Unix
targets are unsupported for durable replace, referencing the write_metadata,
temp_path, and sync_parent_dir symbols so reviewers can locate the
platform-specific change.
- Around line 346-355: The load_metadata function currently uses
file.read_to_end which can OOM on a huge/corrupt sidecar; instead bound the read
before decoding by checking the file size and/or using
File::take(MAX_METADATA_BYTES) (or reading into a fixed-size buffer) and
erroring if the sidecar exceeds a small MAX_METADATA_BYTES constant; update the
logic around File::open + read_to_end/bytes to use file.metadata().len() (or
file.take(...).read_to_end) and return an appropriate ReplicaMetadataFileError
when the file is too large, then pass the safely-sized bytes into
decode_replica_metadata to decode the ReplicaMetadata.
In `@docs/replication.md`:
- Around line 152-155: Update the docs text to state that startup faults not
only on decode/metadata inconsistency but also on unreadable metadata due to I/O
errors (e.g., permission denied) as implemented in
crates/allocdb-node/src/replica.rs; mention that the replica startup validation
in the replica initialization path (see replica.rs startup/validation logic that
treats metadata read errors other than NotFound as faulted) will leave the
replica in a `faulted` state until repaired, and ensure the paragraph explicitly
lists "unreadable or permission-denied sidecars" alongside decode failures and
persisted `faulted` roles.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4a8b9cb2-64e8-4434-9105-b16dfc6acfe9
📒 Files selected for processing (6)
crates/allocdb-node/src/engine_observe.rscrates/allocdb-node/src/lib.rscrates/allocdb-node/src/replica.rscrates/allocdb-node/src/replica_tests.rsdocs/replication.mddocs/status.md
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.
Files:
docs/replication.mddocs/status.md
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level:errorfor invariant breaks, corruption, and failed operations that require intervention;warnfor degraded but expected conditions such as overload, lag, or rejected requests;infofor meaningful lifecycle and state-transition events;debugfor detailed execution traces useful in development;traceonly for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.
Files:
crates/allocdb-node/src/lib.rscrates/allocdb-node/src/engine_observe.rscrates/allocdb-node/src/replica_tests.rscrates/allocdb-node/src/replica.rs
docs/status.md
📄 CodeRabbit inference engine (AGENTS.md)
Keep
docs/status.mdcurrent as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.
Files:
docs/status.md
🧠 Learnings (2)
📚 Learning: 2026-03-12T15:18:53.046Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.046Z
Learning: Applies to docs/status.md : Keep [`docs/status.md`](./docs/status.md) current as the single-file progress snapshot for the repository. Update it whenever milestone state, implementation coverage, or the recommended next step materially changes.
Applied to files:
docs/status.md
📚 Learning: 2026-03-12T15:18:53.046Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.046Z
Learning: Applies to **/*.rs : Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Applied to files:
crates/allocdb-node/src/replica_tests.rs
🧬 Code graph analysis (1)
crates/allocdb-node/src/replica_tests.rs (3)
crates/allocdb-node/src/replica.rs (11)
path(337-339)engine(547-549)metadata_path(557-559)paths(562-564)new(183-193)new(330-334)new(758-760)open(412-461)status(542-544)metadata(537-539)bootstrap(67-81)crates/allocdb-node/src/engine_observe.rs (1)
active_snapshot_lsn(43-45)crates/allocdb-core/src/state_machine_metrics.rs (1)
last_applied_lsn(21-23)
Fixes Applied SuccessfullyApplied the substantive CodeRabbit fixes in commit Files modified:
Applied
Not applied
Validation rerun:
The latest autofix changes are on the |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/allocdb-node/src/replica_tests.rs`:
- Around line 128-605: The tests call ReplicaMetadataFile::write_metadata (which
returns an Unsupported error on non-Unix) and unwrap its result, causing panics
off Unix; fix by gating the whole test module to Unix with #[cfg(unix)] (and
#[cfg(test)] as appropriate) or, if you need cross-platform tests, update each
test that calls write_metadata / ReplicaMetadataFile to handle the Unsupported
error path (assert that write_metadata returns
Err(ReplicaMetadataFileError::Unsupported) on non-unix) and skip the unix-only
assertions when not on unix; locate symbols ReplicaMetadataFile::write_metadata,
replica_* tests (e.g., replica_open_bootstraps_missing_metadata,
replica_metadata_file_round_trips) and apply the cfg or conditional assertions
accordingly.
In `@crates/allocdb-node/src/replica.rs`:
- Around line 154-169: The current validation only errors when
local_last_applied_lsn > commit_lsn, which allows a stale local state when
commit_lsn is ahead; change the check to require the local applied LSN to
exactly match the committed LSN: compare self.commit_lsn (commit_lsn) to
local_last_applied_lsn (last_applied_lsn) and if they differ return a validation
error (create or reuse an appropriate ReplicaStartupValidationError variant,
e.g. CommitLsnMismatch or extend AppliedLsnExceedsCommitLsn semantics) that
includes both self.commit_lsn and local_last_applied_lsn so startup is rejected
when commit_lsn != last_applied_lsn, handling Option cases (None vs Some)
consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c0218093-9dcb-4d43-8fd9-47bee161e9ed
📒 Files selected for processing (3)
crates/allocdb-node/src/replica.rscrates/allocdb-node/src/replica_tests.rsdocs/replication.md
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level:errorfor invariant breaks, corruption, and failed operations that require intervention;warnfor degraded but expected conditions such as overload, lag, or rejected requests;infofor meaningful lifecycle and state-transition events;debugfor detailed execution traces useful in development;traceonly for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.
Files:
crates/allocdb-node/src/replica_tests.rscrates/allocdb-node/src/replica.rs
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.
Files:
docs/replication.md
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.046Z
Learning: Treat CodeRabbit as part of the required review path when it is enabled on the repository. Wait for its status to complete before merge. If it completes without a visible review comment or review thread, request visible output with `coderabbitai summary`. Address every substantive CodeRabbit comment explicitly before merge by either applying the change or documenting why it is not being applied. Apply correctness, safety, recovery, test, and docs-alignment feedback by default; document why you reject suggestions that would weaken determinism, boundedness, or trusted-core discipline.
📚 Learning: 2026-03-12T15:18:53.046Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.046Z
Learning: Applies to **/*.rs : Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Applied to files:
crates/allocdb-node/src/replica_tests.rs
🧬 Code graph analysis (2)
crates/allocdb-node/src/replica_tests.rs (3)
crates/allocdb-node/src/replica.rs (13)
path(340-342)engine(568-570)metadata_path(573-575)paths(578-580)status(563-565)new(184-194)new(333-337)new(781-783)open(433-482)metadata(558-560)bootstrap(68-82)load_metadata(350-376)recover(493-555)crates/allocdb-node/src/engine_observe.rs (1)
active_snapshot_lsn(43-45)crates/allocdb-core/src/state_machine_metrics.rs (1)
last_applied_lsn(21-23)
crates/allocdb-node/src/replica.rs (3)
crates/allocdb-node/src/replica_tests.rs (4)
identity(58-63)metadata_path(25-27)snapshot_path(29-31)wal_path(33-35)crates/allocdb-node/src/engine_observe.rs (1)
active_snapshot_lsn(43-45)crates/allocdb-core/src/state_machine_metrics.rs (1)
last_applied_lsn(21-23)
🔇 Additional comments (1)
docs/replication.md (1)
140-155: Good docs/code alignment on startup faults.This now mirrors the sidecar fields and fail-closed startup cases implemented in
crates/allocdb-node/src/replica.rs, including unreadable sidecars and persistedfaultedstate.
Fixes Applied SuccessfullyApplied the remaining substantive CodeRabbit fixes in commit Files modified:
Applied
Validation rerun:
The latest autofix changes are on the |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/allocdb-node/src/non_unix_tests.rs (1)
42-43: Minor: Temp file cleanup path construction may be incorrect.The
with_extension("replica.tmp")call on line 43 replaces the entire extension, producing a path likefoo.replica.tmp. However, thetemp_pathmethod inReplicaMetadataFile(lines 426-434 in replica.rs) formats the extension as{ext}.tmp, which for a.replicaextension would produce.replica.tmp. In this case the behavior happens to match, but the cleanup logic relies on implementation details.Since write actually fails before creating the temp file on non-Unix (the error occurs in
replace_file_durably), the temp file may not exist anyway—but the parent directory and partial temp file fromFile::createat line 417 could exist.💡 Suggestion: Consider using the same temp_path logic
- let _ = fs::remove_file(file.path().with_extension("replica.tmp")); + // Temp file uses the pattern: {original}.tmp + let mut temp_path = path.clone(); + temp_path.set_extension("replica.tmp"); + let _ = fs::remove_file(&temp_path);Or expose a method to get the temp path for testing purposes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/allocdb-node/src/non_unix_tests.rs` around lines 42 - 43, The cleanup currently uses file.path().with_extension("replica.tmp") which relies on implementation detail; instead construct the temp path using the same logic as ReplicaMetadataFile::temp_path (i.e., take the original file path, read its extension and set the extension to "{ext}.tmp") or call/expose ReplicaMetadataFile::temp_path from the test, then remove that exact temp path (and ensure the parent directory is removed if needed) so the test mirrors the real temp-file naming used by replace_file_durably/File::create.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/allocdb-node/src/non_unix_tests.rs`:
- Around line 42-43: The cleanup currently uses
file.path().with_extension("replica.tmp") which relies on implementation detail;
instead construct the temp path using the same logic as
ReplicaMetadataFile::temp_path (i.e., take the original file path, read its
extension and set the extension to "{ext}.tmp") or call/expose
ReplicaMetadataFile::temp_path from the test, then remove that exact temp path
(and ensure the parent directory is removed if needed) so the test mirrors the
real temp-file naming used by replace_file_durably/File::create.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ac36fa1b-5120-4af5-bf2d-ae155fdfc56a
📒 Files selected for processing (4)
crates/allocdb-node/src/non_unix_tests.rscrates/allocdb-node/src/replica.rscrates/allocdb-node/src/replica_tests.rsdocs/replication.md
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Add extensive logging where it materially improves debuggability or operational clarity. Use the right log level:errorfor invariant breaks, corruption, and failed operations that require intervention;warnfor degraded but expected conditions such as overload, lag, or rejected requests;infofor meaningful lifecycle and state-transition events;debugfor detailed execution traces useful in development;traceonly for very high-volume diagnostic detail.
Logging must be structured and purposeful. Do not add noisy logs that obscure signal or hide bugs.
Files:
crates/allocdb-node/src/non_unix_tests.rscrates/allocdb-node/src/replica_tests.rscrates/allocdb-node/src/replica.rs
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.
Files:
docs/replication.md
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.046Z
Learning: Treat CodeRabbit as part of the required review path when it is enabled on the repository. Wait for its status to complete before merge. If it completes without a visible review comment or review thread, request visible output with `coderabbitai summary`. Address every substantive CodeRabbit comment explicitly before merge by either applying the change or documenting why it is not being applied. Apply correctness, safety, recovery, test, and docs-alignment feedback by default; document why you reject suggestions that would weaken determinism, boundedness, or trusted-core discipline.
📚 Learning: 2026-03-12T15:18:53.046Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.046Z
Learning: Applies to **/*.md : Keep documentation up to date with the code and design. If a change affects behavior, invariants, failure modes, operational semantics, testing strategy, or implementation sequencing, update the relevant docs in the same task or PR.
Applied to files:
docs/replication.md
📚 Learning: 2026-03-12T15:18:53.046Z
Learnt from: CR
Repo: skel84/allocdb PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-12T15:18:53.046Z
Learning: Applies to **/*.rs : Write extensive tests for every meaningful behavior change. Favor invariant tests, negative-path tests, recovery tests, and regression tests over shallow happy-path coverage.
Applied to files:
crates/allocdb-node/src/replica_tests.rs
🧬 Code graph analysis (1)
crates/allocdb-node/src/replica_tests.rs (1)
crates/allocdb-node/src/replica.rs (13)
path(363-365)engine(591-593)metadata_path(596-598)paths(601-603)status(586-588)new(203-213)new(356-360)new(804-806)open(456-505)metadata(581-583)bootstrap(72-86)load_metadata(373-399)recover(516-578)
🔇 Additional comments (23)
docs/replication.md (1)
140-156: LGTM!The M7-T01 documentation accurately describes the persisted metadata fields and startup validation behavior. The fault conditions (decode failure, metadata inconsistency, unreadable/permission-denied sidecars, applied state mismatch, persisted faulted role) correctly reflect the implementation in
replica.rs. The documentation is well-aligned with the code.crates/allocdb-node/src/replica.rs (11)
1-22: LGTM!Imports are minimal and appropriate. The metadata size bound (256 bytes) is reasonable for the fixed-size format, and the magic/version scheme supports forward compatibility. Test module gating (
#[cfg(all(test, unix))]and#[cfg(all(test, not(unix)))]) correctly separates platform-specific test behavior.
23-68: LGTM!Type definitions are clean and well-structured. Derives are appropriate (
Clone, Copy, Debug, Eq, PartialEq), and the public fields onReplicaMetadataallow straightforward construction and inspection. The types align with the documented persisted fields indocs/replication.md.
70-192: LGTM!The
bootstrapconstructor correctly initializes a replica inRecoveringstate with view 0 and no vote. Thevalidatemethod is comprehensive—it checks identity match, snapshot-versus-commit consistency, view/vote ordering invariants, active snapshot anchor match, and the full spectrum of applied-LSN-versus-commit-LSN states (none/none, behind, equal, ahead). The match arms at lines 165-188 correctly handle all combinations.
194-347: LGTM!Error types are well-designed with appropriate granularity.
ReplicaMetadataLoadErrorwisely storesstd::io::ErrorKindrather thanstd::io::Errorto supportCopy. TheFromimplementations enable ergonomic error propagation. Validation error variants capture all relevant context fields for debugging.
349-435: LGTM!The bounded read in
load_metadatacorrectly guards against oversized files with both the pre-read metadata length check and post-read length verification. Thewrite_metadataflow (create parent → temp write →sync_data→ durable rename) follows best practices for atomic durable writes. The temp-path naming scheme is deterministic, which is acceptable since aReplicaMetadataFileis typically owned by a singleReplicaNodeinstance.
437-578: LGTM!The
ReplicaNode::openandReplicaNode::recoverconstructors correctly implement the documented startup paths. The three-way handling viaLoadedMetadata(Missing → bootstrap, Faulted → skip engine, Ready → validate) is clean. Logging levels are appropriate:infofor successful bootstrap,warnfor pre-engine faults,errorfor post-validation faults. Both paths correctly persist bootstrapped metadata before returning.
580-643: LGTM!Accessors are appropriately read-only. The removal of
engine_mut(per past review feedback) prevents callers from advancing allocator state without updating replica metadata. Thepersist_metadatamethod enables explicit metadata sync for future protocol-level operations.
645-710: LGTM!The
load_or_fault_existing_metadatahelper correctly handles all load outcomes: missing file, persisted faulted role, successful load, and load errors. On load failure, it appropriately constructs synthetic faulted metadata using the expected identity (not trusting corrupted file contents). The error logging includes sufficient context for debugging.
712-796: LGTM!The encoding functions are straightforward and consistent. All integers use little-endian encoding. Optional fields use a tag byte (0 for None, 1 for Some) followed by the payload. The maximum encoded size (~74 bytes) is well under the 256-byte bound.
727-859: LGTM!The
MetadataCursorabstraction provides clean bounds-checked reading. Decoding validates magic, version, and option tags, and thefinish()call ensures no trailing bytes. Error variants (BufferTooShort,InvalidMagic,UnsupportedVersion,InvalidRole,InvalidOptionTag,InvalidLayout) cover all malformed input cases.
861-882: LGTM!The Unix durability path correctly performs
renamefollowed by parent directorysync_allto ensure the rename is durable. The non-Unix stub returnsUnsupportedas documented. Thesync_parent_dirgracefully handles edge cases whereparent()returnsNone.crates/allocdb-node/src/non_unix_tests.rs (1)
19-41: LGTM!The test correctly verifies that
write_metadatareturnsReplicaMetadataFileError::IowithErrorKind::Unsupportedon non-Unix platforms, matching the documented behavior and thereplace_file_durablyimplementation.crates/allocdb-node/src/replica_tests.rs (10)
1-127: LGTM!Test helpers are well-designed and reduce boilerplate. The
test_pathfunction ensures unique paths via nanosecond timestamps. Config values are minimal but sufficient for the test scenarios. Theassert_faultedhelper cleanly verifies both status and engine absence.
128-151: LGTM!The round-trip test exercises all metadata fields including optional ones with
Somevalues, verifying the complete encode/decode path.
153-178: LGTM!Tests the missing-metadata bootstrap path: fresh engine, bootstrapped metadata (view 0, Recovering role, no LSNs), and metadata persistence. Verification that the metadata file was created confirms durability.
180-204: LGTM!Tests recover-path bootstrap: prepares local state via engine + checkpoint, then verifies that
ReplicaNode::recoverbootstraps metadata from the recovered engine state (commit_lsn=2, active_snapshot_lsn=2).
206-298: LGTM!Fault tests cover corrupt metadata (decode failure), replica ID mismatch, and shard ID mismatch. Each correctly constructs the fault condition and verifies the expected
ReplicaFaultReasonvariant.
300-352: LGTM!Tests cover the snapshot-versus-commit invariants: snapshot without commit and snapshot exceeding commit. Both correctly trigger the expected validation errors.
354-447: LGTM!Tests verify view ordering invariants:
last_normal_view ≤ current_view,vote.view ≥ current_view, andvote.view ≥ last_normal_view. Each constructs the violation and asserts the correct error variant.
449-475: LGTM!Tests the active snapshot anchor validation: metadata claims a snapshot at LSN 3, but the fresh engine has no snapshot. The test correctly expects
ActiveSnapshotMismatch.
477-580: LGTM!Tests comprehensively cover the applied-versus-commit LSN validation: exceeds (lines 478-519), behind on fresh open (lines 521-546), and behind on recover (lines 548-580). Both directions and both open/recover paths are exercised.
582-666: LGTM!Tests cover: oversized sidecar rejection (verifies the 256-byte bound), overwrite semantics (ensures writes replace rather than append), and persisted faulted role (verifies that a
Faultedrole in metadata triggersPersistedFaultStatewithout further validation). Good negative-path coverage.
Summary
allocdb-node::replicawrapper with durable replica metadata, explicit faulted startup state, and bootstrap paths for fresh-open and recover flowsLinked Issue
Changes
ReplicaNode,ReplicaMetadata,ReplicaMetadataFile, and startup fault-reason types incrates/allocdb-nodeSingleNodeEngine::active_snapshot_lsn()for wrapper validation and export the new replica surface from the crate rootdocs/replication.mdanddocs/status.mdfor the persisted metadata file and faulted-startup behaviorValidation
scripts/preflight.shcargo test -p allocdb-nodecargo clippy -p allocdb-node --all-targets --all-features -- -D warningsDocs
CodeRabbit Triage
@coderabbitai summaryif no visible review comment or thread appeared