cold: unified handle architecture (supersedes #57)#58
Open
cold: unified handle architecture (supersedes #57)#58
Conversation
…body iterator deadline Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a `metrics` module under `crates/cold/src/metrics.rs` with const metric names, help strings, a `LazyLock` describe block, and `pub(crate)` helper functions for recording: - `cold.reads_in_flight`, `cold.writes_in_flight`, `cold.streams_active` (gauges) - `cold.op_duration_us` (histogram, labeled by op) - `cold.permit_wait_us` (histogram, labeled by sem: read/write/drain/stream) - `cold.op_errors_total` (counter, labeled by op and error kind) - `cold.stream_lifetime_ms` (histogram) Wires the helpers into every `ColdStorage<B>` handle method: `spawn_read` and `spawn_write` time permit acquisition, bump in-flight, measure op duration, record errors, and dec in-flight after the backend call. Cache hits in `get_header`/`get_transaction`/`get_receipt` record op duration only (no permit wait, no in-flight). `stream_logs` instruments stream permit wait and records stream lifetime + gauge in the spawned producer. Adds `ColdStorageError::kind()` for the error metric label. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 tasks
Member
Author
|
[Claude Code] Code reviewWent deep on the concurrency model. One real regression worth addressing, plus a resource-lifetime nit. Found 2 issues:
storage/crates/cold/src/handle.rs Lines 223 to 240 in cb06741 storage/crates/cold/src/handle.rs Lines 588 to 614 in cb06741
storage/crates/cold/src/handle.rs Lines 99 to 110 in cb06741 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
The coordinator task previously moved Arc<Inner<B>> into its body and awaited the user's cancel token. If callers dropped all ColdStorage clones without firing cancel, Inner (and the backend's file/DB handles) stayed pinned until process exit. Switch the coordinator to Weak<Inner>, and put a DropGuard on Inner that fires a child cancel token. shutdown now fires on either user-side cancel OR Inner drop; in the drop case upgrade() returns None and the coordinator exits without pinning anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ColdStorageError::backend unconditionally wraps as Backend(Box<_>), which hid MdbxColdError::TooManyLogs behind the generic backend variant and broke the conformance suite's max_logs assertion. The From<MdbxColdError> for ColdStorageError impl already translates TooManyLogs correctly and wraps the rest. Route all spawn_blocking result conversions through ::from so the translation runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Collapse cold storage from channels+dispatcher+writer-task into a single
ColdStorage<B: ColdStorageBackend>handle overArc<Inner<B>>. All concurrency primitives (read_sem(64),write_sem(1),stream_sem(8),TaskTracker,CancellationToken) live onInner; all backend work spawns into the shared tracker; spawned tasks hold permits for the real duration of the backend call.Supersedes #57. Tracks Linear ENG-2198.
Fixes three PR #57 review issues:
StreamLogssetup held the outer read permit across unbounded awaits — streams now acquire onlystream_sem; setup reads are isolated from the read pool by design (a stream requesting "latest" should observe latest at setup).timeoutdid not cancel backend work so permits released early — timeouts are now mandatory in theColdStorageBackendtrait and enforced at the backend layer, so permits honestly reflect in-flight work.Shape:
ColdStorage<B>is clone-cheap (one Arc refcount bump); replacesColdStorageHandle+ColdStorageReadHandle+ColdStorageTask.ColdStorageWritenow takes&selfacross all backends (Mem / MDBX / SQL updated in lockstep).write_semthen the drain barrier (read_sem.acquire_many_owned(64)); destructive writes invalidate the cache inside the spawned body while holding the drain.stream_sem; producer runs in the shared tracker.TaskTerminated.Backend timeouts (mandatory in trait contract):
tokio::task::spawn_blockingfor reads,tokio::task::block_in_placefor writes, in-body deadline check between per-block iterations. Post-commit WARN on overrun (advisory — MDBX commits are uninterruptible).SET LOCAL statement_timeout = <ms>at the start of every Postgres transaction; SQLite skips (no equivalent).with_read_timeout/with_write_timeoutbuilders on backend and connector. Defaults 500ms / 2s.Observability:
crates/cold/src/metrics.rsfollowing the ajj pattern.cold.reads_in_flight,cold.writes_in_flight,cold.streams_active.cold.op_duration_us,cold.permit_wait_us,cold.stream_lifetime_ms.cold.op_errors_total.#[tracing::instrument]on every handle method with stableopfield; spans propagate into spawned tasks via.in_current_span().Error variants removed:
Timeout,Backpressure,Cancelled. Backend timeouts now map toBackend.TaskTerminatedremains and is returned from every handle method when the semaphore is closed.Rollout: 10 commits, one per phase, each building + testing clean in isolation:
Spec: `docs/superpowers/specs/2026-04-20-cold-unified-architecture-design.md` (local — `docs/` is gitignored).
Test plan
🤖 Generated with Claude Code