Skip to content

fix: durable clock race + dir fsync, cluster backoff/restart bounds, transport+wasm hardening#149

Merged
joshua-temple merged 5 commits into
mainfrom
fix/runtime-fixes
Jun 5, 2026
Merged

fix: durable clock race + dir fsync, cluster backoff/restart bounds, transport+wasm hardening#149
joshua-temple merged 5 commits into
mainfrom
fix/runtime-fixes

Conversation

@joshua-temple
Copy link
Copy Markdown
Collaborator

What this change does

Remediates the durable / cluster / transport / wasm review findings (correctness + perf + docs; no feature behavior change):

  • durable: the recording-clock buffer drain + index reads are now behind the clock mutex (closes the Tick-goroutine-vs-drain race, with a concurrent test); FileStore.writeAtomic fsyncs the parent directory after rename to honor the crash-durability claim; replayActorData surfaces a decode error instead of swallowing it; the inert WithRetainTail option is removed (superseded by WithHistory).
  • cluster: backoffDelay clamps before the time.Duration conversion (no overflow at maxDelay==0); Tick guards a nil respawner; added Supervisor.Forget(actorID) to bound the restarts map (additive, chosen over eviction so the restart-storm budget is preserved); migration error-path tests added; Capture documents its non-atomic boundary.
  • transport: dropped the redundant global init() codec registration (both sides force the codec); StateAt wraps the wire error; added a README and client error-path tests.
  • wasm (host safety): runtime built with WithCloseOnContextDone(true) so a runaway guest is cancelled (with a loopguest timeout test); len()-guarded the result-slice indexing that could panic the host; added a variadic CompileOption/WithRuntimeConfig (additive); README allocator note corrected.

Skipped with reasoning: durable Dispatched-set caching (risks the exactly-once invariant against out-of-tree Stores) and tagging cluster/integration_test.go (would drop an in-process test from the default run and break the coverage gate).

Checklist

  • Signed off (DCO), conventional commits; feature/ambiguous items skipped with reasons
  • All four modules build/test/vet (-race) + lint clean; e2e green

…actor replay error, drop inert WithRetainTail

Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
…upervisor.Forget, cover migration error paths

Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
…dd README and client error-path tests

Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
…ompile options, honest allocator docs

Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
@joshua-temple joshua-temple merged commit 7b347ba into main Jun 5, 2026
121 checks passed
@joshua-temple joshua-temple deleted the fix/runtime-fixes branch June 5, 2026 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant