Skip to content

chore(rivetkit): impl follow up review#4711

Draft
NathanFlurry wants to merge 42 commits into04-22-chore_rivetkit_core_napi_typescript_follow_up_reviewfrom
04-23-chore_rivetkit_impl_follow_up_review
Draft

chore(rivetkit): impl follow up review#4711
NathanFlurry wants to merge 42 commits into04-22-chore_rivetkit_core_napi_typescript_follow_up_reviewfrom
04-23-chore_rivetkit_impl_follow_up_review

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app Bot commented Apr 23, 2026

🚅 Deployed to the rivet-pr-4711 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-inspector ❌ Build Failed (View Logs) Web Apr 24, 2026 at 1:26 am
frontend-cloud ❌ Build Failed (View Logs) Web Apr 24, 2026 at 1:25 am
kitchen-sink ❌ Build Failed (View Logs) Web Apr 24, 2026 at 1:24 am
website 😴 Sleeping (View Logs) Web Apr 23, 2026 at 11:42 pm
mcp-hub ✅ Success (View Logs) Web Apr 23, 2026 at 11:27 pm
ladle ✅ Success (View Logs) Web Apr 23, 2026 at 11:26 pm

Copy link
Copy Markdown
Member Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

PR #4711 Review: chore(rivetkit): impl follow up review

Status: DRAFT | +5480 / -3130 across 88 files


Overview

This PR consolidates fixes for the driver-test DT-008 through DT-057 cascade. Main behavioral changes:

  1. Queue waiter registration race fix — register completion waiters before persisting the KV message, so a fast consumer cannot complete before the waiter exists
  2. WebSocket setup timeout — bound actor-connect setup at the registry boundary via tokio::time::timeout
  3. actionId null encoding — always emit actionId (as null for connection-level errors) in CBOR/JSON to match the protocol contract; fixes actionId: 0 being mistaken for a valid action
  4. Cancellation token migration — replaced BigInt-keyed SccHashMap polling registry with event-driven CancellationToken objects end-to-end, removing the setInterval(..., 50) poll
  5. Activity signal draining — process pending acknowledge_activity_dirty signals during shutdown grace periods to avoid premature sleep/stop transitions
  6. Sleep state — expose sleep_ready() getter and add idle_notify to wait_for_sleep_idle_window so it no longer busy-waits

Issues and Observations

Substantive

Queue cleanup on enqueue failure (queue.rs)

The waiter-before-persist fix is correct. One edge case: the cleanup path awaits remove_completion_waiter after a persist failure. If the actor is under shutdown pressure at that moment, the awaited removal could race with teardown. Persist failure is rare, but a comment noting this dependency would help.

Structured logging fields still use inline format strings (napi/queue.rs)

The new has_cancel_token={} logging still formats parameters inline in the message string. Per CLAUDE.md, structured fields belong outside the message: tracing::info!(actor_id=..., has_cancel_token=..., "msg"). This is a pre-existing pattern, but since the line is being touched it is an opportunity to fix it.

New .expect() in queue.rs

.expect("panic path should observe the dispatch token");

CLAUDE.md's fail-by-default guidance treats required lifecycle operations as paths that must return explicit structured errors instead of panicking. If this invariant can be violated outside of tests, convert to .ok_or_else(...) returning a RivetError.

wait_for_names_available passes None for cancel signal (napi/queue.rs)

// wait_for_names uses signal:
.wait_for_names(names, queue_wait_opts(options, signal)?)
// wait_for_names_available ignores signal:
.wait_for_names_available(names, queue_wait_opts(options, None)?)

If omitting the signal is intentional (availability checks have no cancel semantics), a one-line comment would make the asymmetry clear. If it is an oversight, wire the signal through.

anyhow! macro in shutdown path (task.rs)

Err(_) => Err(anyhow!("shutdown panicked during {reason:?}")),

CLAUDE.md prefers .context() over the anyhow! macro. Minor, but this line is in a touched area.

Removed debug_assert!(waiter.is_finished()) guards (queue.rs)

Two early-check guards were removed while the .await + assert! remains. The removed debug_assert! verified the waiter was already finished before the await, catching ordering bugs during test development. If removal is intentional, a short comment explaining why would help.


Minor / Style

  • cbor_write_null helper — A comment like // CBOR null (RFC 8949 section 3.3) would explain the magic byte 0xf6 to future readers.
  • on_connect_timeout config field — Make sure this is surfaced in public ActorConfig documentation/defaults so users can tune it for slower cold-start environments.
  • wait_for_sleep_idle_window import cleanup — Removing timeout_at is correct. Verify the test-only #[cfg(test)] path that previously imported it still compiles after the change.

What is Working Well

  • The cancellation token migration from BigInt polling to event-driven CancellationToken is a solid improvement and aligns with the Tokio-shaped API convention in CLAUDE.md.
  • The actionId: null fix correctly separates connection-level errors (null) from action-0 errors (0); this is an important protocol correctness fix.
  • The WebSocket setup timeout is placed at the right boundary (registry, not inside the actor task) so it fires even when connection_open never replies.
  • Test progress tracking in .agent/notes/ is thorough and the CHANGELOG clearly documents the breaking error-class removal.
  • All DT-008 through DT-057 driver tests now pass.

Summary

The core fixes are sound. Three items worth addressing before merge: (1) the wait_for_names_available cancel-signal omission — intentional or bug?, (2) the new .expect() on a lifecycle path that should return a structured error, and (3) inline logging format strings on touched lines. The queue waiter race fix and the actionId null-encoding fix are correct and well-tested.

Reviewed by claude[bot]

@github-actions
Copy link
Copy Markdown
Contributor

Preview packages published to npm

Install with:

npm install rivetkit@pr-4711

All packages published as 0.0.0-pr.4711.7747b75 with tag pr-4711.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-7747b75
docker pull rivetdev/engine:full-7747b75
Individual packages
npm install rivetkit@pr-4711
npm install @rivetkit/react@pr-4711
npm install @rivetkit/rivetkit-napi@pr-4711
npm install @rivetkit/workflow-engine@pr-4711

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant