Skip to content

fix(api): make actor destroy wait for ack#4327

Open
MasterPtato wants to merge 1 commit into02-27-chore_flatten_runtime_configfrom
02-27-fix_api_make_actor_destroy_wait_for_ack
Open

fix(api): make actor destroy wait for ack#4327
MasterPtato wants to merge 1 commit into02-27-chore_flatten_runtime_configfrom
02-27-fix_api_make_actor_destroy_wait_for_ack

Conversation

@MasterPtato
Copy link
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link

railway-app bot commented Feb 27, 2026

🚅 Deployed to the rivet-pr-4327 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-cloud 😴 Sleeping (View Logs) Web Feb 28, 2026 at 8:35 pm
website 😴 Sleeping (View Logs) Web Feb 28, 2026 at 8:24 pm
frontend-inspector 😴 Sleeping (View Logs) Web Feb 27, 2026 at 11:58 pm
mcp-hub ✅ Success (View Logs) Web Feb 27, 2026 at 11:49 pm
ladle ❌ Build Failed (View Logs) Web Feb 27, 2026 at 11:49 pm

Copy link
Contributor Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Feb 27, 2026

Review: fix(api): make actor destroy wait for ack

The approach here is correct and the subscribe-before-fetch ordering is the right pattern to prevent race conditions. A few things worth discussing:

No timeout on destroy_sub.next().await?

The most significant concern is that destroy_sub.next().await? will block indefinitely if the destroy workflow hangs or stalls. Looking at analogous patterns in the codebase (e.g., pegboard_gateway.rs:283), timeouts are used for similar wait scenarios:

_ = tokio::time::sleep(ACTOR_READY_TIMEOUT) => {
    return Err(errors::ActorReadyTimeout { actor_id }.build());
}

The destroy workflow itself (destroy.rs) runs activities that retry on failure, so DestroyComplete should eventually arrive. But if something goes wrong in the workflow engine or the activity retries back-off indefinitely, the HTTP request will hang with no escape hatch. Consider wrapping with a timeout:

tokio::time::timeout(ACTOR_DESTROY_TIMEOUT, destroy_sub.next())
    .await
    .map_err(|_| errors::ActorDestroyTimeout { actor_id }.build())??;

Behavior change for already-destroyed actors

Previously, calling DELETE on an already-destroyed actor (where destroy_ts.is_some()) would:

  1. Send the Destroy signal (handled gracefully via graceful_not_found())
  2. Return 200 OK

Now it returns NotFound. This is arguably more correct semantically, but it's a breaking API behavior change. If clients are written to be idempotent and call DELETE regardless, they'll now get errors on retry. Worth confirming this is intentional and that the API contract change is acceptable.

Subscription resource usage on invalid requests

The subscription is created unconditionally before validation. If the actor doesn't exist or the namespace doesn't match, the subscription is created, validated against, and then dropped without use. This is fine from a correctness standpoint (Rust's ownership model handles cleanup), but it means every invalid delete request creates a NATS subscription. This is a minor concern and acceptable given the simplicity it buys.

Positive aspects

  • Correct race condition prevention: Subscribing before fetching state ensures no window where the DestroyComplete message could be missed between the state check and the signal send.
  • Parallel fetch: tokio::try_join\! for the actor and namespace ops is a nice improvement over the sequential fetches in the original code.
  • The workflow-not-found path is correctly handled: If the workflow is already gone (res.is_none()), we don't wait on the subscription — avoiding a hang in the common "already stopped" case.

Summary

The core logic is sound. The main ask is to add a timeout to the destroy_sub.next().await? call, and to confirm the behavior change for already-destroyed actors is intentional.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 27, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4327

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4327

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4327

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4327

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4327

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4327

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4327

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4327

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4327

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4327

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4327

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4327

commit: d8e20e3

@MasterPtato
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant