You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR Review: fix: add global envoy to envoy client, remove crash policy from actors v2
Summary
This PR does two things: (1) introduces a global singleton envoy via OnceLock, and (2) removes per-actor crash_policy from actors v2, hardcoding Sleep behavior. The code is generally clean but there are a few issues worth addressing.
Issues
started() regression — potential hang after reconnect
File:engine/sdks/rust/envoy-client/src/handle.rs
The simplification from the poll-loop to a single changed().await changes semantics in a way that can hang:
clone() on a watch::Receiver marks the current value as already seen. If started() is called after the envoy has already connected (e.g. on a handle obtained from GLOBAL_ENVOY after startup), changed() will wait for the nextsend() — which only happens on reconnect — and block indefinitely. The old loop checked the boolean value first:
while !*rx.borrow_and_update(){ ...}
With the () type this cannot be done the same way, but the intent should be preserved. One option: use has_changed() + changed() on the original receiver (not a clone), or restructure the channel to be a oneshot.
If start_envoy_sync is called a second time with a different config (e.g. different endpoint, pool name), the new config is silently discarded and the original global handle is returned. This is a footgun. At minimum, add a debug-level log warning that the config was ignored, or document this contract clearly (a debug_assert! comparing key fields could help catch misconfiguration in tests).
Signal handler spawned per non-global envoy
File:engine/sdks/rust/envoy-client/src/envoy.rs
let handle2 = handle.clone();
tokio::spawn(asyncmove{let _ = tokio::signal::ctrl_c().await;
handle2.shutdown(false);});
This is spawned inside start_envoy_sync_inner, which is called for every not_global: true envoy. If a test or process creates multiple non-global envoys, it accumulates N Ctrl+C listeners. Each will call shutdown on its respective handle, which is probably benign, but the extra tasks are noise. Consider only installing the signal handler once (e.g. guard with its own OnceLock<()> or scope it to the global path only).
Additionally, libraries generally should not intercept process signals. The signal handler here would interfere with any application-level Ctrl+C handling (e.g. graceful HTTP server shutdown). Consider exposing a shutdown_on_signal helper that the caller can opt into instead.
not_global is a negative boolean flag
File:engine/sdks/rust/envoy-client/src/config.rs
not_global: bool is harder to read than a positive name. not_global: false (the common case) reads as a double negative. Consider renaming to something like isolated: bool or per_instance: bool where true means "don't share the global instance".
is_shutdown is ineffective in the serverless SSE path
File:engine/sdks/rust/test-envoy/src/server.rs
In start_serverless, create_envoy is called with not_global: false, so it returns the global envoy handle (with the original callbacks from startup). The new DefaultTestCallbacks and its is_shutdown arc are created fresh but never wired to the global envoy's actual shutdown. The SSE loop polling is_shutdown.load() will never see true via this mechanism. This is a test server so it's not critical, but the dead code is misleading.
Minor Notes
SleepIntent { rewake_after_stop: false } behavior change: In the old code both true and false mapped to Decision::Reallocate. The new code maps false to Decision::Sleep. This looks intentional (the field name implies it), but it's a behavioral change worth calling out explicitly in the PR description.
CrashPolicy import in util.rs: CrashPolicy is now imported only to hardcode CrashPolicy::Sleep. If this is a permanent change, consider whether the type is still meaningful in the public API or if it should be phased out.
rivet-util-serde split: Clean factoring — no concerns.
events.rs refactor: The two-pass cleanup loop elimination is an improvement. Looks correct.
Overall
The changes are well-scoped. The global envoy pattern is a reasonable optimization, but the started() hang regression and signal handler scope are the two items I'd want resolved before merge.
MasterPtato
deleted the
04-08-fix_add_global_envoy_to_envoy_client_remove_crash_policy_from_actors_v2
branch
April 8, 2026 23:25
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: