You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite. Learn more
This PR adds runtime diagnostics across the stack: active actor count, SQLite VFS metrics, SSE stopping event when an actor shuts down, a malloc_trim call after actor shutdown on glibc targets, and debug HTTP endpoints (/debug/memory, /debug/heap-snapshot) in the kitchen-sink example. It also refactors onSleep/onDestroy native callbacks to always run cleanup regardless of whether the user supplied them.
Issues
1. Hardcoded zeros in JsSqliteVfsMetrics — misleading diagnostic data
These fields are in the public JsSqliteVfsMetrics struct and returned from metrics(), but they always return zero. Consumers will silently get wrong data. Either populate them from the underlying struct or remove them from the interface until the data is available.
2. Race condition in wait_actor_registered_then_stopped
engine/sdks/rust/envoy-client/src/handle.rs:
If the actor registers and deregisters before wait_actor_registered_then_stopped is called, the function will wait forever:
notified is created — no pending wakeup because both events already fired.
registered is false, so registered && !actor_is_registered is false.
The function enters select! and waits indefinitely (unless the envoy stops).
In practice the caller is an SSE handler spawned right after decoding the start payload, but the actor could theoretically be very short-lived. A simpler fix: if registered is still false when the state shows the actor is not present, treat that as "never registered" and exit early (or add a timeout arm).
path is user-controlled and passed directly to v8.writeHeapSnapshot, allowing a caller to write a heap dump to any location writable by the process (e.g. overwriting config files, secrets, or binary artifacts). The env-flag guard is a soft gate. At minimum, restrict the output path to a fixed temp directory (/tmp/, or better, a pre-configured dir), and reject paths containing .. or absolute paths outside that directory.
4. Test uses a retry poll loop — violates CLAUDE.md testing policy
for _ in0..10{if ctx.internal_keep_awake_count() == 0{break;}yield_now().await;}assert_eq!(ctx.internal_keep_awake_count(),0);
CLAUDE.md: "Never paper over flakes with retry loops or bumped waits." The keep-awake guard drop is async, but the right approach is to add a Notify (or watch channel) that fires when the count drops to zero, then wait on that directly instead of polling with yield_now.
This function is defined but never called in this PR (or apparently anywhere in the diff). It should be removed or used.
6. Double-decode of serverless start payload
handle.rs / serverless.rs: The caller decodes the payload via handle.decode_serverless_actor_start(&payload) to get the actor info, then later start_serverless_actor(&payload) re-decodes the same bytes. Consider returning the parsed (message, actor_start) from a single decode call and threading the message through instead of decoding twice.
7. HTTP server timeouts set to 0 — needs a comment
Setting all timeouts to 0 (infinite) is intentional here (long-running SSE streams), but there's no comment explaining why. A reader will assume these are bugs. Add a one-line comment per CLAUDE.md conventions.
8. diagnostics() can hang if the registry is still initialising
for(constcandidateofcandidates){const{ runtime, registry }=awaitcandidate;// may block indefinitely
...
}
If buildConfiguredRegistry is in progress (e.g. compiling WASM), this will block the caller until it resolves. For a diagnostics endpoint this is surprising. Either add a Promise.race with a short timeout or check if the promise is settled before awaiting.
9. malloc_trim(0) called after every actor shutdown — potential cost concern
malloc_trim is a glibc system call that walks the heap — its cost scales with heap size and fragmentation. Calling it after every single actor shutdown may add measurable tail latency in high-throughput scenarios (many short-lived actors). Consider rate-limiting (e.g. only trim when idle actor count drops to 0) or only calling it from a periodic background task.
10. Split impl EnvoyHandle blocks — style
engine/sdks/rust/envoy-client/src/handle.rs: The free function decode_serverless_actor_start_payload is sandwiched between two separate impl EnvoyHandle blocks. This is valid Rust but unusual. Move the free function before or after all impl blocks so the impl block is contiguous.
11. normalizeNativeMetrics snake_case fallback — dead code
NAPI always returns camelCase keys. The snake_case fallback will never be hit and creates confusion about whether the format is stable. Remove the snake fallback parameter.
Minor / Nits
SSE_PING_FRAME / SSE_STOPPING_FRAME constants — good cleanup, no issues.
onSleep/onDestroy always registered — the semantic change (always run cleanup) is safer and intentional. The comment inside the wasm branch is helpful.
actors_notify placement and pairing — the notify_waiters() calls at actor add and remove sites look correct.
active_actor_count filtering closed handles — reasonable; worth a comment explaining what "closed" means in this context.
Process lifecycle logging in server.ts guarded by SQLITE_MEMORY_SOAK_DIAGNOSTICS — appropriate scoping.
Summary
The main concerns before merging are the hardcoded zeros in the metrics struct (silent misinformation), the path traversal in the heap snapshot endpoint (security), and the race in wait_actor_registered_then_stopped (correctness for very short-lived actors). The retry loop in the test and the dead normalizeExecuteRoute function are small but violate project conventions. Everything else is style or advisory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: