Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@vercel
Copy link

vercel bot commented Nov 6, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Ready Ready Preview Comment Nov 6, 2025 5:37am
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 6, 2025 5:37am
rivet-inspector Ignored Ignored Preview Nov 6, 2025 5:37am
rivet-site Ignored Ignored Preview Nov 6, 2025 5:37am

Copy link
Member Author

NathanFlurry commented Nov 6, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Nov 6, 2025

More templates

@rivetkit/actor

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/actor@3384

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3384

@rivetkit/core

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/core@3384

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3384

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3384

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3384

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3384

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3384

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3384

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3384

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3384

commit: f5907de

@claude
Copy link

claude bot commented Nov 6, 2025

Code Review

Summary

This PR fixes a race condition where actors could attempt to sleep after a stop operation has already started. The fix adds proper guards to prevent sleep operations when #stopCalled is true.

Positive Observations

Correct Guard Placement: The checks for #stopCalled are placed at appropriate entry points:

  • sleepActor() in runner (engine/sdks/typescript/runner/src/mod.ts:175)
  • #resetSleepTimer() in ActorInstance (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1924)
  • _startSleep() in ActorInstance (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1985)

Defensive Programming: Multiple layers of protection ensure the race condition is caught at different levels

Improved Logging: Log messages are now more descriptive (e.g., "cannot call _startSleep twice, actor already sleeping" vs "already sleeping actor")

Issues & Concerns

1. Debug Code Left in Production ⚠️

Location: engine/sdks/typescript/runner/src/mod.ts:870

console.trace("send actor intent", actorId, intentType);

This appears to be debug code that should be removed before merging. It:

  • Uses console.trace instead of the structured logger
  • Will produce noisy stack traces in production
  • Appears to be temporary debugging for this specific issue

Recommendation: Remove this line or convert to a proper debug log if needed.

2. Potential Race Condition with Sleep Timeout 🤔

Location: rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:2084

The PR removes the clearing of #sleepTimeout from _onStop():

-if (this.#sleepTimeout) clearTimeout(this.#sleepTimeout);

However, the timeout is still cleared earlier at line 2038. Question: Is there a scenario where a sleep timeout could be scheduled between line 2038 and line 2084?

Looking at the code flow:

  1. _onStop() sets #stopCalled = true (line 2034)
  2. Sleep timeout is cleared (line 2038-2040)
  3. #resetSleepTimer() now checks #stopCalled and returns early (line 1924)

This appears safe, but I'd recommend verifying that no code path between these lines could schedule a new sleep timeout. The removal seems intentional but lacks explanation.

3. Missing Test Coverage

Concern: No tests were added to verify this race condition fix.

Recommendations:

  • Add a test that simulates the race condition: stop an actor while a sleep timer is pending
  • Test that sleepActor() is a no-op when called after shutdown
  • Test that #resetSleepTimer() doesn't schedule new timers when #stopCalled is true

Given this is a race condition fix, having regression tests would be valuable.

Minor Suggestions

4. Consistency in Log Messages

Per CLAUDE.md guidelines, log messages should be lowercase. Most are correct, but verify consistency:

  • ✅ "runner is shut down, cannot sleep actor" (lowercase)
  • ✅ "cannot call _startSleep if actor already stopping" (lowercase)
  • Consider: "Runner is shut down" → "runner is shut down" at line 872 for consistency

5. Additional Context in Shutdown Log

Location: rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:605

Good addition of the immediate parameter to the log! This will help with debugging shutdown scenarios.

Performance Considerations

No performance concerns. The added checks are simple boolean comparisons with negligible overhead.

Security Considerations

No security issues identified. This is a defensive fix that improves reliability.

Action Items

Must Fix:

  • Remove console.trace() debug statement at engine/sdks/typescript/runner/src/mod.ts:870

Should Consider:

  • Add test coverage for the race condition scenario
  • Verify the removal of #sleepTimeout clearing from _onStop() is intentional and document why
  • Consider adding a code comment explaining why #sleepTimeout doesn't need to be cleared in _onStop() (since #stopCalled prevents new timers)

Overall Assessment

The core fix is sound and addresses a real race condition. The main blocker is the debug code that needs removal. Otherwise, this is a good defensive improvement to actor lifecycle management.


Review generated using repository conventions from CLAUDE.md

@claude
Copy link

claude bot commented Nov 6, 2025

Code Review

I've reviewed PR #3384 which fixes actors trying to sleep after stop has started. Overall, this is a solid bug fix that addresses a race condition in the actor lifecycle. Here's my detailed feedback:


✅ Strengths

  1. Correct Problem Identification: The fix properly addresses the race condition where a sleep timeout could trigger _startSleep() after _onStop() has already been called.

  2. Defense in Depth: The fix adds guards at multiple levels:

    • In #resetSleepTimer() - prevents scheduling new sleep timers (line 1924 in instance.ts)
    • In _startSleep() - prevents executing sleep logic (line 1985 in instance.ts)
    • In Runner.sleepActor() - prevents sending sleep intents when runner is shut down (line 175 in mod.ts)
  3. Follows Existing Patterns: The #stopCalled check is consistent with how it's already used in other parts of the code (e.g., saveState() at line 1308).

  4. Improved Logging: Better log messages help with debugging ("cannot call _startSleep if actor already stopping" is clearer than "already sleeping actor").


⚠️ Issues & Concerns

1. Critical: Removed #sleepTimeout Cleanup in _onStop()

Location: rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:2083-2084

- if (this.#sleepTimeout) clearTimeout(this.#sleepTimeout);

Problem: This removal creates a potential resource leak. While the new guard in _startSleep() prevents the callback from executing, the timeout object itself is never cleared if _onStop() is called while a sleep timeout is pending.

Impact:

  • Memory leak: timeout objects remain in memory until they fire
  • The callback will still execute (but return early due to the guard)
  • Unnecessary code execution during shutdown

Recommendation: Keep the cleanup in _onStop(). The guard in _startSleep() is good defense-in-depth, but you should still clean up the timeout:

if (this.#sleepTimeout) {
    clearTimeout(this.#sleepTimeout);
    this.#sleepTimeout = undefined;
}

This is also consistent with how you clear #pendingSaveTimeout and #checkConnLivenessInterval immediately after.


2. Console.trace() Left in Production Code

Location: engine/sdks/typescript/runner/src/mod.ts:870

console.trace("send actor intent", actorId, intentType);

Problem: This appears to be debug code that shouldn't be in production.

Recommendation: Either:

  • Remove it entirely if it was just for debugging this issue
  • Convert to structured logging: this.log?.debug({ msg: "send actor intent", actorId, intentType });

3. Missing Additional Logging Context

Location: rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:605

The addition of immediate to the log is good, but consider also logging:

  • Number of active actors being shut down
  • Whether this is a graceful vs forced shutdown

🧪 Test Coverage Concerns

Missing Test Case: There's no test that specifically validates the fix for this race condition. The existing sleep tests in actor-sleep.ts don't cover the scenario where:

  1. An actor has a pending sleep timeout
  2. Stop is called before the timeout fires
  3. Verify that sleep doesn't execute after stop

Recommendation: Add a test like:

test("stop prevents pending sleep from executing", async (c) => {
    const { client } = await setupDriverTest(c, driverTestConfig);
    
    const sleepActor = client.sleep.getOrCreate().connect();
    
    // Verify initial state
    const { startCount, sleepCount } = await sleepActor.getCounts();
    expect(sleepCount).toBe(0);
    expect(startCount).toBe(1);
    
    // Disconnect to allow sleep timer to start
    await sleepActor.dispose();
    
    // Wait most of the sleep timeout but not all
    await waitFor(driverTestConfig, SLEEP_TIMEOUT - 100);
    
    // Stop the actor before sleep fires
    await sleepActor.stop();
    
    // Wait for what would have been the sleep timeout
    await waitFor(driverTestConfig, 200);
    
    // Actor should be stopped, not sleeping
    // (verify appropriate state - exact assertions depend on your API)
});

🔒 Security Considerations

No security concerns identified. This is an internal lifecycle fix.


⚡ Performance Considerations

Minor positive impact:

  • Early returns prevent unnecessary work when shutdown/stop is in progress
  • Logging additions are minimal overhead

The removed #sleepTimeout cleanup could have a minor negative impact (see Issue #1).


📝 Code Style & Convention Notes

  1. Logging Style ✅: Follows the CLAUDE.md convention of lowercase messages with structured parameters
  2. Error Handling ✅: Appropriate use of early returns
  3. Comments ✅: Good explanatory comments in _startSleep()

🎯 Summary

This PR fixes a real bug, but needs one change before merging:

Required:

  • ✅ Restore the clearTimeout(this.#sleepTimeout) in _onStop() to prevent resource leaks
  • ✅ Remove or convert the console.trace() debug statement

Recommended:

  • Add a test case for the stop-during-pending-sleep scenario
  • Consider logging more context in shutdownRunner()

The core logic is sound and the defense-in-depth approach is good. Once the #sleepTimeout cleanup is restored, this will be ready to merge.

@claude
Copy link

claude bot commented Nov 6, 2025

Code Review

Summary

This PR fixes a race condition where actors attempt to sleep after stop has been initiated. The changes add appropriate guards to prevent sleep operations during shutdown.

Positive Aspects

  1. Good defensive programming: The added guards in #resetSleepTimer() and _startSleep() prevent race conditions
  2. Clear intent: The log messages clearly explain why operations are being skipped
  3. Consistent pattern: The #stopCalled check follows the existing pattern of #sleepCalled

Issues & Concerns

1. Debug Code Left in Production (Critical)

// engine/sdks/typescript/runner/src/mod.ts:870
console.trace("send actor intent", actorId, intentType);

Issue: This console.trace() appears to be debug code that should be removed before merging.

Recommendation: Remove this line or convert it to a proper debug log:

this.log?.debug({
    msg: "send actor intent",
    actorId,
    intentType,
});

2. Removed Sleep Timeout Cleanup (Potential Issue)

The PR removes this line from _onStop():

if (this.#sleepTimeout) clearTimeout(this.#sleepTimeout);

Analysis:

  • This cleanup now happens at line 2038-2040 (earlier in _onStop)
  • However, the new guards in #resetSleepTimer() (line 1924) check #stopCalled BEFORE clearing existing timeouts
  • This means if a sleep timeout is already scheduled when stop is called, it won't be cleared by #resetSleepTimer()

Question: Is this intentional? The _onStop method still clears the timeout at line 2038, so this appears safe, but the logic flow is subtle.

3. Potential Race Condition

// #resetSleepTimer at line 1924
if (this.#stopCalled) return;  // Early return before clearing timeout

// Later at line 1935
if (this.#sleepTimeout) {
    clearTimeout(this.#sleepTimeout);
    this.#sleepTimeout = undefined;
}

Issue: If #stopCalled becomes true, this function returns early without clearing any existing #sleepTimeout. While _onStop() does clear it, there's a small window where the timeout could fire between checking #stopCalled and the timeout being cleared in _onStop().

Recommendation: Consider clearing the timeout before the early return:

#resetSleepTimer() {
    if (this.#config.options.noSleep || \!this.#sleepingSupported) return;

    // Clear any existing timeout first
    if (this.#sleepTimeout) {
        clearTimeout(this.#sleepTimeout);
        this.#sleepTimeout = undefined;
    }

    // Don't sleep if already stopping
    if (this.#stopCalled) return;

    // ... rest of the function
}

4. Missing Test Coverage

Issue: There don't appear to be tests covering the sleep-during-stop race condition this PR addresses.

Recommendation: Add a test case that:

  1. Starts an actor with a short sleep timeout
  2. Initiates stop before sleep timeout fires
  3. Verifies that sleep is not initiated
  4. Verifies no errors or warnings are logged (besides the expected "cannot sleep" debug message)

Minor Observations

  1. Log message improvements: The updated log messages are clearer (e.g., "cannot call _startSleep twice, actor already sleeping" is more descriptive)

  2. Consistent shutdown checks: Good that both sleepActor() and #sendActorIntent() check for shutdown state

Questions for Author

  1. Was the console.trace() intentional for debugging, or should it be removed?
  2. Can you clarify the reasoning for removing the #sleepTimeout cleanup from the end of _onStop()?
  3. Have you tested the scenario where a sleep timeout is about to fire when stop is called?

Recommendations

Must Fix:

  • Remove or properly log the console.trace() call

Should Consider:

  • Add test coverage for the race condition
  • Review the timeout clearing logic in #resetSleepTimer() for edge cases

Overall Assessment

The core fix is sound and addresses a real race condition. The main concern is the debug code that should be removed, and it would benefit from test coverage to prevent regressions.

@NathanFlurry NathanFlurry force-pushed the 11-05-chore_runner_rename_websocketid_-_requestid branch from 8d4492f to f9b3de2 Compare November 6, 2025 02:16
@NathanFlurry NathanFlurry force-pushed the 11-05-fix_rivetkit_fix_actors_trying_to_sleep_after_stop_started branch from 30a829e to f612c0f Compare November 6, 2025 02:16
@claude
Copy link

claude bot commented Nov 6, 2025

PR Review: Fix actors trying to sleep after stop started

Summary

This PR addresses a race condition where actors would attempt to sleep after the stop sequence has already begun. The fix adds proper guards to prevent sleep operations when #stopCalled is true, and adds early-exit checks in the runner to prevent operations during shutdown.


Code Quality & Best Practices

✅ Positive Observations:

  1. Proper guard placement: The checks for this.#stopCalled in #resetSleepTimer() (line 1924) and _startSleep() (line 1985) correctly prevent race conditions.

  2. Clear logging: The debug messages provide good context for understanding why operations are being skipped:

    • "cannot call _startSleep if actor already stopping"
    • "cannot call _startSleep twice, actor already sleeping"
  3. Defensive programming: Adding the shutdown check in sleepActor() at the runner level (mod.ts:175-180) provides defense in depth.

⚠️ Areas for Improvement:

  1. Debugging code left in: There's a console.trace() call in mod.ts:870 that appears to be debugging code:

    console.trace("send actor intent", actorId, intentType);

    This should be removed before merging or converted to proper structured logging using the this.log logger.

  2. Inconsistent sleep timeout clearing: In instance.ts:2083, the removal of the sleep timeout clearing is concerning:

    - if (this.#sleepTimeout) clearTimeout(this.#sleepTimeout);

    Why was this removed? The sleep timeout is already being cleared in _onStop() at line 2038-2041, so this might be redundant. However, it's worth documenting why this was removed to prevent future confusion. If the timeout is cleared earlier in _onStop(), this change is safe, but it should be verified that there's no code path where _onStop() completes without clearing the timeout first.


Potential Bugs & Issues

Critical:

None identified. The logic appears sound.

Medium Priority:

  1. Race condition window: Between checking this.#shutdown and actually sending the intent, there's a small window where shutdown could occur. However, this is likely acceptable given the checks are at multiple levels (runner + instance).

  2. Timeout lifecycle verification needed: Verify that removing the sleep timeout clearing at line 2083 doesn't leave any edge cases where the timeout could fire after actor cleanup. The clearing at line 2038 should handle this, but a code inspection of all paths through _onStop() would be prudent.


Performance Considerations

No significant performance impact. The added conditional checks are minimal and occur in code paths that are not performance-critical (shutdown sequences).


Security Concerns

No security issues identified. The changes properly prevent operations during invalid states, which actually improves robustness.


Test Coverage

✅ Excellent test coverage exists:

The existing test suite in actor-sleep.ts has comprehensive coverage:

  • Basic sleep functionality
  • Sleep with connections
  • Automatic sleep after timeout
  • RPC calls keeping actors awake
  • Alarms preventing/triggering sleep
  • Long-running operations
  • Raw WebSocket handling
  • Raw HTTP request handling
  • noSleep option

📋 Recommendation: Consider adding a specific test case for the race condition this PR fixes:

test("actor does not sleep when stop is called during sleep timer", async (c) => {
  const { client } = await setupDriverTest(c, driverTestConfig);
  
  const sleepActor = client.sleep.getOrCreate();
  
  // Wait almost until sleep timeout
  await waitFor(driverTestConfig, SLEEP_TIMEOUT - 100);
  
  // Manually trigger stop (this would need to be exposed for testing)
  // Verify that sleep does not occur
  
  // Verify actor stopped cleanly without attempting to sleep
});

This would require exposing some testing hooks to manually trigger the stop sequence, which may not be worth it given the complexity. The existing tests likely cover this implicitly.


Minor Issues

  1. Typo in log message improvement: The log message at line 1997 could be more concise:

    msg: "cannot call _startSleep twice, actor already sleeping",

    Consider: msg: "actor already sleeping, skipping _startSleep"

  2. Additional logging parameter: The immediate parameter added to the shutdown log at actor-driver.ts:605 is good for debugging. No issue here.


Recommendations

Before merging:

  1. Remove or convert the console.trace() at mod.ts:870
  2. Document why sleep timeout clearing was removed at instance.ts:2083 (add a comment explaining it's cleared earlier in _onStop())
  3. ⚠️ Verify all code paths through _onStop() to ensure the sleep timeout is always cleared before reaching the removed clearing code

Optional improvements:

  1. Consider adding JSDoc comments to _startSleep() explaining the guard conditions
  2. Consider extracting the shutdown check pattern into a helper method if this pattern is repeated elsewhere

Overall Assessment

Status: Approve with minor changes

This is a solid fix for a legitimate race condition bug. The implementation is clean and follows the codebase patterns. The main blocker is removing the debugging console.trace() call. The removal of the sleep timeout clearing should be verified as safe (which it likely is), and ideally documented with a comment.

Severity of bug fixed: Medium - Could cause unexpected behavior and error logs during shutdown sequences
Risk of changes: Low - Well-guarded defensive checks with existing comprehensive test coverage


Great work on identifying and fixing this race condition! 🎯

@claude
Copy link

claude bot commented Nov 6, 2025

Code Review

Summary

This PR fixes a race condition where actors could attempt to sleep after the stop process has already started, which could lead to inconsistent state or errors during shutdown.

Positive Changes ✅

1. Race condition prevention in #resetSleepTimer (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1924)

  • Good: Adding if (this.#stopCalled) return; prevents scheduling new sleep timers during shutdown
  • This is the right guard at the right place - prevents the root cause

2. Early exit in _startSleep (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1985-1990)

  • Good: Guards against calling sleep after stop has been initiated
  • Provides helpful debug logging for troubleshooting

3. Runner shutdown check (engine/sdks/typescript/runner/src/mod.ts:175-180)

  • Good: Prevents attempting to sleep actors when the runner is already shutting down
  • Consistent with existing pattern in #sendActorIntent

Concerns & Questions ⚠️

1. Removed sleep timeout cleanup (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:2083)

  • CRITICAL: The line if (this.#sleepTimeout) clearTimeout(this.#sleepTimeout); was removed from _onStop
  • This seems problematic because:
    • Even though #stopCalled prevents new timers from being scheduled, an existing timer scheduled before stop was called could still fire
    • If a timer fires during shutdown (before being garbage collected), it will call _startSleep(), which now has a guard but will still log unnecessarily
    • The timeout clearing was already present at line 2038-2041 in the same _onStop method, so this appears to be a duplicate cleanup that was removed
    • However, removing defensive cleanup could be risky if the earlier cleanup is somehow bypassed

Recommendation: Verify that the sleep timeout is always cleared before reaching the removed line. If this is a duplicate cleanup, a comment explaining why it's not needed would be helpful. If it's not a duplicate, consider restoring it.

2. Debug console.trace in production code (engine/sdks/typescript/runner/src/mod.ts:870)

  • ⚠️ The line console.trace("send actor intent", actorId, intentType); appears to be debug code
  • Recommendation: Remove this before merging, or wrap it in a debug flag check

3. Improved log message clarity (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1996-1997)

  • ✅ Good: Changed "already sleeping actor" to "cannot call _startSleep twice, actor already sleeping"
  • More descriptive and helps with debugging

Testing Recommendations 🧪

Consider adding tests for:

  1. Race condition scenario: Verify that if _onStop is called while a sleep timer is pending, the timer doesn't fire or has no effect
  2. Concurrent sleep/stop: Test calling stop immediately after scheduling a sleep timeout
  3. Runner shutdown with pending sleep: Verify actors don't attempt sleep operations during runner shutdown

Code Quality

Follows repository conventions: ✅

  • Uses structured logging with tracing patterns
  • Consistent error handling
  • Follows naming conventions

Performance: ✅ No concerns

  • Changes are guard checks with minimal overhead

Security: ✅ No concerns

  • No security implications

Final Recommendation

Approve with minor changes:

  1. Remove the debug console.trace statement (line 870 in runner/src/mod.ts)
  2. Clarify or add comment about the removed sleep timeout cleanup in _onStop
  3. Consider adding test coverage for the race condition this fixes

The core fix is sound and addresses a real race condition in the shutdown flow. The main concern is the removed timeout cleanup which may or may not be needed depending on the execution flow.


Review generated by Claude Code

@claude
Copy link

claude bot commented Nov 6, 2025

Code Review

Summary

This PR addresses a race condition where actors attempt to sleep after the stop process has already begun. The fix adds proper guards to prevent sleep operations during shutdown.

Positive Aspects ✅

  1. Correct Problem Identification: The PR properly identifies and addresses the race condition between sleep and stop operations.

  2. Defense in Depth: Multiple layers of protection are added:

    • Early return in sleepActor() when runner is shut down (runner/src/mod.ts:175-180)
    • Guard in #resetSleepTimer() to prevent scheduling new sleep timers (instance.ts:1924)
    • Guard in _startSleep() to prevent initiating sleep (instance.ts:1985-1990)
  3. Improved Logging: Better log messages differentiate between different error conditions (e.g., "cannot call _startSleep if actor already stopping" vs "cannot call _startSleep twice").

Issues & Concerns 🔴

Critical: Sleep Timeout Not Cleared in _onStop()

Location: instance.ts:2083

The PR removes the line that clears #sleepTimeout from _onStop():

// Removed from _onStop():
if (this.#sleepTimeout) clearTimeout(this.#sleepTimeout);

Problem: While the early return guards prevent _startSleep() from executing its body, the timeout itself is still scheduled and will fire. This means:

  1. The timeout callback at line 1944-1946 will still execute
  2. It will call _startSleep()
  3. _startSleep() will early-return due to the #stopCalled check
  4. This wastes CPU cycles and creates unnecessary log messages

Why this matters: The sleep timeout is set at instance.ts:1944 and can be a significant duration (sleepTimeout config option). If an actor stops before this timeout fires, the timeout should be cancelled to avoid unnecessary work.

Solution: The line should not be removed. It should remain in _onStop():

// In _onStop(), around line 2038:
if (this.#sleepTimeout) {
    clearTimeout(this.#sleepTimeout);
    this.#sleepTimeout = undefined;
}

The existing code at lines 2038-2041 already does this correctly. The removal appears to be accidental.

Debug Code Left In Production

Location: runner/src/mod.ts:870

console.trace("send actor intent", actorId, intentType);

Problem: This debug trace should be removed before merging:

  • Uses console.trace() instead of structured logging
  • Will spam logs with stack traces in production
  • Appears to be debugging code that helped identify the issue

Solution: Remove this line entirely, or convert to proper structured logging if the information is valuable:

this.log?.debug({
    msg: "Runner shut down during actor intent",
    actorId,
    intentType,
});

Minor Observations 💡

  1. Consistent Logging Style: The new log message "cannot call _startSleep if actor already stopping" follows the repository's lowercase convention for log messages. Good! ✅

  2. Log Message Improvement: Changed "already sleeping actor" to "cannot call _startSleep twice, actor already sleeping" which is more descriptive. Good improvement! ✅

  3. Added Logging Context: The immediate parameter is now logged in shutdownRunner() (actor-driver.ts:602), which improves observability. ✅

Recommendations

Before Merging:

  1. Keep the sleep timeout clearing in _onStop() - this line should not have been removed
  2. Remove the console.trace() debug code from runner/src/mod.ts:870
  3. Add a test case for this race condition to prevent regression (see test suggestion below)

Test Coverage Suggestion

Consider adding a test that verifies actors don't attempt to sleep after stop is called:

// Test: Should not sleep after stop initiated
test('actor should not sleep after stop called', async () => {
  const actor = createTestActor({
    options: { sleepTimeout: 100 }
  });
  
  // Trigger conditions that would schedule sleep
  actor.resetSleepTimer();
  
  // Start stop process
  await actor._onStop();
  
  // Sleep timeout should have been cleared and sleep should not occur
  // Verify via logs or internal state
});

Security & Performance

  • ✅ No security concerns identified
  • ⚠️ Minor performance impact if sleep timeout isn't cleared (unnecessary timer firing)

Overall Assessment: Good fix for the race condition, but needs two changes before merging: keep the sleep timeout clearing and remove debug code.

@graphite-app
Copy link
Contributor

graphite-app bot commented Nov 6, 2025

Merge activity

  • Nov 6, 5:47 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Nov 6, 5:48 AM UTC: CI is running for this pull request on a draft pull request (#3395) due to your merge queue CI optimization settings.
  • Nov 6, 5:50 AM UTC: Merged by the Graphite merge queue via draft PR: #3395.

@graphite-app graphite-app bot closed this Nov 6, 2025
@graphite-app graphite-app bot deleted the 11-05-fix_rivetkit_fix_actors_trying_to_sleep_after_stop_started branch November 6, 2025 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants