Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Oct 23, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Oct 23, 2025 10:54pm
rivet-inspector Ignored Ignored Preview Oct 23, 2025 10:54pm
rivet-site Ignored Ignored Preview Oct 23, 2025 10:54pm

Copy link
Contributor Author

MasterPtato commented Oct 23, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Oct 23, 2025

Code Review

This PR adds logging statements and renames WorkflowStopped to WorkflowEvicted.

Positive Aspects

  1. Semantic Clarity: Renaming WorkflowStopped to WorkflowEvicted better describes the behavior where workflows are evicted from memory rather than permanently stopped
  2. Consistent Rename: The change is applied consistently across all occurrences
  3. Follows CLAUDE.md: Logging follows structured logging patterns with field prefixes

Issues and Concerns

1. Missing Context in Rust Logs (Moderate)

In engine/packages/pegboard-serverless/src/lib.rs:220, the message "scaling" is vague. Per CLAUDE.md, messages should be more descriptive. Consider "runner scaling adjusted" or "scaling runners to meet demand"

2. Missing Context in Outbound Handler Log (Moderate)

In engine/packages/pegboard-serverless/src/lib.rs:310:

  • Abbreviated "req" should be "request" for clarity
  • Missing important context: runner_name, namespace_name (available as parameters)

3. Inconsistent Logging in TypeScript (IMPORTANT)

Problem: Removing runnerId from logs makes debugging harder, especially with multiple concurrent runners:

  • Line 249: Removed from "starting runner"
  • Line 283: Removed from "starting shutdown"
  • Line 430: Removed from "Connected"

Recommendation: Keep runnerId in logs. It is crucial for correlating logs across distributed runner instances.

4. Debug Noise (Low Priority)

Lines 275-278 log process.listeners("SIGINT"). This appears to be debugging code that should likely be removed before merge.

Testing Concerns

  1. No test changes: Are there tests that verify wake_immediate() and is_recoverable() work correctly for WorkflowEvicted?
  2. Breaking Change: If any external code matches on the error string "workflow stopped", this is a breaking change

Recommendations Summary

Must Fix:

  1. Restore runnerId in TypeScript logs (lines 249, 283, 430) - critical for debugging
  2. Consider removing the "current listeners" debug log (lines 275-278)

Should Fix:

  1. Add more context to Rust logs (runner_name, namespace_name where available)
  2. Use full word "request" instead of "req"
  3. Make log messages more descriptive

Nice to Have:

  1. Add/update tests for WorkflowEvicted error behavior
  2. Document the semantic difference between "stopped" and "evicted"

Overall Assessment

The error rename is a good semantic improvement, but the logging changes have issues, particularly removing runnerId from TypeScript logs which will hurt debuggability.

Recommendation: Request changes to restore runnerId before merging.

@MasterPtato MasterPtato force-pushed the 10-21-chore_engine_runner_improvements branch from c1830eb to d78cc5f Compare October 23, 2025 22:54
@MasterPtato MasterPtato marked this pull request as ready for review October 23, 2025 22:54
@claude
Copy link

claude bot commented Oct 23, 2025

PR Review: Add logs

Summary

This PR adds debug logging to improve observability and renames WorkflowStopped to WorkflowEvicted for better semantic clarity.

Code Quality & Best Practices

✅ Positive Changes:

  1. Terminology improvement: Renaming WorkflowStoppedWorkflowEvicted is more semantically accurate and better reflects that workflows are being evicted from memory rather than permanently stopped.
  2. Logging additions follow CLAUDE.md conventions: The new debug logs properly use structured logging with field prefixes (e.g., %namespace_name, %runner_name, ?x).
  3. Logging messages are lowercase: Correctly follows the convention in CLAUDE.md (e.g., "scaling", "sending outbound req").

⚠️ Minor Issues:

  1. Log message in pegboard-serverless/src/lib.rs:310:

    tracing::debug!(%url, "sending outbound req");

    The message uses abbreviation "req" instead of the full word "request". For consistency and clarity, consider using the full word:

    tracing::debug!(%url, "sending outbound request");
  2. TypeScript logging inconsistency (engine/sdks/typescript/runner/src/mod.ts:275-278):

    logger()?.debug({
        msg: "current listeners",
        listeners: process.listeners("SIGINT"),
    });

    This debug log seems like temporary debugging code that may have been added for troubleshooting. Questions:

    • Is this intended to remain in production?
    • Should this be at a more appropriate log level or removed?
    • The listeners output could be verbose and not particularly useful in production
  3. Missing context in scaling log: The scaling log at pegboard-serverless/src/lib.rs:220 is good, but it might be helpful to also log the current runner count:

    tracing::debug!(
        %namespace_name, 
        %runner_name, 
        current_count = curr.len(),
        %desired_count, 
        %drain_count, 
        %start_count, 
        "scaling"
    );

Potential Bugs or Issues

✅ No bugs identified - The changes are straightforward:

  • The rename is complete and consistent across all usages
  • The error variant is properly updated in all match statements and error handling

Performance Considerations

✅ No performance concerns:

  • Debug logs have minimal overhead when not enabled
  • The changes don't introduce any new allocations or expensive operations

Security Concerns

✅ No security issues:

  • The logs don't expose sensitive information
  • URL logging in outbound_handler is acceptable as it's debug-level

Test Coverage

⚠️ Recommendation:

  • Consider adding a test to verify that WorkflowEvicted errors are properly classified as recoverable (via is_recoverable()) and wake immediately (via wake_immediate())
  • This would ensure the semantic behavior of the renamed error variant is maintained

Additional Notes

  1. The comment at engine/sdks/typescript/runner/src/mod.ts:270-272 says "added SIGTERM listeners" but actually both SIGTERM and SIGINT listeners were added. Minor documentation inconsistency.

  2. The removal of runnerId from some log messages (starting runner, Connected) - is this intentional? It might reduce traceability if multiple runners are active simultaneously.

Verdict

Approve with minor suggestions

The changes are clean and improve both terminology and observability. The suggestions above are minor and non-blocking. The core changes (error renaming and debug logging) are solid.

@claude
Copy link

claude bot commented Oct 23, 2025

PR Review: chore: add logs

Summary

This PR makes two main changes:

  1. Renames error variant: Changes WorkflowStopped to WorkflowEvicted in the gasoline workflow engine
  2. Adds debug logging: Adds debug logs in pegboard-serverless and the TypeScript runner SDK

Code Quality & Best Practices

✅ Strengths

  • The rename from WorkflowStopped to WorkflowEvicted is more semantically accurate and aligns with the actual behavior
  • All references to the error variant are updated consistently across the codebase
  • New logging follows the project's structured logging conventions (e.g., tracing::debug!(%namespace_name, ...))
  • Uses existing reqwest client pool correctly: rivet_pools::reqwest::client_no_timeout().await?

⚠️ Considerations

1. Logging Consistency (TypeScript runner)

In engine/sdks/typescript/runner/src/mod.ts, some logs were changed to remove runnerId:

// Before: logger()?.info({ msg: "starting runner", runnerId: this.runnerId });
// After:  logger()?.info({ msg: "starting runner" });

However, the shutdown log still includes runnerId:

logger()?.info({ msg: "starting shutdown", runnerId: this.runnerId, immediate });

Recommendation: For consistency and debugging purposes, consider keeping runnerId in all logs or removing it from all logs. The runner ID is valuable context for distributed tracing.

2. Debugging Logs May Be Too Verbose

The added logs at lines 275-278 in mod.ts:

logger()?.debug({
    msg: "current listeners",
    listeners: process.listeners("SIGINT"),
});

This appears to be debugging code that logs the entire listener array. This might be:

  • Too verbose for production environments
  • Potentially exposing internal implementation details
  • A temporary debugging aid that should be removed

Recommendation: If this is temporary debugging code, consider removing it or wrapping it in a feature flag. If it's meant to stay, add a comment explaining why this information is useful.

3. Log Message Case Consistency

In mod.ts line 271, the log message uses title case:

msg: "added SIGTERM listeners"

Per the CLAUDE.md guidelines for Rust ("Log messages should be lowercase unless mentioning specific code symbols"), consider whether this applies to TypeScript as well for consistency.

Potential Bugs or Issues

Possible Breaking Change

The rename from WorkflowStopped to WorkflowEvicted could be a breaking change if:

  • Any external code matches on this error variant
  • Error messages are parsed or matched in logs/monitoring
  • The error string is used in user-facing messages

Recommendation: Verify that no external consumers depend on the WorkflowStopped error string or variant name.

Performance Considerations

✅ No significant performance concerns. The added debug logs are appropriately scoped and won't impact performance in production if debug logging is disabled.

Security Concerns

✅ No security concerns identified. The logs don't expose sensitive information.

Test Coverage

⚠️ Missing Test Updates

The codebase has a test file engine/packages/gasoline/tests/workflows/eviction_test.rs, but this PR doesn't update it. While the test might still pass (since it doesn't explicitly check error messages), consider:

Recommendation: Verify that tests still pass with the renamed error variant. Run:

cargo test -p gasoline eviction_test

Overall Assessment

This is a straightforward chore PR with mostly positive changes. The main improvements needed are:

  1. Decide on consistent runnerId logging in TypeScript
  2. Remove or justify the debug listener logging
  3. Verify no breaking changes from the error rename
  4. Confirm tests pass

Verdict: ✅ Approve with minor suggestions for improvement

@graphite-app
Copy link
Contributor

graphite-app bot commented Oct 24, 2025

Merge activity

  • Oct 24, 6:47 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Oct 24, 6:47 AM UTC: CI is running for this pull request on a draft pull request (#3257) due to your merge queue CI optimization settings.
  • Oct 24, 6:48 AM UTC: Merged by the Graphite merge queue via draft PR: #3257.

graphite-app bot pushed a commit that referenced this pull request Oct 24, 2025
@graphite-app graphite-app bot closed this Oct 24, 2025
@graphite-app graphite-app bot deleted the 10-22-chore_add_logs branch October 24, 2025 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants