Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

Copy link
Member Author

NathanFlurry commented Jan 22, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Jan 22, 2026

PR Review: feat(rivetkit): integrate workflows into actors

Summary

This PR integrates the workflow engine into RivetKit actors, allowing developers to use actor({ run: workflow(async ctx => { ... }) }) syntax. The implementation is well-architected with proper separation of concerns, comprehensive testing, and good adherence to the project's conventions.

Positive Highlights

Architecture & Design

  • Clean abstraction layers: ActorWorkflowDriver and ActorWorkflowContext provide excellent separation between workflow engine and actor runtime
  • Deterministic execution: The guard system preventing state access outside steps is clever and prevents non-deterministic bugs
  • Live mode integration: Using mode: "live" keeps workflows responsive to messages and alarms without constant polling
  • Comprehensive test coverage: Test fixtures cover basic operations, queue integration, and state guards

Code Quality

  • Type safety: Proper generic constraints maintain type safety across the stack
  • Error handling: Graceful error handling with proper cleanup (e.g., consumeGuardViolation())
  • Resource management: Good use of keepAwake() to prevent premature actor sleep during active operations

Issues & Concerns

Critical Issues

1. Inconsistent indentation in workflow/context.ts:59-64

// Current (mixed tabs/spaces):
	const stepConfig = nameOrConfig as StepConfig<T>;
	const config: StepConfig<T> = {
		...stepConfig,
		run: () => this.#withActorAccess(stepConfig.run),
	};

Issue: Line 59 uses spaces while the rest uses tabs. According to CLAUDE.md, hard tabs are required.

2. Potential race condition in QueueManager message deletion

queue-manager.ts:296-305 - Between loading messages and removal, new messages could arrive, making the index-based removal in #removeMessages potentially unsafe if other operations interleave.
Recommendation: Add a comment explaining the thread safety assumptions.

3. Empty catch block with unclear intent

workflow/context.ts:276-278 - While the comment explains intent, silently catching all errors could hide bugs (e.g., network errors during KV operations).
Recommendation: At minimum, log the error at debug level.

Major Issues

4. Memory leak potential in QueueManager listeners

queue-manager.ts:256-283 - The waitForNames implementation adds listeners but only cleans them up on resolution/rejection. If the promise never settles due to a bug, listeners accumulate.

5. Missing KV key validation

driver.ts:114-126 - The workflow driver directly constructs keys with makeWorkflowKey(key) but doesn't validate that keys stay within reasonable bounds or don't contain invalid characters.

6. Unbounded loop in sleep implementation

workflow-engine/index.ts:434-469 - If Date.now() never advances (clock issues, debugger), this could loop infinitely.
Recommendation: Add iteration limit or additional exit condition.

Minor Issues

  1. Magic number for worker poll interval (driver.ts:98) - Extract 100 to a named constant with documentation
  2. TODO.md in workflow-engine package - Should be tracked in issue tracker instead
  3. Linear search in message consumption (workflow-engine/storage.ts:379-384) - O(n) for each message, consider indexing by name for large queues

Performance Considerations

  1. Frequent KV operations - Workflow engine persists state after every operation. For high-throughput workflows, consider batching flushes.
  2. Queue message loading (queue-manager.ts:326-358) - Loads and decodes all messages every time. Consider caching for large queues.
  3. Message consumption - Linear search could be optimized with indexing for actors with many queued messages.

Security Concerns

  1. No size limits on workflow history - Could cause memory/storage exhaustion. Add configurable limits or pruning.
  2. CBOR deserialization - There is message size validation on enqueue (good), but additional runtime limits on decode could help.

Test Coverage Assessment

Strengths: Basic operations, queue integration, state guards, sleep/alarm behavior all tested

Gaps:

  • No tests for workflow rollback/error recovery
  • No tests for concurrent queue operations
  • No tests for very large queue sizes
  • Limited edge case coverage (clock skew, very long sleeps)

Recommendations by Priority

Must Fix Before Merge

  1. ✅ Fix indentation inconsistency (tabs vs spaces)
  2. ⚠️ Add error logging to empty catch blocks
  3. ⚠️ Document thread-safety assumptions in QueueManager

Should Fix Before Merge

  1. Add JSDoc to public APIs (workflow(), ActorWorkflowContext)
  2. Review and improve test coverage for rollback scenarios
  3. Add validation for workflow KV keys

Consider for Follow-up

  1. Performance optimization for large queues
  2. Workflow history pruning/limits
  3. Enhanced observability (metrics, traces)

Overall Assessment

Quality: High ⭐⭐⭐⭐☆

This is a well-designed and well-implemented feature that follows most of the project's conventions. The architecture is clean, the code is generally well-written, and the test coverage is reasonable. The critical issues are minor and easily fixable. The main areas for improvement are around error handling, performance optimization for scale, and additional test coverage for edge cases.

Recommendation: Approve with minor changes requested.


Review generated by Claude Code. Please verify all suggestions in context.

@NathanFlurry NathanFlurry force-pushed the 01-02-feat_rivetkit_workflows branch from 246d677 to a3fbdd7 Compare January 22, 2026 17:45
@NathanFlurry NathanFlurry force-pushed the 01-21-feat_rivetkit_integrate_workflows_in_to_actors branch from 2428e7f to d2ae4eb Compare January 22, 2026 17:45
@claude
Copy link

claude bot commented Jan 22, 2026

PR Review: feat(rivetkit): integrate workflows in to actors

This is a substantial and well-architected addition that integrates the workflow engine into RivetKit actors. The implementation follows the design spec closely and introduces a clean abstraction layer.


✅ Strengths

  1. Clean Architecture: The separation of concerns between ActorWorkflowDriver, ActorWorkflowContext, and the workflow() helper is well thought out.

  2. Deterministic Guards: The runtime guards preventing state access outside steps (via #allowActorAccess and #actorAccessDepth) are a good safety mechanism for maintaining workflow determinism.

  3. Comprehensive Test Coverage: The PR includes extensive test suites covering run handlers, workflows, queue integration, and edge cases.

  4. Documentation: The specs/workflow.md provides excellent context for the design decisions.


🔴 Critical Issues

1. Indentation Inconsistency (context.ts:59-64)

Lines 59-64 have incorrect indentation - missing else keyword.

2. Unsafe Type Casting (workflow.ts:304, 348, 369)

The fixtures use as any to access actor context properties which bypasses TypeScript type safety and makes tests fragile.

3. Missing Error Handling (workflow.ts:305-309)

Empty catch blocks silently swallow ALL errors, not just the expected guard violations. This could hide real bugs.

4. Race Condition in keepAwake (mod.ts:784-805)

If the actor is destroyed while a keepAwake promise is pending, the counter may not decrement properly, leading to leaked references.


⚠️ Significant Issues

5. Workflow Never Completes (mod.ts:80-82)

The never-resolving promise creates a memory leak if the workflow engine completes. While this matches the design intent, it should be well documented.

6. Queue Message Filtering (driver.ts:1790-1791)

The __workflow: prefix convention is fragile. Users could accidentally create conflicting queue names. Add validation to reject user queue names starting with __workflow:.

7. Incomplete Alarm Cleanup (driver.ts:1949-1952)

Alarms are set but never cleared, which could lead to spurious wake-ups after workflow completion.

8. Message Listener Memory Management (queue-manager.ts:910-912)

The #messageListeners set is never cleaned up when actors are destroyed, causing memory leaks. Add cleanup in the actor destroy/sleep lifecycle.


💡 Code Quality Suggestions

9. Guard Violation Tracking (context.ts:1703-1732)

The #markGuardTriggered() mechanism appears to be test infrastructure leaking into production code. Consider removing it or making it debug-only.

10. Unclear Abort Signal Semantics

The workflow context exposes both the inner workflow abort signal AND the actor abort signal. Document which one to use in which scenarios.

11. Inconsistent Error Types

Both actor abort and external abort signals reject with the same ActorAborted error, making it impossible to distinguish the cause.

12. Type Safety Violation (mod.ts:43-46)

Using symbols with type casting is fragile. Consider adding a typed helper method on RunContext to get the actor instance safely.


📋 Testing Observations

Positive:

  • Comprehensive coverage of basic workflows, loops, queues, and sleep
  • Good use of fixtures to separate test actors from production code
  • Tests verify both happy paths and error cases

Concerns:

  • Tests rely heavily on timing (waitFor) which can be flaky
  • No tests for concurrent workflow operations
  • Missing tests for workflow migration/versioning (mentioned in TODO.md)
  • No tests for memory cleanup during actor lifecycle transitions

🔧 Performance Considerations

  1. KV Operations: Every workflow step involves multiple KV reads/writes. This could become a bottleneck for workflows with many small steps.
  2. Queue Polling: The workerPollInterval = 100 means workflows check for messages every 100ms, which may waste resources.
  3. State Serialization: The workflow engine serializes/deserializes state on every operation. For large actor states, this could be expensive.

🛡️ Security Considerations

  1. Queue Name Injection: The WORKFLOW_QUEUE_PREFIX convention could allow users to inject malicious queue names.
  2. State Access Guards: While the guards prevent accidental non-determinism, a determined user could bypass them with type casts.
  3. Resource Exhaustion: No limits on workflow history size, which could grow unbounded for long-running workflows.

📚 Documentation Needs

  1. Add API documentation for the workflow() function
  2. Document the reserved __workflow: queue prefix
  3. Explain the relationship between run and onWake hooks
  4. Add migration guide for existing actors wanting to adopt workflows
  5. Document memory/performance characteristics

Summary

This is a solid foundation for workflow integration with good architectural decisions. The critical issues around indentation, type safety, and error handling should be addressed before merge. The design trade-offs (like never-resolving promises) are intentional but should be well documented.

Recommendation: Address critical issues, then merge. The warning-level issues can be tracked as follow-up improvements.


Review generated with context from CLAUDE.md and specs/workflow.md

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 22, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4006

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@4006

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4006

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4006

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4006

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4006

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4006

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4006

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4006

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4006

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4006

commit: 2bdce1c

@NathanFlurry NathanFlurry force-pushed the 01-02-feat_rivetkit_workflows branch from a3fbdd7 to b4b061f Compare January 24, 2026 23:35
@NathanFlurry NathanFlurry force-pushed the 01-21-feat_rivetkit_integrate_workflows_in_to_actors branch from d2ae4eb to 2bdce1c Compare January 24, 2026 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants