-
Notifications
You must be signed in to change notification settings - Fork 133
fix(pb): rewrite runner wf to handle batch signals #3480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11-17-fix_gas_fix_batch_listen_fix_history_for_graceful_signal_send_in_workflows
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 4 Skipped Deployments
|
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
How to use the Graphite Merge QueueAdd the label merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
More templates
@rivetkit/cloudflare-workers
@rivetkit/db
@rivetkit/framework-base
@rivetkit/next-js
@rivetkit/react
rivetkit
@rivetkit/sql-loader
@rivetkit/engine-runner
@rivetkit/engine-runner-protocol
commit: |
Code Review: Rewrite runner workflow to handle batch signalsSummaryThis PR introduces Positive AspectsArchitecture & Design
Error Handling
Code Quality
Issues & ConcernsCritical Issues1. Unused Import (line 11)use vbare::OwnedVersionedData;This import is not used anywhere in the file and should be removed. The project uses workspace dependencies, so unnecessary imports should be cleaned up. Fix: Remove line 11 2. Potential Division by Zero (lines 620, 1057)let remaining_millislots = (remaining_slots * 1000) / input.total_slots;If Recommendation: Add validation or use 3. Missing Signal Handler?The Recommendation: Consider if this is intentional or if the signal handling should be refactored for better compile-time safety Performance Considerations4. Sequential Signal Processing (lines 239-256)// NOTE: This should not be parallelized because signals should be sent in order
// Forward to actor workflows
// Process events
for event in &events {
// ... sends signal to actor workflow
}While the comment explains this must be sequential, this could become a bottleneck with many events. Each signal send is an async operation that must complete before the next begins. Consider:
5. Sequential Allocation Signals (lines 315-321)for alloc in res.allocations {
ctx.signal(alloc.signal)
.to_workflow::<crate::workflows::actor::Workflow>()
.tag("actor_id", alloc.actor_id)
.send()
.await?;
}Similar to #4, these allocations are sent sequentially but could potentially be parallelized since they're going to different actors. Recommendation: Use 6. Message-by-Message Publishing (lines 1140-1147)for message in &input.messages {
let message_serialized = versioned::ToClient::wrap_latest(message.clone())
.serialize_with_embedded_version(PROTOCOL_VERSION)?;
ctx.ups()?
.publish(&receiver_subject, &message_serialized, PublishOpts::one())
.await?;
}Each message is published individually in a loop. If the pubsub system supports batch publishing, this could be optimized. Recommendation: Check if Code Quality Issues7. Commented Out Code (lines 36, 819-825)// events: Vec<EventRow>,
// TODO: Storing events is disabled for now, otherwise state will grow indefinitelyWhile the TODO explains why events aren't stored, the commented code should either be removed or the decision should be finalized. Recommendation: Either implement a proper solution (e.g., event truncation/archival) or remove the dead code 8. Deprecated Field (line 455-456)#[derive(Debug, Serialize, Deserialize)]
struct InitOutput {
/// Deprecated.
evict_workflow_id: Option<Id>,
}If this field is deprecated, there should be a plan to remove it. Is this for backwards compatibility? Should it have a timeline for removal? Recommendation: Add context on when this can be removed or if it's needed for backward compatibility 9. Clone Overhead (line 70, 284)let input = input.clone(); // line 70
commands: commands.clone(), // line 284The input is cloned on every loop iteration. While Recommendation: Consider if these clones are necessary or if references could be used Testing & Documentation10. No Tests IncludedThis is a significant rewrite (1176 lines) that changes core behavior from single-signal to batch-signal processing, but no tests are included in the PR. Critical Recommendation: Add tests covering:
11. Missing Migration PathThe file is named Questions:
Recommendation: Document the migration strategy Minor Issues12. Inconsistent Naming (lines 427-442)
Recommendation: impl Default for LifecycleState {
fn default() -> Self { ... }
}13. Magic Number (line 16)const EVENT_ACK_BATCH_SIZE: i64 = 500;Good use of a constant, but consider if this should be configurable via the config system like other thresholds. Security Concerns14. Unbounded Batch SizeWhile the batch size is limited to 1024 signals (line 77), each signal could contain variable-sized data (e.g., Recommendation: Consider adding size-based limits in addition to count-based limits 15. Error Message Information Disclosure (lines 130-133)bail!(
"received message that should not be sent to runner workflow: {:?}",
sig.inner
)Logging the full message content in an error could potentially expose sensitive information. Recommendation: Log message type only, not full content, or ensure sensitive fields are redacted Performance MetricsThe batch processing approach should provide significant performance improvements:
Recommendation: Add metrics to track:
Summary & RecommendationsMust Fix Before Merge
Should Fix Before Merge
Consider for Future
ConclusionThis is a well-architected improvement to the runner workflow that should significantly improve performance through batch processing. The code quality is generally good with clear separation of concerns and proper error handling. However, the lack of tests for such a critical component is concerning, and the migration strategy needs clarification. With the critical issues addressed, this will be a solid improvement to the system. Great work on the batch processing implementation! The signal accumulation logic is particularly well done. |

No description provided.