feat(gas): add batch listen #3478

MasterPtato · 2025-11-18T02:30:09Z

No description provided.

vercel · 2025-11-18T02:30:16Z

The latest updates on your projects. Learn more about Vercel for GitHub.

4 Skipped Deployments

Project	Deployment	Preview	Updated (UTC)
rivet-cloud	Ignored		Nov 18, 2025 2:30am
rivet-inspector	Ignored	Preview	Nov 18, 2025 2:30am
rivet-site	Ignored	Preview	Nov 18, 2025 2:30am
rivetkit-serverless	Skipped		Nov 18, 2025 2:30am

MasterPtato · 2025-11-18T02:30:26Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2025-11-18T02:32:48Z

Pull Request Review: Batch Listen Feature

This PR introduces batch signal listening functionality to the workflow engine, allowing workflows to receive multiple signals in a single operation. This is a significant enhancement that can improve throughput and reduce overhead when dealing with multiple signals.

Summary

Changes: 1,215 additions, 732 deletions across 16 files
Core Feature: Transforms single-signal listening to batch-based signal processing

Positive Aspects

Architecture & Design

Well-structured approach: The batch listening is implemented at multiple layers (trait, context, database) with proper abstraction
Backwards compatibility: The single listen() method is preserved and now delegates to listen_n() with limit=1, maintaining existing API contracts
Consistent pattern: Both listen_with_timeout and listen_until variants support batch operations

Code Quality

Type safety: The Listen trait now returns Vec<Self>, enforcing that all signal types support batch operations
History tracking: New SignalsEvent properly tracks multiple signals with coordinated signal_ids, names, and bodies vectors
Error handling: Maintains existing error patterns while adapting to batch scenarios

Critical Issues

1. Breaking Change: Macro-generated `Listen` Implementation

Location: engine/packages/gasoline-macros/src/lib.rs:375-381

The #[signal] macro now generates a Listen implementation that returns Vec<Self>:

async fn listen(ctx: &mut gas::prelude::ListenCtx, limit: usize) -> gas::prelude::WorkflowResult<Vec<Self>> {
    ctx
        .listen_any(&[<Self as gas::signal::Signal>::NAME], limit)
        .await?
        .into_iter()
        .map(|signal| Self::parse(&signal.signal_name, &signal.body))
        .collect()
}

Issue: For a single signal type, this will return a Vec containing at most limit signals of the same type. However, if only 1 signal is available, it returns a 1-element vec. The consumer must now handle this differently.

Impact: All workflows using ctx.listen::<SomeSignal>() will need updates, or you need migration logic.

Recommendation: Document this breaking change clearly and consider:

Providing a migration guide for existing workflows
Adding deprecation warnings if keeping old single-signal methods temporarily

2. Inconsistent Limit Behavior

Location: engine/packages/gasoline/src/ctx/listen.rs:40-49

pub async fn listen_any(
    &mut self,
    signal_names: &[&'static str],
    limit: usize,
) -> WorkflowResult<Vec<SignalData>> {
    // ...
    if signals.is_empty() {
        return Err(WorkflowError::NoSignalFound(Box::from(signal_names)));
    }

Issue: The function fetches up to limit signals across ALL signal names, but the limit is applied per signal name in the database layer (line 1953 in db/kv/mod.rs). This creates ambiguity:

Does limit=5 mean 5 total signals, or 5 per signal type?
The current implementation fetches up to limit per signal type, then flattens them

Example Problem:

// Listening for 2 signal types with limit=5
// Could return up to 10 signals (5 per type)!
ctx.listen_any(&["signal_a", "signal_b"], 5).await?

Recommendation:

Clarify the limit semantics in documentation
Consider renaming to limit_per_type or enforcing a global limit by taking only the first N after flattening

3. Missing Signal ID Tracking

Location: engine/packages/gasoline/src/history/event.rs:210-214

pub struct SignalsEvent {
    pub names: Vec<String>,
    pub bodies: Vec<Box<serde_json::value::RawValue>>,
}

Issue: The public SignalsEvent struct in the event history doesn't include signal_ids, but the debug version does (line 1535 in db/kv/debug.rs). This makes it impossible to:

Correlate signals to their original IDs in user code
Debug which specific signals were processed
Implement proper idempotency checks

Recommendation: Add pub signal_ids: Vec<Id> to the public SignalsEvent struct.

4. Data Consistency Risk in History

Location: engine/packages/gasoline/src/db/kv/debug.rs:1542-1577

The TryFrom<WorkflowHistoryEventBuilder> for SignalsEvent checks that vectors are non-empty but doesn't validate they have the same length:

Ok(SignalsEvent {
    signal_ids: if value.indexed_signal_ids.is_empty() {
        return Err(WorkflowError::MissingEventData("signal_id"));
    } else {
        value.indexed_signal_ids
    },
    names: if value.indexed_names.is_empty() {
        return Err(WorkflowError::MissingEventData("name"));
    } else {
        value.indexed_names
    },
    bodies: if value.indexed_input_chunks.is_empty() {
        // ...
    }
})

Issue: If the vectors have different lengths (due to corruption or partial writes), this could cause panics or incorrect signal matching.

Recommendation: Add validation:

ensure!(
    value.indexed_signal_ids.len() == value.indexed_names.len() 
        && value.indexed_names.len() == value.indexed_input_chunks.len(),
    "signal event data vectors must have equal length"
);

Moderate Issues

5. Error Context Loss

Location: engine/packages/gasoline/src/ctx/workflow.rs:707

pub async fn listen<T: Listen>(&mut self) -> Result<T> {
    let signals = self.listen_n::<T>(1).in_current_span().await?;
    signals
        .into_iter()
        .next()
        .context("must return at least 1 signal")
}

Issue: The error message "must return at least 1 signal" is generic and doesn't indicate which signal type was expected. This makes debugging harder.

Recommendation:

.with_context(|| format!("must return at least 1 signal for type {}", std::any::type_name::<T>()))

6. Metrics Removed Without Explanation

Location: engine/packages/gasoline/src/ctx/listen.rs (removed lines 28-42)

The SIGNAL_PULL_DURATION metric was removed, but it provided valuable insight into database performance.

Recommendation: Either restore the metric with batch-aware labels, or document why it was removed in the commit message.

7. Unused Loop Location Parameter

Location: engine/packages/gasoline/src/db/kv/mod.rs:1922

_loop_location: Option<&Location>,

Issue: The parameter is prefixed with _, indicating it's unused. If it's not needed, remove it from the trait definition.

Recommendation: Clean up unused parameters or document why they're reserved for future use.

Minor Issues & Suggestions

8. Display Implementation Inefficiency

Location: engine/packages/gasoline/src/history/event.rs:77-83

EventData::Signals(signals) => {
    let mut unique_names = signals.names.clone();
    unique_names.sort();
    unique_names.dedup();
    write!(f, "signals {:?}", unique_names.join(", "))
}

Suggestion: This allocates and clones for every display call. Consider using a HashSet or caching if displayed frequently.

9. Naming: `last_attempt` vs `last_try`

Location: engine/packages/gasoline/src/ctx/listen.rs:17

The parameter was renamed from last_try to last_attempt. While more professional, ensure all documentation references are updated.

10. Test Coverage

Observation: No new test files were added for the batch listening functionality.

Recommendation: Add tests for:

Batch signal reception with multiple signals
Limit behavior with various signal counts
History replay with SignalsEvent
Edge cases (empty results, limit=0, etc.)

Security Considerations

11. Resource Exhaustion Risk

Location: Database signal fetching logic

Issue: The limit parameter is user-controlled (via listen_n()). If a workflow requests limit=1000000, it could:

Consume excessive memory
Cause database performance issues
Enable DoS attacks

Recommendation:

Add a maximum limit constant (e.g., MAX_BATCH_SIZE = 100)
Document the reasonable limit range
Consider adding metrics for large batch requests

Performance Considerations

Positive

Batch operations reduce database round-trips
Parallel signal stream fetching (line 1941-1972 in db/kv/mod.rs) is well-designed

Potential Concerns

Multiple signal types with large limits could cause memory spikes
No pagination for very large signal backlogs

Documentation & Style

Per CLAUDE.md:

✅ Uses structured logging correctly (e.g., tracing::info!(?x, "foo"))
✅ Lowercase log messages
✅ Proper error handling with anyhow::Result
✅ Follows timestamp naming (create_ts)
❌ Could use more inline documentation for complex functions like pull_next_signals

Recommendations

High Priority

Clarify limit semantics - Document whether limit is per-signal-type or total
Validate vector lengths in SignalsEvent deserialization
Add signal_ids to public API for better debugging and idempotency
Add integration tests for batch signal handling
Document breaking changes for workflows using the old API

Medium Priority

Restore or replace removed metrics
Add maximum limit validation to prevent resource exhaustion
Improve error messages with signal type information
Clean up unused parameters

Low Priority

Optimize Display implementation
Add inline documentation for complex database operations

Conclusion

This is a well-architected feature that adds valuable batch processing capabilities to the workflow engine. The implementation is mostly solid with good separation of concerns and proper abstraction layers.

However, there are critical issues around limit semantics and data consistency that should be addressed before merging. The breaking API changes also need clear documentation and migration guidance.

Overall Assessment: Approve with requested changes

Recommendation: Address critical issues #1-4, then merge. Other issues can be follow-up PRs.

🤖 Generated with Claude Code

feat(gas): add batch listen

25ece3d

vercel bot temporarily deployed to Preview – rivetkit-serverless November 18, 2025 02:30 Inactive

This was referenced Nov 18, 2025

chore(gas): add test for history with holes #3477

Draft

fix(gas): fix batch listen, fix history for graceful signal send in workflows #3479

Draft

fix(pb): rewrite runner wf to handle batch signals #3480

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(gas): add batch listen #3478

feat(gas): add batch listen #3478

Uh oh!

MasterPtato commented Nov 18, 2025

Uh oh!

vercel bot commented Nov 18, 2025 •

edited

Loading

Uh oh!

MasterPtato commented Nov 18, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(gas): add batch listen #3478

Are you sure you want to change the base?

feat(gas): add batch listen #3478

Uh oh!

Conversation

MasterPtato commented Nov 18, 2025

Uh oh!

vercel bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MasterPtato commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Nov 18, 2025

Pull Request Review: Batch Listen Feature

Summary

Positive Aspects

Architecture & Design

Code Quality

Critical Issues

1. Breaking Change: Macro-generated Listen Implementation

2. Inconsistent Limit Behavior

3. Missing Signal ID Tracking

4. Data Consistency Risk in History

Moderate Issues

5. Error Context Loss

6. Metrics Removed Without Explanation

7. Unused Loop Location Parameter

Minor Issues & Suggestions

8. Display Implementation Inefficiency

9. Naming: last_attempt vs last_try

10. Test Coverage

Security Considerations

11. Resource Exhaustion Risk

Performance Considerations

Positive

Potential Concerns

Documentation & Style

Recommendations

High Priority

Medium Priority

Low Priority

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Nov 18, 2025 •

edited

Loading

MasterPtato commented Nov 18, 2025 •

edited

Loading

1. Breaking Change: Macro-generated `Listen` Implementation

9. Naming: `last_attempt` vs `last_try`