Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Nov 18, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

4 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 18, 2025 2:30am
rivet-inspector Ignored Ignored Preview Nov 18, 2025 2:30am
rivet-site Ignored Ignored Preview Nov 18, 2025 2:30am
rivetkit-serverless Skipped Skipped Nov 18, 2025 2:30am

Copy link
Contributor Author

MasterPtato commented Nov 18, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 18, 2025

Pull Request Review: Batch Listen Feature

This PR introduces batch signal listening functionality to the workflow engine, allowing workflows to receive multiple signals in a single operation. This is a significant enhancement that can improve throughput and reduce overhead when dealing with multiple signals.

Summary

Changes: 1,215 additions, 732 deletions across 16 files
Core Feature: Transforms single-signal listening to batch-based signal processing


Positive Aspects

Architecture & Design

  • Well-structured approach: The batch listening is implemented at multiple layers (trait, context, database) with proper abstraction
  • Backwards compatibility: The single listen() method is preserved and now delegates to listen_n() with limit=1, maintaining existing API contracts
  • Consistent pattern: Both listen_with_timeout and listen_until variants support batch operations

Code Quality

  • Type safety: The Listen trait now returns Vec<Self>, enforcing that all signal types support batch operations
  • History tracking: New SignalsEvent properly tracks multiple signals with coordinated signal_ids, names, and bodies vectors
  • Error handling: Maintains existing error patterns while adapting to batch scenarios

Critical Issues

1. Breaking Change: Macro-generated Listen Implementation

Location: engine/packages/gasoline-macros/src/lib.rs:375-381

The #[signal] macro now generates a Listen implementation that returns Vec<Self>:

async fn listen(ctx: &mut gas::prelude::ListenCtx, limit: usize) -> gas::prelude::WorkflowResult<Vec<Self>> {
    ctx
        .listen_any(&[<Self as gas::signal::Signal>::NAME], limit)
        .await?
        .into_iter()
        .map(|signal| Self::parse(&signal.signal_name, &signal.body))
        .collect()
}

Issue: For a single signal type, this will return a Vec containing at most limit signals of the same type. However, if only 1 signal is available, it returns a 1-element vec. The consumer must now handle this differently.

Impact: All workflows using ctx.listen::<SomeSignal>() will need updates, or you need migration logic.

Recommendation: Document this breaking change clearly and consider:

  • Providing a migration guide for existing workflows
  • Adding deprecation warnings if keeping old single-signal methods temporarily

2. Inconsistent Limit Behavior

Location: engine/packages/gasoline/src/ctx/listen.rs:40-49

pub async fn listen_any(
    &mut self,
    signal_names: &[&'static str],
    limit: usize,
) -> WorkflowResult<Vec<SignalData>> {
    // ...
    if signals.is_empty() {
        return Err(WorkflowError::NoSignalFound(Box::from(signal_names)));
    }

Issue: The function fetches up to limit signals across ALL signal names, but the limit is applied per signal name in the database layer (line 1953 in db/kv/mod.rs). This creates ambiguity:

  • Does limit=5 mean 5 total signals, or 5 per signal type?
  • The current implementation fetches up to limit per signal type, then flattens them

Example Problem:

// Listening for 2 signal types with limit=5
// Could return up to 10 signals (5 per type)!
ctx.listen_any(&["signal_a", "signal_b"], 5).await?

Recommendation:

  • Clarify the limit semantics in documentation
  • Consider renaming to limit_per_type or enforcing a global limit by taking only the first N after flattening

3. Missing Signal ID Tracking

Location: engine/packages/gasoline/src/history/event.rs:210-214

pub struct SignalsEvent {
    pub names: Vec<String>,
    pub bodies: Vec<Box<serde_json::value::RawValue>>,
}

Issue: The public SignalsEvent struct in the event history doesn't include signal_ids, but the debug version does (line 1535 in db/kv/debug.rs). This makes it impossible to:

  • Correlate signals to their original IDs in user code
  • Debug which specific signals were processed
  • Implement proper idempotency checks

Recommendation: Add pub signal_ids: Vec<Id> to the public SignalsEvent struct.

4. Data Consistency Risk in History

Location: engine/packages/gasoline/src/db/kv/debug.rs:1542-1577

The TryFrom<WorkflowHistoryEventBuilder> for SignalsEvent checks that vectors are non-empty but doesn't validate they have the same length:

Ok(SignalsEvent {
    signal_ids: if value.indexed_signal_ids.is_empty() {
        return Err(WorkflowError::MissingEventData("signal_id"));
    } else {
        value.indexed_signal_ids
    },
    names: if value.indexed_names.is_empty() {
        return Err(WorkflowError::MissingEventData("name"));
    } else {
        value.indexed_names
    },
    bodies: if value.indexed_input_chunks.is_empty() {
        // ...
    }
})

Issue: If the vectors have different lengths (due to corruption or partial writes), this could cause panics or incorrect signal matching.

Recommendation: Add validation:

ensure!(
    value.indexed_signal_ids.len() == value.indexed_names.len() 
        && value.indexed_names.len() == value.indexed_input_chunks.len(),
    "signal event data vectors must have equal length"
);

Moderate Issues

5. Error Context Loss

Location: engine/packages/gasoline/src/ctx/workflow.rs:707

pub async fn listen<T: Listen>(&mut self) -> Result<T> {
    let signals = self.listen_n::<T>(1).in_current_span().await?;
    signals
        .into_iter()
        .next()
        .context("must return at least 1 signal")
}

Issue: The error message "must return at least 1 signal" is generic and doesn't indicate which signal type was expected. This makes debugging harder.

Recommendation:

.with_context(|| format!("must return at least 1 signal for type {}", std::any::type_name::<T>()))

6. Metrics Removed Without Explanation

Location: engine/packages/gasoline/src/ctx/listen.rs (removed lines 28-42)

The SIGNAL_PULL_DURATION metric was removed, but it provided valuable insight into database performance.

Recommendation: Either restore the metric with batch-aware labels, or document why it was removed in the commit message.

7. Unused Loop Location Parameter

Location: engine/packages/gasoline/src/db/kv/mod.rs:1922

_loop_location: Option<&Location>,

Issue: The parameter is prefixed with _, indicating it's unused. If it's not needed, remove it from the trait definition.

Recommendation: Clean up unused parameters or document why they're reserved for future use.


Minor Issues & Suggestions

8. Display Implementation Inefficiency

Location: engine/packages/gasoline/src/history/event.rs:77-83

EventData::Signals(signals) => {
    let mut unique_names = signals.names.clone();
    unique_names.sort();
    unique_names.dedup();
    write!(f, "signals {:?}", unique_names.join(", "))
}

Suggestion: This allocates and clones for every display call. Consider using a HashSet or caching if displayed frequently.

9. Naming: last_attempt vs last_try

Location: engine/packages/gasoline/src/ctx/listen.rs:17

The parameter was renamed from last_try to last_attempt. While more professional, ensure all documentation references are updated.

10. Test Coverage

Observation: No new test files were added for the batch listening functionality.

Recommendation: Add tests for:

  • Batch signal reception with multiple signals
  • Limit behavior with various signal counts
  • History replay with SignalsEvent
  • Edge cases (empty results, limit=0, etc.)

Security Considerations

11. Resource Exhaustion Risk

Location: Database signal fetching logic

Issue: The limit parameter is user-controlled (via listen_n()). If a workflow requests limit=1000000, it could:

  • Consume excessive memory
  • Cause database performance issues
  • Enable DoS attacks

Recommendation:

  • Add a maximum limit constant (e.g., MAX_BATCH_SIZE = 100)
  • Document the reasonable limit range
  • Consider adding metrics for large batch requests

Performance Considerations

Positive

  • Batch operations reduce database round-trips
  • Parallel signal stream fetching (line 1941-1972 in db/kv/mod.rs) is well-designed

Potential Concerns

  • Multiple signal types with large limits could cause memory spikes
  • No pagination for very large signal backlogs

Documentation & Style

Per CLAUDE.md:

  • ✅ Uses structured logging correctly (e.g., tracing::info!(?x, "foo"))
  • ✅ Lowercase log messages
  • ✅ Proper error handling with anyhow::Result
  • ✅ Follows timestamp naming (create_ts)
  • ❌ Could use more inline documentation for complex functions like pull_next_signals

Recommendations

High Priority

  1. Clarify limit semantics - Document whether limit is per-signal-type or total
  2. Validate vector lengths in SignalsEvent deserialization
  3. Add signal_ids to public API for better debugging and idempotency
  4. Add integration tests for batch signal handling
  5. Document breaking changes for workflows using the old API

Medium Priority

  1. Restore or replace removed metrics
  2. Add maximum limit validation to prevent resource exhaustion
  3. Improve error messages with signal type information
  4. Clean up unused parameters

Low Priority

  1. Optimize Display implementation
  2. Add inline documentation for complex database operations

Conclusion

This is a well-architected feature that adds valuable batch processing capabilities to the workflow engine. The implementation is mostly solid with good separation of concerns and proper abstraction layers.

However, there are critical issues around limit semantics and data consistency that should be addressed before merging. The breaking API changes also need clear documentation and migration guidance.

Overall Assessment: Approve with requested changes

Recommendation: Address critical issues #1-4, then merge. Other issues can be follow-up PRs.


🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants