fix: add ready_chunks to worker bumps, bumps per tick metric #3855

MasterPtato · 2026-01-13T01:29:51Z

No description provided.

MasterPtato · 2026-01-13T01:30:05Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

pkg-pr-new · 2026-01-13T01:33:02Z

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3855

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3855

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3855

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3855

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3855

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3855

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3855

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3855

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3855

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3855

commit: a084bd6

claude · 2026-01-14T02:09:47Z

PR Review: Add ready_chunks to worker bumps, bumps per tick metric

Summary

This PR introduces two improvements to the Gasoline worker's bump handling:

Uses ready_chunks(1024) to batch bump messages together
Adds a new metric WORKER_BUMPS_PER_TICK to track how many bump messages are processed per tick

Positive Observations

✅ Good performance optimization: Using ready_chunks is an excellent approach to coalesce multiple bump notifications into a single wake event. This prevents unnecessary tick processing when many bumps arrive in quick succession.

✅ Useful observability: The new histogram metric will provide valuable insights into bump batching behavior and help identify if the chunk size of 1024 is appropriate.

✅ Proper histogram buckets: The bucket distribution (1, 2, 3, 4, 8, 16, 32, 64, 128, 256, 512, 1024) is well-suited for tracking bump counts with good granularity at lower values.

✅ Follows project conventions: Code style matches the existing patterns, structured logging is used properly, and the metric is registered correctly.

Issues and Concerns

1. Logic Error: Metric only recorded on Some, not None (Minor)

In worker.rs:112-118, the metric is only observed when bumps are received (Some(bumps)). However, the tick still happens and processes workflows even when the subscription ends (None case), which immediately breaks the loop. This is likely fine, but worth noting that the metric won't capture the final state before shutdown.

2. Missing Logging (Minor)

The original code didn't log anything about bumps, and neither does the new code. Consider adding a debug log when a large batch is received to help with diagnostics.

3. Magic Number: 1024 chunk size (Minor)

The chunk size of 1024 is hardcoded without explanation. While this is likely fine for most cases, consider:

Documenting why 1024 was chosen
Whether this should be configurable
What happens if the actual burst size exceeds 1024 (answer: it processes them in multiple ticks, which is fine)

4. Ready Chunks Behavior Clarification (Documentation)

The comment on line 77 states "multiple bumps in a row should be processed as 1 bump," which is accurate. However, it's worth understanding that ready_chunks(1024):

Returns immediately with available items (up to 1024)
Does NOT wait to fill the buffer
This is the desired behavior for responsiveness

The comment is correct, but could be clearer about the immediate-return behavior.

Recommendations

Priority: Low

Consider adding a comment explaining the choice of 1024 as the chunk size
Optional: Add debug logging for large bump batches to help with diagnostics
Optional: Add a test case that verifies metric recording with bump batching

Code Quality Assessment

Style: ✅ Follows project conventions
Error Handling: ✅ Appropriate (continues on None by breaking)
Performance: ✅ Improvement from baseline
Observability: ✅ Good metric added
Security: ✅ No concerns
Test Coverage: ⚠️ No new tests, but this is observability code

Verdict

LGTM with minor suggestions - This is a solid improvement that adds both performance optimization and observability. The issues noted are minor and mostly documentation-related. The code is ready to merge as-is, though the suggestions above would make it even better.

Review generated by Claude Code

claude · 2026-01-14T02:10:44Z

Pull Request Review: Worker Bumps Optimization

Summary

This PR adds ready_chunks to the worker bump subscription and introduces a histogram metric to track the number of bumps received per tick. The changes batch multiple consecutive bump messages into a single processing cycle, reducing overhead from redundant workflow pulls.

✅ Positive Aspects

Smart optimization - Batching multiple bump notifications into a single tick prevents redundant database queries since pull_workflows() will fetch all awake workflows anyway
Good observability - The WORKER_BUMPS_PER_TICK histogram provides valuable insights into how much batching is occurring
Appropriate histogram buckets - Exponential distribution (1, 2, 4, 8...1024) is well-suited for tracking batch sizes with reasonable cardinality
Follows conventions - Metric naming follows gasoline patterns with lowercase message and structured labels
Correct logic - The bumps are properly recorded and then execution continues to self.tick(&cache).await, so the tick does get processed

🔍 Issues & Concerns

1. Hard-coded Magic Number (Medium Priority)

.ready_chunks(1024);

The chunk size of 1024 is hard-coded without explanation. Consider:

Recommendation: Extract to a named constant with documentation:

/// Maximum number of bump messages to batch per tick.
/// Larger batches reduce overhead but may increase latency for early bumps.
const WORKER_BUMP_BATCH_SIZE: usize = 1024;

let mut bump_sub = self
    .db
    .bump_sub(BumpSubSubject::Worker)
    .await?
    .ready_chunks(WORKER_BUMP_BATCH_SIZE);

Rationale needed: Why 1024? Is this based on:

Expected concurrent workflow completions?
Memory constraints for the Vec?
Balance between latency and efficiency?

2. Unclear Comment (Low Priority)

// We use ready_chunks because multiple bumps in a row should be processed as 1 bump

This comment could be more helpful:

Suggested improvement:

// Batch bump notifications to reduce redundant workflow pulls. When many workflows
// complete simultaneously, we receive multiple bump messages, but pull_workflows()
// fetches all awake workflows regardless. Batching up to 1024 bumps (or until none
// are immediately ready) reduces unnecessary wake-ups and database queries.

This explains:

Why batching is beneficial (prevents redundant pulls)
How it works (up to 1024 or until none ready)
What the impact is (reduces wake-ups and DB queries)

3. Potential Latency Consideration (Info/Question)

ready_chunks(1024) collects items that are immediately available in the stream buffer, up to 1024. This means:

✅ Good: Won't delay a single bump waiting for more items
✅ Good: Will batch many simultaneous bumps efficiently
⚠️ Question: If exactly 1024 bumps arrive, does the next batch get processed immediately, or is there a tick_interval delay?

The behavior seems correct, but worth understanding the edge case where bumps arrive faster than ticks can process them.

4. Metric Allocation (Low Priority)

metrics::WORKER_BUMPS_PER_TICK
    .with_label_values(&[self.worker_id.to_string().as_str()])
    .observe(bumps.len() as f64);

Minor optimization opportunity: self.worker_id.to_string() allocates on every bump. Consider caching the string:

// In Worker struct or earlier in run()
let worker_id_str = self.worker_id.to_string();

// Then use:
metrics::WORKER_BUMPS_PER_TICK
    .with_label_values(&[&worker_id_str])
    .observe(bumps.len() as f64);

This is very minor since bumps likely aren't that frequent, but worth noting.

🧪 Testing Recommendations

While this is a straightforward change, consider testing:

High bump volume: Send >1024 bumps rapidly and verify batching works correctly
Single bump latency: Verify single bumps aren't delayed
Metric accuracy: Check that the histogram correctly reflects batch sizes
Subscription loss: Verify the None case still properly errors

📊 Observability

The histogram will show:

p50/p90/p99: Typical batch sizes
Sum/Count: Total bumps vs. total ticks (efficiency ratio)
Max bucket: Whether 1024 is ever reached (indicating if limit should be raised)

Consider adding a dashboard panel for this metric to monitor the optimization's effectiveness.

🎯 Summary Rating

Code Quality: 8/10 - Solid implementation, minor magic number issue
Performance: 9/10 - Clever optimization with good batching strategy
Observability: 9/10 - Excellent metric addition for monitoring
Maintainability: 7/10 - Could use better constant and comment clarity

Recommended Actions Before Merge

Must fix:

None - code is functionally correct

Should consider:

Extract 1024 to a named constant with documentation
Improve the comment explaining the rationale for batching
(Optional) Cache worker_id.to_string() for minor efficiency

Overall: This is a good optimization with proper observability. The main improvements needed are documentation clarity around the magic number and the batching rationale. The code is functionally correct and the optimization strategy is sound. ✅

MasterPtato requested a review from NathanFlurry January 13, 2026 01:29

This was referenced Jan 13, 2026

fix(guard): clean up guard #3862

Open

fix(otel): enrich http traces #3878

Open

chore(gas): add overview and history docs #3879

Open

MasterPtato force-pushed the 01-12-fix_add_ready_chunks_to_worker_bumps_bumps_per_tick_metric branch from 3df8f27 to a2b521a Compare January 14, 2026 02:05

MasterPtato force-pushed the 01-12-fix_epoxy_add_metrics branch from 5d200c7 to 69795ba Compare January 14, 2026 02:05

MasterPtato force-pushed the 01-12-fix_epoxy_add_metrics branch from 69795ba to 7c91643 Compare January 14, 2026 02:53

MasterPtato force-pushed the 01-12-fix_add_ready_chunks_to_worker_bumps_bumps_per_tick_metric branch from a2b521a to 061a672 Compare January 14, 2026 02:53

MasterPtato force-pushed the 01-12-fix_epoxy_add_metrics branch from 7c91643 to 40274fa Compare January 14, 2026 03:06

MasterPtato force-pushed the 01-12-fix_add_ready_chunks_to_worker_bumps_bumps_per_tick_metric branch from 061a672 to 535af14 Compare January 14, 2026 03:06

NathanFlurry approved these changes Jan 14, 2026

View reviewed changes

MasterPtato force-pushed the 01-12-fix_epoxy_add_metrics branch from 40274fa to cb0fad3 Compare January 14, 2026 19:47

MasterPtato force-pushed the 01-12-fix_add_ready_chunks_to_worker_bumps_bumps_per_tick_metric branch from 535af14 to 24b4824 Compare January 14, 2026 19:47

MasterPtato force-pushed the 01-12-fix_epoxy_add_metrics branch from cb0fad3 to 7a06ae3 Compare January 14, 2026 20:01

MasterPtato force-pushed the 01-12-fix_add_ready_chunks_to_worker_bumps_bumps_per_tick_metric branch from 24b4824 to 45cdaf0 Compare January 14, 2026 20:01

This was referenced Jan 14, 2026

fix(engine-runner): handle shutdown close codes correctly #3896

Open

fix(pegboard): cache none runner pool errors too #3899

Open

MasterPtato force-pushed the 01-12-fix_epoxy_add_metrics branch from 7a06ae3 to 5109a19 Compare January 14, 2026 22:52

MasterPtato force-pushed the 01-12-fix_add_ready_chunks_to_worker_bumps_bumps_per_tick_metric branch from 45cdaf0 to 445bb05 Compare January 14, 2026 22:52

fix: add ready_chunks to worker bumps, bumps per tick metric

a084bd6

MasterPtato force-pushed the 01-12-fix_add_ready_chunks_to_worker_bumps_bumps_per_tick_metric branch from 445bb05 to a084bd6 Compare January 14, 2026 23:02

MasterPtato force-pushed the 01-12-fix_epoxy_add_metrics branch from 5109a19 to 2e32f50 Compare January 14, 2026 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add ready_chunks to worker bumps, bumps per tick metric #3855

fix: add ready_chunks to worker bumps, bumps per tick metric #3855

MasterPtato commented Jan 13, 2026

Uh oh!

MasterPtato commented Jan 13, 2026 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 14, 2026

Uh oh!

claude bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: add ready_chunks to worker bumps, bumps per tick metric #3855

Are you sure you want to change the base?

fix: add ready_chunks to worker bumps, bumps per tick metric #3855

Conversation

MasterPtato commented Jan 13, 2026

Uh oh!

MasterPtato commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

pkg-pr-new bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Jan 14, 2026

PR Review: Add ready_chunks to worker bumps, bumps per tick metric

Summary

Positive Observations

Issues and Concerns

1. Logic Error: Metric only recorded on Some, not None (Minor)

2. Missing Logging (Minor)

3. Magic Number: 1024 chunk size (Minor)

4. Ready Chunks Behavior Clarification (Documentation)

Recommendations

Code Quality Assessment

Verdict

Uh oh!

claude bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Review: Worker Bumps Optimization

Summary

✅ Positive Aspects

🔍 Issues & Concerns

1. Hard-coded Magic Number (Medium Priority)

2. Unclear Comment (Low Priority)

3. Potential Latency Consideration (Info/Question)

4. Metric Allocation (Low Priority)

🧪 Testing Recommendations

📊 Observability

🎯 Summary Rating

Recommended Actions Before Merge

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MasterPtato commented Jan 13, 2026 •

edited

Loading

pkg-pr-new bot commented Jan 13, 2026 •

edited

Loading

claude bot commented Jan 14, 2026 •

edited

Loading