Skip to content

fix(gas): add large txn retries for pulling signals, tighten commit interval for actors#4712

Open
MasterPtato wants to merge 1 commit into04-22-fix_improve_epoxy_propose_errors_add_gcp_logging_fmtfrom
04-22-fix_gas_add_large_txn_retries_for_pulling_signals_tighten_commit_interval_for_actors
Open

fix(gas): add large txn retries for pulling signals, tighten commit interval for actors#4712
MasterPtato wants to merge 1 commit into04-22-fix_improve_epoxy_propose_errors_add_gcp_logging_fmtfrom
04-22-fix_gas_add_large_txn_retries_for_pulling_signals_tighten_commit_interval_for_actors

Conversation

@MasterPtato
Copy link
Copy Markdown
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app Bot commented Apr 23, 2026

🚅 Deployed to the rivet-pr-4712 environment in rivet-frontend

Service Status Web Updated (UTC)
kitchen-sink ❌ Build Failed (View Logs) Web Apr 25, 2026 at 1:08 am
frontend-cloud 😴 Sleeping (View Logs) Web Apr 24, 2026 at 12:14 am
website 😴 Sleeping (View Logs) Web Apr 24, 2026 at 12:04 am
frontend-inspector ❌ Build Failed (View Logs) Web Apr 23, 2026 at 11:51 pm
mcp-hub ✅ Success (View Logs) Web Apr 23, 2026 at 11:50 pm
ladle ❌ Build Failed (View Logs) Web Apr 23, 2026 at 11:50 pm

@claude
Copy link
Copy Markdown

claude Bot commented Apr 23, 2026

PR #4712 Review: fix(gas): add large txn retries for pulling signals, tighten commit interval for actors

Overview

This PR makes two independent changes:

  1. Signal pull retry on large transaction: Wraps the FDB transaction in pull_next_signals in a retry loop that halves the signal fetch limit when a "transaction too large" error occurs.
  2. Tighter commit interval for actor loops: Switches pegboard_actor and pegboard_actor2 from the old ctx.loope(...) API to the builder-based ctx.lupe().commit_interval(5) API, reducing the checkpoint interval from the default 20 down to 5.

Issues

Critical: Infinite loop when limit = 1

In db/kv/mod.rs retry loop:

limit = (limit / 2).max(1);

When limit is already 1, integer division gives 0, then .max(1) clamps back to 1 — the limit never decreases further and the loop spins forever if the transaction is still too large at limit = 1. A guard is needed:

if limit == 1 {
    return Err(WorkflowError::Udb(err.context("failed to pull signals even at limit=1")));
}
limit = limit / 2; // will always be >= 1 after the check above

Stub implementation renders the retry logic dead code

error_is_transaction_too_large always returns false:

pub fn error_is_transaction_too_large(_err: &anyhow::Error) -> bool {
    // Only implemented with fdb
    false
}

The entire retry-on-large-txn path in pull_next_signals is unreachable until the FDB variant is wired in. If the FDB implementation is coming in a follow-up PR, that should be called out explicitly (e.g., a .agent/todo/ entry). If the KV backend can never hit this error, the comment should say so, not just "only implemented with fdb".

Wrong noun in the warning log

tracing::warn!(?limit, "failed pulling workflows due to large txn, trying again with lower limit");

This is inside pull_next_signals — "workflows" should be "signals".


Minor

  • The warning log omits workflow_id, making it hard to trace which workflow triggered the retry. Adding %workflow_id would be helpful for on-call debugging.
  • The loopelupe().commit_interval(5) change is clean. Reducing from the default of 20 down to 5 makes sense for long-lived actor workflows where minimising replay work on restart matters — worth a one-line comment (e.g., // checkpoint frequently to reduce replay on restart) since the magic number 5 would otherwise be opaque to reviewers.
  • The log level for the retry warning looks correct (warn), but a tracing::debug! on each successful retry (with the final limit used) would help measure how often this fires in production.

Summary

The commit-interval tightening is straightforward and looks correct. The signal-pull retry logic has the right shape but has two issues that need to be fixed before merging: the missing exit condition when limit == 1 (infinite loop) and the stub error_is_transaction_too_large that keeps the whole path unreachable. The minor log wording nit and missing workflow_id field are low-priority but worth fixing.

@MasterPtato MasterPtato force-pushed the 04-22-fix_improve_epoxy_propose_errors_add_gcp_logging_fmt branch from 4249ab0 to c3a491c Compare April 24, 2026 18:20
@MasterPtato MasterPtato force-pushed the 04-22-fix_gas_add_large_txn_retries_for_pulling_signals_tighten_commit_interval_for_actors branch from cd88cb0 to a50d021 Compare April 24, 2026 18:20
@MasterPtato MasterPtato force-pushed the 04-22-fix_improve_epoxy_propose_errors_add_gcp_logging_fmt branch from c3a491c to 98c339a Compare April 24, 2026 23:02
@MasterPtato MasterPtato force-pushed the 04-22-fix_gas_add_large_txn_retries_for_pulling_signals_tighten_commit_interval_for_actors branch from a50d021 to 5d39087 Compare April 24, 2026 23:02
@MasterPtato MasterPtato force-pushed the 04-22-fix_gas_add_large_txn_retries_for_pulling_signals_tighten_commit_interval_for_actors branch from 5d39087 to e204537 Compare April 25, 2026 01:07
@MasterPtato MasterPtato force-pushed the 04-22-fix_improve_epoxy_propose_errors_add_gcp_logging_fmt branch from 98c339a to 161b673 Compare April 25, 2026 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant