chore(pegboard): allow configuring runner_eligible_threshold and runner_lost_threshold #3372

NathanFlurry · 2025-11-05T05:21:53Z

No description provided.

…er_lost_threshold

vercel · 2025-11-05T05:21:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
rivetkit-serverless	Ready	Preview	Comment	Nov 5, 2025 5:23am

3 Skipped Deployments

Project	Deployment	Preview	Updated (UTC)
rivet-cloud	Ignored		Nov 5, 2025 5:23am
rivet-inspector	Ignored	Preview	Nov 5, 2025 5:23am
rivet-site	Ignored	Preview	Nov 5, 2025 5:23am

NathanFlurry · 2025-11-05T05:22:11Z

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2025-11-05T05:23:29Z

PR Review: Allow Configuring Runner Thresholds

Summary

This PR makes the runner eligibility and lost thresholds configurable via the Pegboard config, converting them from hardcoded constants to configurable parameters. This improves flexibility for different deployment environments, particularly for development where faster iteration is beneficial.

Code Quality ✅

Strengths:

Clear separation of concerns with well-documented config getters
Consistent pattern follows existing config methods (e.g., actor_start_threshold, base_retry_timeout)
Good documentation with units clearly specified (milliseconds)
Properly marked as Experimental which is appropriate for infrastructure tuning parameters

Adherence to CLAUDE.md conventions:

✅ Uses hard tabs for formatting (as per rustfmt.toml)
✅ Follows existing patterns in neighboring code
✅ Config accessors properly use unwrap_or() with sensible defaults
✅ Uses util::duration::seconds() helper for readability

Implementation Review 📝

Config Structure (engine/packages/config/src/config/pegboard.rs:38-84)

✅ Properly adds Option<i64> fields with clear documentation
✅ Default values maintained (10s and 15s) preserve existing behavior
✅ Accessor methods follow established patterns

Usage Updates:
All 5 usages of the constants have been correctly migrated:

✅ runner.rs:84 - Timeout for listen operation
✅ runner.rs:115 - Protocol metadata for client
✅ runner.rs:940 - Check expired activity
✅ runtime.rs:151 - Actor allocation ping threshold
✅ update_alloc_idx.rs:48 - Runner eligibility check

Development Configuration (rivetkit-typescript):

✅ Sets 5s/7s thresholds for faster development iteration
✅ Good comment explaining the rationale (default ping interval is 3s)
⚠️ Minor concern: The 2s grace period (5s threshold - 3s ping) is quite tight. If a ping is delayed by network issues, runners might be marked ineligible prematurely.

Potential Issues & Concerns

1. ⚠️ Tight Development Thresholds

RIVET__PEGBOARD__RUNNER_ELIGIBLE_THRESHOLD: "5000",  // 5s
RIVET__PEGBOARD__RUNNER_LOST_THRESHOLD: "7000",      // 7s

With a 3s ping interval:

Eligible threshold allows only ~1.67 missed pings (5s / 3s)
Lost threshold allows only ~2.33 missed pings (7s / 3s)

Recommendation: Consider using 6s/9s (2x and 3x ping interval) for more robustness in development environments with variable network conditions.

2. ✅ No Configuration Validation

While the code doesn't validate that runner_lost_threshold > runner_eligible_threshold, this is acceptable because:

The system gracefully handles any ordering
These are experimental tuning parameters
Over-validation can be restrictive

3. ⚠️ Missing Test Coverage

No unit tests verify the new config methods
No integration tests validate behavior with different threshold values
Recommendation: Consider adding tests that verify:
- Config getters return correct defaults
- Runner expiration logic works with custom thresholds
- Runner eligibility checks respect the configured threshold

Performance Considerations ✅

Positive impacts:

No performance regression - same computation paths, just parameterized
Config values are read from ctx.config() which is typically cached
Shorter development thresholds reduce wait times for local testing

No concerns about:

Additional memory overhead (negligible - two i64 values)
Runtime cost (config access is lightweight)

Security Concerns ✅

No security issues identified:

Threshold values don't expose sensitive information
No risk of injection attacks (values are integers)
Configuration is environment-based, not user-controlled
Worst case: misconfiguration leads to aggressive runner eviction (availability issue, not security)

Architecture & Design ✅

Well-aligned with system design:

Follows the established pattern for Pegboard configuration
Maintains backward compatibility via default values
Separates configuration from implementation logic
Properly threads config through all necessary callsites

Recommendations

High Priority

None - the PR is fundamentally sound.

Medium Priority

Consider adjusting development thresholds to 6000ms/9000ms for better resilience
Add basic unit tests for the config getters

Low Priority

Add a comment in the config explaining the relationship between these thresholds and the default ping interval
Consider adding a validation warning if runner_lost_threshold <= runner_eligible_threshold

Final Verdict ✅ APPROVE WITH MINOR SUGGESTIONS

This is a clean, well-implemented change that follows project conventions. The core functionality is solid, and the only concerns are around the aggressiveness of the development environment thresholds.

Strengths:

Clean refactoring from constants to config
Maintains backward compatibility
Good documentation
Enables faster development iteration

Minor improvements suggested:

Adjust development thresholds for more robustness
Add test coverage

Great work! 🚀

chore(pegboard): allow configuring runner_eligible_threshold and runn…

8dadce9

…er_lost_threshold

NathanFlurry requested a review from MasterPtato November 5, 2025 05:21

NathanFlurry merged commit d5ee786 into main Nov 5, 2025
8 of 11 checks passed

NathanFlurry deleted the 11-05-chore_pegboard_allow_configuring_runner_eligible_threshold_and_runner_lost_threshold branch November 5, 2025 05:22

vercel bot deployed to Preview – rivetkit-serverless November 5, 2025 05:23 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(pegboard): allow configuring runner_eligible_threshold and runner_lost_threshold #3372

chore(pegboard): allow configuring runner_eligible_threshold and runner_lost_threshold #3372

Uh oh!

NathanFlurry commented Nov 5, 2025

Uh oh!

vercel bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

NathanFlurry commented Nov 5, 2025

Uh oh!

claude bot commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore(pegboard): allow configuring runner_eligible_threshold and runner_lost_threshold #3372

chore(pegboard): allow configuring runner_eligible_threshold and runner_lost_threshold #3372

Uh oh!

Conversation

NathanFlurry commented Nov 5, 2025

Uh oh!

vercel bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

NathanFlurry commented Nov 5, 2025

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Nov 5, 2025

PR Review: Allow Configuring Runner Thresholds

Summary

Code Quality ✅

Implementation Review 📝

Potential Issues & Concerns

1. ⚠️ Tight Development Thresholds

2. ✅ No Configuration Validation

3. ⚠️ Missing Test Coverage

Performance Considerations ✅

Security Concerns ✅

Architecture & Design ✅

Recommendations

High Priority

Medium Priority

Low Priority

Final Verdict ✅ APPROVE WITH MINOR SUGGESTIONS

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Nov 5, 2025 •

edited

Loading