-
Couldn't load subscription status.
- Fork 130
fix(pegboard): revise actor rescheduling algorithm, add client metrics #2531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR enhances Pegboard's actor management and monitoring capabilities by revising the rescheduling algorithm and adding detailed client resource metrics. The changes focus on improving reliability and observability of the system.
- Added client resource metrics (
CLIENT_MEMORY_TOTAL,CLIENT_CPU_TOTAL) with proper units (MiB/millicores) in/packages/edge/services/pegboard/src/metrics.rs - Implemented retry backoff reset with
RETRY_RESET_DURATION_MS(10 min) in/packages/edge/services/pegboard/src/workflows/actor/mod.rs - Introduced structured
RescheduleStatefor better retry tracking in/packages/edge/services/pegboard/src/workflows/actor/runtime.rs - Optimized metrics collection with single transaction fetch in
/packages/edge/services/pegboard/src/workflows/client/mod.rs
4 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile
| let now = util::timestamp::now(); | ||
| state.retry_count = | ||
| if state.last_retry_ts < now - i64::try_from(2 * backoff.current_duration())? { | ||
| 0 | ||
| } else { | ||
| state.retry_count + 1 | ||
| }; | ||
| state.retry_count = if state.last_retry_ts < now - RETRY_RESET_DURATION_MS { | ||
| 0 | ||
| } else { | ||
| state.retry_count + 1 | ||
| }; | ||
| state.last_retry_ts = now; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Potential race condition - state.last_retry_ts is updated after the check, which could lead to incorrect retry count calculations if multiple retries happen very close together
| .set(cpu.try_into()?); | ||
| .set(total_cpu.try_into()?); | ||
|
|
||
| let alllocated_mem = total_mem.saturating_sub(remaining_mem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: Typo in variable name 'alllocated_mem' (three 'l's)
| let alllocated_mem = total_mem.saturating_sub(remaining_mem); | |
| let allocated_mem = total_mem.saturating_sub(remaining_mem); |
Deploying rivet with
|
| Latest commit: |
bbbf37e
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://45988c62.rivet.pages.dev |
| Branch Preview URL: | https://06-04-fix-pegboard-revise-ac.rivet.pages.dev |
51a568c to
f9ef2e7
Compare
f9ef2e7 to
a52399a
Compare
31b6aa7 to
b7c048a
Compare
Deploying rivet-studio with
|
| Latest commit: |
bbbf37e
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://19017757.rivet-studio.pages.dev |
| Branch Preview URL: | https://06-04-fix-pegboard-revise-ac.rivet-studio.pages.dev |
Deploying rivet-hub with
|
| Latest commit: |
bbbf37e
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://416232b6.rivet-hub-7jb.pages.dev |
| Branch Preview URL: | https://06-04-fix-pegboard-revise-ac.rivet-hub-7jb.pages.dev |
4184d72 to
e53a26f
Compare
6f7c837 to
ec71a65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
Enhanced resource tracking and rescheduling logic in Pegboard service, with improved state management and metrics collection.
- Reorganized metric definitions in
packages/edge/services/pegboard/src/metrics.rsto group related CPU and memory metrics for better maintainability - Improved metric collection in
packages/edge/services/pegboard/src/workflows/client/mod.rswith explicit draining state handling - Enhanced state persistence for actor rescheduling in
packages/edge/services/pegboard/src/workflows/actor/runtime.rswith global state tracking
4 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile
| let (total_mem, total_cpu, remaining_mem, remaining_cpu) = | ||
| ctx.fdb() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Tuple order in destructuring doesn't match database query results. total_mem and total_cpu are swapped compared to query result order.
| let (total_mem, total_cpu, remaining_mem, remaining_cpu) = | |
| ctx.fdb() | |
| let (total_mem, remaining_mem, total_cpu, remaining_cpu) = | |
| ctx.fdb() |
| total_mem, | ||
| remaining_mem, | ||
| total_cpu, | ||
| remaining_cpu, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Tuple return order doesn't match variable names above. Should be (total_mem, total_cpu, remaining_mem, remaining_cpu) to match destructuring.
| total_mem, | |
| remaining_mem, | |
| total_cpu, | |
| remaining_cpu, | |
| total_mem, | |
| total_cpu, | |
| remaining_mem, | |
| remaining_cpu, |
ec71a65 to
bbbf37e
Compare
e53a26f to
b9dcd49
Compare
Merge activity
|
#2531) <!-- Please make sure there is an issue that this PR is correlated to. --> ## Changes <!-- If there are frontend changes, please include screenshots. -->

Changes