Skip to content

Conversation

@MasterPtato
Copy link
Contributor

Changes

Copy link
Contributor Author

MasterPtato commented Jun 4, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR enhances Pegboard's actor management and monitoring capabilities by revising the rescheduling algorithm and adding detailed client resource metrics. The changes focus on improving reliability and observability of the system.

  • Added client resource metrics (CLIENT_MEMORY_TOTAL, CLIENT_CPU_TOTAL) with proper units (MiB/millicores) in /packages/edge/services/pegboard/src/metrics.rs
  • Implemented retry backoff reset with RETRY_RESET_DURATION_MS (10 min) in /packages/edge/services/pegboard/src/workflows/actor/mod.rs
  • Introduced structured RescheduleState for better retry tracking in /packages/edge/services/pegboard/src/workflows/actor/runtime.rs
  • Optimized metrics collection with single transaction fetch in /packages/edge/services/pegboard/src/workflows/client/mod.rs

4 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines 794 to 801
let now = util::timestamp::now();
state.retry_count =
if state.last_retry_ts < now - i64::try_from(2 * backoff.current_duration())? {
0
} else {
state.retry_count + 1
};
state.retry_count = if state.last_retry_ts < now - RETRY_RESET_DURATION_MS {
0
} else {
state.retry_count + 1
};
state.last_retry_ts = now;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Potential race condition - state.last_retry_ts is updated after the check, which could lead to incorrect retry count calculations if multiple retries happen very close together

.set(cpu.try_into()?);
.set(total_cpu.try_into()?);

let alllocated_mem = total_mem.saturating_sub(remaining_mem);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Typo in variable name 'alllocated_mem' (three 'l's)

Suggested change
let alllocated_mem = total_mem.saturating_sub(remaining_mem);
let allocated_mem = total_mem.saturating_sub(remaining_mem);

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jun 4, 2025

Deploying rivet with  Cloudflare Pages  Cloudflare Pages

Latest commit: bbbf37e
Status: ✅  Deploy successful!
Preview URL: https://45988c62.rivet.pages.dev
Branch Preview URL: https://06-04-fix-pegboard-revise-ac.rivet.pages.dev

View logs

@MasterPtato MasterPtato force-pushed the 06-04-fix_pegboard_revise_actor_rescheduling_algorithm_add_client_metrics branch from 51a568c to f9ef2e7 Compare June 4, 2025 01:46
@MasterPtato MasterPtato force-pushed the 06-04-fix_pegboard_revise_actor_rescheduling_algorithm_add_client_metrics branch from f9ef2e7 to a52399a Compare June 4, 2025 19:17
@MasterPtato MasterPtato force-pushed the 06-04-fix_fix_build_cache_key branch from 31b6aa7 to b7c048a Compare June 4, 2025 19:17
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jun 4, 2025

Deploying rivet-studio with  Cloudflare Pages  Cloudflare Pages

Latest commit: bbbf37e
Status: ✅  Deploy successful!
Preview URL: https://19017757.rivet-studio.pages.dev
Branch Preview URL: https://06-04-fix-pegboard-revise-ac.rivet-studio.pages.dev

View logs

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jun 4, 2025

Deploying rivet-hub with  Cloudflare Pages  Cloudflare Pages

Latest commit: bbbf37e
Status: ✅  Deploy successful!
Preview URL: https://416232b6.rivet-hub-7jb.pages.dev
Branch Preview URL: https://06-04-fix-pegboard-revise-ac.rivet-hub-7jb.pages.dev

View logs

@MasterPtato MasterPtato force-pushed the 06-04-fix_fix_build_cache_key branch from 4184d72 to e53a26f Compare June 5, 2025 02:10
@MasterPtato MasterPtato force-pushed the 06-04-fix_pegboard_revise_actor_rescheduling_algorithm_add_client_metrics branch from 6f7c837 to ec71a65 Compare June 5, 2025 02:10
Copy link
Contributor Author

@greptileai

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Enhanced resource tracking and rescheduling logic in Pegboard service, with improved state management and metrics collection.

  • Reorganized metric definitions in packages/edge/services/pegboard/src/metrics.rs to group related CPU and memory metrics for better maintainability
  • Improved metric collection in packages/edge/services/pegboard/src/workflows/client/mod.rs with explicit draining state handling
  • Enhanced state persistence for actor rescheduling in packages/edge/services/pegboard/src/workflows/actor/runtime.rs with global state tracking

4 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +953 to 954
let (total_mem, total_cpu, remaining_mem, remaining_cpu) =
ctx.fdb()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Tuple order in destructuring doesn't match database query results. total_mem and total_cpu are swapped compared to query result order.

Suggested change
let (total_mem, total_cpu, remaining_mem, remaining_cpu) =
ctx.fdb()
let (total_mem, remaining_mem, total_cpu, remaining_cpu) =
ctx.fdb()

Comment on lines +996 to +999
total_mem,
remaining_mem,
total_cpu,
remaining_cpu,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Tuple return order doesn't match variable names above. Should be (total_mem, total_cpu, remaining_mem, remaining_cpu) to match destructuring.

Suggested change
total_mem,
remaining_mem,
total_cpu,
remaining_cpu,
total_mem,
total_cpu,
remaining_mem,
remaining_cpu,

@graphite-app
Copy link
Contributor

graphite-app bot commented Jun 9, 2025

Merge activity

  • Jun 9, 4:46 PM UTC: MasterPtato added this pull request to the Graphite merge queue.
  • Jun 9, 4:48 PM UTC: CI is running for this pull request on a draft pull request (#2574) due to your merge queue CI optimization settings.
  • Jun 9, 4:49 PM UTC: Merged by the Graphite merge queue via draft PR: #2574.

graphite-app bot pushed a commit that referenced this pull request Jun 9, 2025
#2531)

<!-- Please make sure there is an issue that this PR is correlated to. -->

## Changes

<!-- If there are frontend changes, please include screenshots. -->
@graphite-app graphite-app bot closed this Jun 9, 2025
@graphite-app graphite-app bot deleted the 06-04-fix_pegboard_revise_actor_rescheduling_algorithm_add_client_metrics branch June 9, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant