Skip to content

Conversation

@MasterPtato
Copy link
Contributor

Changes

Copy link
Contributor Author

MasterPtato commented May 30, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR introduces server state tracking and usage metrics functionality across the Rivet platform. The changes span both core and edge services, with significant modifications to server management and metrics collection.

  • Added new ServerState enum with 7 states (Provisioning through Destroyed) in cluster/src/types.rs for tracking server lifecycle
  • Introduced new Prometheus metrics in pegboard service for tracking CPU/memory usage per environment and client flavor
  • Created new standalone pegboard-usage-metrics-publish service to handle periodic collection and publishing of resource usage metrics
  • Moved client usage metrics from usage_get.rs to the new standalone service for better separation of concerns
  • Critical: Integration tests are missing for the new functionality, with only a TODO placeholder in integration.rs

13 file(s) reviewed, 12 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines 62 to 70
CASE
WHEN s.cloud_destroy_ts IS NOT NULL THEN 6 -- Destroyed
WHEN s.taint_ts IS NOT NULL AND s.drain_ts IS NOT NULL THEN 5 -- TaintedDraining
WHEN s.drain_ts IS NOT NULL THEN 4 -- Draining
WHEN s.taint_ts IS NOT NULL THEN 3 -- Tainted
WHEN s.install_complete_ts IS NOT NULL THEN 2 -- Running
WHEN s.provision_complete_ts IS NOT NULL THEN 1 -- Installing
ELSE 0 -- Provisioning
END AS state
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Missing table alias 's' in FROM clause but used in CASE statement. Add 's AS s' to the FROM clause.

Suggested change
CASE
WHEN s.cloud_destroy_ts IS NOT NULL THEN 6 -- Destroyed
WHEN s.taint_ts IS NOT NULL AND s.drain_ts IS NOT NULL THEN 5 -- TaintedDraining
WHEN s.drain_ts IS NOT NULL THEN 4 -- Draining
WHEN s.taint_ts IS NOT NULL THEN 3 -- Tainted
WHEN s.install_complete_ts IS NOT NULL THEN 2 -- Running
WHEN s.provision_complete_ts IS NOT NULL THEN 1 -- Installing
ELSE 0 -- Provisioning
END AS state
FROM db_cluster.servers AS s

Comment on lines +34 to +42
CASE
WHEN s.cloud_destroy_ts IS NOT NULL THEN 6 -- Destroyed
WHEN s.taint_ts IS NOT NULL AND s.drain_ts IS NOT NULL THEN 5 -- TaintedDraining
WHEN s.drain_ts IS NOT NULL THEN 4 -- Draining
WHEN s.taint_ts IS NOT NULL THEN 3 -- Tainted
WHEN s.install_complete_ts IS NOT NULL THEN 2 -- Running
WHEN s.provision_complete_ts IS NOT NULL THEN 1 -- Installing
ELSE 0 -- Provisioning
END AS state
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: State values should be defined as constants in the codebase to avoid magic numbers and ensure consistency across queries

Comment on lines 62 to 70
CASE
WHEN s.cloud_destroy_ts IS NOT NULL THEN 6 -- Destroyed
WHEN s.taint_ts IS NOT NULL AND s.drain_ts IS NOT NULL THEN 5 -- TaintedDraining
WHEN s.drain_ts IS NOT NULL THEN 4 -- Draining
WHEN s.taint_ts IS NOT NULL THEN 3 -- Tainted
WHEN s.install_complete_ts IS NOT NULL THEN 2 -- Running
WHEN s.provision_complete_ts IS NOT NULL THEN 1 -- Installing
ELSE 0 -- Provisioning
END AS state
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: State transitions appear to be mutually exclusive, but consider adding explicit priority with AND NOT conditions to prevent edge cases where multiple timestamps are set.

pub lan_ip: Option<IpAddr>,
pub wan_ip: Option<IpAddr>,
pub cloud_destroy_ts: Option<i64>,
pub state: ServerState,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The Server struct doesn't derive Serialize/Deserialize traits but contains a serializable state field. This may cause issues if Server needs to be serialized

Running = 2,
Tainted = 3,
Draining = 4,
TaintedDraining = 5,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: TaintedDraining combines two states - consider if this should be a separate flag instead of a combined state for better state management

Comment on lines 41 to 46
pub static ref ENV_CPU_USAGE: IntGaugeVec = register_int_gauge_vec_with_registry!(
"pegboard_env_cpu_usage",
"Total percent of CPU (per core) used by an environment.",
&["env_id", "flavor"],
*REGISTRY
).unwrap();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Missing trailing comma after *REGISTRY to maintain consistency with other metric definitions

Suggested change
pub static ref ENV_CPU_USAGE: IntGaugeVec = register_int_gauge_vec_with_registry!(
"pegboard_env_cpu_usage",
"Total percent of CPU (per core) used by an environment.",
&["env_id", "flavor"],
*REGISTRY
).unwrap();
pub static ref ENV_CPU_USAGE: IntGaugeVec = register_int_gauge_vec_with_registry!(
"pegboard_env_cpu_usage",
"Total percent of CPU (per core) used by an environment.",
&["env_id", "flavor"],
*REGISTRY,
).unwrap();

Comment on lines 48 to 53
pub static ref ENV_MEMORY_USAGE: IntGaugeVec = register_int_gauge_vec_with_registry!(
"pegboard_env_memory_usage",
"Total MiB of memory used by an environment.",
&["env_id", "flavor"],
*REGISTRY
).unwrap();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Missing trailing comma after *REGISTRY to maintain consistency with other metric definitions

Suggested change
pub static ref ENV_MEMORY_USAGE: IntGaugeVec = register_int_gauge_vec_with_registry!(
"pegboard_env_memory_usage",
"Total MiB of memory used by an environment.",
&["env_id", "flavor"],
*REGISTRY
).unwrap();
pub static ref ENV_MEMORY_USAGE: IntGaugeVec = register_int_gauge_vec_with_registry!(
"pegboard_env_memory_usage",
"Total MiB of memory used by an environment.",
&["env_id", "flavor"],
*REGISTRY,
).unwrap();

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Complete removal of usage metrics tracking code without a clear replacement could break existing monitoring. Ensure equivalent functionality exists elsewhere before removing this file.

Comment on lines +18 to +19
tracing = "0.1"
tracing-subscriber = { version = "0.3", default-features = false, features = ["fmt", "json", "ansi"] }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using workspace version for tracing and tracing-subscriber to maintain consistency with other dependencies

Comment on lines +95 to +97
if actor.start_ts.is_none() || actor.destroy_ts.is_some() {
continue;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider logging skipped actors with debug level to help with monitoring and debugging

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented May 30, 2025

Deploying rivet with  Cloudflare Pages  Cloudflare Pages

Latest commit: 39fc637
Status: ✅  Deploy successful!
Preview URL: https://eefe696a.rivet.pages.dev
Branch Preview URL: https://05-30-feat-add-pb-usage-metr.rivet.pages.dev

View logs

@MasterPtato MasterPtato force-pushed the 05-30-feat_add_pb_usage_metrics_server_state branch from 3de4613 to 464160e Compare May 31, 2025 00:36
@NathanFlurry NathanFlurry force-pushed the 05-29-fix_cache_add_traces branch from d69ce87 to ecbaa48 Compare May 31, 2025 00:39
@NathanFlurry NathanFlurry force-pushed the 05-30-feat_add_pb_usage_metrics_server_state branch from 464160e to 1eac431 Compare May 31, 2025 00:39
@MasterPtato MasterPtato force-pushed the 05-29-fix_cache_add_traces branch from ecbaa48 to 47e00c0 Compare May 31, 2025 00:54
@MasterPtato MasterPtato force-pushed the 05-30-feat_add_pb_usage_metrics_server_state branch from 1eac431 to 2f17dc8 Compare May 31, 2025 00:54
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented May 31, 2025

Deploying rivet-studio with  Cloudflare Pages  Cloudflare Pages

Latest commit: 39fc637
Status: ✅  Deploy successful!
Preview URL: https://6d9ce416.rivet-studio.pages.dev
Branch Preview URL: https://05-30-feat-add-pb-usage-metr.rivet-studio.pages.dev

View logs

@NathanFlurry NathanFlurry force-pushed the 05-30-feat_add_pb_usage_metrics_server_state branch from 2f17dc8 to 1eac431 Compare May 31, 2025 01:01
@NathanFlurry NathanFlurry force-pushed the 05-29-fix_cache_add_traces branch from 47e00c0 to ecbaa48 Compare May 31, 2025 01:01
@MasterPtato MasterPtato force-pushed the 05-30-feat_add_pb_usage_metrics_server_state branch from 1eac431 to 2f17dc8 Compare May 31, 2025 01:04
@MasterPtato MasterPtato force-pushed the 05-29-fix_cache_add_traces branch from ecbaa48 to 47e00c0 Compare May 31, 2025 01:04
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented May 31, 2025

Deploying rivet-hub with  Cloudflare Pages  Cloudflare Pages

Latest commit: 39fc637
Status: ✅  Deploy successful!
Preview URL: https://639f7abc.rivet-hub-7jb.pages.dev
Branch Preview URL: https://05-30-feat-add-pb-usage-metr.rivet-hub-7jb.pages.dev

View logs

@NathanFlurry NathanFlurry force-pushed the 05-29-fix_cache_add_traces branch from 47e00c0 to ecbaa48 Compare May 31, 2025 01:52
@NathanFlurry NathanFlurry force-pushed the 05-30-feat_add_pb_usage_metrics_server_state branch from 2f17dc8 to 1eac431 Compare May 31, 2025 01:52
@MasterPtato MasterPtato force-pushed the 05-29-fix_cache_add_traces branch from ecbaa48 to 47e00c0 Compare May 31, 2025 02:05
@MasterPtato MasterPtato force-pushed the 05-30-feat_add_pb_usage_metrics_server_state branch from 1eac431 to 2f17dc8 Compare May 31, 2025 02:05
@MasterPtato MasterPtato force-pushed the 05-29-fix_cache_add_traces branch from 47e00c0 to 90e20fd Compare June 2, 2025 18:13
@MasterPtato MasterPtato force-pushed the 05-30-feat_add_pb_usage_metrics_server_state branch from 2f17dc8 to 39fc637 Compare June 2, 2025 18:13
@graphite-app graphite-app bot closed this Jun 3, 2025
@graphite-app graphite-app bot deleted the 05-30-feat_add_pb_usage_metrics_server_state branch June 3, 2025 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant