WIP: counter latency monitor#5093
Conversation
|
🚅 Deployed to the rivet-pr-5093 environment in rivet-frontend
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
9a2f692 to
49e17ec
Compare
Code Review: PR #5093 — WIP: counter latency monitorState: Draft. The PR is explicitly marked WIP; this review is advisory. Summary of ChangesThis is a large, multi-concern PR (~28K additions) that bundles:
Critical Issues1.
The same applies in the NAPI The In practice, Prometheus metrics are scraped from internal network paths, but the endpoint is exposed on the same port as user traffic and actor requests. Any HTTP request to 2. Renamed Prometheus metrics are a breaking change for deployed monitoring The PR renames ~20 metrics:
Additionally, the These are breaking changes for any deployed Grafana dashboards, alerting rules, or Prometheus recording rules that reference the old names. The diff updates Code Quality Issues3. The three warm-PIDX hazard repro tests use 4. Per-page-fetch logs promoted from In 5. request.expected_generation.get_or_insert(self.generation);
6. When 7. The PR migrates Potential Bugs8. Previously, 9. Shutdown grace period fallback of 30 minutes let stop_threshold = handle
.get_protocol_metadata()
.await
.map(|x| x.actor_stop_threshold)
.unwrap_or(30 * 60 * 1000);If 10. VFS generation parsed from name string after registration In Design / Architecture Notes11. Warm-PIDX hazard tests are reproduction attempts, not regression guards All three warm-PIDX tests end with 12. Committed build artifact The PR adds 13. The metrics refactor changes label set from Minor Issues
|

Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: