-
Couldn't load subscription status.
- Fork 130
fix: add future/fdb metrics #2377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add future/fdb metrics #2377
Conversation
Deploying rivet with
|
| Latest commit: |
2ffbbd6
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://1790567b.rivet.pages.dev |
| Branch Preview URL: | https://04-23-fix-add-future-fdb-met.rivet.pages.dev |
9c38e32 to
19edb7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR adds comprehensive metrics and tracing capabilities across the Rivet codebase, focusing on FoundationDB operations and future durations.
- Added new Grafana dashboard
futures.jsonfor monitoring instrumented future durations with heatmap visualization - Introduced
CustomInstrumentExttrait infuture.rsfor tracking future durations with Prometheus metrics, though has potential safety issues with unsafe code - Added process-exporter installation script and configuration in
/packages/core/services/cluster/src/workflows/server/install/install_scripts/but lacks security measures like checksum verification - Replaced
foundationdb::tuple::Subspacewithfdb_util::Subspaceacross multiple files for better metrics tracking and consistency - Added custom instrumentation spans to various FDB transactions throughout the codebase for improved tracing and monitoring
28 file(s) reviewed, 10 comment(s)
Edit PR Review Bot Settings | Greptile
| "filterValues": { | ||
| "le": 1e-9 | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: filterValues.le is set to 1e-9 which is an extremely small value that may filter out relevant data points. Consider adjusting or removing this filter.
| "yAxis": { | ||
| "axisPlacement": "left", | ||
| "max": "60", | ||
| "min": 0, | ||
| "reverse": false, | ||
| "unit": "s" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: yAxis max value is hardcoded to 60s which may not be suitable for all future durations. Consider making this dynamic or configurable via variables.
| res = &mut gc_handle => { | ||
| tracing::error!(?res, "metrics task unexpectedly stopped"); | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Error message incorrectly states 'metrics task unexpectedly stopped' when it's actually the GC task that stopped
| res = &mut gc_handle => { | |
| tracing::error!(?res, "metrics task unexpectedly stopped"); | |
| break; | |
| res = &mut gc_handle => { | |
| tracing::error!(?res, "gc task unexpectedly stopped"); | |
| break; |
| fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> { | ||
| let this = unsafe { self.get_unchecked_mut() }; | ||
| let inner = unsafe { Pin::new_unchecked(&mut this.inner) }; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Unsafe code can be replaced with safe alternatives. Use Pin::as_mut().get_mut() and Pin::as_mut() instead of get_unchecked_mut and new_unchecked.
| fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> { | |
| let this = unsafe { self.get_unchecked_mut() }; | |
| let inner = unsafe { Pin::new_unchecked(&mut this.inner) }; | |
| fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> { | |
| let this = self.as_mut().get_mut(); | |
| let inner = Pin::new(&mut this.inner); |
| let this = unsafe { self.get_unchecked_mut() }; | ||
| let inner = unsafe { Pin::new_unchecked(&mut this.inner) }; | ||
|
|
||
| let metadata = inner.span().metadata().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Cloning metadata on every poll is inefficient. Consider storing the formatted location string in the struct during construction.
| User=process-exporter | ||
| Group=process-exporter | ||
| Type=simple | ||
| ExecStart=/usr/bin/process-exporter --config.path /etc/process-exporter/config.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Missing security-related systemd directives like ProtectSystem=strict and NoNewPrivileges=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The entire file has been deleted. This file contains critical routing logic for actors and should not be removed. Please restore this file and add the metrics changes separately.
| Ok(None) | ||
| } | ||
| }) | ||
| .custom_instrument(tracing::info_span!("prewarm_fetch_tx")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The tracing span should be placed before the transaction starts (line 25) rather than after it completes to properly capture the full transaction duration
| .custom_instrument(tracing::info_span!("prewarm_fetch_tx")) | |
| .custom_instrument(tracing::info_span!("prewarm_fetch_tx")) | |
| .run(|tx, _mc| async move { |
| .try_collect::<Vec<_>>() | ||
| .await | ||
| }) | ||
| .custom_instrument(tracing::info_span!("actor_list_wf_tx")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider moving the custom_instrument call before the run() to capture the entire FDB operation including setup
| .await | ||
| } | ||
| }) | ||
| .custom_instrument(tracing::info_span!("actor_destroy_tx")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The span should be added before line 131 where the transaction begins, not after the transaction completes. This ensures the entire transaction is properly instrumented.
19edb7d to
2ffbbd6
Compare
Deploying rivet-studio with
|
| Latest commit: |
2ffbbd6
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://306c3dae.rivet-studio.pages.dev |
| Branch Preview URL: | https://04-23-fix-add-future-fdb-met.rivet-studio.pages.dev |
2ffbbd6 to
8cd9d0a
Compare
Merge activity
|
<!-- Please make sure there is an issue that this PR is correlated to. --> ## Changes <!-- If there are frontend changes, please include screenshots. -->

Changes