-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Telemetry updates, add prometheus listener #2413
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2413 +/- ##
==========================================
- Coverage 54.72% 54.71% -0.01%
==========================================
Files 584 584
Lines 78531 78550 +19
==========================================
+ Hits 42974 42982 +8
- Misses 32348 32361 +13
+ Partials 3209 3207 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few minor comments regarding the prometheus init, otherwise good to go 🙏
If you've cleared this with the standards warden @ajnavarro, I'm good with adding this feature 💯
tm2/pkg/telemetry/metrics/metrics.go
Outdated
server := &http.Server{ | ||
Addr: config.PrometheusAddr, | ||
ReadHeaderTimeout: 5 * time.Second, | ||
} | ||
http.Handle("/metrics", promhttp.Handler()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should be creating a new server and exposing the /metrics
here
A better place would be the central one we actually use for all endpoints in tm2/pkg/bft/node
-- you can just expose a func in telemetry to register the handle with the mux (node's mux)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: This is what tendermint what doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first i liked the idea, but i just changed my mind.
Because you want to have the possibility to expose your RPC port to public without publicly expose your metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to have two listeners: one on 0.0.0.0
and another on 127.0.0.1
.
The alternative is to have a single listener, then either configure flags to disable certain endpoints, such as /metrics
, or document how to use a firewall or an HTTP proxy for advanced usage patterns.
Need feedback: Do you know if it's common for validators to have a hybrid approach with metrics enabled, but only for localhost? I suspect they usually choose between one of two extremes: "minimal runtime with metrics disabled completely" or a "fully tooled setup with metrics enabled and verbose mode."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's make total sense, to give you an example, i could run the RPC on 0.0.0.0
for public access and prometheus endpoint on the VPN ip address.
Or they have their small prometheus in send mode on the server, and read from it on localhost.
@zivkovicmilos rebased :) please @ajnavarro review |
Could you add some tests please? thanks. |
This PR is stale because it has been open 3 months with no activity. Remove stale label or comment or this will be closed in 3 months. |
This PR continue the works on #2408
Adding multiple features:
Add prefixes to all exported names, since they're all catched up in prometheus, it's a monitoring norm to have them all prefixed. It's simpler to created dashboards and check what's exposed.
I used tm2 and gno as prefixes depending on what i believed it was linked, if i made a mistake, don't hesitate to tell me :)
The service instance was forced to gno-node-1, since we have multiple instance nodes now, it's better to have the possiblity to set it.
It's bringing back exposing a prometheus endpoint on
/metrics
on port ":26660".Contributors' checklist...
BREAKING CHANGE: xxx
message was included in the description