chore: Telemetry updates, add prometheus listener #2413

albttx · 2024-06-21T09:09:33Z

This PR continue the works on #2408

Adding multiple features:

Add prefixes to all exported names, since they're all catched up in prometheus, it's a monitoring norm to have them all prefixed. It's simpler to created dashboards and check what's exposed.
I used tm2 and gno as prefixes depending on what i believed it was linked, if i made a mistake, don't hesitate to tell me :)
The service instance was forced to gno-node-1, since we have multiple instance nodes now, it's better to have the possiblity to set it.
It's bringing back exposing a prometheus endpoint on /metrics on port ":26660".

Contributors' checklist...

Added new tests, or not needed, or not feasible
Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory
Updated the official documentation or not needed
No breaking changes were made, or a BREAKING CHANGE: xxx message was included in the description
Added references to related issues and PRs
Provided any useful hints for running manual tests
Added new benchmarks to generated graphs, if any. More info here.

codecov · 2024-06-21T09:22:19Z

Codecov Report

Attention: Patch coverage is 12.50000% with 28 lines in your changes missing coverage. Please review.

Project coverage is 54.71%. Comparing base (608ca30) to head (fd27778).
Report is 260 commits behind head on master.

Files with missing lines	Patch %	Lines
tm2/pkg/telemetry/metrics/metrics.go	0.00%	26 Missing ⚠️
tm2/pkg/telemetry/config/config.go	66.66%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2413      +/-   ##
==========================================
- Coverage   54.72%   54.71%   -0.01%     
==========================================
  Files         584      584              
  Lines       78531    78550      +19     
==========================================
+ Hits        42974    42982       +8     
- Misses      32348    32361      +13     
+ Partials     3209     3207       -2

Flag	Coverage Δ
contribs/gnodev	`23.81% <ø> (ø)`
contribs/gnofaucet	`15.31% <ø> (ø)`
contribs/gnokeykc	`0.00% <ø> (ø)`
contribs/gnomd	`0.00% <ø> (ø)`
gno.land	`62.68% <ø> (ø)`
gnovm	`59.97% <ø> (ø)`
misc/autocounterd	`0.00% <ø> (ø)`
misc/genproto	`0.00% <ø> (ø)`
misc/genstd	`73.90% <ø> (ø)`
misc/goscan	`0.00% <ø> (ø)`
misc/logos	`17.68% <ø> (+0.30%)`	⬆️
misc/loop	`0.00% <ø> (ø)`
tm2	`54.43% <12.50%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zivkovicmilos

I've left a few minor comments regarding the prometheus init, otherwise good to go 🙏

If you've cleared this with the standards warden @ajnavarro, I'm good with adding this feature 💯

tm2/pkg/telemetry/metrics/metrics.go

tm2/pkg/telemetry/config/config.go

tm2/pkg/telemetry/metrics/metrics.go

zivkovicmilos · 2024-06-24T11:12:32Z

tm2/pkg/telemetry/metrics/metrics.go

+			server := &http.Server{
+				Addr:              config.PrometheusAddr,
+				ReadHeaderTimeout: 5 * time.Second,
+			}
+			http.Handle("/metrics", promhttp.Handler())


I'm not sure we should be creating a new server and exposing the /metrics here

A better place would be the central one we actually use for all endpoints in tm2/pkg/bft/node -- you can just expose a func in telemetry to register the handle with the mux (node's mux)

FYI: This is what tendermint what doing

At first i liked the idea, but i just changed my mind.

Because you want to have the possibility to expose your RPC port to public without publicly expose your metrics

It makes sense to have two listeners: one on 0.0.0.0 and another on 127.0.0.1.

The alternative is to have a single listener, then either configure flags to disable certain endpoints, such as /metrics, or document how to use a firewall or an HTTP proxy for advanced usage patterns.

Need feedback: Do you know if it's common for validators to have a hybrid approach with metrics enabled, but only for localhost? I suspect they usually choose between one of two extremes: "minimal runtime with metrics disabled completely" or a "fully tooled setup with metrics enabled and verbose mode."

Yes it's make total sense, to give you an example, i could run the RPC on 0.0.0.0 for public access and prometheus endpoint on the VPN ip address.
Or they have their small prometheus in send mode on the server, and read from it on localhost.

albttx · 2024-07-02T14:00:15Z

@zivkovicmilos rebased :)

please @ajnavarro review

ajnavarro · 2024-07-03T16:00:41Z

Could you add some tests please? thanks.

github-actions · 2024-11-10T02:01:38Z

This PR is stale because it has been open 3 months with no activity. Remove stale label or comment or this will be closed in 3 months.

albttx self-assigned this Jun 21, 2024

albttx requested review from a team, jaekwon, moul, piux2 and zivkovicmilos as code owners June 21, 2024 09:09

github-actions bot added the 📦 🌐 tendermint v2 Issues or PRs tm2 related label Jun 21, 2024

albttx force-pushed the telemetry-prom branch from 76e5cfb to 3a91c30 Compare June 21, 2024 10:07

albttx requested a review from gfanton as a code owner June 21, 2024 10:10

albttx requested a review from ajnavarro June 21, 2024 10:19

albttx mentioned this pull request Jun 21, 2024

chore: tm2 telemetry add prefixes and service instance name #2408

Closed

7 tasks

zivkovicmilos requested changes Jun 24, 2024

View reviewed changes

albttx force-pushed the telemetry-prom branch from bcb5b9c to 3c2abe3 Compare June 24, 2024 19:05

Kouteki mentioned this pull request Jun 26, 2024

Minutes: Core Staff Weekly Syncs [every Monday] gnolang/meetings#36

Open

albttx added 11 commits July 2, 2024 14:48

chore(tm2/telemetry): add prefix to keys

b041cd0

chore: telemetry service instance

1136a5e

chore: add config test for valid hostname

8aaf04d

feat: add prometheus_laddr listener on :26660

65587ff

chore: gofmt

26412c8

fix: config.ValidateBasic

3d6f986

chore: better setup of providerOptions

0b67cc7

chore: make tidy

b595c98

chore: add ctx to prometheus goroutine

cf850ed

chore: move promexp in if

c20a123

chore: update dev-env for telemetry

d08244b

albttx force-pushed the telemetry-prom branch from 3c2abe3 to d08244b Compare July 2, 2024 13:55

chore: rebases fixes

fd27778

github-actions bot added Stale and removed Stale labels Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Telemetry updates, add prometheus listener #2413

chore: Telemetry updates, add prometheus listener #2413

albttx commented Jun 21, 2024

codecov bot commented Jun 21, 2024 •

edited

Loading

zivkovicmilos left a comment

zivkovicmilos Jun 24, 2024

albttx Jun 24, 2024

albttx Jun 24, 2024

moul Jun 24, 2024 •

edited

Loading

albttx Jun 24, 2024 •

edited

Loading

albttx commented Jul 2, 2024

ajnavarro commented Jul 3, 2024

github-actions bot commented Nov 10, 2024

chore: Telemetry updates, add prometheus listener #2413

Are you sure you want to change the base?

chore: Telemetry updates, add prometheus listener #2413

Conversation

albttx commented Jun 21, 2024

codecov bot commented Jun 21, 2024 • edited Loading

Codecov Report

zivkovicmilos left a comment

Choose a reason for hiding this comment

zivkovicmilos Jun 24, 2024

Choose a reason for hiding this comment

albttx Jun 24, 2024

Choose a reason for hiding this comment

albttx Jun 24, 2024

Choose a reason for hiding this comment

moul Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

albttx Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

albttx commented Jul 2, 2024

ajnavarro commented Jul 3, 2024

github-actions bot commented Nov 10, 2024

codecov bot commented Jun 21, 2024 •

edited

Loading

moul Jun 24, 2024 •

edited

Loading

albttx Jun 24, 2024 •

edited

Loading