Allow for multiple metrics endpoints #25

VladLazar · 2022-06-20T16:35:24Z

Previously, seastar collected all metrics (and metrics related metadata) in a single object (metrics::impl::impl). The consequence of this is that there was no way to have multiple metrics namespaces and expose each namespace through its own prometheus endpoint. If an application published a large volume of metrics, the user had to filter them by using by using the name request parameter of the prometheus endpoint. The downside of this approach is that the user must have knowledge of the metrics set and, even in that case, some queries cannot be expressed in terms of substring matching (i.e. the existing filtering implementation).

A more flexible approach is to allow applications to categorise metrics at the time of registering them and expose the different metrics set on different prometheus endpoints if required. This is the approach taken for this PR. The metrics registration api
now allows applications to specify the internal metrics implementation (think of it as a metrics bucket) via an integer handle. Internally, whenever a shard needs to handle a metric related operation for a handle it hasn't seen before, a new metrics::impl::impl object is created on the fly. This leads to a simple api, as applications don't need to pre-register metric implementation before adding groups.

Here is an example of how to add metric groups to a specific "bucket" and expose each "bucket" on a prometheus endpoint.
Metrics in the foo group will be exposed on the default metric prometheus endpoint, while the baz group will be exposed
on the extra_metrics endpoint.

auto extra_metrics_handle = 100;

ss::metrics::metric_groups metrics;
metrics.add_group(
  "foo", {sm::make_gauge("bar", [] { return 1; }, sm::description("foobar metric")}); 
  
ss::metrics::metric_groups metrics2(extra_metrics_handle);
metrics2.add_group(
  "baz", {sm::make_gauge("daz", [] { return 1; }, sm::description("bazdaz metric")});

...
ss::prometheus::config default;
default.prefix = "default metrics bucket";

ss::prometheus::config extra;
extra.prefix = "extra metrics bucket";
extra.handle = extra_metrics_handle;
extra.route = "metrics_v100";

seastar::prometheus::add_prometheus_routes(your_server, default).get();
seastar::prometheus::add_prometheus_routes(your_server, extra).get();

NB: Previous PR here

force push:

Use a std::unordered_map for storing the metric implementations. This allows for the removal of
the handle from the implementation object itself.

force push:

reduced number of lookups in impl map by using try_emplace
added a default_handle function to the public metrics interface
fixed comment typos in prometheus.cc
removed subsequent calls to get_local_impl

force push:

Moved the handle to the metric_groups level. Previously, each group within a metric_groups could be
associated with its own metrics implementation. It makes more sense for all the groups within a metric_groups
object to push metrics into the same internal object.

force push:

explicit constructors

force push:

Use seastar::metrics::default_handle instead of seastar::metrics::impl::default_handle
Explicitly call metric_groups constructor where it was called implicitly

force push:

Rebased on top of v22.2.x and fixed merge conflicts. This mostly consisted of moving the handle
argument, that was added to much of the metrics interface, behind the skip_when_empty argument.

BenPope

Very nice

src/core/metrics.cc

include/seastar/core/metrics_api.hh

src/core/metrics.cc

include/seastar/core/metrics.hh

include/seastar/core/prometheus.hh

BenPope · 2022-06-23T10:38:19Z

src/core/reactor.cc

+thread_local metrics::impl::metric_implementations metric_impls;
+
+metrics::impl::metric_implementations& metrics::impl::get_metric_implementations() {
+    return metric_impls;
+}
+


I don't understand why a static thread_local has initialisation order problems.

You mean you don't understand why it's declared here, or you don't understand why it's not static?

It is declared here because metric_impls needs to be alive during the reactor (declared here) cleanup as metrics are de-registered at that point. thread_local destructors are called in reverse order of initialisation, so metric_impls needs to initialise ahead of the reactor_holder.

It's not static as metric_impls needs external linkage.

I mean I don't know why its previous location as a static local had a problem.

But if it's an uninitialization order fiasco, then ok.

include/seastar/core/metrics_api.hh

src/core/metrics.cc

include/seastar/core/metrics_registration.hh

BenPope

LGTM

Would be nice to have another reviewer, and maybe send it to the mailing list.

VladLazar · 2022-06-24T11:20:06Z

Would be nice to have another reviewer, and maybe send it to the mailing list.

I've submitted it to the mailing list: https://groups.google.com/g/seastar-dev/c/xtZnIU14u5c.

Lazin · 2022-06-27T12:28:23Z

include/seastar/core/metrics_api.hh

@@ -358,7 +363,7 @@ using values_reference = shared_ptr<values_copy>;

 foreign_ptr<values_reference> get_values();

-shared_ptr<impl> get_local_impl();
+shared_ptr<impl> get_local_impl(int handle = default_handle());


The handle can be a strong type instead of int

Does seastar have any utility for named types? I went looking for one, but didn't find anything.

Prior to this patch seastar only exposes one global metrics::impl::impl object which holds all metric related data for one application. This patch changes the implementation details such that multiple metrics::impl::impl objects can exist for any given application. Said objects are stored into a map on each shard and created dinamically whenever requested. A metrics::impl::impl is identified by an integer handle that acts as the key for the storage map. Implementation note: in order to avoid issues caused by the ordering of static thread_local objects I had to declare the storage in reactor.cc. (cherry picked from commit 585a8af)

This patch extends the metrics internal apis to use a specific metrics::impl::impl object identified by its integer handle. (cherry picked from commit 6ee4af7)

This patch extends the metrics user facing apis to use a specific metrics::impl::impl object identified by its integer handle. Note that the constructor of 'metric_groups' is marked explicit in this patch and updates two call sites where the constructor was used implicitly.

This patch removes two subsequent calls to `get_local_impl` and reuses the returned handle in that scope.

This patch extends the user facing prometheus apis allowing the user to specify the internal metrics implementation to be used through a handle. Additionally, 'add_prometheus_routes' now takes an argument that specifies the route on which to advertise the metrics. This enables different metrics "namespaces" to be served by different endpoints in isolation. (cherry picked from commit 6189522)

This patch extends the scollectd apis with the ability to select the internal metrics implementation to be used by providing a handle. (cherry picked from commit d4331d1)

VladLazar mentioned this pull request Jun 20, 2022

Minimal primary metrics endpoint redpanda-data/redpanda#5165

Merged

BenPope requested review from BenPope, dotnwat and jcsp June 20, 2022 18:14

BenPope reviewed Jun 20, 2022

View reviewed changes

src/core/metrics.cc Outdated Show resolved Hide resolved

VladLazar force-pushed the multiple-metrics-endpoints branch from db7d98a to ccbdea2 Compare June 21, 2022 13:14

BenPope reviewed Jun 22, 2022

View reviewed changes

VladLazar force-pushed the multiple-metrics-endpoints branch 2 times, most recently from 0298ea2 to 020c9ba Compare June 22, 2022 17:39

VladLazar requested a review from BenPope June 23, 2022 09:39

BenPope reviewed Jun 23, 2022

View reviewed changes

include/seastar/core/metrics_registration.hh Outdated Show resolved Hide resolved

include/seastar/core/metrics_registration.hh Outdated Show resolved Hide resolved

VladLazar force-pushed the multiple-metrics-endpoints branch from 020c9ba to ae95909 Compare June 23, 2022 12:14

BenPope requested a review from mmaslankaprv June 23, 2022 12:47

VladLazar force-pushed the multiple-metrics-endpoints branch from ae95909 to d91b995 Compare June 23, 2022 13:07

BenPope approved these changes Jun 24, 2022

View reviewed changes

Lazin reviewed Jun 27, 2022

View reviewed changes

Vlad Lazar added 6 commits June 27, 2022 17:08

metrics: expose metric impl handle to internal api

180e72f

This patch extends the metrics internal apis to use a specific metrics::impl::impl object identified by its integer handle. (cherry picked from commit 6ee4af7)

metrics: Use handle for impl object

76f2aa9

This patch removes two subsequent calls to `get_local_impl` and reuses the returned handle in that scope.

scollectd: select internal metrics implementation

91129e2

This patch extends the scollectd apis with the ability to select the internal metrics implementation to be used by providing a handle. (cherry picked from commit d4331d1)

VladLazar force-pushed the multiple-metrics-endpoints branch from d91b995 to 91129e2 Compare June 27, 2022 16:43

VladLazar merged commit 16d4456 into redpanda-data:v22.2.x Jun 27, 2022

VladLazar mentioned this pull request Jul 8, 2022

Metric replication accross internal implementations #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for multiple metrics endpoints #25

Allow for multiple metrics endpoints #25

VladLazar commented Jun 20, 2022 •

edited

Loading

BenPope left a comment

BenPope Jun 23, 2022

VladLazar Jun 23, 2022

BenPope Jun 23, 2022

BenPope left a comment •

edited

Loading

VladLazar commented Jun 24, 2022

Lazin Jun 27, 2022

VladLazar Jun 27, 2022

Allow for multiple metrics endpoints #25

Allow for multiple metrics endpoints #25

Conversation

VladLazar commented Jun 20, 2022 • edited Loading

BenPope left a comment

Choose a reason for hiding this comment

BenPope Jun 23, 2022

Choose a reason for hiding this comment

VladLazar Jun 23, 2022

Choose a reason for hiding this comment

BenPope Jun 23, 2022

Choose a reason for hiding this comment

BenPope left a comment • edited Loading

Choose a reason for hiding this comment

VladLazar commented Jun 24, 2022

Lazin Jun 27, 2022

Choose a reason for hiding this comment

VladLazar Jun 27, 2022

Choose a reason for hiding this comment

VladLazar commented Jun 20, 2022 •

edited

Loading

BenPope left a comment •

edited

Loading