Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for multiple metrics endpoints #25

Merged
merged 6 commits into from
Jun 27, 2022

Conversation

VladLazar
Copy link

@VladLazar VladLazar commented Jun 20, 2022

Previously, seastar collected all metrics (and metrics related metadata) in a single object (metrics::impl::impl). The consequence of this is that there was no way to have multiple metrics namespaces and expose each namespace through its own prometheus endpoint. If an application published a large volume of metrics, the user had to filter them by using by using the name request parameter of the prometheus endpoint. The downside of this approach is that the user must have knowledge of the metrics set and, even in that case, some queries cannot be expressed in terms of substring matching (i.e. the existing filtering implementation).

A more flexible approach is to allow applications to categorise metrics at the time of registering them and expose the different metrics set on different prometheus endpoints if required. This is the approach taken for this PR. The metrics registration api
now allows applications to specify the internal metrics implementation (think of it as a metrics bucket) via an integer handle. Internally, whenever a shard needs to handle a metric related operation for a handle it hasn't seen before, a new metrics::impl::impl object is created on the fly. This leads to a simple api, as applications don't need to pre-register metric implementation before adding groups.

Here is an example of how to add metric groups to a specific "bucket" and expose each "bucket" on a prometheus endpoint.
Metrics in the foo group will be exposed on the default metric prometheus endpoint, while the baz group will be exposed
on the extra_metrics endpoint.

auto extra_metrics_handle = 100;

ss::metrics::metric_groups metrics;
metrics.add_group(
  "foo", {sm::make_gauge("bar", [] { return 1; }, sm::description("foobar metric")}); 
  
ss::metrics::metric_groups metrics2(extra_metrics_handle);
metrics2.add_group(
  "baz", {sm::make_gauge("daz", [] { return 1; }, sm::description("bazdaz metric")});

...
ss::prometheus::config default;
default.prefix = "default metrics bucket";

ss::prometheus::config extra;
extra.prefix = "extra metrics bucket";
extra.handle = extra_metrics_handle;
extra.route = "metrics_v100";

seastar::prometheus::add_prometheus_routes(your_server, default).get();
seastar::prometheus::add_prometheus_routes(your_server, extra).get();

NB: Previous PR here

force push:

  • Use a std::unordered_map for storing the metric implementations. This allows for the removal of
    the handle from the implementation object itself.

force push:

  • reduced number of lookups in impl map by using try_emplace
  • added a default_handle function to the public metrics interface
  • fixed comment typos in prometheus.cc
  • removed subsequent calls to get_local_impl

force push:

  • Moved the handle to the metric_groups level. Previously, each group within a metric_groups could be
    associated with its own metrics implementation. It makes more sense for all the groups within a metric_groups
    object to push metrics into the same internal object.

force push:

  • explicit constructors

force push:

  • Use seastar::metrics::default_handle instead of seastar::metrics::impl::default_handle
  • Explicitly call metric_groups constructor where it was called implicitly

force push:

  • Rebased on top of v22.2.x and fixed merge conflicts. This mostly consisted of moving the handle
    argument, that was added to much of the metrics interface, behind the skip_when_empty argument.

Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

src/core/metrics.cc Outdated Show resolved Hide resolved
src/core/metrics.cc Outdated Show resolved Hide resolved
include/seastar/core/metrics_api.hh Outdated Show resolved Hide resolved
src/core/metrics.cc Outdated Show resolved Hide resolved
include/seastar/core/metrics.hh Outdated Show resolved Hide resolved
include/seastar/core/prometheus.hh Outdated Show resolved Hide resolved
@VladLazar VladLazar force-pushed the multiple-metrics-endpoints branch 2 times, most recently from 0298ea2 to 020c9ba Compare June 22, 2022 17:39
@VladLazar VladLazar requested a review from BenPope June 23, 2022 09:39
Comment on lines +3547 to +3552
thread_local metrics::impl::metric_implementations metric_impls;

metrics::impl::metric_implementations& metrics::impl::get_metric_implementations() {
return metric_impls;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why a static thread_local has initialisation order problems.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean you don't understand why it's declared here, or you don't understand why it's not static?

It is declared here because metric_impls needs to be alive during the reactor (declared here) cleanup as metrics are de-registered at that point. thread_local destructors are called in reverse order of initialisation, so metric_impls needs to initialise ahead of the reactor_holder.

It's not static as metric_impls needs external linkage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I don't know why its previous location as a static local had a problem.

But if it's an uninitialization order fiasco, then ok.

include/seastar/core/metrics_api.hh Outdated Show resolved Hide resolved
src/core/metrics.cc Show resolved Hide resolved
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Would be nice to have another reviewer, and maybe send it to the mailing list.

@VladLazar
Copy link
Author

Would be nice to have another reviewer, and maybe send it to the mailing list.

I've submitted it to the mailing list: https://groups.google.com/g/seastar-dev/c/xtZnIU14u5c.

@@ -358,7 +363,7 @@ using values_reference = shared_ptr<values_copy>;

foreign_ptr<values_reference> get_values();

shared_ptr<impl> get_local_impl();
shared_ptr<impl> get_local_impl(int handle = default_handle());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handle can be a strong type instead of int

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does seastar have any utility for named types? I went looking for one, but didn't find anything.

Vlad Lazar added 6 commits June 27, 2022 17:08
Prior to this patch seastar only exposes one global metrics::impl::impl
object which holds all metric related data for one application.

This patch changes the implementation details such that multiple
metrics::impl::impl objects can exist for any given application.
Said objects are stored into a map on each shard and created
dinamically whenever requested. A metrics::impl::impl is identified
by an integer handle that acts as the key for the storage map.

Implementation note: in order to avoid issues caused by the ordering
of static thread_local objects I had to declare the storage in
reactor.cc.

(cherry picked from commit 585a8af)
This patch extends the metrics internal apis to use a specific
metrics::impl::impl object identified by its integer handle.

(cherry picked from commit 6ee4af7)
This patch extends the metrics user facing apis to use a specific
metrics::impl::impl object identified by its integer handle.

Note that the constructor of 'metric_groups' is marked explicit
in this patch and updates two call sites where the constructor was used
implicitly.
This patch removes two subsequent calls to `get_local_impl` and reuses
the returned handle in that scope.
This patch extends the user facing prometheus apis allowing the user to
specify the internal metrics implementation to be used through a handle.
Additionally, 'add_prometheus_routes' now takes an argument that
specifies the route on which to advertise the metrics. This enables
different metrics "namespaces" to be served by different endpoints in
isolation.

(cherry picked from commit 6189522)
This patch extends the scollectd apis with the ability to select the
internal metrics implementation to be used by providing a handle.

(cherry picked from commit d4331d1)
@VladLazar VladLazar merged commit 16d4456 into redpanda-data:v22.2.x Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants