Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider exporting unaggregated per-pid metrics by default #107

Closed
Sinjo opened this issue May 1, 2019 · 8 comments · Fixed by #131
Closed

Consider exporting unaggregated per-pid metrics by default #107

Sinjo opened this issue May 1, 2019 · 8 comments · Fixed by #131
Milestone

Comments

@Sinjo
Copy link
Member

Sinjo commented May 1, 2019

We had a couple of comments on this point, and it’s fair. The point about “no obvious right way to aggregate gauges in all cases” is a solid argument.

We need to decide a stance on this both for gauges specifically and for metrics in general (i.e. should gauges be a special case where we don't aggregate by default).

This issue should be renamed and updated once we decide that stance.

@Sinjo
Copy link
Member Author

Sinjo commented May 31, 2019

Right, I've been digging through all the relevant code and doing some thinking on which approach to take.

Looking at the aggregations supported by the Python client (scroll down to 'Three: Instrumentation'), I think we should start by supporting the 'all' mode. The 'liveall' and 'livesum' modes sound interesting and potentially useful, but I'm confident we can add them later with no breaking changes (i.e. I'm not making them part of the path to 1.0).

My plan is to:

  • Add :all as a new aggregation mode in DirectFileStore. When specified, it will add a "pid" label to the metric and populate it with the process ID that exported that value.
  • Make :all the default aggregation mode for gauges only.
  • Make "pid" a reserved label for all metric types. While it will only be the default for gauges, my planned approach and current implementation (Support :all as an aggregation mode in DirectFileStore #127) support :all for any type of metric, and I don't want to overwrite people's label values silently. This will mean changing the subclasses of Prometheus::Client::Metric to append their reserved labels to those from the base class, rather than overriding them completely.

I'm happy to discuss any of those choices and change tack if there's a good reason to.

@lawrencejones
Copy link
Contributor

Fwiw, providing a pid label has been the only legitimate way I've found to use gauges with the multi-process library. Reckon it's the only semantically valid approach!

@SuperQ
Copy link
Member

SuperQ commented Jun 1, 2019

We use a worker label and assign the unicorn worker ID to our per-process metrics. This avoids some metric name churn on worker restarts.

@Sinjo
Copy link
Member Author

Sinjo commented Jun 11, 2019

@SuperQ That does sound way better! Out of curiosity, do you grab the worker number in after_fork? Just been taking a quick look around the docs and I see that method passes the worker object into the block, which has a nr attribute, which I'm guessing is the worker number?

I don't think we can do that for people in the gem as it would require us to hook into Unicorn (and Puma?) somehow. We could support changing the default aggregation for a given metric type and adding some docs for Unicorn/Puma users.

If I've missed an approach that would be better, please shout!

@SuperQ
Copy link
Member

SuperQ commented Jun 13, 2019

@Sinjo
Copy link
Member Author

Sinjo commented Jun 13, 2019

Status update: the first and third bullet points from this comment are done. I'm going to try and get the second one done tomorrow, but my day is pretty packed so it might be next week now.

@SuperQ Thanks for sharing those links! I'm going to have a think about whether there's an approach we can reasonably give people for free in this gem. The constraints I'm trying to make it fit into are:

  • We should support any multi-process server, not just Unicorn (i.e. we should have the generic pid-based version)
  • We should make it as clear as possible to users as possible (i.e. we should weigh up always pid as the label vs changing to a different label when we're in a Unicorn process)
  • We shouldn't enable any new automatic behaviour after 1.0 without going to 2.0 (i.e. if we want to automatically detect we're in a Unicorn process and export a worker_id label instead of pid, I think that's a breaking change as it would mess up people's saved queries)

@brian-brazil
Copy link

I don't think it's wise to ever change the label for the pid/worker, as not only would you have to change your own code that happens to conflict but also that of every library that you use. Python always uses pid, it may be best to follow that.

@Sinjo
Copy link
Member Author

Sinjo commented Jun 14, 2019

Yeah that makes total sense. Globally reserving more and more label names feels a tad ridiculous. Let's always use pid.

In which case, adding the Unicorn worker ID support later isn't a problem. I'm going to put together a PR that makes unaggregated per-process gauges the default and once that's merged this issue is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants