Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric (observer) collections should be independent from metric export #1432

Open
Oberon00 opened this issue Feb 11, 2021 · 9 comments
Open
Labels
area:sdk Related to the SDK release:after-ga Not required before GA release, and not going to work on before GA spec:metrics Related to the specification/metrics directory triage:deciding:tc-inbox

Comments

@Oberon00
Copy link
Member

Oberon00 commented Feb 11, 2021

(the metric spec is still very unfinished, so this might already be included in one of the more general metric spec issues)

Consider this use case:

  • I want to collect JVM memory metrics (e.g. currently committed memory bytes) every 10 seconds
  • I want to export this (and all other metrics) aggregated (min/max/avg) every six collections i.e. every minute.

I think the spec currently assumes (or assumed before it was deleted) that metric collection and export happens in the same step, but this seems overly restrictive.

What is more, it would make sense to have each observer instrument run potentially on it's own schedule. E.g. some metrics might profit more from more granularity than others, some might be expensive to retrieve, others cheap.

(This paragraph might warrant it's own issue:) A good implementation of this would also ensure that collections whose intervals are a proper fraction or multiple of the export internval do not in-deterministically move between exports. For example, in the above JVM use case, an implementation that does not do any synchronization between export and collection will have a race between every 6th collection of metrics and export which are both supposed to happen at T=60. Such an implementation would sometimes send an aggregate of only 5 and then the next time up to 7 collections. This might be undesirable and should be avoidable without much performance overhead.

@Oberon00 Oberon00 added area:sdk Related to the SDK spec:metrics Related to the specification/metrics directory labels Feb 11, 2021
@bogdandrutu
Copy link
Member

@Oberon00 who is "I" in this example "instrumentation author"? "application owner"?

@Oberon00
Copy link
Member Author

@bogdandrutu very good question actually 🤔 The two "I" might not even be the same. For collection interval, we are interested in:

  1. How fluctuating is this metric actually (CPU usage vs number of loaded classes) -- Usually the instrumentation author has a good grasp on this, but the application owner might want to influence this e.g. to debug specific problems.
  2. How expensive is it to collect the metric -- Usually the instrumentation author has a good estimate here too, but it might depend on the deployment situation, etc. (e.g. what if we are deployed to a machine with 256 cores -- collecting CPU usage might become unexpectedly expensive)
  3. How interesting is this metric for the use case of the application owner

For export interval, the instrumentation author has no clue. I think this is more for the application owner to decide (that would be a typical environment variable setting). The backend's current/usual load factor could also be a factor here, as you can't export so often that you overwhelm it.

A recommended minimum export interval can also be calculated as the greatest common divisor of all collection intervals by the SDK which knows all registered observers. However, that would often be 1, if we do not provide some guidance on selecting the intervals.

One way might be to allow instrumentation authors only coarse control like HIGH_FREQUENCY, FREQUENT, NORMAL, INFREQUENT, SEMI_STATIC.

@victlu
Copy link
Contributor

victlu commented Feb 17, 2021

I think we have/want the ability to create multiple instances of a "pipeline" (aka SDK instance), so that each pipeline can be configure differently. This includes setting the SetPushIntervals (currently one setting for both collection and export), aggregators, exporters, etc... We should also allow pipelines to filter (include/exclude) for specific metric.

I think this issue maybe related to #1437 as well. We need ability to let each Library set it's "scope" so that it can be "attached" to a SDK / pipeline instance.

@Oberon00
Copy link
Member Author

I think we have/want the ability to create multiple instances of a "pipeline" (aka SDK instance), so that each pipeline can be configure differently

This is something different than this issue. In this issue I propose to decouple the first pipeline stage (collection) from the second (export).

@jmacd
Copy link
Contributor

jmacd commented Feb 24, 2021

I know that @bogdandrutu himself has called for making the collection interval variable on a per-instrument basis. As you say @Oberon00, I believe this is about configuration in the "Accumulator", which flushes synchronous and asynchronous data into a processor and then exporter. I think you'll find 100% agreement from the group, it's just more work to spec out how this configuration should be done.

@Oberon00
Copy link
Member Author

I agree that it is more spec work, that's why I also filed #1433 😃

@reyang
Copy link
Member

reyang commented Sep 3, 2021

We've discussed this during the 9/2/2021 Metrics SIG meeting and decided to move this out of the initial metrics spec release scope.
This can be supported with #1888 (comment).

@pellared
Copy link
Member

pellared commented Aug 2, 2023

Some thoughts...

Having "collect + export" done at the same time covers most of the use cases and should be the behavior of a periodic exporting reader. I would avoid adding such behavior to a periodic exporting reader.

Could such functionality be provided via another reader implementation?
Maybe the SDK can provide an additional reader implementation which could handle this use case?
If so then, do we need to specify it?

@Leonardo-Ferreira
Copy link

Having "collect + export" done at the same time covers most of the use cases and should be the behavior of a periodic exporting reader.

I strongly disagree here. As an app owner, responsible for several critical applications subjected to request bursts, I can tell you that collecting CPU/MEM once a minute not good enough for any of my scenarios.

For the new applications, or applications that suffered changes in functionality recently or else, once a minute collection is too little because A LOT can happen in that 1 min... eg: my app reports ZERO http queue length but at the WAF I have several 503's because a burst of requests came in at a very unfortunate time and the APIM has a hard timeout of 10sec..

For the old applications, where things did not change anything in past 2 years or more and all else is stable, once a minute is basically a waste... you could capture 99% of the metrics once every 10min or more without any actual impact...

in my point of view, the person who says "this one-size-fits-all metrics collection every X amount of time for all my applications is ok" is not really hands-on involved in any kind of critical/relevant service... I've seen this behavior again and again in IT managers that will not have to actually troubleshoot anything ever again...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK release:after-ga Not required before GA release, and not going to work on before GA spec:metrics Related to the specification/metrics directory triage:deciding:tc-inbox
Projects
None yet
Development

No branches or pull requests

8 participants