Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta to Cumulative Processor (metrics) #29300

Closed
2 tasks
0x006EA1E5 opened this issue Nov 16, 2023 · 20 comments
Closed
2 tasks

Delta to Cumulative Processor (metrics) #29300

0x006EA1E5 opened this issue Nov 16, 2023 · 20 comments
Labels
Accepted Component New component has been sponsored

Comments

@0x006EA1E5
Copy link

0x006EA1E5 commented Nov 16, 2023

The purpose and use-cases of the new component

Convert metric data from monotonic delta to monotonic cumulative.

We can currently convert from cumulative to delta, but not delta to cumulative.

One concrete use case is metrics produced by the count connector or deltas (in fact, a simple stream of monotonic delta 1s, with no period start_time_unix_nano; this design decision appears to have been made so that the count connector can be stateless)

Metrics produced by the count connector cannot be exported correctly via the Prometheus exporter due to the missing start_time_unix_nano, the Prometheus exporter appears to consider the data points to represent breaks in the sequence.

Metrics produced by the count connector are also not suitable for export with the Prometheus remote write exporter. Instead we should be sending the Prometheus remote write exporter periodic aggregates, as we would if the metrics had been scraped, for example every 30 seconds.

Ideally, we should be able to receive the stream of delta data points from the count connector (with missing start_time_unix_nano), and periodically emit a cumulative datapoint, suitable for reception by the Prometheus remote write exporter.

Note, due to the monotonic nature of the metrics, users should be aware that in a load balanced configuration, different instances will maintain different cumulative totals, which will then likely send incorrect, non-monotonic data downstream, unless care is taken to ensure that only one instance is ever responsible for a unique metric (for example, adding an identifying attribute such as the collectors unique instance id).

This processor will need to be stateful to maintain the cumulative total by metric.

The use case outlined above could be addressed in a single processor, which maintains the cumulative sum and periodically emits this value.

Alternatively, this could be implemented in two processors, a simple delta to cumulative which emits a cumulative sum for each delta datapoint, and a periodic aggregator which could also be used for deltas

This issues proposes the creation of a simple delta to cumulative which emits a cumulative sum for each delta datapoint.

Another issue has been created to perform peridoic aggregation: #29461

Example configuration for the component

  • Metric names to match, similar to cumulative to delta processor
  • Publish interval

Telemetry data types supported

Metric

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

No response

Sponsor (optional)

@djaglowski

Additional context

No response

@0x006EA1E5 0x006EA1E5 added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels Nov 16, 2023
@djaglowski
Copy link
Member

I'm sponsoring this component.

A few initial thoughts/questions on the design:

Convert metric data from monotonic delta to monotonic cumulative.

I think we could also support nonmonotonic delta to nonmonotonic cumulative without any substantial changes. Or am I missing something?

Alternatively, this could be implemented in two processors, a simple delta to cumulative which emits a cumulative sum for each delta datapoint, and a periodic aggregator which could also be used for deltas

This seems likely to be problematic, since every data point in a time series would be inaccurate in between the two processors. e.g. delta timeseries with values 3, 2, 1 becomes cumulative time series 3, 2, 1 temporarily and then properly 3, 5, 6 only after the second processor.


One note on the broader use case for this - the metrics data model specifically describes this functionality and rationale:
Delta-to-Cumulative: Metrics that are input and output with Delta temporality unburden the client from keeping high-cardinality state. The use of deltas allows downstream services to bear the cost of conversion into cumulative timeseries...

Since our data model is explicitly designed for this use case, I think it is important that the Collector provides a solution for it, with sensible limitations.

@djaglowski djaglowski added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Nov 16, 2023
@0x006EA1E5
Copy link
Author

This seems likely to be problematic, since every data point in a time series would be inaccurate in between the two processors. e.g. delta timeseries with values 3, 2, 1 becomes cumulative time series 3, 2, 1 temporarily and then properly 3, 5, 6 only after the second processor.

The intention would be that the output of the first processor would be correct, but there would be a one to one correspondence of datapoints. So if we had 1000 deltas/sec, we would transform to 1000 cumulatives/sec.

The second processor would kind of of just emit the latest value every interval. So we would just get 1 datapoint per interval (e.g. every 30 seconds)

So, delta timeseries with values 3, 2, 1 becomes cumulative time series 3, 5, 6, but the second processor would just output 6 (but then output 6 again at the next publish interval, and keep doing that forever, or until a new datapoint is received, or the timeseres become "stale", and is stopped being tracked)

We can achieve something like the second processor already, by publishing to the Prometheus exporter, and scraping ourselves with a Prometheus receiver. But it's a bit ugly, no?

@djaglowski
Copy link
Member

So, delta timeseries with values 3, 2, 1 becomes cumulative time series 3, 5, 6, but the second processor would just output 6 (but then output 6 again at the next publish interval, and keep doing that forever, or until a new datapoint is received, or the timeseres become "stale", and is stopped being tracked)

Thanks for clarifying, I get the idea now. I actually find this composable design quite attractive. The implementation of each would be simpler and users may find a need for only one or the other (e.g. only use the first in combination with the prometheus exporter, or only use the second for reducing frequency of counter updates).

@0x006EA1E5
Copy link
Author

0x006EA1E5 commented Nov 17, 2023

I actually find this composable design quite attractive

Yes, me too :D
But just to be clear, as I see it this would mean that both processors would need to be stateful, and essentially hold the same state - a map of metric "identities" to the current value.
Although I don't think this state will be very large

@djaglowski
Copy link
Member

both processors would need to be stateful, and essentially hold the same state

This will ensure we design a reusable state-tracking mechanism, which I think is needed for many similar use cases. For example, metric aggregate over dimensions.

@djaglowski
Copy link
Member

@0x006EA1E5, given that we agree on splitting this into two components, do you mind opening two new issues in favor of this one? I will sponsor both.

@crobert-1
Copy link
Member

I'm going to remove the needs triage label with the understanding that two issues will be created (one for each of the two proposed processors) and this issue will be closed.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Nov 20, 2023
@0x006EA1E5
Copy link
Author

@0x006EA1E5, given that we agree on splitting this into two components, do you mind opening two new issues in favor of this one? I will sponsor both.

Do we really need two new issues, rather than just scoping the current one to delta to cumulative, and one new one which would be whatever we call the other component (the one to periodically output updates)?

@0x006EA1E5
Copy link
Author

I created #29461

@djaglowski
Copy link
Member

@0x006EA1E5, I apologize for the slow/missed responses on this. I still think we need these components and with the new year can be much more responsive, if you wish to renew your efforts on them.

@0x006EA1E5
Copy link
Author

Yes, no worries, hopefully I will get some more time to look at this too 😁

@0x006EA1E5
Copy link
Author

I have some questions regarding the spec for these processors, sorry if this is documented already somewhere.

We have the concept of identifying properties, which as I understand it includes AggregationTemporality.

The Metrics data model docs mention AggregationTemporality conflicts:

  1. If the potential conflict involves an AggregationTemporality property, an implementation MAY convert temporality using a Cumulative-to-Delta or a Delta-to-Cumulative transformation; otherwise, an implementation SHOULD inform the user of a semantic error and pass through conflicting data.

What should we do here?

  • We could be tracking a metric where we are receiving a stream of deltas, and then we see a cumulative datapoint which is otherwise for the same metric (name, resource and scope).
    • Do we ignore the cumulative datapoint as a "different" metric?
    • Do we reset the tracked metric?
  • Similarly, do we just ignore cumulative datapoints for metrics we are not tracking (that we have never seen a delta datapoint for)?
  • Same questions for inconsistent monotonic values
  • Delta datapoints can have a start_time_unix_nano. What do we do in the various ways this can go wrong?
    • where the start_time_unix_nano is missing (as in the original case, as produced by the stateless count connector)
    • where the start_time_unix_nano is before the current tracked "latest" timestamp (out of order datapoint?)
    • where the start_time_unix_nano is before the current tracked start_time_unix_nano (out of order datapoint?)
    • Where the start_time_unix_nano is after the current tracked "latest" timestamp (out of order datapoint?)
    • In some of the above it would normally be a reset, but I guess there is a use-case where we just want to keep cumulating regardless, so maybe this should be configurable?
  • how should we determine the start_time_unix_nano of the cumulative
    • take it from the delta, if available
    • If not available, is it the instant when we start to track, or the timestamp of when the collector started? Or just omit if not available?

Anyone have any ideas regarding other things that need to be defined?

@djaglowski
Copy link
Member

Great questions. Focusing on the timestamp edge cases first.

Delta datapoints can have a start_time_unix_nano. What do we do in the various ways this can go wrong?

  • where the start_time_unix_nano is missing (as in the original case, as produced by the stateless count connector)
  • where the start_time_unix_nano is before the current tracked "latest" timestamp (out of order datapoint?)
  • where the start_time_unix_nano is before the current tracked start_time_unix_nano (out of order datapoint?)
  • Where the start_time_unix_nano is after the current tracked "latest" timestamp (out of order datapoint?)

I think the general principle is that we have two behaviors:

  1. Prior to emitting the first data point for a time series, we freely expand the time window to reflect the earliest start_time_unix_nano and the latest time_unix_nano (end time). Out-of-order or overlapping points just update the time window as necessary.
  2. After emitting the first data point for a time series, we lock the start time but still can update the end time. Data points with a start time prior the the accumulated start time are a bit of a problem though. Perhaps there is a configurable behavior to either include them in the current aggregation vs drop them. (User effectively chooses to prioritize accuracy over the long term or short term.)

Does this make sense or am I oversimplifying it?

@0x006EA1E5
Copy link
Author

Does this make sense or am I oversimplifying it?

It does make sense, but I'm a bit concerned with how the complexity can explode considering how many things can vary 🤔

And how do we make these behaviours configurable, without it becoming a mess.

One important constraint will be how will the typical downstream consumers behave, for example if we expand the time window's start time, especially the prometheus and prometheusremotewrite` exporters.

@verejoel
Copy link

This is something that is on our radar, and would like to support as much as possible. Our use case is to enable remote writing of metrics from the count connector to Thanos.

@djaglowski
Copy link
Member

One important constraint will be how will the typical downstream consumers behave, for example if we expand the time window's start time, especially the prometheus and prometheusremotewrite` exporters.

This certainly could prove to be tricky but I think it may be worth trying in a simple form, and then iterating based on feedback.

@RichieSams
Copy link
Contributor

RichieSams commented Jan 22, 2024

Hi @djaglowski and @0x006EA1E5

I'd like to help contribute my time to the implementation of this issue and/or #29461

I can be available to work semi-full time on it. @0x006EA1E5 I see above you're working through many of the edge cases. Do you have a fork already? Would you be willing to work together?

@djaglowski
Copy link
Member

Thanks very much @RichieSams. I think we'll gladly take the help unless @0x006EA1E5 is already working on it or just about to start. I haven't been working on any code for it, just trying to help design it ahead of time. I think it's fine to start development and we'll work through it as necessary.

We have a contributing guide which articulates a strategy for splitting a new component into multiple PRs. This helps keep the complexity in each PR at a reasonable level so we can review them.

@djaglowski
Copy link
Member

A duplicate issue was opened for this component and appears to have started development. I think we should consolidate to one issue but we need to reestablish whether we are splitting the component. Currently waiting for feedback from those involved with the other proposal.

@djaglowski
Copy link
Member

It seems #30479 moved is very close to an exact duplicate of this issue and specifically does not include the functionality proposed in #29461. It has a detailed design doc and a reference implementation already too. Therefore, I will close this issue and suggest that anyone interested in doing so take a closer look at #30479 and the associated PRs.

@RichieSams, @0x006EA1E5, or anyone else interested in #29461, I think we can parallel efforts by moving focus to that processor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored
Projects
None yet
Development

No branches or pull requests

5 participants