Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLP metrics export partial success - structure, semantic conventions and beyond #404

Closed
joaopgrassi opened this issue Jun 21, 2022 · 0 comments

Comments

@joaopgrassi
Copy link
Member

In #390 we introduced the concept of partial success in OTLP export responses. The initial version only contained fields to indicate the # of accepted datapoints + a string error message. During the PR review it was brought up that only these fields might not be enough and that we may need more structure to be able to achieve more things with it.

This issue is to continue the discussion, focusing on what we want to achieve and agree on how we want to do it.

A couple of things that came up was:

Derive telemetry for consumed/dropped metrics for OTLP exporters

Exporters would record metrics using the partial_success response. Things like otlp.metrics.consumed, otlp.metrics.dropped can be derived from the # of accepted datapoints. With the current structure we defined in #390 this is already possible, but we need to define semantic conventions for these metrics so OTLP exporters are consistent in how they report it (if desired).

Action items:

  • Define use-cases and which metrics we want/need
  • Define semantic conventions for such metrics
  • Define how to turn this behavior off on exporters?

Possibility to retry

With only the # of accepted datapoints + a string error message we currently have, it's not possible to achieve any level of retry. It might be possible to perform a retry in some cases but to do that, we need more structure on the partial_success result.

From the PR, an example of such response would be:

{"accepted_data_points": 5, "error": [
  {"index": 0, "status_code": 400, "message": "the timestamp xyz is invalid"},
  {"index": 3, "status_code": 502, "message": "the tenant does not exist"}
]}

Another important topic was that any extra details should be optional to not overload receivers/senders that are under pressure.

Action items:

  • Discuss if retries really make sense for a partial success response.
  • If we want some level of retry, identify which and come up with a structure to represent it
  • Make sure such extra fields are optional
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants