Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default buckets for Explicit Bucket Histogram #2770

Merged
merged 11 commits into from
Sep 21, 2022

Conversation

reyang
Copy link
Member

@reyang reyang commented Sep 2, 2022

Fixes #2759

Changes

@jmacd and I did some research offline after we received #2759, it seems that Prometheus official clients don't have consistency regarding the default buckets. This PR tries to find a good balance that would cover all the official prom-clients.

@reyang reyang added area:sdk Related to the SDK spec:metrics Related to the specification/metrics directory labels Sep 6, 2022
@jack-berg
Copy link
Member

Is changing bucket boundaries not a breaking change? There was discussion in the spec related to this here.

Java was not aligned with the spec (issue here) until recently. We changed to align with the spec, but took special care to provide tools for users to revert to the previous behavior. Note that these changes are unreleased. Although our last released bucket boundaries are aligned with this PR, we definitely don't want to release code that changes the defaults then have to flip flop on users again.

It's important to resolve whether changes to the default bucket boundaries is a breaking change. I say that it is a breaking change.

@reyang
Copy link
Member Author

reyang commented Sep 7, 2022

Is changing bucket boundaries not a breaking change? There was discussion in the spec related to this here.

Good question - here is my personal thinking: introducing more buckets is not a breaking change (it might cause more resource, which is fine), removing buckets (reducing the number of buckets) could be a breaking change since it drops the accuracy (which could cause existing alerts being unavailable/wrong).

@jack-berg
Copy link
Member

It's difficult to say whether a particular backend or dashboard is operating in a way that is relying on the existing bucket boundaries in a strict way. My employer New Relic would not be impacted by this change. However I'm inclined to take the conservative approach and consider any change to the default values as a breaking change, even adding new buckets. I don't feel strongly enough to block this PR.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this breaking change?

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a breaking change. I agree with it, but can we give SDKs guidance on how to accept this change?

@jmacd
Copy link
Contributor

jmacd commented Sep 20, 2022

Quoting the OpenMetrics specification on histogram buckets,

Buckets are cumulative to allow monitoring systems to drop any non-+Inf bucket for performance/anti-denial-of-service reasons in a way that loses granularity but is still a valid Histogram.

This to me clarifies that adding buckets is a non-breaking change.

@jack-berg
Copy link
Member

If there is any doubt about the backwards compatibility of this change, one option is follow the pattern used when changing OTLP exporters default enpdoint from https://localhost:4317 to http://localhost:4317, where it was phrased as:

SDKs SHOULD default endpoint variables to use http scheme unless they have good reasons to choose
https scheme for the default (e.g., for backward compatibility reasons in a stable SDK release).

In a similar manner, we could say that SDKs should use the default bucket boundaries defined in this PR unless they have good reason to use the bucket boundaries previously defined, such as maintaining backwards compatibility.

This would allow most SDKs to be aligned on these bucket boundaries while leaving the door open for SDK implementations to use the old values if they must. Taking this approach does codify that we do think of changing bucket boundaries as a breaking change, which is reasonable and conservative. The stable metrics SDKs are java, dotnet, and python. Java wouldn't have to do anything to be compatible with this change since because of an error, these new bounds match our current defaults. It looks like dotnet is in favor of this change and would make the change. @open-telemetry/python-maintainers how would this change impact you?

@reyang
Copy link
Member Author

reyang commented Sep 20, 2022

If there is any doubt about the backwards compatibility of this change, one option is follow the pattern used when changing OTLP exporters default enpdoint from https://localhost:4317 to http://localhost:4317, where it was phrased as:

SDKs SHOULD default endpoint variables to use http scheme unless they have good reasons to choose
https scheme for the default (e.g., for backward compatibility reasons in a stable SDK release).

I've updated the PR, PTAL.

@reyang reyang closed this Sep 21, 2022
@reyang reyang reopened this Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default boundaries in explicit buckets histograms are not optmized for Latency
7 participants