Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define guideline for components to deprecate configuration #5686

Closed
codeboten opened this issue Jul 14, 2022 · 9 comments · Fixed by #8285
Closed

Define guideline for components to deprecate configuration #5686

codeboten opened this issue Jul 14, 2022 · 9 comments · Fixed by #8285
Assignees
Labels
discussion-needed Community discussion needed priority:p1 High

Comments

@codeboten
Copy link
Contributor

The following change in contrib caused a breaking change for a component identified as stable. This would cause existing users of the component to break when upgrading their collector if they were using the configuration options removed.

Currently the policy for breaking changes is defined for changes to the API, but it's unclear how components should apply this policy for configuration changes.

This issue is a discussion to determine:

  1. what the guidelines should be for configuration changes/deprecation
  2. how components can be marked stable and if they even can be stable while the collector is still in beta
  3. if there should be separate versioning for individual components. Currently all components within a release have the same version. Separating component versions could allow components to change major versions and follow semver, although it's still unclear if this would be enough considering the overall version of the collector wouldn't change.

Pinging the folks that were involved in a discussion in a private channel to add their comments here @Aneurysm9 @jpkrohling @dashpole @mx-psi

Note: I understand the issue was caused by a change in the contrib repo, but since the stability definition is in this repo, I figured it made more sense to open the issue here.

@codeboten codeboten added discussion-needed Community discussion needed priority:p1 High labels Jul 14, 2022
@jpkrohling
Copy link
Member

Here's a first idea to debate, although I think it might go against @Aneurysm9's idea.

  1. we will do our best to not add new required options, nor remove existing options, from components, but we recognize that we might need to, either because the components we interact with have changed, or because we evolved
  2. if we add something, we'll do our best to set a reasonable default. We will also consider not enabling features by default that might cause considerable performance impact to the application (memory, cpu, pipeline latency). I know this is too abstract, but what "considerable" means might differ from component to component.
  3. if we are replacing a property, we'll do it in at least two steps: first, we'll add the new property and convert the old one into the new one, logging an INFO-level message. On at least N-2 to the version we intend to remove this property, we increase the level of the message to WARN if we see the old one is used. On at least N-1 of the version we intend to remove, we log an ERROR if we see the old one is used.
  4. if we are removing property without replacement, we should follow the same as above regarding the log messages.

@dashpole
Copy link
Contributor

Small note: The PR above didn't make any breaking changes. It just added warning messages when using the "old" fields. If we want to impose additional requirements for removal, we still can.

My personal opinion is that logs are not very useful for announcing deprecations. I don't generally look at logs unless I'm debugging a particular problem, at which point I am usually not looking at the logs printed at startup. The better pattern is to fail on startup with an easy way to revert to the previous behavior. I would prefer either using feature gates for all breaking changes, or introducing semvar for components with the ability for users to choose a major version.

@TylerHelmuth
Copy link
Member

Getting to a place where we can use semvar is the ultimate solution, but does it requires Core to have an official 1.0.0 release? It would feel weird to say a component is at version 1.0.0 while completely depending on the Core/Contrib libraries, which would be at 0.55.0 and could cause breaking changes to the component at any moment.

@mx-psi
Copy link
Member

mx-psi commented Jul 15, 2022

what the guidelines should be for configuration changes/deprecation

A combination of @jpkrohling's and @dashpole's points seems reasonable here. Some people do check logs, so we want to have logs for this set of people before we switch to failing on startup. To get to a compromise where we all feel okay with the guidelines, we should maybe have a longer minimum period of deprecation (I think @Aneurysm9 suggested 6 months, maybe that's okay for 'stable' components).

I would also like to suggest requiring a tracking issue for breaking changes, that has migration instructions and is referenced in logs, errors and feature gate descriptions. I tried to do this for the Datadog exporter (e.g. see open-telemetry/opentelemetry-collector-contrib/issues/8845), we can take that as a starting point.

We should also clarify to what stability level these rules apply (they probably shouldn't apply to alpha or in development, they may apply to beta with a smaller deprecation period).

how components can be marked stable and if they even can be stable while the collector is still in beta

Should we use a different wording instead of stable? Maybe 'generally available' or 'ready for production' are acceptable terms. IMO it's not reasonable to have a stability level that says 'no breaking changes ever' at this point, so if it's confusing we can just change the name.

if there should be separate versioning for individual components

My opinion is that we must not make 1.0 releases of components for now. The Go API is nowhere near ready to be 1.0, and we don't want to convey that we follow semver when we don't. We need to have a Core 1.0 for that, and then carefully review the Go API of each component to ensure we don't expose anything we don't want to.

If we don't have 1.0, I don't think having a different versioning schema for stable components makes a lot of sense, so I would remove versioning from the discussion.

@mx-psi
Copy link
Member

mx-psi commented Jan 23, 2023

As mentioned on open-telemetry/opentelemetry-collector-contrib/issues/4055, our current versioning guidelines require that stable components on both core and contrib have a stable public Go API, I think we should mention this explicitly as a requirement for the 'stable' stability level

@jpkrohling
Copy link
Member

@mx-psi, @codeboten, @TylerHelmuth, if you haven't started working on this, I could work on a PR based on the current state of this discussion.

@Aneurysm9, @dashpole, could you please have another look at this issue and let me know if you have concerns not addressed yet?

@dashpole
Copy link
Contributor

dashpole commented Jul 5, 2023

My request is that if/when breaking changes to configuration or behavior take effect, users have an easy way to revert to the previous config or behavior for a period of time. That is what using a feature gate provides, since beta feature gates can still be disabled by users. I would add to the Stable component definition:

When making breaking changes to configuration or behavior, the change MUST be gated by a feature gate, and MUST spend at least 6 months in beta.

@mx-psi
Copy link
Member

mx-psi commented Jul 6, 2023

if you haven't started working on this, I could work on a PR based on the current state of this discussion.

Sounds good to me, I would be interested in reviewing :)

@jpkrohling jpkrohling self-assigned this Jul 10, 2023
jpkrohling added a commit to jpkrohling/opentelemetry-collector that referenced this issue Aug 25, 2023
Fixes open-telemetry#5686

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling
Copy link
Member

I added a first draft for this. I took both the current stability levels for collector components and the maturity levels OTEP to write this one. I think it reflects what was discussed in this issue as well.

codeboten added a commit that referenced this issue Aug 29, 2023
Fixes #5686
---------

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Co-authored-by: Alex Boten <alex@boten.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-needed Community discussion needed priority:p1 High
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants