Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide Conditions on Symbolic Stability to Declare OTLP 1.0 #400

Closed
tigrannajaryan opened this issue Jun 7, 2022 · 18 comments · Fixed by #432
Closed

Decide Conditions on Symbolic Stability to Declare OTLP 1.0 #400

tigrannajaryan opened this issue Jun 7, 2022 · 18 comments · Fixed by #432
Assignees

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Jun 7, 2022

Data models and binary Protobuf for all 3 signals (traces,metrics,logs) is now declared Stable in OTLP.

We need to decide what else is remaining to declare OTLP as version 1.0.

We had a discussion in the Spec SIG meeting, here is the summary:

  1. OTLP/JSON. We believe OTLP/JSON must be also declared Stable before OTLP is marked 1.0.0. This work is happening here Make OTLP http/json stable opentelemetry-specification#1957

  2. Symbolic name stability in the proto files. This has a few components:

  • (a) Field name stability. I believe these must be declared stable and be part of our stability guarantees. The reason is that these names are visible on the wire for OTLP/JSON when using the default Protobuf/JSON implementation (while it is possible to decouple JSON wire and symbolic names in proto files using custom implementations, we do not think we want to go this route).
  • (b) Enum value names for enums which are field types (e.g. SPAN_KIND_CLIENT). These are also visible on the wire for OTLP/JSON and I believe these must be declared stable.
  • (c) Enum names (e.g. SpanKind). These are not visible on the wire anywhere (is this correct?).
  • (d) Helper enum names and enum value names (e.g. LogRecordFlags). These are not visible on the wire anywhere.
  • (e) The location of messages and enums, i.e. whether they are declared at the top lexical scope or nested inside another message.
  • (f) Message names, package names and directory structure.
  • Note that (c),(d),(e),(f) have no impact on the wire format. However, they impact the generated code. There is currently no agreement whether these should be part of our stability guarantees.
  1. Generated code stability, assuming no changes in proto files. This is dependent on the generator and we are NOT going to provide guarantees about generator behavior.

I believe 1, 2a and 2b are the absolute minimum before declaring OTLP 1.0.

We need to decide what we do about 2c-f. This decision needs to be made before declaring OTLP 1.0.

@tigrannajaryan
Copy link
Member Author

@bogdandrutu @Aneurysm9 @dyladan @MrAlias @carlosalberto @ocelotl please see above and correct me if I missed anything.

@tigrannajaryan tigrannajaryan added this to the Declare OTLP 1.0 milestone Jun 7, 2022
@tigrannajaryan tigrannajaryan changed the title Declare OTLP 1.0 Decide Conditions to Declare OTLP 1.0 Jun 7, 2022
@Aneurysm9
Copy link
Member

3. Generated code stability, assuming no changes in proto files. This is dependent on the generator and we are NOT going to provide guarantees about generator behavior.

I agree that this is not a guarantee we can make as it would be outside our control. What I do think we should guarantee is the inverse: that we will not change the .proto files in ways that break stability of generated code assuming no change in code generator. In other words, if generated code changes in a breaking way it will be because of the generator changing and not the protocol. This much should be within our control.

@dyladan
Copy link
Member

dyladan commented Jun 13, 2022

We talked about this in the spec meeting last week but I'll record my thoughts here for the record. I broadly agree with @tigrannajaryan's summary.

  1. Generated code stability, assuming no changes in proto files. This is dependent on the generator and we are NOT going to provide guarantees about generator behavior.

I agree that this is not a guarantee we can make as it would be outside our control. What I do think we should guarantee is the inverse: that we will not change the .proto files in ways that break stability of generated code assuming no change in code generator. In other words, if generated code changes in a breaking way it will be because of the generator changing and not the protocol. This much should be within our control.

I agree with this in principle, but I think it would be better for us to identify which types of changes that may affect the code generation are and document them as specifically stable. Otherwise it may be difficult for someone outside the core maintainer group to understand why something was accepted or rejected.

@ocelotl
Copy link

ocelotl commented Jun 13, 2022

I think that if we introduce breaking changes, we should increase the major version number, only that. Introducing a breaking change may be inconvenient, but we should at least let the user know.

@bogdandrutu
Copy link
Member

For 2b, we can make the case to use only the "numbers/identifiers" and to not support the "string". I think currently only JS uses it, so if that is the case I would propose to see if that makes sense.

I would propose to remove the "helper" enums in 2d. The reason is they are 2 (logs, metrics) of them already and if I am not mistaken they are inconsistent already.

I would also want to make sure we document the guarantees for the generated code:

  • You commented something about protobuf generated code not being in our power.
  • Do we want to give this guarantee at all? Should we say that generated code MUST not be exposed publicly and if you are consuming this, you must generate the code and consume that internally?

@aabmass
Copy link
Member

aabmass commented Jun 15, 2022

If this helps move things forward, I had a draft PR adding a linter to detect wire (binary and json) breaking changes to this repo. I am happy to revive it.

It uses Buf breaking change detection which can also detect generated source code breaking changes. Here are all the rules it supports: https://docs.buf.build/breaking/rules

@jsuereth
Copy link
Contributor

2c + 2f -> This actually has wire level impact if we allow our messages to be written inside proto.Any (where a type_url + bytes field get written).

I think we exclude that case and go with (Tigran's original suggestion) of 1, 2a and 2b for OTLP stability.

When it comes to generated code, enforcing that ALL ways of generating code remains stable is definitely out-of-scope. We should allow clients to optimise/tweak on their own release schedules independently of the protocol itself.

E.g.

java_otlp_v1_client : 1.2.x makes sense. It speaks OTLP v1, but you can evolve the actual generated code as needed.

In many languages there's more than one tool to generate clients for proto as well.

While I think we should aim to provide stable client libraries, I don't think that's part of the discussion around stable OTLP.

@Aneurysm9
Copy link
Member

If this helps move things forward, I had a draft PR adding a linter to detect wire (binary and json) breaking changes to this repo. I am happy to revive it.

It uses Buf breaking change detection which can also detect generated source code breaking changes. Here are all the rules it supports: https://docs.buf.build/breaking/rules

This is excellent and could really help take some of the guesswork and bikeshedding out of this. I would propose that we require WIRE_JSON compatibility and scan for FILE and PACKAGE compatibility, having a high bar to justify changes that would break generated code based on those rulesets. I think it is important that we recognize that there will be many consumers of this protocol and breaking changes, even if they are "only" to the generated code, can have a high cost both in terms of the effort required of our consumers and our reputation with them.

@tigrannajaryan
Copy link
Member Author

If this helps move things forward, I had a draft PR adding a linter to detect wire (binary and json) breaking changes to this repo. I am happy to revive it.

It uses Buf breaking change detection which can also detect generated source code breaking changes. Here are all the rules it supports: https://docs.buf.build/breaking/rules

@aabmass This is definitely very useful and can help prevent accidental mistakes when modifying the proto.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jun 21, 2022
This PR captures the current understanding of stability guarnatees.
I want to make sure it is explicitly visible until we make a decision
on open-telemetry#400

The most likely outcome of open-telemetry#400
is going to be stronger than current definition of "Stable.
@tigrannajaryan
Copy link
Member Author

tigrannajaryan commented Jun 21, 2022

The following PR clarifies the current state of stability guarantees: #406.

When this issue is resolved I will adjust the stability definition, until then I want it to be there to make sure people don't rely on guarantees that don't yet exist.

@dyladan
Copy link
Member

dyladan commented Jun 21, 2022

For 2b, we can make the case to use only the "numbers/identifiers" and to not support the "string". I think currently only JS uses it, so if that is the case I would propose to see if that makes sense.

I would rather support string stability for enum values. In my mind the value of being able to change the strings later is outweighed by breaking user expectations that string enums can be used. Most if not all receivers of OTLP JSON do and will support string value enums because that is the default for the proto3. If a user is emitting JSON which uses those string values and the proto has a minor update they will be confused why their wire compatibility is broken.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jun 21, 2022
This PR captures the current understanding of stability guarnatees.
I want to make sure it is explicitly visible until we make a decision
on open-telemetry#400

The most likely outcome of open-telemetry#400
is going to be stronger than current definition of "Stable.
@tigrannajaryan
Copy link
Member Author

For 2b, we can make the case to use only the "numbers/identifiers" and to not support the "string". I think currently only JS uses it, so if that is the case I would propose to see if that makes sense.

I would rather support string stability for enum values. In my mind the value of being able to change the strings later is outweighed by breaking user expectations that string enums can be used. Most if not all receivers of OTLP JSON do and will support string value enums because that is the default for the proto3. If a user is emitting JSON which uses those string values and the proto has a minor update they will be confused why their wire compatibility is broken.

I agree. Another argument in favour of that: JSON is often used for its human readability and using symbolic names instead of numbers helps with readability.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jun 21, 2022
This PR captures the current understanding of stability guarnatees.
I want to make sure it is explicitly visible until we make a decision
on open-telemetry#400

The most likely outcome of open-telemetry#400
is going to be stronger than current definition of "Stable.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jun 21, 2022
This PR captures the current understanding of stability guarnatees.
I want to make sure it is explicitly visible until we make a decision
on open-telemetry#400

The most likely outcome of open-telemetry#400
is going to be stronger than current definition of "Stable.
@tigrannajaryan
Copy link
Member Author

Having spent some time thinking about this I believe we should do this:

  • OTLP 1.0 should provide stronger stability guarantees than currently, including providing guarantees about symbolic names of messages, fields, enums and packages even if such symbolic names don't appear on the wire. All items in (1) and (2) I listed above should be part of our stability guarantee. This is a stronger guarantee than is strictly necessary just for wire compatibility, but I believe it will help to have happier developers who implement OTLP and as a result increase OTLP adoption in the industry.
  • OTLP 1.0 will NOT provide guarantees about code generator behaviors, but we will not knowingly make changes to proto files that will result in breaking changes in the generated code.

Before OTLP 1.0 is declared we must do the last batch of changes, with the following open PRs either accepted or rejected and issues resolved:

While resolving these open PRs/issues, before OTLP 1.0 is declared we MAY make breaking changes that are allowed by the current stability guarantees (cc @Aneurysm9 since you have a strong opinion about this particular clause).

@Aneurysm9
Copy link
Member

While resolving these open PRs/issues, before OTLP 1.0 is declared we MAY make breaking changes that are allowed by the current stability guarantees (cc @Aneurysm9 since you have a strong opinion about this particular clause).

I support this approach. I think that there should still be a high bar for breaking changes at this point - we should not change things simply because we can, but because the changes are necessary - but now is the time to make these changes if they're needed. If none are needed then it is time to declare stability.

@tigrannajaryan
Copy link
Member Author

@bogdandrutu we discussed this in the Spec SIG and the people present agreed with #400 (comment)
Please post what you think about it.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jul 5, 2022
This PR captures the current understanding of stability guarnatees.
I want to make sure it is explicitly visible until we make a decision
on open-telemetry#400

The most likely outcome of open-telemetry#400
is going to be stronger than current definition of "Stable.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Jul 5, 2022
This PR captures the current understanding of stability guarnatees.
I want to make sure it is explicitly visible until we make a decision
on open-telemetry#400

The most likely outcome of open-telemetry#400
is going to be stronger than current definition of "Stable.
tigrannajaryan added a commit that referenced this issue Jul 5, 2022
This PR captures the current understanding of stability guarnatees.
I want to make sure it is explicitly visible until we make a decision
on #400

The most likely outcome of #400
is going to be stronger than current definition of "Stable.
@carlosalberto
Copy link
Contributor

@bogdandrutu Any opinion on #400 (comment) ?

@dyladan
Copy link
Member

dyladan commented Jul 8, 2022

@tigrannajaryan I would add #412 to the list of issues to be resolved before JSON stability

@tigrannajaryan
Copy link
Member Author

@tigrannajaryan I would add #412 to the list of issues to be resolved before JSON stability

@dyladan Yes, I added it to the list here open-telemetry/opentelemetry-specification#1957

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Oct 18, 2022
@tigrannajaryan tigrannajaryan self-assigned this Oct 18, 2022
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Apr 20, 2023
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Apr 20, 2023
@tigrannajaryan tigrannajaryan changed the title Decide Conditions to Declare OTLP 1.0 Decide Conditions on Symbolic Stability to Declare OTLP 1.0 Apr 20, 2023
carlosalberto pushed a commit that referenced this issue Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants