-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Mandatory Unique Identifier For Telemetry Sources #194
Closed
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
69 changes: 69 additions & 0 deletions
69
text/0194-mandatory-unique-identifier-for-sdk-based-telemetry-sources.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Mandatory unique identifier for sdk-based telemetry sources | ||
|
||
Provide an explicit mandatory unique identifier for sdk-based telemetry sources. | ||
|
||
## Motivation | ||
|
||
Having a way to uniquely identify a telemetry source is helpful in many ways, like in processing and storing data from that source, visualizing them in a backend UI or debugging issues with that source and it's data. | ||
|
||
For sdk-based telemetry sources, as of now `service.name` (and related attributes `service.namespace` and `service.instance_id`) are the implicit standard for that due to `service.name` being enforced as mandatory by the [Resource SDK specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md#sdk-provided-resource-attributes) and [Resource Semantic Conventions](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/README.md#semantic-attributes-with-sdk-provided-default-value). | ||
|
||
But, because those attributes are not **explicitly** available to uniquely identify a sdk-based telemetry source, multiple issues are calling out problems with the current state: | ||
|
||
* [opentelemetry-specification/issues#1034](https://github.com/open-telemetry/opentelemetry-specification/issues/1034) calls out that `service.instance.id` is poorly defined right now and should be replaced by something more meaningful, that can help to uniquely identify a SDK-based telemetry source. | ||
* [open-telemetry/opentelemetry-specification#2111](https://github.com/open-telemetry/opentelemetry-specification/pull/2111) calls out that there is no proper definition of what a `Service` is and that a proper definition is important since `service.name` is such an important attribute | ||
* [open-telemetry/opentelemetry-specification#2115](https://github.com/open-telemetry/opentelemetry-specification/pull/2115) asks for introducing `app.name` and others alongside `service.name` since client-side applications (browser, mobile) are **not** services and end-users might be confused by calling them a _Service_. | ||
* [open-telemetry/opentelemetry-specification#2192](https://github.com/open-telemetry/opentelemetry-specification/pull/2192)) is providing a middle ground between `app.name` and `service.name` by suggesting `telemetry.source.name` as broader term. | ||
|
||
To address all requirements outlined in those approaches, we are proposing the following combined approach for uniquely identifying a SDK-based telemetry source: | ||
|
||
* Introduce an `telemetry.sdk.source.id` attribute, which MUST either be autogenerated by the SDK at application start or be supplied via an environment variable to the SDK. This will be the unique identifier for an SDK-based telemetry.source. | ||
* Remove `service.instance.id` as attribute, since it is superseded by the `telemetry.sdk.source.id` | ||
* Replace `service.name` and `service.namespace` with attributes `telemetry.sdk.source.name` and `telemetry.sdk.source.namespace`, to have a more broad term for identification. | ||
* Make `telemetry.sdk.source.name` the attribute which MUST be provided by the SDK. | ||
* Provide backward compatibility with service.name by adopting [open-telemetry/oteps#161](https://github.com/open-telemetry/oteps/pull/161) | ||
* Backend specific exporters who rely on `service.name` should set a default value themselves if the attribute is missing | ||
* Add a term definition for `Service` and `App` to the specification glossary, which are non-overlapping. | ||
* Introduce further attributes to describe the telemetry source where needed, e.g. `telemetry.sdk.source.version`, `app.bundle`, `app.short_version`, ... | ||
|
||
## Explanation | ||
|
||
With those changes in place, the following use cases will be covered: | ||
|
||
* If one of many instances of a SDK-based telemetry source is in an erroneous state, the user can quickly identify that instance using the `telemetry.sdk.source.id` and fix the issue. This will improve observability of OTel SDKs themselves. | ||
* With replacing `service.name` with `telemetry.sdk.source.name` frontend applications and other sources, which are not seen as _Services_ by their application owners can be named in a more user-expected way. They also can use different scopes like `app` for additional attributes which might not be reasonable for a backend `service` | ||
* Collectors & backends can use `telemetry.sdk.source.id` (or the combination of `id`, `name` and `namespace`) as unique identifier for storing data, processing data & displaying data. | ||
|
||
## Internal details | ||
|
||
Replacing `service.instance.(id|name|namespace)` with `telemetry.sdk.source.(id|name|namespace)` will require a mechanism to provide backward-compatibility. For this we are suggesting to adopt [open-telemetry/oteps#161](https://github.com/open-telemetry/oteps/pull/161). | ||
|
||
Language specific implementations of the SDK used for instrumenting backend services will need to update their code to expect `telemetry.sdk.*` where `service.*` was used so far. This requires significant effort, although we believe that going down this route earlier is better than going on with a less-invasive change which has different drawbacks (see alternatives below). | ||
|
||
Language specific implementations of the SDK for other kinds of telemetry sources, like client side applications, gain the flexibility to use a different scope like `app` for additional attributes of their telemetry source. | ||
|
||
Implementations of the SDK need to add a mechanism to either load the `telemetry.sdk.source.id` from an environment variable or to autogenerate a value at application start. For the auto-generated ID the existing recommendation for `service.instance.id`, to use a random Version 1 or Version 4 RFC 4122 UUID, can be used. | ||
|
||
Different modules in the collector and implementations of the backend will need to adopt this change. The solution for those backend-specific exporters would be to set some default value for `service.name`, to satisfy their particular backends. | ||
|
||
## Alternatives | ||
|
||
We think that the proposed approach is the best among many. The following list provides existing alternatives and reasons why they have been rejected: | ||
|
||
1. Provide a broad definition for the term `Service`, which then would also cover client-side applications. With that `service.(instance_id|name|namespace)` could be used as unique identifier. It is possible to extend the definition of `Service` to cover that ([open-telemetry/opentelemetry-specification#2111](https://github.com/open-telemetry/opentelemetry-specification/pull/2111)), but frontend application developers & owners do not think about their applications as services and might be confused by this broad definition. Additionally it is not forseeable if other future SDK-based telemetry sources might need a different name which could not be covered by this definition. | ||
|
||
2. Introduce `app.(instance_id|name|namespace)` alongside `service.(instance_id|name|namespace)` and require that either `app.name` or `service.name` MUST be provided by the SDK. While this approach addresses the issues of (1), it comes with the disadvantage that a processor like the collector or backend needs to check multiple attributes to identify the type of the telemetry source. This creates additional unnecessary overhead. Also, this _may_ lead to attribute explosion if further SDK-based telemetry sources are introduced and are looking into providing attributes for an id, name, namespace, version or other similar attributes. | ||
|
||
## Open questions | ||
|
||
* Is the namespace `telemetry.sdk.source.*` suitable? Alternative names could be used | ||
* `telemetry.source.*` as suggested by [open-telemetry/opentelemetry-specification#2192](https://github.com/open-telemetry/opentelemetry-specification/pull/2192). The difference is that it does state explicitly that only SDK-based telemetry sources are covered. This is not necessarily bad, since other telemetry sources _could_ decide to use it as well. | ||
* `telemetry.instance.*` | ||
* `source.*` | ||
* `telemetry.sdk.*` can not be used since `telemetry.sdk.name` is already used | ||
* Should duplication of attributes be allowed, e.g. that `telemetry.sdk.source.name` and `service.name`and `app.name` are specified and possible to be set, or should an attribute that exists in `telemetry.sdk.source.*` not be allowed in `service.*` and `app.*`? | ||
* How should additional attributes like `version`, `bundle`, `firmware_version`, `short_name`, `short_version` be treated? Does it make sense to provide a rule, that attributes common to all sources (like `version`) should also be part of `telemetry.sdk.source.*` and only specific attributes like `bundle` or `firmware_version` should live in a different namespace? | ||
|
||
## Future possibilities | ||
|
||
While the discussion right now is between backend and frontend services, in the future additional SDK-based telemetry sources like different kinds of devices could be introduced without the need to re-use `service.name` as a mandatory attribute and with the possibility to simply introduce their own scope of additional specific attributes. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
telemetry.sdk.source.id
and nottelemetry.source.id
? How is "sdk" relevant to the description of the workload that is producing telemetry?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's one of my open questions:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to drop the word "source" and have
telemetry.sdk.id
(ortelemetry.sdk.instance.id
) to identify the SDK. This will match othertelemetry.sdk.*
attributes. This also fits the requirements of possible future remote SDK management capabilities, where the SDK is actually the entity that is being managed.To identify the Service we will (continue to) use
service.*
attributes, to identify other types of sources which are not Services we can introduce new sets of attributes (e.g. for client-side apps as was discussed earlier).If we call this attribute
telemetry.source.id
it is not clear what does it identify. Does it identify the Service? If yes then do we removeservice.instance.id
? If we remove it then do we use a mix oftelemetry.source.id
+service.*
attributes to identify the Service? This does not look very consistent to me.Generally, for any particular kind of entity that we want to identify I think we should have a set of attributes
<kind>.*
that are defined in semantic conventions specifically for that particular kind. This has been the practice so far andtelemetry.source.id
would be a conceptual deviation from the practice which I don't see why we need.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrannajaryan personally I am not completely sold on
telemetry.source.id
and I think it should be a different OTEP (i.e. assigning an opaque unique ID to a source vs. identifying the source via a collection of domain-specific attributes, like k8s.pod etc.) from the question of how to generalizeservice.name
to different types of workloads.I also find it more intuitive to have
service.name
andapp.name
vs. a single (and ambiguous)telemetry.source.name
. The issue is how that affects existing backends like Jaeger whereservice.name
is a core concept andapp.name
means nothing at all. Should the exporters to such backends translateapp.name
(OTEL) ->service.name
(Jaeger)? This translation becomes easier / more consistent if we use a singletelemetry.source.name
instead of domain-specific attributes, but I agree that it diverges from most of the existing conventions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right approach. Each exporter should be responsible for specialized needs of the particular backend they target.
I don't think we should try to reduce Otel conceptually to the lowest common denominator of features supported by all backends we want to support. We should probably accept the fact that some backends may have different requirements and if a particular Otel capability is not exactly representable in a particular backend that's fine as long as there is a reasonable mapping. So the logic in Jaeger exporter can be for example:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine if this was only Jaeger, but from the comments in different threads it seems a lot of backends would prefer a single attribute. Maybe a fair question is: are there (how many) backends that would be completely fine with not having a single attribute identifying the source by low cardinality name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question, we probably want vendors to chime in.
Just one data point: the AWS Xray exporter in Collector for example probes a series of different attributes (including but not limited to service.name) to arrive at a "name", see: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/af3d9b1dc9f1da2e4f71944a32e388368fb0f5e6/exporter/awsxrayexporter/internal/translator/segment.go#L131
I would expect each vendor to either have an equivalent logic that makes sense for their backend or perhaps we extract this as a common logic somewhere for all vendors to use.
Note that vendors likely need to be ready for this anyway since Collector doesn't necessarily set service.name for all data it collects/produces on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a very good example, why there should be clarity on how to identify a telemetry source and why I opened my issues initially and now this OTEP:
As of now it is really hard to say "this telemetry source is called "?
That common logic is what I am looking for. My initial assumption was that
service.namespace
,service.name
andservice.instance.id
are doing the job, but as we know now this has the following issues:service.instance.id
is filled (see Guidance for filling (or consider removing?) service.instance.id opentelemetry-specification#1034)The intend of
telemetry.(sdk?).source.(id|name|namespace)
was to avoid that kind of lengthy checks we see in the AWS Xray exporter (@martinkuba brought up this argument here as well: open-telemetry/opentelemetry-specification#2192)An alternative, that was suggested at some point, is a
<whatever>.type
field which is thenapp
orservice
or ... and can help the implementation to jump to the correct attributes immediately.