From 61dd26529b0b712a067479f431d0ce781a8ac6ea Mon Sep 17 00:00:00 2001 From: svrnm Date: Thu, 9 Dec 2021 15:37:32 +0100 Subject: [PATCH 1/2] Introduce Mandatory Unique Identifier For Telemetry Sources --- ...unique-identifier-for-telemetry-sources.md | 66 +++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 text/0000-mandatory-unique-identifier-for-telemetry-sources.md diff --git a/text/0000-mandatory-unique-identifier-for-telemetry-sources.md b/text/0000-mandatory-unique-identifier-for-telemetry-sources.md new file mode 100644 index 000000000..739abde6f --- /dev/null +++ b/text/0000-mandatory-unique-identifier-for-telemetry-sources.md @@ -0,0 +1,66 @@ +# Mandatory unique identifier for telemetry sources + +Provide an explicit mandatory unique identifier for telemetry sources. + +## Motivation + +Having a way to uniquely identify a telemetry source is helpful in many ways, like in processing and storing data from that source, visualizing them in a backend UI or debugging issues with that source and it's data. + +As of now `service.name` (and related attributes `service.namespace` and `service.instance_id`) are the implicit standard for that due to `service.name` being enforced as mandatory by the [Resource SDK specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md#sdk-provided-resource-attributes) and [Resource Semantic Conventions](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/README.md#semantic-attributes-with-sdk-provided-default-value). + +Due to the fact that those attributes are not **explicitly** available to uniquely identify a telemetry source, multiple approaches have been suggested: + +1. [opentelemetry-specification/issues#1034]( +https://github.com/open-telemetry/opentelemetry-specification/issues/1034) is suggesting that `service.instance.id`is poorly defined and should be removed and be replaced by something different like an `telemetry.sdk.instance_id`. An attribute like `telemetry.sdk.instance_id` could serve as the sole unique identifier. + +2. [open-telemetry/opentelemetry-specification#2111](https://github.com/open-telemetry/opentelemetry-specification/pull/2111) is proposing to provide a broad definition for the term _Service_, which would mean that (almost) every telemetry source is a service and `service.name` (and `namespace` and `instance_id`) could be used as unique identifier. + +3. [open-telemetry/opentelemetry-specification#2115](https://github.com/open-telemetry/opentelemetry-specification/pull/2115) is proposing to introduce `app.name` as mandatory attribute for client side telemetry sources like browser apps or mobile apps, which then would not be treated as service (and with that would not have a `service.name`). `(app|service).name` (and `namespace` and `instance_id`) could be used as unique identifier. + +4. [open-telemetry/opentelemetry-specification#2192](https://github.com/open-telemetry/opentelemetry-specification/pull/2192) is proposing to introduce `telemetry.source.*` attributes as a super-set to `service.*` and `app.*`. + +This OTEP is proposing to choose from those approaches to uniquely identifying a telemetry source, or to find a unifying approach, since not all proposals are mutually exclusive.) + +## Explanation + +As stated in the Motivation with that unique identifier in place, it can be used at different places: + +* Backend developers will have certainty which attributes they can use as unique identifier for the source when storing telemetry data. +* An UI can use it for visualization, especially as fallback if no other attribute is provided for that. +* The collector (and other processors) can use that identifier while processing traces, metrics, logs. +* An end-user could use that identifier for error handling and debugging, e.g. when a telemetry source is mis-configured, it's easier to identify it among others. + +## Internal details + +As stated above, there are multiple approaches to obtain that common unique identifier. Depending on the approach, there are different ways to accomplish it: + +1. Introduce `telemetry.sdk.instance_id` (or similar) and make it mandatory. Make `service.name` only mandatory for backend services. Other telemetry sources can make different attributes mandatory, like `app.name`. Optionally, remove `service.instance_id` from `service.*` + +2. Introduce a broad definition of the term _Service_ in the glossary. Unique identification could be achieved by (1) or making `service.name`, `service.namespace`, `service.instance_id` mandatory for all telemetry sources. + +3. Narrow down the definition for the term _Service_ to backend services. Make `service.name` only mandatory for backend services. Other telemetry sources can make different attributes mandatory, like `app.name` and provide a definition for their term, like `App` in the glossary. Unique identification could be achieved by (1) or having `(service|app).instance_id` and `(service|app).namespace` made mandatory as well. + +4. Introduce `telemetry.source.name`, `telemetry.source.namespace` and `telemetry.source.instance_id`. Make some or all of them mandatory for all telemetry sources. Different telemetry sources can add additional attributes in namespaces like `service.*` and `app.*`. + +## Trade-offs and mitigations + +All potential approaches provide different trade-offs: + +1. This will not introduce any breaking changes. + +2. This will not introduce any breaking changes, but end-users might get confused by calling their telemetry a service while they think of it as an app or different (see future possibilities) + +3. This may introduce a breaking change with `service.name` being not mandatory anymore in that broad sense. This would need further investigation. Also, this approach might lead to further additional sets of attributes which will be used by different telemetry sources for unique identification (devices, cronjobs, bots, ...) + +4. This will introduce a breaking change because `service.name` will be replaced with `telemetry.source.name`. This could be mitigated by a fallback mechanism, e.g. if `telemetry.source.name` is not provided check `service.name`. + +This list is not exhaustive, There are potentially more trade-offs per approach. + +## Open questions + +* What approach provides the most benefit and the least breaking changes to the current specification? +* Are there further approaches missed by the author? + +## Future possibilities + +While the discussion right now is between backend and frontend services, in the future additional telemetry sources like different kinds of devices could be introduced and run into a similar situation that `service` is not the appropriate term. From 60199d4a33e4fb490838d9c7626909ff05e91c61 Mon Sep 17 00:00:00 2001 From: svrnm Date: Mon, 13 Dec 2021 16:36:38 +0100 Subject: [PATCH 2/2] Rename proposal file, anticipate all suggested changes. --- ...unique-identifier-for-telemetry-sources.md | 66 ------------------ ...ntifier-for-sdk-based-telemetry-sources.md | 69 +++++++++++++++++++ 2 files changed, 69 insertions(+), 66 deletions(-) delete mode 100644 text/0000-mandatory-unique-identifier-for-telemetry-sources.md create mode 100644 text/0194-mandatory-unique-identifier-for-sdk-based-telemetry-sources.md diff --git a/text/0000-mandatory-unique-identifier-for-telemetry-sources.md b/text/0000-mandatory-unique-identifier-for-telemetry-sources.md deleted file mode 100644 index 739abde6f..000000000 --- a/text/0000-mandatory-unique-identifier-for-telemetry-sources.md +++ /dev/null @@ -1,66 +0,0 @@ -# Mandatory unique identifier for telemetry sources - -Provide an explicit mandatory unique identifier for telemetry sources. - -## Motivation - -Having a way to uniquely identify a telemetry source is helpful in many ways, like in processing and storing data from that source, visualizing them in a backend UI or debugging issues with that source and it's data. - -As of now `service.name` (and related attributes `service.namespace` and `service.instance_id`) are the implicit standard for that due to `service.name` being enforced as mandatory by the [Resource SDK specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md#sdk-provided-resource-attributes) and [Resource Semantic Conventions](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/README.md#semantic-attributes-with-sdk-provided-default-value). - -Due to the fact that those attributes are not **explicitly** available to uniquely identify a telemetry source, multiple approaches have been suggested: - -1. [opentelemetry-specification/issues#1034]( -https://github.com/open-telemetry/opentelemetry-specification/issues/1034) is suggesting that `service.instance.id`is poorly defined and should be removed and be replaced by something different like an `telemetry.sdk.instance_id`. An attribute like `telemetry.sdk.instance_id` could serve as the sole unique identifier. - -2. [open-telemetry/opentelemetry-specification#2111](https://github.com/open-telemetry/opentelemetry-specification/pull/2111) is proposing to provide a broad definition for the term _Service_, which would mean that (almost) every telemetry source is a service and `service.name` (and `namespace` and `instance_id`) could be used as unique identifier. - -3. [open-telemetry/opentelemetry-specification#2115](https://github.com/open-telemetry/opentelemetry-specification/pull/2115) is proposing to introduce `app.name` as mandatory attribute for client side telemetry sources like browser apps or mobile apps, which then would not be treated as service (and with that would not have a `service.name`). `(app|service).name` (and `namespace` and `instance_id`) could be used as unique identifier. - -4. [open-telemetry/opentelemetry-specification#2192](https://github.com/open-telemetry/opentelemetry-specification/pull/2192) is proposing to introduce `telemetry.source.*` attributes as a super-set to `service.*` and `app.*`. - -This OTEP is proposing to choose from those approaches to uniquely identifying a telemetry source, or to find a unifying approach, since not all proposals are mutually exclusive.) - -## Explanation - -As stated in the Motivation with that unique identifier in place, it can be used at different places: - -* Backend developers will have certainty which attributes they can use as unique identifier for the source when storing telemetry data. -* An UI can use it for visualization, especially as fallback if no other attribute is provided for that. -* The collector (and other processors) can use that identifier while processing traces, metrics, logs. -* An end-user could use that identifier for error handling and debugging, e.g. when a telemetry source is mis-configured, it's easier to identify it among others. - -## Internal details - -As stated above, there are multiple approaches to obtain that common unique identifier. Depending on the approach, there are different ways to accomplish it: - -1. Introduce `telemetry.sdk.instance_id` (or similar) and make it mandatory. Make `service.name` only mandatory for backend services. Other telemetry sources can make different attributes mandatory, like `app.name`. Optionally, remove `service.instance_id` from `service.*` - -2. Introduce a broad definition of the term _Service_ in the glossary. Unique identification could be achieved by (1) or making `service.name`, `service.namespace`, `service.instance_id` mandatory for all telemetry sources. - -3. Narrow down the definition for the term _Service_ to backend services. Make `service.name` only mandatory for backend services. Other telemetry sources can make different attributes mandatory, like `app.name` and provide a definition for their term, like `App` in the glossary. Unique identification could be achieved by (1) or having `(service|app).instance_id` and `(service|app).namespace` made mandatory as well. - -4. Introduce `telemetry.source.name`, `telemetry.source.namespace` and `telemetry.source.instance_id`. Make some or all of them mandatory for all telemetry sources. Different telemetry sources can add additional attributes in namespaces like `service.*` and `app.*`. - -## Trade-offs and mitigations - -All potential approaches provide different trade-offs: - -1. This will not introduce any breaking changes. - -2. This will not introduce any breaking changes, but end-users might get confused by calling their telemetry a service while they think of it as an app or different (see future possibilities) - -3. This may introduce a breaking change with `service.name` being not mandatory anymore in that broad sense. This would need further investigation. Also, this approach might lead to further additional sets of attributes which will be used by different telemetry sources for unique identification (devices, cronjobs, bots, ...) - -4. This will introduce a breaking change because `service.name` will be replaced with `telemetry.source.name`. This could be mitigated by a fallback mechanism, e.g. if `telemetry.source.name` is not provided check `service.name`. - -This list is not exhaustive, There are potentially more trade-offs per approach. - -## Open questions - -* What approach provides the most benefit and the least breaking changes to the current specification? -* Are there further approaches missed by the author? - -## Future possibilities - -While the discussion right now is between backend and frontend services, in the future additional telemetry sources like different kinds of devices could be introduced and run into a similar situation that `service` is not the appropriate term. diff --git a/text/0194-mandatory-unique-identifier-for-sdk-based-telemetry-sources.md b/text/0194-mandatory-unique-identifier-for-sdk-based-telemetry-sources.md new file mode 100644 index 000000000..accd78cb2 --- /dev/null +++ b/text/0194-mandatory-unique-identifier-for-sdk-based-telemetry-sources.md @@ -0,0 +1,69 @@ +# Mandatory unique identifier for sdk-based telemetry sources + +Provide an explicit mandatory unique identifier for sdk-based telemetry sources. + +## Motivation + +Having a way to uniquely identify a telemetry source is helpful in many ways, like in processing and storing data from that source, visualizing them in a backend UI or debugging issues with that source and it's data. + +For sdk-based telemetry sources, as of now `service.name` (and related attributes `service.namespace` and `service.instance_id`) are the implicit standard for that due to `service.name` being enforced as mandatory by the [Resource SDK specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md#sdk-provided-resource-attributes) and [Resource Semantic Conventions](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/README.md#semantic-attributes-with-sdk-provided-default-value). + +But, because those attributes are not **explicitly** available to uniquely identify a sdk-based telemetry source, multiple issues are calling out problems with the current state: + +* [opentelemetry-specification/issues#1034](https://github.com/open-telemetry/opentelemetry-specification/issues/1034) calls out that `service.instance.id` is poorly defined right now and should be replaced by something more meaningful, that can help to uniquely identify a SDK-based telemetry source. +* [open-telemetry/opentelemetry-specification#2111](https://github.com/open-telemetry/opentelemetry-specification/pull/2111) calls out that there is no proper definition of what a `Service` is and that a proper definition is important since `service.name` is such an important attribute +* [open-telemetry/opentelemetry-specification#2115](https://github.com/open-telemetry/opentelemetry-specification/pull/2115) asks for introducing `app.name` and others alongside `service.name` since client-side applications (browser, mobile) are **not** services and end-users might be confused by calling them a _Service_. +* [open-telemetry/opentelemetry-specification#2192](https://github.com/open-telemetry/opentelemetry-specification/pull/2192)) is providing a middle ground between `app.name` and `service.name` by suggesting `telemetry.source.name` as broader term. + +To address all requirements outlined in those approaches, we are proposing the following combined approach for uniquely identifying a SDK-based telemetry source: + +* Introduce an `telemetry.sdk.source.id` attribute, which MUST either be autogenerated by the SDK at application start or be supplied via an environment variable to the SDK. This will be the unique identifier for an SDK-based telemetry.source. +* Remove `service.instance.id` as attribute, since it is superseded by the `telemetry.sdk.source.id` +* Replace `service.name` and `service.namespace` with attributes `telemetry.sdk.source.name` and `telemetry.sdk.source.namespace`, to have a more broad term for identification. +* Make `telemetry.sdk.source.name` the attribute which MUST be provided by the SDK. +* Provide backward compatibility with service.name by adopting [open-telemetry/oteps#161](https://github.com/open-telemetry/oteps/pull/161) +* Backend specific exporters who rely on `service.name` should set a default value themselves if the attribute is missing +* Add a term definition for `Service` and `App` to the specification glossary, which are non-overlapping. +* Introduce further attributes to describe the telemetry source where needed, e.g. `telemetry.sdk.source.version`, `app.bundle`, `app.short_version`, ... + +## Explanation + +With those changes in place, the following use cases will be covered: + +* If one of many instances of a SDK-based telemetry source is in an erroneous state, the user can quickly identify that instance using the `telemetry.sdk.source.id` and fix the issue. This will improve observability of OTel SDKs themselves. +* With replacing `service.name` with `telemetry.sdk.source.name` frontend applications and other sources, which are not seen as _Services_ by their application owners can be named in a more user-expected way. They also can use different scopes like `app` for additional attributes which might not be reasonable for a backend `service` +* Collectors & backends can use `telemetry.sdk.source.id` (or the combination of `id`, `name` and `namespace`) as unique identifier for storing data, processing data & displaying data. + +## Internal details + +Replacing `service.instance.(id|name|namespace)` with `telemetry.sdk.source.(id|name|namespace)` will require a mechanism to provide backward-compatibility. For this we are suggesting to adopt [open-telemetry/oteps#161](https://github.com/open-telemetry/oteps/pull/161). + +Language specific implementations of the SDK used for instrumenting backend services will need to update their code to expect `telemetry.sdk.*` where `service.*` was used so far. This requires significant effort, although we believe that going down this route earlier is better than going on with a less-invasive change which has different drawbacks (see alternatives below). + +Language specific implementations of the SDK for other kinds of telemetry sources, like client side applications, gain the flexibility to use a different scope like `app` for additional attributes of their telemetry source. + +Implementations of the SDK need to add a mechanism to either load the `telemetry.sdk.source.id` from an environment variable or to autogenerate a value at application start. For the auto-generated ID the existing recommendation for `service.instance.id`, to use a random Version 1 or Version 4 RFC 4122 UUID, can be used. + +Different modules in the collector and implementations of the backend will need to adopt this change. The solution for those backend-specific exporters would be to set some default value for `service.name`, to satisfy their particular backends. + +## Alternatives + +We think that the proposed approach is the best among many. The following list provides existing alternatives and reasons why they have been rejected: + +1. Provide a broad definition for the term `Service`, which then would also cover client-side applications. With that `service.(instance_id|name|namespace)` could be used as unique identifier. It is possible to extend the definition of `Service` to cover that ([open-telemetry/opentelemetry-specification#2111](https://github.com/open-telemetry/opentelemetry-specification/pull/2111)), but frontend application developers & owners do not think about their applications as services and might be confused by this broad definition. Additionally it is not forseeable if other future SDK-based telemetry sources might need a different name which could not be covered by this definition. + +2. Introduce `app.(instance_id|name|namespace)` alongside `service.(instance_id|name|namespace)` and require that either `app.name` or `service.name` MUST be provided by the SDK. While this approach addresses the issues of (1), it comes with the disadvantage that a processor like the collector or backend needs to check multiple attributes to identify the type of the telemetry source. This creates additional unnecessary overhead. Also, this _may_ lead to attribute explosion if further SDK-based telemetry sources are introduced and are looking into providing attributes for an id, name, namespace, version or other similar attributes. + +## Open questions + +* Is the namespace `telemetry.sdk.source.*` suitable? Alternative names could be used + * `telemetry.source.*` as suggested by [open-telemetry/opentelemetry-specification#2192](https://github.com/open-telemetry/opentelemetry-specification/pull/2192). The difference is that it does state explicitly that only SDK-based telemetry sources are covered. This is not necessarily bad, since other telemetry sources _could_ decide to use it as well. + * `telemetry.instance.*` + * `source.*` + * `telemetry.sdk.*` can not be used since `telemetry.sdk.name` is already used +* Should duplication of attributes be allowed, e.g. that `telemetry.sdk.source.name` and `service.name`and `app.name` are specified and possible to be set, or should an attribute that exists in `telemetry.sdk.source.*` not be allowed in `service.*` and `app.*`? +* How should additional attributes like `version`, `bundle`, `firmware_version`, `short_name`, `short_version` be treated? Does it make sense to provide a rule, that attributes common to all sources (like `version`) should also be part of `telemetry.sdk.source.*` and only specific attributes like `bundle` or `firmware_version` should live in a different namespace? + +## Future possibilities + +While the discussion right now is between backend and frontend services, in the future additional SDK-based telemetry sources like different kinds of devices could be introduced without the need to re-use `service.name` as a mandatory attribute and with the possibility to simply introduce their own scope of additional specific attributes.