diff --git a/text/0194-mandatory-unique-identifier-for-sdk-based-telemetry-sources.md b/text/0194-mandatory-unique-identifier-for-sdk-based-telemetry-sources.md new file mode 100644 index 000000000..accd78cb2 --- /dev/null +++ b/text/0194-mandatory-unique-identifier-for-sdk-based-telemetry-sources.md @@ -0,0 +1,69 @@ +# Mandatory unique identifier for sdk-based telemetry sources + +Provide an explicit mandatory unique identifier for sdk-based telemetry sources. + +## Motivation + +Having a way to uniquely identify a telemetry source is helpful in many ways, like in processing and storing data from that source, visualizing them in a backend UI or debugging issues with that source and it's data. + +For sdk-based telemetry sources, as of now `service.name` (and related attributes `service.namespace` and `service.instance_id`) are the implicit standard for that due to `service.name` being enforced as mandatory by the [Resource SDK specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md#sdk-provided-resource-attributes) and [Resource Semantic Conventions](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/README.md#semantic-attributes-with-sdk-provided-default-value). + +But, because those attributes are not **explicitly** available to uniquely identify a sdk-based telemetry source, multiple issues are calling out problems with the current state: + +* [opentelemetry-specification/issues#1034](https://github.com/open-telemetry/opentelemetry-specification/issues/1034) calls out that `service.instance.id` is poorly defined right now and should be replaced by something more meaningful, that can help to uniquely identify a SDK-based telemetry source. +* [open-telemetry/opentelemetry-specification#2111](https://github.com/open-telemetry/opentelemetry-specification/pull/2111) calls out that there is no proper definition of what a `Service` is and that a proper definition is important since `service.name` is such an important attribute +* [open-telemetry/opentelemetry-specification#2115](https://github.com/open-telemetry/opentelemetry-specification/pull/2115) asks for introducing `app.name` and others alongside `service.name` since client-side applications (browser, mobile) are **not** services and end-users might be confused by calling them a _Service_. +* [open-telemetry/opentelemetry-specification#2192](https://github.com/open-telemetry/opentelemetry-specification/pull/2192)) is providing a middle ground between `app.name` and `service.name` by suggesting `telemetry.source.name` as broader term. + +To address all requirements outlined in those approaches, we are proposing the following combined approach for uniquely identifying a SDK-based telemetry source: + +* Introduce an `telemetry.sdk.source.id` attribute, which MUST either be autogenerated by the SDK at application start or be supplied via an environment variable to the SDK. This will be the unique identifier for an SDK-based telemetry.source. +* Remove `service.instance.id` as attribute, since it is superseded by the `telemetry.sdk.source.id` +* Replace `service.name` and `service.namespace` with attributes `telemetry.sdk.source.name` and `telemetry.sdk.source.namespace`, to have a more broad term for identification. +* Make `telemetry.sdk.source.name` the attribute which MUST be provided by the SDK. +* Provide backward compatibility with service.name by adopting [open-telemetry/oteps#161](https://github.com/open-telemetry/oteps/pull/161) +* Backend specific exporters who rely on `service.name` should set a default value themselves if the attribute is missing +* Add a term definition for `Service` and `App` to the specification glossary, which are non-overlapping. +* Introduce further attributes to describe the telemetry source where needed, e.g. `telemetry.sdk.source.version`, `app.bundle`, `app.short_version`, ... + +## Explanation + +With those changes in place, the following use cases will be covered: + +* If one of many instances of a SDK-based telemetry source is in an erroneous state, the user can quickly identify that instance using the `telemetry.sdk.source.id` and fix the issue. This will improve observability of OTel SDKs themselves. +* With replacing `service.name` with `telemetry.sdk.source.name` frontend applications and other sources, which are not seen as _Services_ by their application owners can be named in a more user-expected way. They also can use different scopes like `app` for additional attributes which might not be reasonable for a backend `service` +* Collectors & backends can use `telemetry.sdk.source.id` (or the combination of `id`, `name` and `namespace`) as unique identifier for storing data, processing data & displaying data. + +## Internal details + +Replacing `service.instance.(id|name|namespace)` with `telemetry.sdk.source.(id|name|namespace)` will require a mechanism to provide backward-compatibility. For this we are suggesting to adopt [open-telemetry/oteps#161](https://github.com/open-telemetry/oteps/pull/161). + +Language specific implementations of the SDK used for instrumenting backend services will need to update their code to expect `telemetry.sdk.*` where `service.*` was used so far. This requires significant effort, although we believe that going down this route earlier is better than going on with a less-invasive change which has different drawbacks (see alternatives below). + +Language specific implementations of the SDK for other kinds of telemetry sources, like client side applications, gain the flexibility to use a different scope like `app` for additional attributes of their telemetry source. + +Implementations of the SDK need to add a mechanism to either load the `telemetry.sdk.source.id` from an environment variable or to autogenerate a value at application start. For the auto-generated ID the existing recommendation for `service.instance.id`, to use a random Version 1 or Version 4 RFC 4122 UUID, can be used. + +Different modules in the collector and implementations of the backend will need to adopt this change. The solution for those backend-specific exporters would be to set some default value for `service.name`, to satisfy their particular backends. + +## Alternatives + +We think that the proposed approach is the best among many. The following list provides existing alternatives and reasons why they have been rejected: + +1. Provide a broad definition for the term `Service`, which then would also cover client-side applications. With that `service.(instance_id|name|namespace)` could be used as unique identifier. It is possible to extend the definition of `Service` to cover that ([open-telemetry/opentelemetry-specification#2111](https://github.com/open-telemetry/opentelemetry-specification/pull/2111)), but frontend application developers & owners do not think about their applications as services and might be confused by this broad definition. Additionally it is not forseeable if other future SDK-based telemetry sources might need a different name which could not be covered by this definition. + +2. Introduce `app.(instance_id|name|namespace)` alongside `service.(instance_id|name|namespace)` and require that either `app.name` or `service.name` MUST be provided by the SDK. While this approach addresses the issues of (1), it comes with the disadvantage that a processor like the collector or backend needs to check multiple attributes to identify the type of the telemetry source. This creates additional unnecessary overhead. Also, this _may_ lead to attribute explosion if further SDK-based telemetry sources are introduced and are looking into providing attributes for an id, name, namespace, version or other similar attributes. + +## Open questions + +* Is the namespace `telemetry.sdk.source.*` suitable? Alternative names could be used + * `telemetry.source.*` as suggested by [open-telemetry/opentelemetry-specification#2192](https://github.com/open-telemetry/opentelemetry-specification/pull/2192). The difference is that it does state explicitly that only SDK-based telemetry sources are covered. This is not necessarily bad, since other telemetry sources _could_ decide to use it as well. + * `telemetry.instance.*` + * `source.*` + * `telemetry.sdk.*` can not be used since `telemetry.sdk.name` is already used +* Should duplication of attributes be allowed, e.g. that `telemetry.sdk.source.name` and `service.name`and `app.name` are specified and possible to be set, or should an attribute that exists in `telemetry.sdk.source.*` not be allowed in `service.*` and `app.*`? +* How should additional attributes like `version`, `bundle`, `firmware_version`, `short_name`, `short_version` be treated? Does it make sense to provide a rule, that attributes common to all sources (like `version`) should also be part of `telemetry.sdk.source.*` and only specific attributes like `bundle` or `firmware_version` should live in a different namespace? + +## Future possibilities + +While the discussion right now is between backend and frontend services, in the future additional SDK-based telemetry sources like different kinds of devices could be introduced without the need to re-use `service.name` as a mandatory attribute and with the possibility to simply introduce their own scope of additional specific attributes.