Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance for filling (or consider removing?) service.instance.id #1034

Open
anuraaga opened this issue Sep 29, 2020 · 13 comments
Open

Guidance for filling (or consider removing?) service.instance.id #1034

anuraaga opened this issue Sep 29, 2020 · 13 comments
Labels
area:semantic-conventions Related to semantic conventions priority:p2 Medium priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:resource Related to the specification/resource directory

Comments

@anuraaga
Copy link
Contributor

Resource defines service.instance.id as a unique ID among horizontally scaled services

https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/resource/semantic_conventions#service

But we have many similar IDs in other places
host.id
k8s.pod.name or uid
container.id
faas.instance

Should one of these be copied into service.instance.id when it's available? One issue is ordering, host.id is the instance ID for non-container workloads but not container workloads, for example.

Or can we just remove service.instance.id since we would expect an ID of horizontal scaling to be present in another convention?

@anuraaga anuraaga added the spec:resource Related to the specification/resource directory label Sep 29, 2020
@Oberon00 Oberon00 added the area:semantic-conventions Related to semantic conventions label Sep 29, 2020
@Oberon00
Copy link
Member

Oberon00 commented Sep 29, 2020

A host can host multiple services. A POD might host multiple (related) services. A container might host multiple services (even though that's unusual). I can't think of a scenario wher FaaS would have more than one service instance.

I agree though that most of the time, except for host, everything should map 1:1 to service.instance.id, so that would be reasonable choices.

However, since only the triplet service.(namespace, name, id) is supposed to be unique, I think that we should not auto-fill service.instance.id by default with anything -- it seems to be predestined for an "inherent unique ID" as the spec puts it. I don't think we should remove it, it is fine for that use case.

I think it may make a lot of sense to add a telemetry.instance_id to be able to group resources that came from the very same initialization of OpenTelemetry, i.e. something that is usable as primary key for resources. Note that the service namespace/name/id triplet is not usable for that as it stays (or should stay at least) the same over restarts of the same service instance.

@anuraaga
Copy link
Contributor Author

is not usable for that as it stays (or should stay at least) the same over restarts of the same service instance.

Hmm - in that case should we consider removing the recommendation to "generate a random Version 1 or Version 4 RFC 4122 UUID"? I don't think it can ever be possible for the UUID to be the same over restarts. I guess a user needs to set it appropriately in that case.

@Oberon00
Copy link
Member

Oberon00 commented Sep 29, 2020

Spec says:

It is preferable for the ID to be persistent and stay the same for the lifetime of the service instance, however it is acceptable that the ID is ephemeral and changes during important lifetime events for the service (e.g. service restarts).

And also later:

services aiming for reproducible UUIDs may also use Version 5, see RFC 4122 for more recommendations

It would be cool if the spec could also write the "nicknames" for these versions, e.g. V 5 requires a name and namespace as input, e.g. Python's implementation has this:

uuid.uuid5(namespace, name)
Generate a UUID based on the SHA-1 hash of a namespace identifier (which is a UUID) and a name (which is a string).

@andrewhsu andrewhsu added the release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs label Sep 29, 2020
@anuraaga
Copy link
Contributor Author

I was thinking a bit more on this - @Oberon00 just wondering, how would a backend use this value with the current definition? Isn't it a bit too ambiguous? If the value is always a useful, persistent identifier for the instance among a group of them, that's useful, but if we also allow a fallback to a random UUID V1 / V5 then isn't the attribute too unreliable to do anything meaningful?

@Oberon00
Copy link
Member

I don't think the current definition is suitable for using it much generically. It probably requires user-side knowledge to interpret. Still, grouping by instance ID groups at least groups all the spans by one running OpenTelemetry SDK instance together.

@andrewhsu andrewhsu added this to Required for GA, needs action. Add label:release:required-for-ga to issues and PRs when moving them to this column. in GA Spec Burndown Oct 6, 2020
@andrewhsu andrewhsu added the priority:p3 Lowest priority level label Oct 9, 2020
@andrewhsu
Copy link
Member

from issue triage mtg today, setting as p3 for editorial change if it comes in before ga

@Oberon00
Copy link
Member

Oberon00 commented Dec 11, 2020

Since service.instance.id is "required" in the service resource, this issue does not interact well with #1269 "Define the fallback case for service.name": If the SDK sets a value for service.name automatically, service.instance.id may be missing, resulting in an invalid resource produced by default. See #1269 (comment)

@carlosalberto carlosalberto added priority:p2 Medium priority level and removed priority:p3 Lowest priority level labels Dec 15, 2020
@carlosalberto
Copy link
Contributor

Changed priority to P2 to have this issue as a follow up.

@Oberon00
Copy link
Member

Oberon00 commented Dec 15, 2020

Repeating my comment from #1269 (comment)

IMHO this attribute is poorly defined right now as it may or may not be the same across service restarts, which IMHO can make quite a difference. It would be easiest if it MUST be the different for each restart, that way it could be used as primary key for all resources (not only service.*) sent by the same telemetry instance. On the other hand, maybe such an attribute would better be named telemetry.sdk.instance.id.

(BTW, why instance.id instead of instance_id?)


To sum it up, I think we should split this attribute up into (names are just ideas):

  • service.deployment_id: An unique ID of that particular deployed service instance. Stays the same across restarts, but only one service with that deployment_id can ever be running at the same time (if there are, they are parallel instances running with the exact same configuration+filesystem+host). Such an ID will often not be available. If it is, the ID is typically created when installing/deploying the service and stored on the filesystem.
  • telemetry.sdk.instance_id: A unique ID generated at startup by the SDK. Can be used as primary key for resources.

@carlosalberto
Copy link
Contributor

To sum it up, I think we should split this attribute up into (names are just ideas)

Thanks @Oberon00, I think this makes sense. Not sure on the names themselves but the idea of having two attributes (one optional, set by the user, and another one generated per-restart by the SDK) seems like the way to go here.

@aabmass
Copy link
Member

aabmass commented Sep 16, 2021

@Oberon00 @anuraaga any updates on this issue? Looking at what we have in the Python SDK right now, the default behavior violates the requirement that "service.namespace,service.name,service.instance.id triplet MUST be globally unique".

We are prototyping metrics and this presents the problem that multiple instances of a service are likely to violate the single-writer requirement because of the identical resources. I was considering doing the same as open-telemetry/opentelemetry-java#1726 to solve this, but it doesn't seem like we are settled on a solution.

@Oberon00
Copy link
Member

Oberon00 commented Sep 17, 2021

I don't really have much stake in this issue (anymore). I think it would be best if you create a PR since you seem to have an actual use case.

@svrnm
Copy link
Member

svrnm commented Nov 12, 2021

I stumbled across this issue recently as well, so I would be curious to get this conversation re-started.

A few thoughts:

Reg. service.deployment_id. That's an interesting addition. However, I am wondering if this should be part of Deployment, i.e. deployment.id or similar? Or would it make more sense to add deployment.environment to service, i.e. service.deployment.environment and service.deployment.id?

Rg. telemetry.sdk.instance_id: would it be possible to reload/replace the SDK at runtime? Let's say I use an auto instrumentation agent and make a hot update? Would that id change in that case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions priority:p2 Medium priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:resource Related to the specification/resource directory
Projects
No open projects
GA Spec Burndown
  
Required/Allowed for GA, todo.
Development

No branches or pull requests

6 participants