Add oci.manifest.digest, container.image.repo_digests and make container.image.tag array #159

ChrsMark · 2023-07-04T08:26:31Z

This PR adds ~~container.image.digest~~ oci.manifest.digest, container.image.repo_digests fields and make container.image.tag an array of strings (renamed to container.image.tags).
This is to cover #48.

Also related to #72.

More analysis can be found at #48 (comment).

cc: @AlexanderWert @kaiyan-sheng @mlunadia

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

lmolkova

I still think OCI terminology makes more sense here and suggest using oci.manifest.digest and oci.manifest.tag`.This would cover images and artifacts and would also rely on well-known and standard terminology.

ChrsMark · 2023-07-11T10:05:28Z

Hey @lmolkova thanks for the feedback here!

From my perspective it would give a better user experience to provide a more generic naming. In a use case where a collector ships data to a data-store when the end user will go to search for them it would be easier to have all the related container fields under the container.* namespace. I suspect that the average user would struggle to understand what oci.manifest.digest stands for even if they know what they are looking for.

In addition I don't see any tags specific attribute in the OCI spec: https://github.com/opencontainers/image-spec/blob/main/manifest.md. Do you see sth there that totally reflects the container.image.tag? If only .digest is reflected in OCI directly then that's an extra reason for not splitting the fields in container.* and oci.* at the same time.

To my mind we should be following what runtimes/orchestrators provide which at the same time follow the OCI under the hood. Following the OCI looks more like an implementation detail and should not be exposed to the end user to my mind since it slightly changes the scope of interest. You can also have a look at my comment at #48 (comment) where I analyse how the various runtimes/orchestrators define the specific fields. There I also mention that k8s is based on Container Runtime Interface (CRI) which follow the OCI spec but nowhere this detail is exposed to the end user. Indeed the CRI reports as RepoDigests while in OCI it's just digest from what we can see.

Let me know what you think :).

docs/resource/container.md

lmolkova · 2023-07-18T01:14:27Z

@ChrsMark I agree on the tag part - there is no formal tag definition in the OCI.

I don't however agree that OCI manifest digest should be recorded with container.image.digest.
OCI manifest is wider and more general-purpose thing than container image digest.

It describes not just container images, but non-runnable artifacts, VM images, helm charts, or anything else.

As an owner of Azure Container Registry SDK, when I report client calls, I would not even know what users are pulling or pushing, or how they intend to use the image/artifact, but I know how it's represented with manifests. Distribution platforms such as image/artifact registries don't know either - they just implement OCI/docker v2 APIs on blobs of arbitrary data.

container.image.digest focuses on the containers and the execution side only. Introducing it would cause the following problem:

container/artifact registries and their client libraries will have to use container.image attributes for non-containerized and not runnable things.
OR we'll have to introduce vm.image, helm.chart, oci.manifest namespaces, etc

OCI manifest is a standard thing defined by the spec, while container image is a vague thing. k8s docs refer to the OCI spec.

TL;DR: OCI manifest digest covers a wider set of use cases, and allows to have consistent attributes in client libraries, artifact registries, and container environments. It provides common unambiguous terminology.

The only downside, is that there is a tiny learning curve to discover OCI. From my point of view, the benefits of oci.manifest.digest outweigh it.

ChrsMark · 2023-07-19T08:42:13Z

Thanks for the detailed explanation @lmolkova !

To your use-case I see the point, however I would avoid using a generic naming like oci.image.digest to report a Container's Image Digest specifically. Based on your example the oci.image.digest might be populated for other entities not just containers so I think it would better to use a specific field for the container entity. I would prefer using container.image.digest and then registry.image.digest to refer to Registry Images in general. I would see value in a query like container.image.digest==registry.image.digest since the second field creates a super set that includes the first one.

Also, I made some extra research to try and get things together and I think that for the container.image.digest we need to clarify some things along with the container.image.id. Let me try to summarize my findings:

So the image ID is the equivalent of the .config.digest from
https://github.com/opencontainers/image-spec/blob/main/manifest.md
We already depict this at container.image.id since https://github.com/open-telemetry/semantic-conventions/pull/39/files#r1202644093.

The current PR intends to add the Digest information that is capable to be used for downloading an image.
In Docker and the CRI it is called RepoDigest and it is not part of the Image Manifest at https://github.com/opencontainers/image-spec/blob/main/manifest.md.
Indeed it seems to be a different field which is part of the Manifest List or the "fat manifest" as it is called (instead of the Image Manifest). But still the ID and the Digest are depicted in different parts of the spec.

Example:

➜  ~ docker pull prom/prometheus:v2.16.0@sha256:efd99a6be65885c07c559679a0df4ec709604bcdd8cd83f0d00a1a683b28fb6a
docker.io/prom/prometheus@sha256:efd99a6be65885c07c559679a0df4ec709604bcdd8cd83f0d00a1a683b28fb6a: Pulling from prom/prometheus
Digest: sha256:efd99a6be65885c07c559679a0df4ec709604bcdd8cd83f0d00a1a683b28fb6a
Status: Image is up to date for prom/prometheus@sha256:efd99a6be65885c07c559679a0df4ec709604bcdd8cd83f0d00a1a683b28fb6a
docker.io/prom/prometheus:v2.16.0@sha256:efd99a6be65885c07c559679a0df4ec709604bcdd8cd83f0d00a1a683b28fb6a
➜  ~ docker manifest inspect --verbose prom/prometheus:v2.16.0 | jq '.[0].Descriptor'
{
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "digest": "sha256:efd99a6be65885c07c559679a0df4ec709604bcdd8cd83f0d00a1a683b28fb6a",
  "size": 2824,
  "platform": {
    "architecture": "amd64",
    "os": "linux"
  }
}

➜  ~ docker inspect prom/prometheus:v2.16.0 --format 'Id: {{.Id}}                                                                            
Repo Digest: {{index .RepoDigests }}' 
Id: sha256:e935122ab143a64d92ed1fbb27d030cf6e2f0258207be1baf1b509c466aeeb42                                                                            
Repo Digests: [prom/prometheus@sha256:efd99a6be65885c07c559679a0df4ec709604bcdd8cd83f0d00a1a683b28fb6a prom/prometheus@sha256:e4ca62c0d62f3e886e684806dfe9d4e0cda60d54986898173c1083856cfda0f4]

➜  ~ docker manifest inspect --verbose prom/prometheus:v2.16.0 | jq '.[0].SchemaV2Manifest.config'                                           
{
  "mediaType": "application/vnd.docker.container.image.v1+json",
  "size": 6669,
  "digest": "sha256:e935122ab143a64d92ed1fbb27d030cf6e2f0258207be1baf1b509c466aeeb42"
}

So to summarize it, we already have the container.image.id which is the equivalent of Image ID in Docker, Kubernetes and CRI as it is also mentioned at https://github.com/open-telemetry/semantic-conventions/blob/main/model/resource/container.yaml#L35-L47. In terms of OCI it is the digest of the .config section of the manifest.

So here we talk about adding the Repo Digest identifier. Thus, if my claims above are correct we already skipped using the oci. specific namespace and to my mind it would be better if we continue doing this. It's easier to just use a mapping more friendly to the Container runtimes' (like ImageID, RepoDigest) users rather than going deep into the OCI spec which took me significant time to analyse and correlate with the fields provided by the runtimes. Plus the fact that the intend to use oci would extend the scope.

I would propose sth like container.image.digest and registry.image.digest (for registry use cases) as I already mentioned.

@lmolkova out of curiosity which part of the OCI spec would you be willing to use in your use-case specifically? The one that is depicted as Repo Digest and is used for downloading an image or the Image ID one?

Also I think having input from more people here to get more opinions would help a lot :).

lmolkova · 2023-07-25T00:34:43Z

To your use-case I see the point, however I would avoid using a generic naming like oci.image.digest to report a Container's Image Digest specifically. Based on your example the oci.image.digest might be populated for other entities not just containers so I think it would better to use a specific field for the container entity.

@ChrsMark , this is where we disagree. Attribute-based correlation is one of the features OTel provides. If we give the same thing multiple names we would not be able to correlate using attributes.

E.g. when using an artifact, I want to:

record data how the artifact is obtained regardless of container environment
record how it's pushed to the registry on the client side
record how it's pushed on the registry side
record how it's pulled on both sides
record how it's used (e.g. as a resource attribute on an application)

Assuming a bright future where I can get access to all this telemetry, using oci.manifest.digest I can find everything that has ever happened to this specific artifact digest.

Querying gets more complicated with container.image.digest, and registry.artifact.digest, now you need to know they are both available and are the same thing.

Once you add config digest and layers digest into the picture, the bigger the need for unambiguous and externally defined image id becomes (which is defined in OCI manifest digest).

I'm still struggling to understand the problem with oci.manifest.digest except a small learning curve. Can you explain if there are other reasons to make attribute-based correlation more complicated and use ambiguous and vague definitions while we have a spec that defines id?

ChrsMark · 2023-07-25T07:49:27Z

I'm still struggling to understand the problem with oci.manifest.digest except a small learning curve. Can you explain if there are other reasons to make attribute-based correlation more complicated and use ambiguous and vague definitions while we have a spec that defines id?

Thank's @lmolkova! Let me try to collect my concerns bellow:

The learning curve is not quite small, see my explanation at Add oci.manifest.digest, container.image.repo_digests and make container.image.tag array #159 (comment) of how the OCI information is depicted in runtimes. Also there was a struggle to clearly define those at https://github.com/open-telemetry/semantic-conventions/pull/39/files#r1202644093. And still the schema needs to be validated again, see point 4 bellow. All these make me feel that the learning curve would not be negligible.
Coming from an Infrastructure Observability background I would assume that developers crafting Observability tools/collectors would find it more straight forward to use a terminology more relative to container runtimes (ImageDigest, ImageID). Otherwise they also need to go through the learning curve.
If we follow the oci.image.digest we might hit an issue with this field containing a super sets of objects/entities: A container image is an OCI image but an OCI image is not necessarily a container image. So if I plot in a pie chart the top 10 OCI images this might be misleading if I don't know that this dataset does not only contain container image information. This is an example illustrating the points 1 and 2. It's a clear issue of being generic VS being specific.
Last but not least, what is the plan about the container.image.id field that we have already introduced with https://github.com/open-telemetry/semantic-conventions/pull/39/files#r1202644093? As I explained at Add oci.manifest.digest, container.image.repo_digests and make container.image.tag array #159 (comment) we would need to validate it again since this is also part of the OCI spec. @joaopgrassi @marcsanmi what are your thoughts on this?

Still open question:
A) which part of the OCI spec would you be willing to use in your use-case specifically? The one that is depicted as Repo Digest and is used for downloading an image or the Image ID one?

I'm not totally against the oci.* proposal but I would like to address the above first.

lmolkova · 2023-07-26T06:46:31Z

@ChrsMark

Thank you for the update!

I'm not sure I understand the concern, please take a look at p4 below
My take (coming from an observability perspective) is that using standard, unambiguous terminology and being able to correlate telemetry using attributes is quite important.
It probably means that the pie chart needs to take other information into account - such as manifest media type. BTW, for the registry scenario, it does not matter what type manifest has since the registry pulls and pushes arbitrary data and does not care.
The container.image.id, as we discussed before is a container-runtime-specific thing that represents the image id - in case of docker it's (as you figured out) it's, the config digest, in the case of k8s it's something else. The same image has different ids on docker and k8s.
If there are other environments, container.image.id would potentially be different in each of them, oci.manifest.digest would be a cross-environment, unified digest, common across different environments.

I.e. in your Prometheus example above:

container.image.id=sha256:e935122ab143a64d92ed1fbb27d030cf6e2f0258207be1baf1b509c466aeeb42
oci.manifest.digest=sha256:efd99a6be65885c07c559679a0df4ec709604bcdd8cd83f0d00a1a683b28fb6a.

I don't see how calling it container.image.digest would make it any easier to understand or use (on the contrary, we'll have yet another attribute to correlate it to registry telemetry). Plus if I google for container.image.digest I get nothing definitive, oci.manifest.digest brings me right to the spec.

A) which part of the OCI spec would you be willing to use in your use-case specifically? The one that is depicted as Repo Digest and is used for downloading an image or the Image ID one?

If I defined oci namespace for my SDK, I'd start with:

oci.manifest.digest
oci.manifest.media_type
oci.schema_version (2 by default)

thinking more about it, it'd be useful for me to record config and layers digests on telemetry, so I'd also consider

oci.blob.digest (blob is coming from docker v2 API, but maybe descriptor would be a better term)
oci.blob.media_type - this can also be used to distinguish configs from layers

Since SDK (or registry) has no knowledge of the container environment the image/artifact will be used in (or if it will be used in the container environment at all), it would not know anything about containers or their ids, it will only be able to use the manifest digest

ChrsMark · 2023-07-31T08:30:56Z

Hey @lmolkova and thank you for the feedback! I see how container.image.id is different and now things are clear to me.
I think we can decide on using the oci.manifest.digest, I will update this PR accordingly :).

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

docs/resource/oci.md

lmolkova

LGTM, and thank you for the great discussion, @ChrsMark !

One small comment from me on the container.image.id - the note on it seems out of date - it says "OCI defines a digest of manifest ." (I can't leave a comment on it)

I'd either remove this sentence completely or change it to something along the following lines:

"The container.image.id of the same image running in different environments don't not always match. The oci.manifest.digest attribute, however, is the same for a given image in all container runtimes that follow OCI specification."

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

ChrsMark · 2023-08-03T07:53:47Z

Thanks for reviewing this folks! In adacd83 , I changed the fields to plural form since I think it's more accurate based on https://opentelemetry.io/docs/specs/otel/common/attribute-naming/#name-pluralization-guidelines. Runtimes also report those in plural form.

I also tuned the descriptions accordingly.
@lmolkova let me know what you think about the oci.manifest.digests description specifically.

joaopgrassi · 2023-08-23T09:50:53Z

Sounds reasonable to me, but we would have duplication of data right? As container.image.digests would be a superset of oci.manifest.digest. In cases where @lmolkova described where it deals with "non container" scenarios we would only populate oci.manifest.digest if I got it right?

ChrsMark · 2023-08-23T11:43:29Z

I think it depends on the perspective/method we collect these data:

but we would have duplication of data right? As container.image.digests would be a superset of oci.manifest.digest.

In cases we only have access to the already stored container images of our observed system we can retrieve the information from the CRI's API and hence we would populate container.image.digests accordingly. We don't have a straight forward way to retrieve the single manifest's digest so far (in that particular usecase). In some cases we might be able to populate both fields but that's a fair compromise (==data duplication) for now. Maybe in the long run the CRI definition will change (to singular) and we can deprecate the digests in order to only use the oci one.

In cases where @lmolkova described where it deals with "non container" scenarios we would only populate oci.manifest.digest

In cases we have direct access to know the specific manifest's digest we will be using the oci.manifest.digest (for example in download time/execution).

ChrsMark · 2023-08-24T08:10:20Z

@lmolkova what are your thoughts on this? It would be nice if we can move this one forward and conclude into sth soon :). Thanks!

lmolkova · 2023-08-29T21:24:10Z

There is 1:1 relationship between container and an image. One container can run only one image.

There is also 1:1 relationship between an image and it's manifest.

So having multiple digests for one container does not make sense.

The same image can be pushed to multiple repositories, but if it's the same image, it will have the same oci digest anywhere (as it's a sha256 of manifest json and uniquely identifies a specific version of image). Check out answers on this thread: https://stackoverflow.com/questions/45533005/why-digests-are-different-depend-on-registry

If we need to support docker v1 or something else where the same image can have multiple manifest digests, let's create container.docker.repo_digests or something similar.

ChrsMark · 2023-08-30T08:13:04Z

All right, based on the above discussions I have changed the oci.manifest. field accordingly to singular and introduced a new one to reflect what CRI's api provides:

oci.manifest.digest
container.image.repo_digests to reflect what CRI provides (https://github.com/kubernetes/cri-api/blob/c75ef5b473bbe2d0a4fc92f82235efd665ea8e9f/pkg/apis/runtime/v1/api.proto#L1238) for infra observability and security use-cases.

I hope that covers all that we have discussed so far.

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

joaopgrassi

A small consideration is that the PR does "3" things now: Change tag to array, introduce OCI and a new image attribute for the digests.

I'd probably split this, at least the tag array in a separate PR, but given this has been open for a while and had extensive discussions, I'm approving to avoid even more work on @ChrsMark side.

ChrsMark · 2023-08-31T08:30:21Z

@lmolkova I think the requested changes are now covered :) . Are we good to go with this one?

model/resource/container.yaml

lmolkova

Left one small comment, otherwise LGTM. Thank you!

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

ChrsMark · 2023-09-07T12:47:38Z

@open-telemetry/specs-semconv-maintainers this one should be ready for merge?

Add container.image.digest and make container.image.tag array

8dbfebe

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

ChrsMark requested review from a team as code owners July 4, 2023 08:26

github-actions bot assigned arminru Jul 4, 2023

ChrsMark mentioned this pull request Jul 4, 2023

Container semantic conventions: follow Open Container Initiative spec #48

Open

ChrsMark added 3 commits July 4, 2023 12:35

Merge remote-tracking branch 'upstream/main' into container_image

ba71df5

lint

e758406

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

Merge branch 'main' into container_image

a7caf37

ChrsMark force-pushed the container_image branch from b3de8a7 to a7caf37 Compare July 6, 2023 07:12

Merge remote-tracking branch 'upstream/main' into container_image

94807f5

lmolkova requested changes Jul 10, 2023

View reviewed changes

kaiyan-sheng reviewed Jul 17, 2023

View reviewed changes

docs/resource/container.md Outdated Show resolved Hide resolved

ChrsMark added 3 commits July 31, 2023 11:47

Add oci.manifest.digest

d098873

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

Merge remote-tracking branch 'upstream/main' into container_image

7515c62

fixup

e1d68bb

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

ChrsMark changed the title ~~Add container.image.digest and make container.image.tag array~~ Add oci.manifest.digest and make container.image.tag array Jul 31, 2023

jsuereth reviewed Aug 2, 2023

View reviewed changes

docs/resource/oci.md Outdated Show resolved Hide resolved

lmolkova approved these changes Aug 3, 2023

View reviewed changes

AlexanderWert approved these changes Aug 3, 2023

View reviewed changes

ChrsMark added 3 commits August 3, 2023 09:59

Merge remote-tracking branch 'upstream/main' into container_image

e5cccbf

review changes

3581275

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

Make fields plurals and enhance descriptions

adacd83

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

Merge branch 'main' into container_image

ebd2737

ChrsMark requested review from lmolkova, AlexanderWert and joaopgrassi August 30, 2023 08:13

ChrsMark force-pushed the container_image branch from 49a0e82 to 47fb33f Compare August 30, 2023 08:16

Make oci digest singular and add runtime repo_digests

c883557

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

ChrsMark force-pushed the container_image branch from 47fb33f to c883557 Compare August 30, 2023 08:18

joaopgrassi approved these changes Aug 30, 2023

View reviewed changes

ChrsMark changed the title ~~Add oci.manifest.digest and make container.image.tag array~~ Add oci.manifest.digest, container.image.repo_digests and make container.image.tag array Aug 30, 2023

AlexanderWert approved these changes Aug 31, 2023

View reviewed changes

Merge branch 'main' into container_image

8ef4c95

lmolkova reviewed Sep 6, 2023

View reviewed changes

model/resource/container.yaml Outdated Show resolved Hide resolved

lmolkova approved these changes Sep 6, 2023

View reviewed changes

Fix repo_digests examples

e5a3293

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

ChrsMark force-pushed the container_image branch from 83f35c4 to e5a3293 Compare September 6, 2023 20:03

Merge branch 'main' into container_image

3e667b8

joaopgrassi removed the request for review from a team September 11, 2023 09:16

joaopgrassi merged commit 9e3ac90 into open-telemetry:main Sep 11, 2023
9 checks passed

ChrsMark mentioned this pull request Sep 11, 2023

Tune container.image.* fields to follow OCI spec elastic/ecs#2230

Open

ChrsMark mentioned this pull request Oct 2, 2023

Tune container image fields to align with Otel SemConv elastic/ecs#2282

Closed

joaopgrassi mentioned this pull request Oct 4, 2023

[CONTRIBUTING.md] Add section about merging ECS conventions #333

Merged

3 tasks

ChrsMark mentioned this pull request Oct 19, 2023

Request to create semconv-{container,k8s}-approvers #427

Closed

ChrsMark mentioned this pull request Apr 12, 2024

[processor/k8sattributes] Implement container.image.id for k8sattributes processor open-telemetry/opentelemetry-collector-contrib#32314

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add oci.manifest.digest, container.image.repo_digests and make container.image.tag array #159

Add oci.manifest.digest, container.image.repo_digests and make container.image.tag array #159

ChrsMark commented Jul 4, 2023 •

edited

lmolkova left a comment

ChrsMark commented Jul 11, 2023

lmolkova commented Jul 18, 2023 •

edited

ChrsMark commented Jul 19, 2023 •

edited

lmolkova commented Jul 25, 2023 •

edited

ChrsMark commented Jul 25, 2023

lmolkova commented Jul 26, 2023 •

edited

ChrsMark commented Jul 31, 2023

lmolkova left a comment •

edited

ChrsMark commented Aug 3, 2023

joaopgrassi commented Aug 23, 2023

ChrsMark commented Aug 23, 2023 •

edited

ChrsMark commented Aug 24, 2023

lmolkova commented Aug 29, 2023 •

edited

ChrsMark commented Aug 30, 2023

joaopgrassi left a comment

ChrsMark commented Aug 31, 2023

lmolkova left a comment

ChrsMark commented Sep 7, 2023 •

edited

Add oci.manifest.digest, container.image.repo_digests and make container.image.tag array #159

Add oci.manifest.digest, container.image.repo_digests and make container.image.tag array #159

Conversation

ChrsMark commented Jul 4, 2023 • edited

lmolkova left a comment

Choose a reason for hiding this comment

ChrsMark commented Jul 11, 2023

lmolkova commented Jul 18, 2023 • edited

ChrsMark commented Jul 19, 2023 • edited

lmolkova commented Jul 25, 2023 • edited

ChrsMark commented Jul 25, 2023

lmolkova commented Jul 26, 2023 • edited

ChrsMark commented Jul 31, 2023

lmolkova left a comment • edited

Choose a reason for hiding this comment

ChrsMark commented Aug 3, 2023

joaopgrassi commented Aug 23, 2023

ChrsMark commented Aug 23, 2023 • edited

ChrsMark commented Aug 24, 2023

lmolkova commented Aug 29, 2023 • edited

ChrsMark commented Aug 30, 2023

joaopgrassi left a comment

Choose a reason for hiding this comment

ChrsMark commented Aug 31, 2023

lmolkova left a comment

Choose a reason for hiding this comment

ChrsMark commented Sep 7, 2023 • edited

ChrsMark commented Jul 4, 2023 •

edited

lmolkova commented Jul 18, 2023 •

edited

ChrsMark commented Jul 19, 2023 •

edited

lmolkova commented Jul 25, 2023 •

edited

lmolkova commented Jul 26, 2023 •

edited

lmolkova left a comment •

edited

ChrsMark commented Aug 23, 2023 •

edited

lmolkova commented Aug 29, 2023 •

edited

ChrsMark commented Sep 7, 2023 •

edited