Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Standard header to attribute requests #328

Open
imjasonh opened this issue Jun 29, 2022 · 3 comments
Open

Proposal: Standard header to attribute requests #328

imjasonh opened this issue Jun 29, 2022 · 3 comments

Comments

@imjasonh
Copy link
Member

I'd like to propose a new recommended HTTP header that clients can send to registries, when pushing and pulling blobs and manifests, to indicate what image ref it's being pushed/pulled for.

As an example, today, when a client pushes registry.biz/hello:image, they first push all the constituent blobs (POST https://registry.biz/v2/hello/blobs/sha256:abc..., etc.), then push the manifest (POST https://registry.biz/v2/hello/manifests/image). When the registry receives the blob request though, it has no idea what image that blob belongs to.

On pull, clients pull the manifest (GET https://registry.biz/v2/hello/manifests/image), resolving the manifest's digest, then pull all the referenced blobs (GET https://registry.biz/v2/hello/blobs/sha256:abc...), but when the registry gets the blob request, it has no idea what image the blob belongs to.

If multiple pulls from the same repo are happening concurrently, as is often the case, attributing a blob request to a manifest request can be very difficult. You could try to align useragents, or client IP addresses, or auth credentials, or what blobs a manifest contains, but none of those are very reliable signals.

The client knows what image it's pulling when it pulls a blob; why doesn't it share this with the registry?

Advantages

Tying a blob request to a manifest request has some benefits related to resource usage attribution: as a registry operator, I may want to track bandwidth usage per image reference -- how many bytes am I serving for registry.biz/hello:image vs registry.biz/hello:image2? When I roll out :image3, how are blob requests and bandwidth usage and latency affected?

It can also have benefits for garbage collection of aborted pushes. If I'm pushing registry.biz/hello:image and first push its blob, then give up or fail, the blob may exist without being referenced by an image. When the blob is pushed, the registry can note that it's meant to be a part of hello:image, and after some time, if hello:image doesn't exist, it can delete the orphaned blob. Today, I believe Quay associates blobs with the bare repo until an image manifest exists that references it; this provides an avenue to have finer-grained attribution of these blobs.

Proposal

I'd like to add to distribution-spec an optional-but-recommended OCI-Ref header (open to bikeshedding) that the client should send when requesting a blob or manifest, with the image reference the request is being made for.

If I'm pulling registry.biz/my-image:latest, any blob requests that happen as a result would include the request header OCI-Ref: registry.biz/my-image:latest.

This extends to manifests as well. If I'm pulling an index registry.biz/hello:image, any subsequent manifest request being made as a result would include OCI-Ref: registry.biz/hello:index. Subsequent blob requests would have OCI-Ref: registry.biz/hello@sha256:def..., the image manifest reference by digest, as referenced by the index.

I have an open PR in go-containerregistry to implement this behavior (albeit with a different header value), and some example output showing the behavior.

Disadvantages

This is a client-sent header, and clients can lie, or simply omit the header. Registries shouldn't make important decisions based on the value of the header, and only use it informationally.

At first, no clients will set the header. Only as clients adopt the behavior will the signal coming in to registries improve enough to be useful for anything.

This would likely require OCI to more clearly specify what a "reference" is, and what its format is, and what it represents. Currently, image-spec only deals in manifests and descriptors and digests, and distribution-spec only deals in HTTP paths, and neither specifies the form of a "reference" as we normally think of it -- e.g., registry.biz/hello:image. This came up a bit during opencontainers/image-spec#822 as well.

cc @jonjohnsonjr

@sajayantony
Copy link
Member

This is a nice suggestion.

  • Should we consider the digest of the manifest that triggered this since for index and mutating tags this value might not be useful looking back at logs or other data.
  • Should we also consider recommending a unique pull/push correlation Id which corresponds to a logical operation from the client.

From a registry observability standpoint it would be nice to see that your manifest pull/push and blob operations all correspond to one logical operation.

@jonjohnsonjr
Copy link
Contributor

Should we also consider recommending a unique pull/push correlation Id which corresponds to a logical operation from the client.

I think non-standard headers like X-Correlation-ID serve this purpose already. It's unfortunate that there's not a standard header for this, otherwise we could link to an RFC or something :/

@imjasonh
Copy link
Member Author

  • Should we consider the digest of the manifest that triggered this since for index and mutating tags this value might not be useful looking back at logs or other data.

Good question. We could include both, in separate headers, if we want. The example PR linked above sends the ref-by-tag the user requested, then the ref-by-digest of the platform-specific manifest pulled out of the index manifest (if any) when pulling blobs.

  • Should we also consider recommending a unique pull/push correlation Id which corresponds to a logical operation from the client.

I'd be okay with that too, especially if there's some prior art we can draft off of. If not, if we invent our own, I'm not sure what else we'd use besides some form of the manifest reference that's being pushed.

A client that pushes or pulls many unassociated things at once could come up with some unique string for that multi-pull operation I guess, but how common is that, and how useful is that to correlate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants