New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add registry proxying section #66
base: main
Are you sure you want to change the base?
Conversation
42589cd
to
e5df360
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments & questions.
spec.md
Outdated
@@ -901,6 +902,20 @@ If the image had already been deleted or did not exist, a `404 Not Found` respon | |||
|
|||
> for more details, see: [compatibility.md](https://github.com/docker/distribution/blob/master/docs/spec/manifest-v2-2.md#backward-compatibility) | |||
|
|||
### Registry Mirroring | |||
|
|||
A registry MAY operate as a mirror for an upstream registry to support pull-through caching or proxying of fetch operations (such as fetching tags, manifest, or blobs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/s/manifest/manifests/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first time for the term fetch operations in this document
Maybe http GET requests/operations instead? If agree need to also change below.. otherwise maybe introduce the term earlier on in the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec.md
Outdated
|
||
A client SHOULD be aware of whether a registry host being used is a mirror. | ||
A client SHOULD avoid sending `ns` query parameters to non-mirrored registries. | ||
A client SHOULD NOT send authorization credentials for an upstream registry to a mirrored registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/s/mirrored/mirror/
Why do clients need to know anything about pull-through caching if its implemented server-side? |
e5df360
to
29fdf25
Compare
The clients should know how the registry host was resolved from a given image reference. The clients don't care how the server is implemented, but they SHOULD provide information to the server which indicates what the reference being asked for is. Just as when an HTTP client connects with a PROXY server it must communicate what the upstream server is, the same is true here. Today the protocol doesn't define anyway to communicate what the upstream is and proxies end up be hardcoded to a single upstream. In a few cases you can see proxies use custom domains per upstream and require users to change the name of their images in order to use them. |
Right... isn't that the point? If I encode that "myregistry/mynamespace/myrepo" goes to "upstream/foo/bar", that's a detail for the maintainer of If the goal is to allow the client to specify "upstream/foo/bar", then I'd that the target is not really a repository anymore, but simply a working proxy, and thus, a different protocol parameter might be useful, but registries should therefore have the option to not support said parameter. |
That is one use case that will still work. In the example you mentioned, when a repository is proxied in that fashion, the puller often does know of this detail as they must explicitly provide myregistry with the intent of getting some upstream content. The use case where myregistry is some sort of blessed version of upstream is reasonable, but not the intent of the namespace parameter here.
This is the use case here and proxy may be better terminology here, but that is really a detail of the registry. The registry may act as a proxy, proxy-cache, or active mirror, that is out of scope for definition here. This parameter just enables all of those features to work across multiple namespaces. For example if you want public images from both |
Or configure two repositories, one for each? (especially since combining them could lead to merge conflicts). I'm concerned we're adding quite a bit of complexity to address a use case that has simpler solutions when configured on the registry side. |
Can you elaborate here? Configuring a repository for each mirror is non-trivial. Configuring a domain for each upstream and routing to the upstream based on the domain is not easier, that would still requires the same routing on the server side that an implementation of this would require. The client side implementation to support per-registry configuration is not simple and inherently requires catch -all conditions when trying to enforce proxying through a gateway. I did do a client side implementation of this to demonstrate the feature and allow server side implementations a client to test against. On the client side, it is not complex at all since clients should already know how to handle 404s when multiple registry endpoints are configured. On the server side, the complexity to support this isn't much more than existing proxy-cache support. |
Its non-trivial, but its not that difficult either :) My concern remains around complexity: the document as outlined, for example, says that the If we feel that pass-through proxying of other registries is, in and of itself, a feature of the protocol (rather than something configured on the registry side), then I suspect we need to give significantly more thought to the end-to-end user experience. For example, I could imagine some paths supporting proxying and others not. |
The clients have the most context and really does not need to be defined here, only that a client SHOULD make that distinction to avoid sending unnecessary redundant information. The clients themselves have both the configuration and endpoint resolution logic, so it has multiple options for determining this. In the implementation I sent I just simply did this by checking whether the endpoint was configured without push support, as this could indicate the registry being communicated with may not be the upstream source. However, I will probably add a check there for
No, the registry can simply ignore it. This is like asking a registry today which was configured to mirror docker.io to return an error if the client actually meant quay.io, the registry just isn't expected to have the same amount of context as a client in regards to the intent of the entire pull process. If the registry chooses to be handle the
They aren't expected to check for it, but rather be explicitly configured for it. A client will know if it is configured to always use a specific mirror or a mirror for multiple namespaces. I think what you are suggesting here though is the idea of registry discovery. That is a much larger topic that I would still love to see happen, in that feature a client could start with zero knowledge (except of course the domain quay.io, docker.io, etc) and discover registry capabilities and endpoints. |
discussion ensues on the call today. |
pinging @thomasmckay and @kurtismullins who are implementing mirroring on Quay -- they probably have feedback and want to track this thread |
is `ns` already used, so best to continue with that mnemonic? or could
it be something to not collide with the outdated concept that images
would on be named "transport/namespace/name:tag"?
|
@vbatts I use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think this is a great addition. It might be a good idea to add a few combinatoric examples of how ns and the the repo name are combined to calculate the upstream and local mirroring location.
@dmcgowan one thing i'm unclear on here is: can i have a single registry mirror that will be usable for more than one remote registry (i.e. remote of |
Notes from today's call:
Please lets find a way to classify this language (whether client or server side). So we can close out or merge this |
Reiterating our convo from the meeting: I actually see a lot of value in adding this query parameter, but removing any connotation that it is the blessed solution for repository mirroring. I think that by including this value, a proxy could implement lots of different behavior for the client that need not be directly related to repository mirroring. |
I note that harbor uses the term "replication" rather than proxy-cache or mirroring, which I quite like https://goharbor.io/docs/1.10/administration/configuring-replication/ |
658a97c
to
1e3a9a4
Compare
Updated section to remove use of the term "mirror". A mirror is just a special case of a proxy or could be a completely different configuration. For example, the most common "mirroring" case which is used by Docker is actually a pull-through cache. Using the more generic terminology here instead of continuing the confusion :) |
1e3a9a4
to
7050de8
Compare
Define repository namespace query parameter for proxying. Signed-off-by: Derek McGowan <derek@mcg.dev>
7050de8
to
291f0e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
apart from a curious question
| `Host` | header | Standard HTTP Host Header. SHOULD be set to the registry host. | | ||
| `Authorization` | header | An RFC7235 compliant authorization header. | | ||
| `name` | path | Name of the target repository. | | ||
| `ns` | query | (OPTIONAL) Namespace of repository. SHOULD be set to source host. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we clarify that if ns
is not provided, there is a default assumption for the source host? or is that implied in the configuration of registry-proxy?
|
||
A client SHOULD be aware of whether a registry host is being used is a proxy, such as when the `ns` query parameter differs from the `Host` header. | ||
A client SHOULD avoid sending `ns` query parameters to non-proxy registries. | ||
A client SHOULD NOT unintentionally send authorization credentials for an upstream registry to a proxy registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop "unintentionally", replace with "... except when explicitly indicated (e.g. via configuration)"
Security sensitive requirements being labeled as SHOULD / SHOULD NOT, SHOULD also give you pause. Can this be upgraded to a MUST?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative wording:
Clients SHOULD NOT assume that proxy registries can handle authentication credentials issued by an upstream registry. Authentication credentials MUST NOT be sent to different hostnames except when prior knowledge or a configuration signal of authentication support is present.
big time rebase needed @dmcgowan |
| `Host` | header | Standard HTTP Host Header. SHOULD be set to the registry host. | | ||
| `Authorization` | header | An RFC7235 compliant authorization header. | | ||
| `name` | path | Name of the target repository. | | ||
| Name | Kind | Description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we lost these tables in the spec.md rewrite
An example of delegating functionality is proxying pull operations to another registry. | ||
An example of adding functionality is implementing a pull-through cache of pulls to another registry. | ||
When operating as a proxy, the `Host` header passed to the registry will be the host of the PROXY and NOT the host in the repository name used by the client. | ||
A `ns` query parameter on pull operations is OPTIONAL, but when used specifies the host in a repository name used by a client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the key piece - we should document at least this query param
Pushing this out to 1.2 while we figure out how to properly word this |
Hi team, just checking to see when this is likely to be merged in? Thanks |
@dcarrion87 it's going to be a while. Effort is still continuing on the 1.1 release. After that, we'll revisit this. |
I don't mind rebasing this after 1.1 to target 1.2. It really is a pretty light change, it has been included by |
| `Range` | header | HTTP Range header specifying blob chunk. | | ||
| `name` | path | Name of the target repository. | | ||
| `digest` | path | Digest of desired blob. | | ||
| `ns` | query | (OPTIONAL) Namespace of repository. SHOULD be set to source host. | | ||
|
||
###### On Success: Partial Content | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quickly please!
I just wanted to check in if this proposed change has just stalled or is abandoned? I know that these things take a long time, so it can at times be hard to tell. I have been working in depth with mirroring OCI artifacts over the last year now. The addition of an I just wanted to add my input that I think that adding this to the spec would be very useful and could potentially help pain points that exist today when mirroring. If there is any additional work required before merging this, I would be happy to help. |
Define repository namespace query parameter for proxying.
Closes #12
Giving time for registry operators to weigh in
Maintainer approval