-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: JSON Registry API V2.1 #9015
Comments
At a first glance, looks good! Will have a proper re-read at a later stage. One thing I noticed are the proposed JSON error messages; perhaps they could be "namespaced" as well, by reversing the parts, ie
Perhaps a more "rich" approach could be taken by combining "global" error-types with the namespace / object they affect, so that it is easier to handle. (e.g. Finally; the proposal describes returning a single error-code, which can be limiting. Being able to return multiple errors could offer more flexibility. |
@thaJeztah Thank for the suggestion about namespacing the errors. I'll play around with it. Do you have examples for the registry API use case where we'd like to see multiple errors returned? Or are you suggesting this as a measure of future-proofing? Either way, its a good suggestion. |
Regarding the end points (first impression, will add more suggestions in a later stage); for consistency, a different approach could be taken;
This will make |
@stevvooe I think the Twitter API does this, but it's not the best example of a good API https://dev.twitter.com/overview/api/response-codes I can try to find better examples, I know I saw some when doing some research for an API I was working on. |
upload progress Wondering if a separate endpoint/request is required to check upload progress. I'll need to dig a bit deeper into this for the technical side, but I think it would be possible to have the server respond with the current upload progress while uploading? edit I wasn't thinking when adding those links, because those are pure client-side Additionally, as @wking pointed out (#9015 (comment)), an endpoint is required for resuming |
Errors
that is simple enough and a valid addition. Tag API LayoutThe current desire for this api is to continue to support the notion of a default tag for a repository, aka "latest", so we won't be able to repurpose the tag-less URI to list tags. We may want to reconsider that but it might be better not to overload that API method. Based on your response, it also seems the contents of
As far as I understand, all images have a tag but if the tag is not specified, the default tag of "latest" is used. We may need to clarify the relationship between image, tag and repository. Upload ProgressUpload progress is served via the GET method, while uploads are using PUT, so grabbing progress concurrently should be supported with separate requests. The progress will only be reported on the server after a chunk is accepted and only at the granularity of the chunk size. Otherwise, maintaining backend consistency of upload state would be problematic. Keep in mind, the purpose of this feature is not to broadcast the upload progress to other consumers. Rather, the goal is manage resumable uploads. There is no reason one could not use this feature with the progress monitoring capability of |
On Thu, Nov 06, 2014 at 03:39:35PM -0800, Stephen Day wrote:
If I understand correctly, this is just the (possibly compressed? The only places where you'd want to limit caching via ETags, It would be nice if there was a way to upload descriptions for the |
On Thu, Nov 06, 2014 at 05:16:34PM -0800, Stephen Day wrote:
I don't know what the Git wire protocol looks like for this, but we PUT /v2//image which would upload the image (if it wasn't already uploaded) like: PUT /v2//image/ but it would also set the default tag (to whichever tag was in the |
On Thu, Nov 06, 2014 at 08:34:24PM -0800, W. Trevor King wrote:
And if you wanted to get really radical, you could have everyone do
So folks performing an unqualified: $ docker pull debian would get library/debian:7.7 (or whatever the default tag was) without On the other hand, you'd have to have separate names if you wanted |
On Thu, Nov 06, 2014 at 05:16:34PM -0800, Stephen Day wrote:
I haven't used XMLHttpRequest's ProgressEvents myself, but looking |
Yes, you are right. I realised after posting that the examples Basically, what I wanted to link to, is an example where I'm not really sure if that's possible and the "resume" reason |
But should this be a default that the repository uses, or the client that calls the repository? if I'm correct, currently the docker client automatically requests the |
I don't like that |
Have you thought at all about how to implement quotas? What if I'm and admin and I want to limit each namespace to e.g. 1GB of unique layer content? Here's an example: my namespace is currently at 700MB and I'm pushing a new image/tag that has 400MB of unique layer content split between 2 layers (299MB and 101MB). First my client would push the 299MB layer (which keeps me under quota), then my client would attempt to push the 101MB layer (which should not be allowed because that would put me over quota). At this point, there's an orphaned 299MB layer that should be deleted. In an ideal case, the registry should never have allowed that layer to have been uploaded in the first place. Would it be possible to take the overall size of the new layers into account at the beginning of a push? |
On Fri, Nov 07, 2014 at 06:57:32AM -0800, Andy Goldstein wrote:
I agree, unless we want folks to be able to configure the default tag |
That sounds fine to me. Re [1], I'm not clear why you'd need multiple repos to support debian:6.0 and debian:7.0? And what's the motivation for immutable tags? [1] #9015 (comment) |
On Fri, Nov 07, 2014 at 08:37:19AM -0800, Andy Goldstein wrote:
You don't need immutable tags with a configurable default branch, but
Predictable results for a given tag, no need for alias fetching, easy |
Ah, I see what you're saying. But, I can definitely think of use cases where multiple mutable tags would be useful. For example, using tags to signify when an image is "QA-ready", what should be deployed to "staging" or "production", etc. I wouldn't want separate repos just to be able to have sliding tags for these different targets. /cc @smarterclayton |
On Fri, Nov 07, 2014 at 08:53:53AM -0800, Andy Goldstein wrote:
Right. Hence my 1: “I'm not sure how many docker repositories use multiple mutable You could certainly have foo/bar-QA-ready, foo/bar-staging, Still, immutable tags aren't that big a win. Folks who care about “no need for alias fetching” 2 is actually a feature of having a |
TagsWe are going to drop the notion of default "latest" from the registry API and will leave that "sugar" to the client to resolve. Concretely, if a user only specifies "stevvooe/foo", it would be up to the client to fill in "latest" as the default tag. While the rest of the discussion about tags (aliases, immutable vs mutable) is constructive and you all make excellent points, changes to the tagging scheme are outside of the scope of this proposal. HTTP Caching
This section is indicating the immutable nature of the layer files should be leveraged at the HTTP caching layer, allowing docker clients to make a quick determination about the existence of the layer with a 304 response. Everything that can be done will be done to ensure that HTTP standard clients (read: proxies) will cache the content. Any other caching support for tags and manifests will be implemented as needed, depending on the nature of the resource. QuotasThis is an interesting request but its outside of the scope for this first revision. Could you file a feature request issue in docker/docker-registry with the prefix "NG:"? |
On Fri, Nov 07, 2014 at 11:15:15AM -0800, Stephen Day wrote:
In that case I agree with @thaJeztah 1 and @ncdc 2 that we should
Fair enough ;).
Right. Are you going to use all of that for caching immutable Expires: Fri, 1 Jan 2038 03:14:07 GMT and be done with it? I don't see why you'd also want to set ETags, |
Changes:
|
Added docker-archive/docker-registry#698 for tracking the quota request. |
@ncdc Thank you for filing the issue! I don't think there is anything within this proposal that prevents quotas from being implemented. Enforcement can be within these core API methods, but a management API could be added such that the client can avoid hitting those quotas before making uploads. We'll take the full discussion to docker-archive/docker-registry#698. |
On Fri, Nov 07, 2014 at 12:00:20PM -0800, Stephen Day wrote:
You have to precomute the tarsum, so you might as well precompute Also, the spec now has: PUT PUT /v2//image/ which should just be: PUT /v2//image/ Also, I think: POST /v2//layer/ should be a PUT call, because you're pushing to the same URI you'll be “The fundamental difference between the POST and PUT requests is |
@wking Thank you again for your careful feedback! I'll make sure the typos are corrected.
This allows the client to be as lazy as possible.
From section 9.5:
POST is used here because the resulting creation is subordinate to the layer
POST is also used here because the request is not idempotent, in that I'll add the "/upload/" suffix. |
Re: searching for images by tag etc:
Thanks. The issues you reference are closed, but it isn't clear to me if the resolutions include the ability to search by tag. Does anyone have an update on this that responds e.g. to this query? http://stackoverflow.com/questions/24481564/how-can-i-find-docker-image-with-specific-tag-in-docker-registry-in-docker-comma |
Hi @nealmcb Ideas/proposals are definitely welcome over there (docker/distribution). |
@jfrazelle Should we close this as accepted? The api spec lives in distribution now: https://github.com/docker/distribution/blob/master/doc/spec/api.md. Should we backport this into the docker core docs? |
yes yayyyyy! |
The link was broken; new link is https://github.com/docker/distribution/blob/master/docs/spec/api.md Or (future proof); https://github.com/docker/distribution/blob/636a19b2126ffe78d209eb6a7aedef857abd2539/docs/spec/api.md |
I could not find an official (not even an unofficial;) JSON Schema for the new v2 Registry API. |
@grexe could you open a feature request for that in the https://github.com/docker/distribution issue tracker? I know the specs are actually generated from code, perhaps there's even something there already. If in doubt, you can ask in the #docker-dev or #docker-distribution IRC channels |
wow @thaJeztah that was really fast! Did so, see distribution/distribution#1060. Now let's hope the implementation is also as fast,-) |
@grexe : documentation and specs for the v2 registry live here: https://github.com/docker/distribution/tree/master/docs/spec There is a code generator but it is written in go. |
thanks @RichardScothern but there is still no reference to a JSON schema, only canonicalization (which is not so important to me, personally,-). This would allow me to create separate repositories (e.g. per customer/realm/...) in my private registry from code, without having to shut down the entire registry and alter configuration by hand just to add a new repository... |
You can create repositories by uploading an image and its layers using the REST API |
I realized about a year ago that "JSON Schema" is actually a draft-standard for specifying the structure of JSON objects/types used by your API and is not just examples of JSON forms/responses. https://en.wikipedia.org/wiki/JSON#JSON_Schema |
Thanks again @RichardScothern it was not obvious to me that a new repository can be created just by specifying a non-existant name in the PUSH URI, but it's really there: completed upload specifically says that
Seems to be exactly what I need, perfect, thanks! |
exactly @jlhawn, just stumbled over another snag where a Boolean was not correctly identified by a mapper because my sample output was not sufficient (String vs. Boolean (even vs.Integer)) was not possible to distinguish from the output). |
@jlhawn @grexe @RichardScothern docker compose recently added a schema for validation (docker/compose#2089). Plans are to use the same schema in libcompose docker/libcompose#34. Just linking these to prevent duplicated work / research :-) |
Please don't comment on closed tickets. |
As a baseline for the new registry API specification, we are checking in the proposal as currently covered in moby/moby#9015. This will allow us to trace the process of transforming the proposal into a specification. The goal is to use api descriptors to generate templated documentation into SPEC.md. The resulting product will be submitted into docker core as part of the client PR.
👆 reported account for abuse (spam activity) Let me lock the conversation on this ticket |
Proposal: JSON Registry API V2.1
Abstract
The docker registry is a service to manage information about docker images and enable their distribution. While the current registry is usable, there are several problems with the architecture that have led to this proposal. For relevant details, please see the following issues:
The main driver of this proposal are changes to the docker the image format, covered in #8093. The new, self-contained image manifest simplifies the image definition and the underlying backend layout. To reduce bandwidth usage, the new registry will be architected to avoid uploading existing layers and will support resumable layer uploads.
While out of scope for this specification, the URI layout of the new API will be structured to support a rich Authentication and Authorization model by leveraging namespaces.
Furthermore, to bring docker registry in line with docker core, the registry is written in Go.
Scope
This proposal covers the URL layout and protocols of the Docker Registry V2 JSON API. This will affect the docker core registry API and the rewrite of docker-registry.
This includes the following features:
While authentication and authorization support will influence this specification, details of the protocol will be left to a future specification. Other features marked as next generation will be incorporated when the initial support is complete. Please see the road map for details.
Use Cases
For the most part, the use cases of the former registry API apply to the new version. Differentiating uses cases are covered below.
Resumable Push
Company X's build servers lose connectivity to docker registry before completing an image layer transfer. After connectivity returns, the build server attempts to re-upload the image. The registry notifies the build server that the upload has already been partially attempted. The build server responds by only sending the remaining data to complete the image file.
Resumable Pull
Company X is having more connectivity problems but this time in their deployment datacenter. When downloading an image, the connection is interrupted before completion. The client keeps the partial data and uses http
Range
requests to avoid downloading repeated data.Layer Upload De-duplication
Company Y's build system creates two identical docker layers from build processes A and B. Build process A completes uploading the layer before B. When process B attempts to upload the layer, the registry indicates that its not necessary because the layer is already known.
If process A and B upload the same layer at the same time, both operations will proceed and the first to complete will be stored in the registry (Note: we may modify this to prevent dogpile with some locking mechanism).
Access Control
Company X would like to control which developers can push to which repositories. By leveraging the URI format of the V2 registry, they can control who is able to access which repository, who can pull images and who can push layers.
Dependencies
Initially, a V2 client will be developed in conjunction with the new registry service to facilitate rich testing and verification. Once this is ready, the new client will be used in docker to communicate with V2 registries.
Proposal
This section covers proposed client flows and details of the proposed API endpoints. All endpoints will be prefixed by the API version and the repository name:
For example, an API endpoint that will work with the
library/ubuntu
repository, the URI prefix will be:This scheme will provide rich access control over various operations and methods using the URI prefix and http methods that can be controlled in variety of ways.
Classically, repository names have always been two path components where each path component is less than 30 characters. The V2 registry API does not enforce this. The rules for a repository name are as follows:
[a-z0-9]+(?:[._-][a-z0-9]+)*
and the matched result must be 2 or more characters in length.These name requirements only apply to the registry API and should accept a superset of what is supported by other docker community components.
API Methods
A detailed list of methods and URIs are covered in the table below:
/v2/
/v2/<name>/tags/list
name
./v2/<name>/manifests/<tag>
name
andtag
./v2/<name>/manifests/<tag>
name
andtag
./v2/<name>/manifests/<tag>
name
andtag
./v2/<name>/blobs/<digest>
digest
./v2/<name>/blobs/<digest>
/v2/<name>/blobs/uploads/
digest
parameter is present, the request body will be used to complete the upload in a single request./v2/<name>/blobs/uploads/<uuid>
uuid
. The primary purpose of this endpoint is to resolve the current status of a resumable upload./v2/<name>/blobs/uploads/<uuid>
uuid
. This is identical to the GET request./v2/<name>/blobs/uploads/<uuid>
/v2/<name>/blobs/uploads/<uuid>
uuid
, optionally appending the body as the final chunk./v2/<name>/blobs/uploads/<uuid>
All endpoints should support aggressive http caching, compression and range headers, where appropriate. Details of each method are covered in the following sections.
The new API will attempt to leverage HTTP semantics where possible but may break from standards to implement targeted features.
Errors
Actionable failure conditions, covered in detail in their relevant sections, will be reported as part of 4xx responses, in a json response body. One or more errors will be returned in the following format:
The
code
field will be a unique identifier, all caps with underscores by convention. Themessage
field will be a human readable string. The optionaldetail
field may contain arbitrary json data providing information the client can use to resolve the issue.The error codes encountered via the API are enumerated in the following table:
UNKNOWN
DIGEST_INVALID
SIZE_INVALID
NAME_INVALID
TAG_INVALID
NAME_UNKNOWN
MANIFEST_UNKNOWN
MANIFEST_INVALID
MANIFEST_UNVERIFIED
BLOB_UNKNOWN
BLOB_UPLOAD_UNKNOWN
While the client can take action on certain error codes, the registry may add new error codes over time. All client implementations should treat unknown error codes as
UNKNOWN
, allowing future error codes to be added without breaking API compatibility. For the purposes of the specification error codes will only be added and never removed.API Version Check
A minimal endpoint, mounted at
/v2/
will provide version support information based on its response statuses. The request format is as follows:If a
200 OK
response is returned, the registry implements the V2(.1) registry API and the client may proceed safely with other V2 operations. Optionally, the response may contain information about the supported paths in the response body. The client should be prepared to ignore this data.If a
401 Unauthorized
response is returned, the client should take action based on the contents of the "WWW-Authenticate" header and try the endpoint again. Depending on access control setup, the client may still have to authenticate against different resources, even if this check succeeds.If
404 Not Found
response status, or other unexpected status, is returned, the client should proceed with the assumption that the registry does not implement V2 of the API.Pulling An Image
An "image" is a combination of a JSON manifest and individual layer files. The process of pulling an image centers around retrieving these two components.
The first step in pulling an image is to retrieve the manifest. For reference, the relevant manifest fields for the registry are the following:
For more information about the manifest format, please see docker/docker#8093.
When the manifest is in hand, the client must verify the signature to ensure the names and layers are valid. Once confirmed, the client will then use the tarsums to download the individual layers. Layers are stored in as blobs in the V2 registry API, keyed by their tarsum digest.
The API details follow.
Pulling an Image Manifest
The image manifest can be fetched with the following url:
The "name" and "tag" parameter identify the image and are required.
A
404 Not Found
response will be returned if the image is unknown to the registry. If the image exists and the response is successful, the image manifest will be returned, with the following format (see #8093 for details):The client should verify the returned manifest signature for authenticity before fetching layers.
Pulling a Layer
Layers are stored in the blob portion of the registry, keyed by tarsum digest. Pulling a layer is carried out by a standard http request. The URL is as follows:
Access to a layer will be gated by the
name
of the repository but is identified uniquely in the registry bytarsum
. Thetarsum
parameter is an opaque field, to be interpreted by the tarsum library.This endpoint may issue a 307 (302 for <HTTP 1.1) redirect to another service for downloading the layer and clients should be prepared to handle redirects.
This endpoint should support aggressive HTTP caching for image layers. Support for Etags, modification dates and other cache control headers should be included. To allow for incremental downloads,
Range
requests should be supported, as well.Pushing An Image
Pushing an image works in the opposite order as a pull. After assembling the image manifest, the client must first push the individual layers. When the layers are fully pushed into the registry, the client should upload the signed manifest.
The details of each step of the process are covered in the following sections.
Pushing a Layer
All layer uploads use two steps to manage the upload process. The first step starts the upload in the registry service, returning a url to carry out the second step. The second step uses the upload url to transfer the actual data. Uploads are started with a POST request which returns a url that can be used
to push data and check upload status.
The
Location
header will be used to communicate the upload location after each request. While it won't change in the this specification, clients should use the most recent value returned by the API.Starting An Upload
To begin the process, a POST request should be issued in the following format:
The parameters of this request are the image namespace under which the layer will be linked. Responses to this request are covered below.
Existing Layers
The existence of a layer can be checked via a
HEAD
request to the blob store API. The request should be formatted as follows:If the layer with the tarsum specified in
digest
is available, a 200 OK response will be received, with no actual body content (this is according to http specification). The response will look as follows:When this response is received, the client can assume that the layer is already available in the registry under the given name and should take no further action to upload the layer. Note that the binary digests may differ for the existing registry layer, but the tarsums will be guaranteed to match.
Uploading the Layer
If the POST request is successful, a
202 Accepted
response will be returned with the upload URL in theLocation
header:The rest of the upload process can be carried out with the returned url, called the "Upload URL" from the
Location
header. All responses to the upload url, whether sending data or getting status, will be in this format. Though the URI format (/v2/<name>/blobs/uploads/<uuid>
) for theLocation
header is specified, clients should treat it as an opaque url and should never try to assemble the it. While theuuid
parameter may be an actual UUID, this proposal imposes no constraints on the format and clients should never impose any.Upload Progress
The progress and chunk coordination of the upload process will be coordinated through the
Range
header. While this is a non-standard use of theRange
header, there are examples of similar approaches in APIs with heavy use. For an upload that just started, for an example with a 1000 byte layer file, theRange
header would be as follows:To get the status of an upload, issue a GET request to the upload URL:
The response will be similar to the above, except will return 204 status:
Note that the HTTP
Range
header byte ranges are inclusive and that will be honored, even in non-standard use cases.Monolithic Upload
A monolithic upload is simply a chunked upload with a single chunk and may be favored by clients that would like to avoided the complexity of chunking. To carry out a "monolithic" upload, one can simply put the entire content blob to the provided URL:
The "digest" parameter must be included with the PUT request. Please see the Completed Upload section for details on the parameters and expected responses.
Additionally, the download can be completed with a single
POST
request to the uploads endpoint, including the "size" and "digest" parameters:On the registry service, this should allocate a download, accept and verify the data and return the same response as the final chunk of an upload. If the POST request fails collecting the data in any way, the registry should attempt to return an error response to the client with the
Location
header providing a place to continue the download.The single
POST
method is provided for convenience and most clients should implementPOST
+PUT
to support reliable resume of uploads.Chunked Upload
To carry out an upload of a chunk, the client can specify a range header and only include that part of the layer file:
There is no enforcement on layer chunk splits other than that the server must receive them in order. The server may enforce a minimum chunk size. If the server cannot accept the chunk, a
416 Requested Range Not Satisfiable
response will be returned and will include aRange
header indicating the current status:If this response is received, the client should resume from the "last valid range" and upload the subsequent chunk. A 416 will be returned under the following conditions:
When a chunk is accepted as part of the upload, a
202 Accepted
response will be returned, including aRange
header with the current upload status:Completed Upload
For an upload to be considered complete, the client must submit a
PUT
request on the upload endpoint with a digest parameter. If it is not provided, the download will not be considered complete. The format for the final chunk will be as follows:Optionally, if all chunks have already been uploaded, a
PUT
request with adigest
parameter and zero-length body may be sent to complete and validated the upload. Multiple "digest" parameters may be provided with different digests. The server may verify none or all of them but must notify the client if the content is rejected.When the last chunk is received and the layer has been validated, the client will receive a
201 Created
response:The
Location
header will contain the registry URL to access the accepted layer file.Digest Parameter
The "digest" parameter is designed as an opaque parameter to support verification of a successful transfer. The initial version of the registry API will support a tarsum digest, in the standard tarsum format. For example, a HTTP URI parameter might be as follows:
Given this parameter, the registry will verify that the provided content does result in this tarsum. Optionally, the registry can support other other digest parameters for non-tarfile content stored as a layer. A regular hash digest might be specified as follows:
Such a parameter would be used to verify that the binary content (as opposed to the tar content) would be verified at the end of the upload process.
For the initial version, registry servers are only required to support the tarsum format.
Canceling an Upload
An upload can be cancelled by issuing a DELETE request to the upload endpoint. The format will be as follows:
After this request is issued, the upload uuid will no longer be valid and the registry server will dump all intermediate data. While uploads will time out if not completed, clients should issue this request if they encounter a fatal error but still have the ability to issue an http request.
Errors
If an 502, 503 or 504 error is received, the client should assume that the download can proceed due to a temporary condition, honoring the appropriate retry mechanism. Other 5xx errors should be treated as terminal.
If there is a problem with the upload, a 4xx error will be returned indicating the problem. After receiving a 4xx response (except 416, as called out above), the upload will be considered failed and the client should take appropriate action.
The following table covers the various error conditions that may be returned after completing a layer upload:
Note that the upload url will not be available forever. If the upload uuid is unknown to the registry, a
404 Not Found
response will be returned and the client must restart the upload process.Pushing an Image Manifest
Once all of the layers for an image are uploaded, the client can upload the image manifest. An image can be pushed using the following request formats:
The
name
andtag
fields of the response body must match those specified in the URL.If there is a problem with pushing the manifest, a relevant 4xx response will be returned with a JSON error message. The following table covers the various error conditions and their corresponding codes:
For the
UNKNOWN_LAYER
error, thedetail
field of the error response will have an "unknown" field with information about the missing layer. For now, that will just be the tarsum. There will be an error returned for each unknown blob. The response format will be as follows:Listing Image Tags
It may be necessary to list all of the tags under a given repository. The tags for an image repository can be retrieved with the following request:
The response will be in the following format:
For repositories with a large number of tags, this response may be quite large, so care should be taken by the client when parsing the response to reduce copying.
Deleting an Image
An image may be deleted from the registry via its
name
andtag
. A delete may be issued with the following request format:If the image exists and has been successfully deleted, the following response will be issued:
If the image had already been deleted or did not exist, a
404 Not Found
response will be issued instead.Roadmap
Reviewers
The text was updated successfully, but these errors were encountered: