Proposal: JSON Registry API V2.1 #9015

stevvooe · 2014-11-06T23:39:27Z

Proposal: JSON Registry API V2.1

Abstract

The docker registry is a service to manage information about docker images and enable their distribution. While the current registry is usable, there are several problems with the architecture that have led to this proposal. For relevant details, please see the following issues:

The main driver of this proposal are changes to the docker the image format, covered in #8093. The new, self-contained image manifest simplifies the image definition and the underlying backend layout. To reduce bandwidth usage, the new registry will be architected to avoid uploading existing layers and will support resumable layer uploads.

While out of scope for this specification, the URI layout of the new API will be structured to support a rich Authentication and Authorization model by leveraging namespaces.

Furthermore, to bring docker registry in line with docker core, the registry is written in Go.

Scope

This proposal covers the URL layout and protocols of the Docker Registry V2 JSON API. This will affect the docker core registry API and the rewrite of docker-registry.

This includes the following features:

Namespace-oriented URI Layout
PUSH/PULL registry server for V2 image manifest format
Resumable layer PUSH support
V2 Client library implementation

While authentication and authorization support will influence this specification, details of the protocol will be left to a future specification. Other features marked as next generation will be incorporated when the initial support is complete. Please see the road map for details.

Use Cases

For the most part, the use cases of the former registry API apply to the new version. Differentiating uses cases are covered below.

Resumable Push

Company X's build servers lose connectivity to docker registry before completing an image layer transfer. After connectivity returns, the build server attempts to re-upload the image. The registry notifies the build server that the upload has already been partially attempted. The build server responds by only sending the remaining data to complete the image file.

Resumable Pull

Company X is having more connectivity problems but this time in their deployment datacenter. When downloading an image, the connection is interrupted before completion. The client keeps the partial data and uses http Range requests to avoid downloading repeated data.

Layer Upload De-duplication

Company Y's build system creates two identical docker layers from build processes A and B. Build process A completes uploading the layer before B. When process B attempts to upload the layer, the registry indicates that its not necessary because the layer is already known.

If process A and B upload the same layer at the same time, both operations will proceed and the first to complete will be stored in the registry (Note: we may modify this to prevent dogpile with some locking mechanism).

Access Control

Company X would like to control which developers can push to which repositories. By leveraging the URI format of the V2 registry, they can control who is able to access which repository, who can pull images and who can push layers.

Dependencies

Initially, a V2 client will be developed in conjunction with the new registry service to facilitate rich testing and verification. Once this is ready, the new client will be used in docker to communicate with V2 registries.

Proposal

This section covers proposed client flows and details of the proposed API endpoints. All endpoints will be prefixed by the API version and the repository name:

/v2/<name>/

For example, an API endpoint that will work with the library/ubuntu repository, the URI prefix will be:

/v2/library/ubuntu/

This scheme will provide rich access control over various operations and methods using the URI prefix and http methods that can be controlled in variety of ways.

Classically, repository names have always been two path components where each path component is less than 30 characters. The V2 registry API does not enforce this. The rules for a repository name are as follows:

A repository name is broken up into path components. A component of a repository name must be at least two characters, optionally separated by periods, dashes or underscores. More strictly, it must match the regular expression [a-z0-9]+(?:[._-][a-z0-9]+)* and the matched result must be 2 or more characters in length.
The name of a repository must have at least two path components, separated by a forward slash.
The total length of a repository name, including slashes, must be less the 256 characters.

These name requirements only apply to the registry API and should accept a superset of what is supported by other docker community components.

API Methods

A detailed list of methods and URIs are covered in the table below:

Method	Path	Entity	Description
GET	`/v2/`	Check	Check that the endpoint implements Docker Registry API V2.
GET	`/v2/<name>/tags/list`	Tags	Fetch the tags under the repository identified by `name`.
GET	`/v2/<name>/manifests/<tag>`	Manifest	Fetch the manifest identified by `name` and `tag`.
PUT	`/v2/<name>/manifests/<tag>`	Manifest	Put the manifest identified by `name` and `tag`.
DELETE	`/v2/<name>/manifests/<tag>`	Manifest	Delete the manifest identified by `name` and `tag`.
GET	`/v2/<name>/blobs/<digest>`	Blob	Retrieve the blob from the registry identified by `digest`.
HEAD	`/v2/<name>/blobs/<digest>`	Blob	Check if the blob is known to the registry.
POST	`/v2/<name>/blobs/uploads/`	Blob Upload	Initiate a resumable blob upload. If successful, an upload location will be provided to complete the upload. Optionally, if the `digest` parameter is present, the request body will be used to complete the upload in a single request.
GET	`/v2/<name>/blobs/uploads/<uuid>`	Blob Upload	Retrieve status of upload identified by `uuid`. The primary purpose of this endpoint is to resolve the current status of a resumable upload.
HEAD	`/v2/<name>/blobs/uploads/<uuid>`	Blob Upload	Retrieve status of upload identified by `uuid`. This is identical to the GET request.
PATCH	`/v2/<name>/blobs/uploads/<uuid>`	Blob Upload	Upload a chunk of data for the specified upload.
PUT	`/v2/<name>/blobs/uploads/<uuid>`	Blob Upload	Complete the upload specified by `uuid`, optionally appending the body as the final chunk.
DELETE	`/v2/<name>/blobs/uploads/<uuid>`	Blob Upload	Cancel outstanding upload processes, releasing associated resources. If this is not called, the unfinished uploads will eventually timeout.

All endpoints should support aggressive http caching, compression and range headers, where appropriate. Details of each method are covered in the following sections.

The new API will attempt to leverage HTTP semantics where possible but may break from standards to implement targeted features.

Errors

Actionable failure conditions, covered in detail in their relevant sections, will be reported as part of 4xx responses, in a json response body. One or more errors will be returned in the following format:

{
    "errors:" [{
            "code": <error identifier>,
            "message": <message describing condition>,
            "detail": <unstructured>
        },
        ...
    ]
}

The code field will be a unique identifier, all caps with underscores by convention. The message field will be a human readable string. The optional detail field may contain arbitrary json data providing information the client can use to resolve the issue.

The error codes encountered via the API are enumerated in the following table:

Code	Message	Description	HTTPStatusCodes
`UNKNOWN`	unknown error	Generic error returned when the error does not have an API classification.	Any
`DIGEST_INVALID`	provided digest did not match uploaded content	When a blob is uploaded, the registry will check that the content matches the digest provided by the client. The error may include a detail structure with the key "digest", including the invalid digest string. This error may also be returned when a manifest includes an invalid layer digest.	400, 404
`SIZE_INVALID`	provided length did not match content length	When a layer is uploaded, the provided size will be checked against the uploaded content. If they do not match, this error will be returned.	400
`NAME_INVALID`	manifest name did not match URI	During a manifest upload, if the name in the manifest does not match the uri name, this error will be returned.	400, 404
`TAG_INVALID`	manifest tag did not match URI	During a manifest upload, if the tag in the manifest does not match the uri tag, this error will be returned.	400, 404
`NAME_UNKNOWN`	repository name not known to registry	This is returned if the name used during an operation is unknown to the registry.	404
`MANIFEST_UNKNOWN`	manifest unknown	This error is returned when the manifest, identified by name and tag is unknown to the repository.	404
`MANIFEST_INVALID`	manifest invalid	During upload, manifests undergo several checks ensuring validity. If those checks fail, this error may be returned, unless a more specific error is included. The detail will contain information the failed validation.	400
`MANIFEST_UNVERIFIED`	manifest failed signature verification	During manifest upload, if the manifest fails signature verification, this error will be returned.	400
`BLOB_UNKNOWN`	blob unknown to registry	This error may be returned when a blob is unknown to the registry in a specified repository. This can be returned with a standard get or if a manifest references an unknown layer during upload.	400, 404
`BLOB_UPLOAD_UNKNOWN`	blob upload unknown to registry	If a blob upload has been cancelled or was never started, this error code may be returned.	404

While the client can take action on certain error codes, the registry may add new error codes over time. All client implementations should treat unknown error codes as UNKNOWN, allowing future error codes to be added without breaking API compatibility. For the purposes of the specification error codes will only be added and never removed.

API Version Check

A minimal endpoint, mounted at /v2/ will provide version support information based on its response statuses. The request format is as follows:

GET /v2/

If a 200 OK response is returned, the registry implements the V2(.1) registry API and the client may proceed safely with other V2 operations. Optionally, the response may contain information about the supported paths in the response body. The client should be prepared to ignore this data.

If a 401 Unauthorized response is returned, the client should take action based on the contents of the "WWW-Authenticate" header and try the endpoint again. Depending on access control setup, the client may still have to authenticate against different resources, even if this check succeeds.

If 404 Not Found response status, or other unexpected status, is returned, the client should proceed with the assumption that the registry does not implement V2 of the API.

Pulling An Image

An "image" is a combination of a JSON manifest and individual layer files. The process of pulling an image centers around retrieving these two components.

The first step in pulling an image is to retrieve the manifest. For reference, the relevant manifest fields for the registry are the following:

field	description
name	The name of the image.
tag	The tag for this version of the image.
fsLayers	A list of layer descriptors (including tarsum)
signature	A JWS used to verify the manifest content

For more information about the manifest format, please see docker/docker#8093.

When the manifest is in hand, the client must verify the signature to ensure the names and layers are valid. Once confirmed, the client will then use the tarsums to download the individual layers. Layers are stored in as blobs in the V2 registry API, keyed by their tarsum digest.

The API details follow.

Pulling an Image Manifest

The image manifest can be fetched with the following url:

GET /v2/<name>/manifests/<tag>

The "name" and "tag" parameter identify the image and are required.

A 404 Not Found response will be returned if the image is unknown to the registry. If the image exists and the response is successful, the image manifest will be returned, with the following format (see #8093 for details):

{
   "name": <name>,
   "tag": <tag>,
   "fsLayers": [
      {
         "blobSum": <tarsum>
      },
      ...
    ]
   ],
   "history": <v1 images>,
   "signature": <JWS>
}

The client should verify the returned manifest signature for authenticity before fetching layers.

Pulling a Layer

Layers are stored in the blob portion of the registry, keyed by tarsum digest. Pulling a layer is carried out by a standard http request. The URL is as follows:

GET /v2/<name>/blobs/<tarsum>

Access to a layer will be gated by the name of the repository but is identified uniquely in the registry by tarsum. The tarsum parameter is an opaque field, to be interpreted by the tarsum library.

This endpoint may issue a 307 (302 for <HTTP 1.1) redirect to another service for downloading the layer and clients should be prepared to handle redirects.

This endpoint should support aggressive HTTP caching for image layers. Support for Etags, modification dates and other cache control headers should be included. To allow for incremental downloads, Range requests should be supported, as well.

Pushing An Image

Pushing an image works in the opposite order as a pull. After assembling the image manifest, the client must first push the individual layers. When the layers are fully pushed into the registry, the client should upload the signed manifest.

The details of each step of the process are covered in the following sections.

Pushing a Layer

All layer uploads use two steps to manage the upload process. The first step starts the upload in the registry service, returning a url to carry out the second step. The second step uses the upload url to transfer the actual data. Uploads are started with a POST request which returns a url that can be used
to push data and check upload status.

The Location header will be used to communicate the upload location after each request. While it won't change in the this specification, clients should use the most recent value returned by the API.

Starting An Upload

To begin the process, a POST request should be issued in the following format:

POST /v2/<name>/blobs/uploads/

The parameters of this request are the image namespace under which the layer will be linked. Responses to this request are covered below.

Existing Layers

The existence of a layer can be checked via a HEAD request to the blob store API. The request should be formatted as follows:

HEAD /v2/<name>/blobs/<digest>

If the layer with the tarsum specified in digest is available, a 200 OK response will be received, with no actual body content (this is according to http specification). The response will look as follows:

200 OK
Content-Length: <length of blob>

When this response is received, the client can assume that the layer is already available in the registry under the given name and should take no further action to upload the layer. Note that the binary digests may differ for the existing registry layer, but the tarsums will be guaranteed to match.

Uploading the Layer

If the POST request is successful, a 202 Accepted response will be returned with the upload URL in the Location header:

202 Accepted
Location: /v2/<name>/blobs/uploads/<uuid>
Range: bytes=0-<offset>
Content-Length: 0

The rest of the upload process can be carried out with the returned url, called the "Upload URL" from the Location header. All responses to the upload url, whether sending data or getting status, will be in this format. Though the URI format (/v2/<name>/blobs/uploads/<uuid>) for the Location header is specified, clients should treat it as an opaque url and should never try to assemble the it. While the uuid parameter may be an actual UUID, this proposal imposes no constraints on the format and clients should never impose any.

Upload Progress

The progress and chunk coordination of the upload process will be coordinated through the Range header. While this is a non-standard use of the Range header, there are examples of similar approaches in APIs with heavy use. For an upload that just started, for an example with a 1000 byte layer file, the Range header would be as follows:

Range: bytes=0-0

To get the status of an upload, issue a GET request to the upload URL:

GET /v2/<name>/blobs/uploads/<uuid>
Host: <registry host>

The response will be similar to the above, except will return 204 status:

204 No Content
Location: /v2/<name>/blobs/uploads/<uuid>
Range: bytes=0-<offset>

Note that the HTTP Range header byte ranges are inclusive and that will be honored, even in non-standard use cases.

Monolithic Upload

A monolithic upload is simply a chunked upload with a single chunk and may be favored by clients that would like to avoided the complexity of chunking. To carry out a "monolithic" upload, one can simply put the entire content blob to the provided URL:

PUT /v2/<name>/blobs/uploads/<uuid>?digest=<tarsum>[&digest=sha256:<hex digest>]
Content-Length: <size of layer>
Content-Type: application/octet-stream

<Layer Binary Data>

The "digest" parameter must be included with the PUT request. Please see the Completed Upload section for details on the parameters and expected responses.

Additionally, the download can be completed with a single POST request to the uploads endpoint, including the "size" and "digest" parameters:

POST /v2/<name>/blobs/uploads/?digest=<tarsum>[&digest=sha256:<hex digest>]
Content-Length: <size of layer>
Content-Type: application/octet-stream

<Layer Binary Data>

On the registry service, this should allocate a download, accept and verify the data and return the same response as the final chunk of an upload. If the POST request fails collecting the data in any way, the registry should attempt to return an error response to the client with the Location header providing a place to continue the download.

The single POST method is provided for convenience and most clients should implement POST + PUT to support reliable resume of uploads.

Chunked Upload

To carry out an upload of a chunk, the client can specify a range header and only include that part of the layer file:

PATCH /v2/<name>/blobs/uploads/<uuid>
Content-Length: <size of chunk>
Content-Range: <start of range>-<end of range>
Content-Type: application/octet-stream

<Layer Chunk Binary Data>

There is no enforcement on layer chunk splits other than that the server must receive them in order. The server may enforce a minimum chunk size. If the server cannot accept the chunk, a 416 Requested Range Not Satisfiable response will be returned and will include a Range header indicating the current status:

416 Requested Range Not Satisfiable
Location: /v2/<name>/blobs/uploads/<uuid>
Range: 0-<last valid range>
Content-Length: 0

If this response is received, the client should resume from the "last valid range" and upload the subsequent chunk. A 416 will be returned under the following conditions:

Invalid Content-Range header format
Out of order chunk: the range of the next chunk must start after the "last valid range" from the last response.

When a chunk is accepted as part of the upload, a 202 Accepted response will be returned, including a Range header with the current upload status:

202 Accepted
Location: /v2/<name>/blobs/uploads/<uuid>
Range: bytes=0-<offset>
Content-Length: 0

Completed Upload

For an upload to be considered complete, the client must submit a PUT request on the upload endpoint with a digest parameter. If it is not provided, the download will not be considered complete. The format for the final chunk will be as follows:

PUT /v2/<name>/blob/uploads/<uuid>?digest=<tarsum>[&digest=sha256:<hex digest>]
Content-Length: <size of chunk>
Content-Range: <start of range>-<end of range>
Content-Type: application/octet-stream

<Last Layer Chunk Binary Data>

Optionally, if all chunks have already been uploaded, a PUT request with a digest parameter and zero-length body may be sent to complete and validated the upload. Multiple "digest" parameters may be provided with different digests. The server may verify none or all of them but must notify the client if the content is rejected.

When the last chunk is received and the layer has been validated, the client will receive a 201 Created response:

201 Created
Location: /v2/<name>/blobs/<tarsum>
Content-Length: 0

The Location header will contain the registry URL to access the accepted layer file.

Digest Parameter

The "digest" parameter is designed as an opaque parameter to support verification of a successful transfer. The initial version of the registry API will support a tarsum digest, in the standard tarsum format. For example, a HTTP URI parameter might be as follows:

tarsum.v1+sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b

Given this parameter, the registry will verify that the provided content does result in this tarsum. Optionally, the registry can support other other digest parameters for non-tarfile content stored as a layer. A regular hash digest might be specified as follows:

sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b

Such a parameter would be used to verify that the binary content (as opposed to the tar content) would be verified at the end of the upload process.

For the initial version, registry servers are only required to support the tarsum format.

Canceling an Upload

An upload can be cancelled by issuing a DELETE request to the upload endpoint. The format will be as follows:

DELETE /v2/<name>/blobs/uploads/<uuid>

After this request is issued, the upload uuid will no longer be valid and the registry server will dump all intermediate data. While uploads will time out if not completed, clients should issue this request if they encounter a fatal error but still have the ability to issue an http request.

Errors

If an 502, 503 or 504 error is received, the client should assume that the download can proceed due to a temporary condition, honoring the appropriate retry mechanism. Other 5xx errors should be treated as terminal.

If there is a problem with the upload, a 4xx error will be returned indicating the problem. After receiving a 4xx response (except 416, as called out above), the upload will be considered failed and the client should take appropriate action.

The following table covers the various error conditions that may be returned after completing a layer upload:

Code	Message
DIGEST_INVALID	provided digest did not match uploaded content
SIZE_INVALID	provided size did not match content size

Note that the upload url will not be available forever. If the upload uuid is unknown to the registry, a 404 Not Found response will be returned and the client must restart the upload process.

Pushing an Image Manifest

Once all of the layers for an image are uploaded, the client can upload the image manifest. An image can be pushed using the following request formats:

PUT /v2/<name>/manifests/<tag>

{
   "name": <name>,
   "tag": <tag>,
   "fsLayers": [
      {
         "blobSum": <tarsum>
      },
      ...
    ]
   ],
   "history": <v1 images>,
   "signature": <JWS>,
   ...
}

The name and tag fields of the response body must match those specified in the URL.

If there is a problem with pushing the manifest, a relevant 4xx response will be returned with a JSON error message. The following table covers the various error conditions and their corresponding codes:

Code	Message
NAME_INVALID	Manifest name did not match URI
TAG_INVALID	Manifest tag did not match URI
MANIFEST_INVALID	Returned when an invalid manifest is received
MANIFEST_UNVERIFIED	Manifest failed signature validation
BLOB_UNKNOWN	Referenced layer not available

For the UNKNOWN_LAYER error, the detail field of the error response will have an "unknown" field with information about the missing layer. For now, that will just be the tarsum. There will be an error returned for each unknown blob. The response format will be as follows:

{
    "errors:" [{
            "code": "UNKNOWN_LAYER",
            "message": "Referenced layer not available",
            "detail": {
                "unknown": {
                    "blobSum": <tarsum>
                 }
            }
        },
        ...
    ]
}

Listing Image Tags

It may be necessary to list all of the tags under a given repository. The tags for an image repository can be retrieved with the following request:

GET /v2/<name>/tags/list

The response will be in the following format:

200 OK
Content-Type: application/json

{
    "name": <name>,
    "tags": [
        <tag>,
        ...
    ]
}

For repositories with a large number of tags, this response may be quite large, so care should be taken by the client when parsing the response to reduce copying.

Deleting an Image

An image may be deleted from the registry via its name and tag. A delete may be issued with the following request format:

DELETE /v2/<name>/manifests/<tag>

If the image exists and has been successfully deleted, the following response will be issued:

202 Accepted
Content-Length: None

If the image had already been deleted or did not exist, a 404 Not Found response will be issued instead.

Roadmap

Write Registry REST API V2 proposal
- Solicit feedback
Implement V2 API server
- Basic Layer API
- Basic Image API
- Resumable upload support
Implement V2 API client
Implement API compliance tests
Port docker core to use client from registry project for v2 pushes

Reviewers

@dmp42
@dmcgowan
@jlhawn
Docker Community

The text was updated successfully, but these errors were encountered:

thaJeztah · 2014-11-07T00:05:51Z

At a first glance, looks good! Will have a proper re-read at a later stage.

One thing I noticed are the proposed JSON error messages; perhaps they could be "namespaced" as well, by reversing the parts, ie

INVALID_TAG would become TAG_INVALID

Perhaps a more "rich" approach could be taken by combining "global" error-types with the namespace / object they affect, so that it is easier to handle. (e.g. format-error and tag)

Finally; the proposal describes returning a single error-code, which can be limiting. Being able to return multiple errors could offer more flexibility.

stevvooe · 2014-11-07T00:28:27Z

@thaJeztah Thank for the suggestion about namespacing the errors. I'll play around with it.

Do you have examples for the registry API use case where we'd like to see multiple errors returned? Or are you suggesting this as a measure of future-proofing? Either way, its a good suggestion.

thaJeztah · 2014-11-07T00:29:07Z

Regarding the end points (first impression, will add more suggestions in a later stage); for consistency, a different approach could be taken;

endpoint	description
`/v2/<name>/`	returns a list of all images in `<name>`
`/v2/<name>/<image>/`	returns a list of all tags available for `<image>`
`/v2/<name>/<image>/<tag>/`	returns the manifest of `<tag>`

This will make <tag> a "required" part of the URL to fetch a manifest. I think that's actually a good thing, because some images don't have a :latest tag. Making the <tag> required, will more clearly state what the intended manifest is.

thaJeztah · 2014-11-07T00:32:32Z

@stevvooe I think the Twitter API does this, but it's not the best example of a good API https://dev.twitter.com/overview/api/response-codes I can try to find better examples, I know I saw some when doing some research for an API I was working on.

thaJeztah · 2014-11-07T00:39:54Z

upload progress

Wondering if a separate endpoint/request is required to check upload progress. I'll need to dig a bit deeper into this for the technical side, but I think it would be possible to have the server respond with the current upload progress while uploading? ~~For reference, see https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/Using_XMLHttpRequest#Monitoring_progress and https://dvcs.w3.org/hg/progress/raw-file/tip/Overview.html~~

edit I wasn't thinking when adding those links, because those are pure client-side
implementations and require no feedback from the server.

Additionally, as @wking pointed out (#9015 (comment)), an endpoint is required for resuming
partial uploads.

stevvooe · 2014-11-07T01:16:27Z

@thaJeztah

Errors

I think the Twitter API does this, but it's not the best example of a good API https://dev.twitter.com/overview/api/response-codes I can try to find better examples, I know I saw some when doing some research for an API I was working on.

that is simple enough and a valid addition.

Tag API Layout

The current desire for this api is to continue to support the notion of a default tag for a repository, aka "latest", so we won't be able to repurpose the tag-less URI to list tags. We may want to reconsider that but it might be better not to overload that API method.

Based on your response, it also seems the contents of <name> should be clarified in the specification or the specification for image name needs to be referenced (does anyone know where?). The <name> component, for the purpose of this API, represents the full "image name" and the contents of the "name" field in the new image manifest format. For a user, it would something like stevvooe/hot-new-thing and for an image listed as ubuntu, it would be library/ubuntu.

This will make a "required" part of the URL to fetch a manifest. I think that's actually a good thing, because some images don't have a :latest tag. Making the required, will more clearly state what the intended manifest is.

As far as I understand, all images have a tag but if the tag is not specified, the default tag of "latest" is used. We may need to clarify the relationship between image, tag and repository.

Upload Progress

Upload progress is served via the GET method, while uploads are using PUT, so grabbing progress concurrently should be supported with separate requests. The progress will only be reported on the server after a chunk is accepted and only at the granularity of the chunk size. Otherwise, maintaining backend consistency of upload state would be problematic.

Keep in mind, the purpose of this feature is not to broadcast the upload progress to other consumers. Rather, the goal is manage resumable uploads. There is no reason one could not use this feature with the progress monitoring capability of XMLHttpRequest but extra work would be required if uploading multiple chunks.

wking · 2014-11-07T04:23:41Z

On Thu, Nov 06, 2014 at 03:39:35PM -0800, Stephen Day wrote:

Pulling a Layer

Pulling a layer is carried out by a standard http request. The URL is as follows:
GET /v2/<name>/layer/<tarsum>
…
This endpoint should support aggresive HTTP caching for image
layers. Support for etags, modification dates and other cache
control headers should be included. To allow for incremental
downloads, Range requests should be supported, as well.

If I understand correctly, this is just the (possibly compressed?
docker-archive/docker-registry#694) tarball with a layer's filesystem changes.
I don't see why you need etags, modification dates, etc. while caching
that. It should be immutable, content-addressable data, so anyone can
cache it wherever they like for as long as they want without fear of
their cached value going stale.

The only places where you'd want to limit caching via ETags,
Last-Modified, and the like would be for mutable data, and that's just
manifests, tag values, and tag lists.

It would be nice if there was a way to upload descriptions for the
search engine too, but maybe that's part of a different spec? Or part
of the image metadata?

wking · 2014-11-07T04:34:28Z

On Thu, Nov 06, 2014 at 05:16:34PM -0800, Stephen Day wrote:

This will make a "required" part of the URL to fetch a
manifest. I think that's actually a good thing, because some
images don't have a :latest tag. Making the required, will
more clearly state what the intended manifest is.

As far as I understand, all images have a tag but if the tag is not
specified, the default tag of "latest" is used. We may need to
clarify the relationship between image, tag and repository.

I don't know what the Git wire protocol looks like for this, but we
could follow their lead and have a configurable, per-repository
default tag that just defaults to ‘latest’ (like ‘HEAD’ defaulting to
‘master’). Then folks without a ‘latest’ tag can:

PUT /v2//image

which would upload the image (if it wasn't already uploaded) like:

PUT /v2//image/

but it would also set the default tag (to whichever tag was in the
uploaded image's metadata).

wking · 2014-11-07T04:50:43Z

On Thu, Nov 06, 2014 at 08:34:24PM -0800, W. Trevor King wrote:

I don't know what the Git wire protocol looks like for this, but we
could follow their lead and have a configurable, per-repository
default tag that just defaults to ‘latest’ (like ‘HEAD’ defaulting
to ‘master’). Then folks without a ‘latest’ tag can:

PUT /v2//image

And if you wanted to get really radical, you could have everyone do
this and drop explicit ‘latest’ tags entirely. Then you could have
immutable tags, and restrict the mutable information to just:

The default tag
The list of available tags

So folks performing an unqualified:

$ docker pull debian

would get library/debian:7.7 (or whatever the default tag was) without
the need for aliases (#8141).

On the other hand, you'd have to have separate names if you wanted
multiple mutable references (e.g. library/debian6 for the newest 6.x
release, and library/debian7 for the newest 7.x release). I'm not
sure how many docker repositories use multiple mutable references, so
I don't know whether the benefits to having immutable tags (where
library/debian6:6.0 always points to the same image, even if someone
pushes a library/debian6:6.0.10) outweigh the annoyance of names like
debian6 for those repositories.

wking · 2014-11-07T05:04:30Z

On Thu, Nov 06, 2014 at 05:16:34PM -0800, Stephen Day wrote:

Keep in mind, the purpose of this feature is not to broadcast the
upload progress to other consumers. Rather, the goal is manage
resumable uploads. There is no reason one could not use this feature
with the progress monitoring capability of XMLHttpRequest but
extra work would be required if uploading multiple chunks.

I haven't used XMLHttpRequest's ProgressEvents myself, but looking
over the suggested names 1, I get the impression that these are
purely client-side (loadstart = “I've started sending”, progress = “I
sent a chunk”), not server-generated progress updates. I don't think
that helps resumable uploads at all, because putting a chunk on the
wire doesn't mean the registry actually gets it. I agree with the
original spec that you need an independent way to request how much of
your previous upload (for session ) the registry received, so
you know where to start the next upload attempt.

thaJeztah · 2014-11-07T07:27:05Z

I get the impression that these are purely client-side

Yes, you are right. I realised after posting that the examples
I gave were completely bogus.

Basically, what I wanted to link to, is an example where
the server would "stream" progress information back during
the upload, so that "polling" the server or opening a second
request to get that information wouldn't be necessary.

I'm not really sure if that's possible and the "resume" reason
Is something I didn't include in my consideration.

thaJeztah · 2014-11-07T07:44:05Z

The current desire for this api is to continue to support the notion of a default tag for a repository, aka "latest"

But should this be a default that the repository uses, or the client that calls the repository? if I'm correct, currently the docker client automatically requests the :latest tag if no tag is specified, doing this at both places (client and repository) seems wrong. (And, as mentioned, not all images have a :latest tag?)

ncdc · 2014-11-07T14:57:25Z

I don't like that GET /v2/<name>/image gives you back the information about the latest tag while GET /v2/<name>/image/<tag> gives you information about a specific tag. I am strongly in favor of eliminating the former route that defaults to latest and only having the latter where you must explicitly specify the tag you want to retrieve. If you want to default things to latest, that can be done in the clients.

ncdc · 2014-11-07T15:02:29Z

Have you thought at all about how to implement quotas? What if I'm and admin and I want to limit each namespace to e.g. 1GB of unique layer content? Here's an example: my namespace is currently at 700MB and I'm pushing a new image/tag that has 400MB of unique layer content split between 2 layers (299MB and 101MB). First my client would push the 299MB layer (which keeps me under quota), then my client would attempt to push the 101MB layer (which should not be allowed because that would put me over quota). At this point, there's an orphaned 299MB layer that should be deleted. In an ideal case, the registry should never have allowed that layer to have been uploaded in the first place.

Would it be possible to take the overall size of the new layers into account at the beginning of a push?

wking · 2014-11-07T16:30:36Z

On Fri, Nov 07, 2014 at 06:57:32AM -0800, Andy Goldstein wrote:

If you want to default things to latest, that can be done in the clients.

I agree, unless we want folks to be able to configure the default tag
that gets pulled on a per-repo basis (setting it to things other than
‘latest’ 1, like you can set Git's HEAD). That would have to
happen in the registry repository.

ncdc · 2014-11-07T16:37:11Z

I agree, unless we want folks to be able to configure the default tag
that gets pulled on a per-repo basis (setting it to things other than
‘latest’ [1], like you can set Git's HEAD). That would have to
happen in the registry repository.

That sounds fine to me.

Re [1], I'm not clear why you'd need multiple repos to support debian:6.0 and debian:7.0? And what's the motivation for immutable tags?

[1] #9015 (comment)

wking · 2014-11-07T16:45:03Z

On Fri, Nov 07, 2014 at 08:37:19AM -0800, Andy Goldstein wrote:

Re 1, I'm not clear why you'd need multiple repos to support
debian:6.0 and debian:7.0?

You don't need immutable tags with a configurable default branch, but
my next comment 1 explains how I think having a single mutable
default-tag reference with immutable tags covers most of the use-cases
I can think of for ‘latest’. However, if you only have one mutable
tag-reference per repo, you can't have something sliding for 6.x and
something else sliding for 7.x unless you have two repositories.

And what's the motivation for immutable tags?

Predictable results for a given tag, no need for alias fetching, easy
caching, and content-addressable storage with a fixed address.

ncdc · 2014-11-07T16:53:44Z

Ah, I see what you're saying. But, I can definitely think of use cases where multiple mutable tags would be useful. For example, using tags to signify when an image is "QA-ready", what should be deployed to "staging" or "production", etc. I wouldn't want separate repos just to be able to have sliding tags for these different targets.

/cc @smarterclayton

wking · 2014-11-07T17:04:05Z

On Fri, Nov 07, 2014 at 08:53:53AM -0800, Andy Goldstein wrote:

But, I can definitely think of use cases where multiple mutable tags
would be useful. For example, using tags to signify when an image is
"QA-ready", what should be deployed to "staging" or "production",
etc. I wouldn't want separate repos just to be able to have sliding
tags for these different targets.

Right. Hence my 1:

“I'm not sure how many docker repositories use multiple mutable
references, so I don't know whether the benefits to having immutable
tags … outweigh the annoyance of names like debian6 for those
repositories.”

You could certainly have foo/bar-QA-ready, foo/bar-staging,
foo/bar-production, …. I don't even think it would be that hard to
maintain. You'd lose easy mass-push, but I doubt you'd be releasing
to multiple streams simultaneously unless you were populating a fresh
repository. What else would be more difficult with that workflow?

Still, immutable tags aren't that big a win. Folks who care about
predictable results from a given tag can just use the patch-level tags
and trust the maintainers not to mess with those ;). And a bit of
extra cache-checking to make sure you had a fairly recent version of
the tag isn't that hard to do.

“no need for alias fetching” 2 is actually a feature of having a
default-tag reference, so strike that from the list of benefits to
immutable tags.

stevvooe · 2014-11-07T19:15:07Z

HTTP Caching

@wking

The only places where you'd want to limit caching via ETags,
Last-Modified, and the like would be for mutable data, and that's just
manifests, tag values, and tag lists.

This section is indicating the immutable nature of the layer files should be leveraged at the HTTP caching layer, allowing docker clients to make a quick determination about the existence of the layer with a 304 response. Everything that can be done will be done to ensure that HTTP standard clients (read: proxies) will cache the content.

Any other caching support for tags and manifests will be implemented as needed, depending on the nature of the resource.

Quotas

@ncdc

This is an interesting request but its outside of the scope for this first revision. Could you file a feature request issue in docker/docker-registry with the prefix "NG:"?

wking · 2014-11-07T19:33:50Z

On Fri, Nov 07, 2014 at 11:15:15AM -0800, Stephen Day wrote:

We are going to drop the notion of default "latest" from the
registry API and will leave that "sugar" to the client to resolve.

In that case I agree with @thaJeztah 1 and @ncdc 2 that we should
probably eliminate any mention of ‘latest’ from the spec.

While the rest of the discussion about tags (aliases, immutable vs
mutable) is constructive and you all make excellent points, changes
to the tagging scheme are outside of the scope of this proposal.

Fair enough ;).

The only places where you'd want to limit caching via ETags,
Last-Modified, and the like would be for mutable data, and that's
just manifests, tag values, and tag lists.

This section is indicating the immutable nature of the layer files
should be leveraged at the HTTP caching layer, allowing docker
clients to make a quick determination about the existence of the
layer with a 304 response. Everything that can be done will be done
to ensure that HTTP standard clients (read: proxies) will cache the
content.

Right. Are you going to use all of that for caching immutable
stuff? Can't you just set:

Expires: Fri, 1 Jan 2038 03:14:07 GMT

and be done with it? I don't see why you'd also want to set ETags,
Last-Modified, ….

ncdc · 2014-11-07T19:40:57Z

@stevvooe re caching, if you don't take it into account up front, I'm worried that it won't be possible going forward, at least not the ideal case where you disallow any layer push if the sum of the layers in the "transaction" would put you over quota. @dmp42 what are your thoughts on this?

stevvooe · 2014-11-07T20:00:12Z

Changes:

added "errors" envelope to error responses and adjusted unknown layers response accordingly
dropped implicit "latest" tag from fall API methods
length and checksum submission moved to end of layer upload so client doesn't have to precompute
added the ability to explicitly cancel an upload

ncdc · 2014-11-07T20:25:59Z

Added docker-archive/docker-registry#698 for tracking the quota request.

stevvooe · 2014-11-07T21:32:57Z

@ncdc Thank you for filing the issue!

I don't think there is anything within this proposal that prevents quotas from being implemented. Enforcement can be within these core API methods, but a management API could be added such that the client can avoid hitting those quotas before making uploads.

We'll take the full discussion to docker-archive/docker-registry#698.

wking · 2014-11-07T22:24:51Z

On Fri, Nov 07, 2014 at 12:00:20PM -0800, Stephen Day wrote:

length and checksum submission moved to end of layer upload so
client doesn't have to precompute

You have to precomute the tarsum, so you might as well precompute
these while you're at it.

Also, the spec now has:

PUT PUT /v2//image/

which should just be:

PUT /v2//image/

Also, I think:

POST /v2//layer/

should be a PUT call, because you're pushing to the same URI you'll be
fetching from 1:

“The fundamental difference between the POST and PUT requests is
reflected in the different meaning of the Request-URI. The URI in a
POST request identifies the resource that will handle the enclosed
entity. … In contrast, the URI in a PUT request identifies the
entity enclosed with the request…

stevvooe · 2014-11-07T22:36:50Z

@wking Thank you again for your careful feedback! I'll make sure the typos are corrected.

You have to precomute the tarsum, so you might as well precompute these while you're at it.

This allows the client to be as lazy as possible.

Also, I think:

POST /v2//layer/

should be a PUT call, because you're pushing to the same URI you'll be
fetching from [1]:

“The fundamental difference between the POST and PUT requests is
reflected in the different meaning of the Request-URI. The URI in a
POST request identifies the resource that will handle the enclosed
entity. … In contrast, the URI in a PUT request identifies the
entity enclosed with the request…

From section 9.5:

The actual function performed by the POST method is determined by the
server and is usually dependent on the Request-URI. The posted entity
is subordinate to that URI in the same way that a file is subordinate
to a directory containing it, a news article is subordinate to a
newsgroup to which it is posted, or a record is subordinate to a
database

POST is used here because the resulting creation is subordinate to the layer
URI. Arguably, the following would be a better POST URI:

POST /v2/<name>/layer/<tarsum>/upload/

POST is also used here because the request is not idempotent, in that
multiple requests to the same endpoint will result in creating multiple
uploads. Use of PUT would be incorrect.

I'll add the "/upload/" suffix.

nealmcb · 2015-02-25T16:07:10Z

Re: searching for images by tag etc:

@dmp42
search will be implemented as an extension (see docker-archive/docker-registry#613 and docker-archive/docker-registry#687).

Indexing image name, tag, and creation date will certainly be part of it.

Thanks. The issues you reference are closed, but it isn't clear to me if the resolutions include the ability to search by tag. Does anyone have an update on this that responds e.g. to this query? http://stackoverflow.com/questions/24481564/how-can-i-find-docker-image-with-specific-tag-in-docker-registry-in-docker-comma
Cheers.

dmp42 · 2015-02-25T18:34:49Z

Hi @nealmcb
These issues have been moved to their new home @ https://github.com/docker/distribution - specifically distribution/distribution#136 - although there is currently no specification for a new/revised search API.

Ideas/proposals are definitely welcome over there (docker/distribution).

stevvooe · 2015-03-05T02:16:40Z

@jfrazelle Should we close this as accepted? The api spec lives in distribution now: https://github.com/docker/distribution/blob/master/doc/spec/api.md. Should we backport this into the docker core docs?

jessfraz · 2015-03-18T01:39:46Z

yes yayyyyy!

thaJeztah · 2015-04-10T10:52:54Z

The link was broken; new link is https://github.com/docker/distribution/blob/master/docs/spec/api.md

Or (future proof); https://github.com/docker/distribution/blob/636a19b2126ffe78d209eb6a7aedef857abd2539/docs/spec/api.md

grexe · 2015-10-02T20:29:50Z

I could not find an official (not even an unofficial;) JSON Schema for the new v2 Registry API.
This would make my live as a Java developer so much easier, because now I have to feed sample output from all REST calls to code-generators instead of using one canonical schema and deriving all POJOs from there...

thaJeztah · 2015-10-02T20:33:55Z

@grexe could you open a feature request for that in the https://github.com/docker/distribution issue tracker? I know the specs are actually generated from code, perhaps there's even something there already. If in doubt, you can ask in the #docker-dev or #docker-distribution IRC channels

grexe · 2015-10-02T20:58:29Z

wow @thaJeztah that was really fast! Did so, see distribution/distribution#1060. Now let's hope the implementation is also as fast,-)

RichardScothern · 2015-10-02T21:02:20Z

@grexe : documentation and specs for the v2 registry live here: https://github.com/docker/distribution/tree/master/docs/spec

There is a code generator but it is written in go.

grexe · 2015-10-02T21:12:17Z

thanks @RichardScothern but there is still no reference to a JSON schema, only canonicalization (which is not so important to me, personally,-).
But I have another question: is it planned to support creation of repositories (a PUT equivalent to GET _catalogs) as mentioned in the spec on listing repositories?

This would allow me to create separate repositories (e.g. per customer/realm/...) in my private registry from code, without having to shut down the entire registry and alter configuration by hand just to add a new repository...

RichardScothern · 2015-10-02T21:18:36Z

You can create repositories by uploading an image and its layers using the REST API
https://github.com/docker/distribution/blob/master/docs/spec/api.md#pushing-an-image

jlhawn · 2015-10-02T21:19:32Z

I realized about a year ago that "JSON Schema" is actually a draft-standard for specifying the structure of JSON objects/types used by your API and is not just examples of JSON forms/responses.

https://en.wikipedia.org/wiki/JSON#JSON_Schema
http://json-schema.org/

grexe · 2015-10-02T21:24:39Z

Thanks again @RichardScothern it was not obvious to me that a new repository can be created just by specifying a non-existant name in the PUSH URI, but it's really there: completed upload specifically says that

The Location header will contain the registry URL to access the accepted layer file.

Seems to be exactly what I need, perfect, thanks!

grexe · 2015-10-02T21:28:23Z

exactly @jlhawn, just stumbled over another snag where a Boolean was not correctly identified by a mapper because my sample output was not sufficient (String vs. Boolean (even vs.Integer)) was not possible to distinguish from the output).

thaJeztah · 2015-10-03T07:54:25Z

@jlhawn @grexe @RichardScothern docker compose recently added a schema for validation (docker/compose#2089). Plans are to use the same schema in libcompose docker/libcompose#34.

Just linking these to prevent duplicated work / research :-)

stevvooe · 2015-10-14T21:45:09Z

Please don't comment on closed tickets.

As a baseline for the new registry API specification, we are checking in the proposal as currently covered in moby/moby#9015. This will allow us to trace the process of transforming the proposal into a specification. The goal is to use api descriptors to generate templated documentation into SPEC.md. The resulting product will be submitted into docker core as part of the client PR.

thaJeztah · 2021-09-30T08:11:01Z

👆 reported account for abuse (spam activity)

Let me lock the conversation on this ticket

dmp42 added the Distribution label Nov 6, 2014

thaJeztah mentioned this issue Nov 7, 2014

Preserve / improve consistency in CLI UI #8829

Closed

dmp42 mentioned this issue Nov 7, 2014

NG: HTTP rest API docker-archive/docker-registry#634

Closed

ncdc mentioned this issue Feb 12, 2015

Proposal: add support for pull/create/run by immutable identifier #10740

Closed

jessfraz added Proposal kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny and removed Proposal labels Feb 28, 2015

jessfraz closed this as completed Mar 18, 2015

hjacobs mentioned this issue Mar 23, 2015

Implement Docker Registry API v2 zalando-stups/pierone#2

Closed

stevvooe mentioned this issue Apr 8, 2015

doc/spec: documentation for the Image Manifest V2 format distribution/distribution#336

Merged

JonathonReinhart mentioned this issue Dec 14, 2015

Prevent overwrite of tags SUSE/Portus#627

Open

ches mentioned this issue May 5, 2016

Support searching tags in a repository on the command line #17238

Open

mhrivnak mentioned this issue Oct 7, 2016

REST API initial documentation and base classes pulp/pulp#2779

Merged

moby locked as spam and limited conversation to collaborators Sep 30, 2021

Proposal: JSON Registry API V2.1 #9015

Proposal: JSON Registry API V2.1 #9015

Comments

stevvooe commented Nov 6, 2014

Proposal: JSON Registry API V2.1

Abstract

Scope

Use Cases

Resumable Push

Resumable Pull

Layer Upload De-duplication

Access Control

Dependencies

Proposal

API Methods

Errors

API Version Check

Pulling An Image

Pulling an Image Manifest

Pulling a Layer

Pushing An Image

Pushing a Layer

Starting An Upload

Existing Layers

Uploading the Layer

Upload Progress

Monolithic Upload

Chunked Upload

Completed Upload

Digest Parameter

Canceling an Upload

Errors

Pushing an Image Manifest

Listing Image Tags

Deleting an Image

Roadmap

Reviewers

thaJeztah commented Nov 7, 2014

stevvooe commented Nov 7, 2014

thaJeztah commented Nov 7, 2014

thaJeztah commented Nov 7, 2014

thaJeztah commented Nov 7, 2014

stevvooe commented Nov 7, 2014

Errors

Tag API Layout

Upload Progress

wking commented Nov 7, 2014

Pulling a Layer

wking commented Nov 7, 2014

wking commented Nov 7, 2014

wking commented Nov 7, 2014

thaJeztah commented Nov 7, 2014

thaJeztah commented Nov 7, 2014

ncdc commented Nov 7, 2014

ncdc commented Nov 7, 2014

wking commented Nov 7, 2014

ncdc commented Nov 7, 2014

wking commented Nov 7, 2014

ncdc commented Nov 7, 2014

wking commented Nov 7, 2014

stevvooe commented Nov 7, 2014

Tags

HTTP Caching

Quotas

wking commented Nov 7, 2014

ncdc commented Nov 7, 2014

stevvooe commented Nov 7, 2014

ncdc commented Nov 7, 2014

stevvooe commented Nov 7, 2014

wking commented Nov 7, 2014

stevvooe commented Nov 7, 2014

nealmcb commented Feb 25, 2015

dmp42 commented Feb 25, 2015

stevvooe commented Mar 5, 2015

jessfraz commented Mar 18, 2015

thaJeztah commented Apr 10, 2015

grexe commented Oct 2, 2015

thaJeztah commented Oct 2, 2015

grexe commented Oct 2, 2015

RichardScothern commented Oct 2, 2015