Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add support for pull/create/run by immutable identifier #10740

Closed
ncdc opened this issue Feb 12, 2015 · 28 comments
Closed

Proposal: add support for pull/create/run by immutable identifier #10740

ncdc opened this issue Feb 12, 2015 · 28 comments
Labels
kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@ncdc
Copy link
Contributor

ncdc commented Feb 12, 2015

Summary

We'd like to add support for using immutable image identifiers when pulling images from a v2 registry, creating containers, and running containers.

Background

Use case

When I create a container, I may specify an image such as mysql:latest. When the image is pulled, latest is resolved to a particular image at that point in time. If I later want to add more containers (e.g. possible read slaves in the MySQL case), ideally all the new containers would use the exact same image as my first container. Using a tag isn't sufficient as the tag is mutable.

V2 registry support

As part of distribution/distribution#46, the v2 registry will be adding support for retrieving an image manifest for a particular digest. This feature gives us what we need, as long as the Docker CLI and Engine support it too.

Proposed CLI/Engine changes

We'll need to provide a means to reference an image by its digest. One possible example might be

namespace/repository@digest

We'll need to make sure the following commands continue to work as they currently do, as well as with an optional digest:

  • docker pull
  • docker create
  • docker run

When listing images via docker images, we could default to displaying only the "current" values for each image and tag. An optional flag could enable displaying all values for each image and tag; namely, this would show 1 entry for each image/tag/digest combination.

Questions

What about v1 registry support?
It's not likely we'll be able to support this

If I create an image locally via docker tag or docker commit, can I refer to it by tag + digest?
As proposed in distribution/distribution#46, the registry is responsible for determining an image's digest and assigning it to the image. For an image that has not yet been pushed to a v2 registry, it may not be possible to refer to it by tag + digest. This is unlikely to be a significant issue, as the use case for tag + digest is consistent deployments using images pulled from registries. Or, if the community thinks this should be supported, we can revisit what component(s) are responsible for calculating digests.

@miminar
Copy link
Contributor

miminar commented Feb 16, 2015

I'd suggest adding docker build (the FROM statement).

As for the image specification, if we already request a particular digest, is there a need to specify tag as well? IMHO tag part could be optional the same way as a digest part (namespace/repository[:tag][@digest]). If a particular image ID has multiple tags assigned, they would be pulled all. Supplying both tag and digest would pull just one tag and would imply additional existence check (particular tag exists for given digest).

@ncdc
Copy link
Contributor Author

ncdc commented Feb 16, 2015

@miminar while this isn't coded yet to my knowledge in the v2 registry, the proposal re pulling by digest requires specifying repository, tag, and digest, which is why I wrote this proposal the way I did. I'd be fine with repository & digest without tag, but I'll defer to @stevvooe on this.

@stevvooe
Copy link
Contributor

TL; DR Let's support <name>:<tag>@<digest> but make it easy to start supporting <name>@<digest>.

@miminar @ncdc The original compromise of distribution/distribution#46 required that immutable references includes a "tag" and "digest", hence why these proposals have this requirement. This was due to the fact that the new manifests have a "tag" field. I doubt we'll ever drop the requirement for specifying the namespace.

With distribution/distribution#62 and distribution/distribution#173, we intend to remove the requirement for a "tag" in the manifests. It would no longer be required when pulling manifests by digest.

For upcoming proposals in immutable manifest references, we should consider the following:

  1. All proposals should support the following syntax:

    <name>:<tag>@<digest>
    

    This refers to a specific manifest, with a specified tag and revision.

  2. We should optionally consider the tag-less syntax for referring to manifests:

    <name>@<digest>
    

    This is dependent on at least doc/spec: generic distribution content manifests distribution/distribution#62 and doc/spec: tags as a first class object distribution/distribution#173. This can be supported without API changes on the server-side with the blob API.

This can be implemented by defining a "image object reference" (working on the nomenclature) to always have three components:

name
Identifies the collection of image objects (repository) under which the object exists
tag (optional)
Tag optionally specifies a named reference to a specific object.
digest (optional)
Identifies the specific object by digest to be referenced.

The goal of the parser would be to identify whatever is required for the level of support specified at implementation time. If initial implementation (proposed above) requires all three, then it will error out when a tag is missing. If we decide we want to support item 2, then it not longer errors out when tag is missing and we proceed.

We may also want to define the minimum level of specification for a reference. Under certain cases, only name is required but for other cases, a digest-qualified reference is necessary.

@ncdc
Copy link
Contributor Author

ncdc commented Feb 17, 2015

@stevvooe I'm definitely in favor of <name>@<digest>, as I assume that makes housekeeping in the registry's storage easier (no need to preserve tag history). We do not need name+tag+digest for our use case.

@ncdc
Copy link
Contributor Author

ncdc commented Feb 18, 2015

@stevvooe what do you think we should be targeting for 1.6? Only name + tag + digest?

@stevvooe
Copy link
Contributor

@ncdc I think we should target name + digest. We may need to adjust the proposed routes in distribution/distribution#46 to overload the tag routes to support manifest digest, but that is more than reasonable.

@ncdc
Copy link
Contributor Author

ncdc commented Feb 18, 2015

@stevvooe that makes sense to me. It should be easier to implement than name + tag + digest, I would assume.

@miminar
Copy link
Contributor

miminar commented Feb 20, 2015

@ncdc @stevvooe Will we support shortened digests (7 characters and more) - similar to git? If we can get all available manifest digests from registry, it should be possible. Currently I don't see a way to obtain it with recent API specification though. IMHO this would greatly benefit to usability. 64 mandatory characters on command line is way too much:

docker pull registry.access.redhat.com/rhel7@e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

If, by a chance, two or more digests matched given short version, user could be asked to specify full digest.

@ncdc
Copy link
Contributor Author

ncdc commented Feb 20, 2015

This feature is most likely to be used as part of an automated system I would think, like Kubernetes, where a user says "I want to deploy foo/bar:latest" and the system resolves that to the digest for that tag at that point in time. Subsequent deployments would use the resolved digest.

Having said that, I don't have any problem supporting shortened digests.

Also, just to note, the id you have in your example below is a v1 image id; v2 digests include the algorithm as a prefix.

Sent from my iPhone

On Feb 20, 2015, at 3:45 AM, Michal Minar notifications@github.com wrote:

@ncdc @stevvooe Will we support shortened digests (7 characters and more) - similar to git? If we can get all available manifest digests from registry, it should be possible. Currently I don't see a way to obtain it with recent API specification though. IMHO this would greatly benefit to usability. 64 mandatory characters on command line is way too much:

docker pull registry.access.redhat.com/rhel7@e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
If, by a chance, two or more digests matched given short version, user could be asked to specify full digest.


Reply to this email directly or view it on GitHub.

@miminar
Copy link
Contributor

miminar commented Feb 20, 2015

Also, just to note, the id you have in your example below is a v1 image id; v2 digests include the algorithm as a prefix.

Oh, I see, thanks for the correction. So the command would actually look like this:

docker pull registry.access.redhat.com/rhel7@sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

even more scary...

@ncdc
Copy link
Contributor Author

ncdc commented Feb 20, 2015

Right, but again, this probably isn't something a human would regularly invoke.

@stevvooe
Copy link
Contributor

@miminar The goal of this proposal is to provide a secure, reproducible way of fetching an image by its manifest digest. Syntactic sugar is outside the scope of this proposal.

That said, I agree long id strings are indeed unwieldy. I've filed distribution/distribution#194 in response so we can find secure, consistent and simple way of accomplishing this.

@aluzzardi
Copy link
Member

+1

This would be awesome for Swarm.

/cc @vieux

@stevvooe
Copy link
Contributor

@icecrime @jfrazelle @crosbymichael Could you take a peak at this?

@icecrime
Copy link
Contributor

SGTM overall, with a few questions.

I agree that supporting <name>@<digest> makes perfect sense, I'm just worried about the way things will show up in docker images output. The same goes for that sentence:

When listing images via docker images, we could default to displaying only the "current" values for each image and tag. An optional flag could enable displaying all values for each image and tag; namely, this would show 1 entry for each image/tag/digest combination.

Does it mean that I can have multiple entries for REPOSITORY=ubuntu, TAG=latest, and different IMAGE ID? I'd rather not display the tag at all when we did a "digest pull".

@jessfraz jessfraz added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny and removed kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny labels Feb 26, 2015
@ncdc
Copy link
Contributor Author

ncdc commented Feb 26, 2015

@icecrime I'm not sure what the best UX for this will be. However, since I originally wrote this proposal, after talking about it with @stevvooe, we think it doesn't make sense to try to support name:tag@digest. I have updated the issue text above to say just name@digest.

@icecrime
Copy link
Contributor

So I guess this adds even more weight to my "don't even try to show a tag in docker images output" remark ;-)

@ncdc
Copy link
Contributor Author

ncdc commented Feb 26, 2015

@icecrime here's what I was thinking with docker images, tags, and digests... Let's say you do this:

docker pull foo/bar:latest

And that gives you back image id i1 and digest d1. Time passes... foo/bar:latest gets updated. You do another pull, returning image id i2 and digest d2. So you've done 2 pulls of foo/bar:latest, neither pull was by digest - what should we show in docker images?

And I guess more generally: if we have images that aren't currently assigned to a tag, either from the above case, or just from pulling by digest, how/where should we display them?

@sghosh151
Copy link

With v2 seeming to have 256 char limit - will that apply to "name" or "name@digest"? The SHA sum would take up a number of chars.

@stevvooe
Copy link
Contributor

@sghosh151 The limit applies only to the name.

@ncdc
Copy link
Contributor Author

ncdc commented Mar 2, 2015

@stevvooe wrote in #10740 (comment) about having a parser that can return an "image object reference". Right now, parsers.ParseRepositoryTag takes a string, parses it, and returns a repository string and a tag string. In my prototype for this feature, I modified this method to return either the digest or the tag in the 2nd returned string. Doing it this way means that the remaining changes to support referring to images either by tag or by digest relatively minimal; however, it does muddy the waters a bit, since ParseRepositoryTag is now returning something that is either a digest or a tag. I've thought about a few possibilities for making this cleaner:

  1. Rename ParseRepositoryTag to ParseRepositoryReference, so it's clearer that it's not always a tag that comes back
  2. Do the rename from above, but also modify the signature to return (repository, tag, digest), where only 1 of tag and digest is ever set at a time
  3. Return a type, perhaps called ImageReference, that looks like this:
type ImageReference struct {
  repository string // or possibly registry.RepositoryInfo instead of string
  tag string
  digest string
}

This ImageReference option would be a more invasive change, as anywhere ParseRepositoryTag is called will have to be modified to work with a struct instead of 2 strings.

@icecrime @jfrazelle @crosbymichael what are your thoughts?

@miminar
Copy link
Contributor

miminar commented Mar 2, 2015

I'm in favor of a new type - parsing will be done just once for every request. Loads of checks for presence of ':' in a tag don't look nice. Passing along more than two values with different semantics encoded a single string is getting cumbersome.

@stevvooe
Copy link
Contributor

stevvooe commented Mar 3, 2015

@ncdc Option 3 above is the best approach, even if ImageReference is simply type ImageReference string with a few access methods. Stringly typed data is a no-no. ;)

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

@stevvooe I'll do whatever you guys think makes the most sense. Do you want a type ImageReference string with

Repository() string
Tag() string
Digest() string

or does making it an actual struct make more sense? Should Repository() return a string or a RepositoryInfo?

@ncdc
Copy link
Contributor Author

ncdc commented Mar 3, 2015

Moving discussion of image reference to the PR here #11109 (comment)

@aluzzardi
Copy link
Member

Was this solved by #11109?

@jessfraz
Copy link
Contributor

yep should be @ncdc let me know if you disagree

@ncdc
Copy link
Contributor Author

ncdc commented Apr 10, 2015

All good, thanks!

Sent from my iPhone

On Apr 10, 2015, at 6:30 PM, Jessie Frazelle notifications@github.com wrote:

yep should be @ncdc let me know if you disagree


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

No branches or pull requests

8 participants