New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same image ID with different digests #22225

Closed
hustcat opened this Issue Apr 21, 2016 · 16 comments

Comments

Projects
None yet
6 participants
@hustcat
Copy link

hustcat commented Apr 21, 2016

docker 1.10.1:

# docker pull xxx/nfsol/logsystem:60063
60063: Pulling from nfsol/logsystem
ee54741ab35b: Pull complete 
424feef11a88: Pull complete 
3dbe0a00a30e: Pull complete 
056dc877c609: Pull complete 
7dfd21aa9c79: Pull complete 
faeff5a9c014: Pull complete 
62d4982d09e4: Pull complete 
Digest: sha256:5ab9da9eb64a0120bb24546851234b955bd94b1fc3cee9ab13a2c62b727e3818
Status: Downloaded newer image for xxx/nfsol/logsystem:60063


# docker pull xxx/logsystem:60063      
60063: Pulling from xxx/logsystem
ee54741ab35b: Already exists 
424feef11a88: Already exists 
3dbe0a00a30e: Already exists 
056dc877c609: Already exists 
7dfd21aa9c79: Already exists 
faeff5a9c014: Already exists 
62d4982d09e4: Already exists 
Digest: sha256:72a362ffedd68d75d65e19b210fa56d6cfa600b2fd8d34ad99a9be0e45347d20
Status: Downloaded newer image for xxx/logsystem:60063

Two tags point to the same image ID:

# docker images|grep logsystem
xxx/nfsol/logsystem                                       60063                1628895c65c6        2 days ago          186.1 MB
xxx/logsystem                                  60063                1628895c65c6        2 days ago          186.1 MB

Is this a correct thing?

Ref to docker@distribution #1654

@aaronlehmann

This comment has been minimized.

Copy link
Contributor

aaronlehmann commented Apr 28, 2016

This does look correct.

The Digest you see in the docker pull output is the digest of the manifest. Docker Hub is still using old-syle manifests which include the repository name inside the manifest. So manifests for xxx/logsystem and xxx/nfsol/logsystem will have different digests.

The image ID, on the other hand, doesn't depend on the repository name, so it makes sense that the two entries in docker images would share the same ID.

I'll close this issue, but please don't hesitate to ask if you have any followup questions. If you're interested in learning more about manifests, there are specs for both the old-style and new-style manifests at https://github.com/docker/distribution/blob/master/docs/spec/manifest-v2-1.md and https://github.com/docker/distribution/blob/master/docs/spec/manifest-v2-2.md. For more information on image IDs, look at #22264.

@duyanghao

This comment has been minimized.

Copy link

duyanghao commented Jun 30, 2016

So,could i understand that the image ID on the docker images output uniquely identifies a image in the docker higher version as the docker lower version does?
If so, how can i get its parent image ID by its image ID or image name from the Registry V2? @aaronlehmann

@aaronlehmann

This comment has been minimized.

Copy link
Contributor

aaronlehmann commented Jun 30, 2016

@duyanghao: I don't think this is possible. The registry doesn't know about the concept of parent images. It handles each image separately.

@duyanghao

This comment has been minimized.

Copy link

duyanghao commented Jul 1, 2016

Could you answer the following question:

So,could i understand that the image ID on the docker images output uniquely identifies a image in the docker higher version as the docker lower version does? @aaronlehmann

@aaronlehmann

This comment has been minimized.

Copy link
Contributor

aaronlehmann commented Jul 1, 2016

Yes, the image ID is a unique identifier for each image. It will only be the same if the image has exactly the same filesystem and configuration.

@duyanghao

This comment has been minimized.

Copy link

duyanghao commented Jul 1, 2016

So,do you mean that the image ID uniquely identifies for each image both in docker higher version and docker lower version?
but As far as i know, in docker lower version, the image ID is randomly generated,so there may exist the possibility that different images share the same image ID.
Will that situation happen in docker higher version? @aaronlehmann

@aaronlehmann

This comment has been minimized.

Copy link
Contributor

aaronlehmann commented Jul 1, 2016

So,do you mean that the image ID uniquely identifies for each image both in docker higher version and docker lower version?

It does in Docker 1.10 and higher, but in earlier versions, it was not guaranteed to be unique.

but As far as i know, in docker lower version, the image ID is randomly generated,so there may exist the possibility that different images share the same image ID.

Yes, in Docker versions older than 1.10 it's possible to have different images with colliding IDs. This is not possible in Docker 1.10 and newer because the image IDs are based on secure hashes of the image filesystem and configuraiton.

@duyanghao

This comment has been minimized.

Copy link

duyanghao commented Jul 1, 2016

Thanks so much for the reply! @aaronlehmann

@duyanghao

This comment has been minimized.

Copy link

duyanghao commented Sep 21, 2016

@aaronlehmann Another three questions:
Question1
what is the difference between Image Manifest Version 2, Schema 2 and Image Manifest Version 2, Schema 1?

I've looked up some documents as below:
https://github.com/docker/distribution/blob/master/docs/spec/manifest-v2-1.md
https://github.com/docker/distribution/blob/master/docs/spec/manifest-v2-2.md
#18785

as you put it:

The new format allows end-to-end content addressability. The existing v2 manifest format puts image configurations in "v1Compatibility" strings that use the same data model and non-content-addressable ID scheme that the legacy v1 protocol uses. Supporting this format with the content addressable image/layer model in Docker 1.10 involves hacks to do things like generate fake v1 IDs. The new format carries the actual image configuration as a blob, so push/pull transfers an exact copy of the image.
....

One nice side effect of this PR is that the hacks described above for assembling legacy manifests are moved out of the engine code into vendored distribution code. The distribution APIs now have a ManifestBuilder interface that abstracts away the job of creating a manifest in either the old or new format.

And i don't really know what you mean.
Could you explain why you replace Schema 1 with Schema 2(exclude the fat manifest) in details?

Question2
Why there is no image id field in Schema 1 manifest as described below?
https://github.com/docker/distribution/blob/master/docs/spec/manifest-v2-1.md

And how to get image id from registry v2 if i use Schema 1?

Question3
As far as i know,there is almost no scene that docker will use this fat manifest.
So why does Schema 2 support the fat manifest?
And in what scene will docker use this fat manifest?

@aaronlehmann

This comment has been minimized.

Copy link
Contributor

aaronlehmann commented Sep 21, 2016

Could you explain why you replace Schema 1 with Schema 2(exclude the fat manifest) in details?

Schema 1 is designed for a model where every layer of an image is actually a runnable image. That's not how Docker works anymore. Since the switch to content addressability, layers are just filesystem diffs. When the image model changed, it made sense to switch to a different manifest format that makes sense for this model.

https://gist.github.com/aaronlehmann/b42a2eaf633fc949f93b has a lot of detail on this.

Why there is no image id field in Schema 1 manifest as described below?

As mentioned above, schema 1 treats every layer like a separate image. Each layer has a v1compatibility string which includes an id key. That's the pre-1.10 image ID. But this isn't the ID that recent versions of Docker will use. Now that images are content-addressable, the ID is based on a hash of the configuration and layers. Basically Docker 1.10+ has to migrate schema1 images to the new format. That's why we replaced schema1.

As far as i know,there is almost no scene that docker will use this fat manifest.
So why does Schema 2 support the fat manifest?
And in what scene will docker use this fat manifest?

The plan is to use it in the future to support images that can be pulled on multiple platforms. Have a look at https://github.com/estesp/manifest-tool

@duyanghao

This comment has been minimized.

Copy link

duyanghao commented Oct 31, 2016

@aaronlehmann Another question:
why there is no introduction of config.Image field in the document:
https://github.com/aaronlehmann/docker/blob/updated-image-spec/image/spec/v1.1.md
is it means parent imageID in docker 1.10+?
And what about other fields(like ContainerConfig,Size,VirtualSize and etc) that do not appear in the above document?

for example:
docker version: docker 1.11.0

[
    {
        "Id": "sha256:2b519bd204483370e81176d98fd0c9bc4632e156da7b2cc752fa383b96e7c042",
        "RepoTags": [
            "x.x.x.x/duyanghao/busybox:v0"
        ],
        "RepoDigests": [],
        "Parent": "",
        "Comment": "",
        "Created": "2016-08-18T06:13:28.269459769Z",
        "Container": "7dfa08cb9cbf2962b2362b1845b6657895685576015a8121652872fea56a7509",
        "ContainerConfig": {
            "Hostname": "xxxx",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/bin/sh",
                "-c",
                "dd if=/dev/zero of=file bs=10M count=1"
            ],
            "ArgsEscaped": true,
            "Image": "sha256:9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": [],
            "Labels": {}
        },
        "DockerVersion": "1.11.0-dev",
        "Author": "",
        "Config": {
            "Hostname": "xxxx",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "sh"
            ],
            "ArgsEscaped": true,
            "Image": "sha256:9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": [],
            "Labels": {}
        },
        "Architecture": "amd64",
        "Os": "linux",
        "Size": 374833837,
        "VirtualSize": 374833837,
        "GraphDriver": {
            "Name": "aufs",
            "Data": null
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a",
                "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef",
                "sha256:d13087c084482a01b15c755b55c5401e5514057f179a258b7b48a9f28fde7d06"
            ]
        }
    }
]
@duyanghao

This comment has been minimized.

Copy link

duyanghao commented Oct 31, 2016

As far as i know,in docker 1.10+,there is no parent field in the image configuration json.
So, why not store parent imageID?

@aaronlehmann

This comment has been minimized.

Copy link
Contributor

aaronlehmann commented Oct 31, 2016

why there is no introduction of config.Image field in the document:
https://github.com/aaronlehmann/docker/blob/updated-image-spec/image/spec/v1.1.md
is it means parent imageID in docker 1.10+?
And what about other fields(like ContainerConfig,Size,VirtualSize and etc) that do not appear in the above document?

The fields in Config and ContainerConfig are settings the engine uses to run the image. They aren't really related to image distribution.

I think Image is the name or ID of the image that the builder was running when it created this one. So in a sense, it's the parent. But I wouldn't rely on this field at all. It could have a symbolic name that's not an ID at all.

As far as i know,in docker 1.10+,there is no parent field in the image configuration json.
So, why not store parent imageID?

We would not be able to trust this information, since an image could claim any other image as its parent.

@duyanghao

This comment has been minimized.

Copy link

duyanghao commented Nov 1, 2016

why could an image claim any other image as its parent? there is a created field in the config.history,so i guess it can identify different images that share same top layers.

in my opinion:
the parent image means image being directly inherited

like Dockerfile:

$ docker build -t svendowideit/ambassador .
 FROM alpine:3.2
 MAINTAINER SvenDowideit@home.org.au
...

the image alpine:3.2 is the parent image of svendowideit/ambassador image.

@EugeniuZ

This comment has been minimized.

Copy link

EugeniuZ commented Jun 11, 2018

Hi,

I have 2 images with the same name, tag, imageid but different digests. Under which conditions is this possible ?

REPOSITORY                                              TAG                 DIGEST                                                                    IMAGE ID            CREATED              SIZE
...
xxx.dkr.ecr.eu-central-1.amazonaws.com/myimage          dev-init            sha256:248f8d24c61686d22f02fb3a1dcd03a8f719d8445eb810e18df2922559c4331b   0ebe1094cc29        6 hours ago          679 MB
xxx.dkr.ecr.eu-central-1.amazonaws.com/myimage          dev-init            sha256:6bd597ef01e44b7218c2c5a593f2e91ef090729dabb145eb22feb4909e93e5c7   0ebe1094cc29        6 hours ago          679 MB
...

On the ECR repo from AWS only the second one is visible (sha256:6bd597...)

Docker version:

docker version
Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   7392c3b/17.03.1-ce
 Built:        Tue May 30 17:59:44 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   7392c3b/17.03.1-ce
 Built:        Tue May 30 17:59:44 2017
 OS/Arch:      linux/amd64
 Experimental: false
@tomkcook

This comment has been minimized.

Copy link

tomkcook commented Oct 25, 2018

This is not possible in Docker 1.10 and newer because the image IDs are based on secure hashes of the image filesystem and configuraiton.

This is not exactly right; it is still possible for the hash to collide for different inputs, since hash functions take arbitrary inputs and produce a fixed range of outputs. In the case of SHA256, there are 2^256 possible outputs, which is roughly equal to 10^77, or perhaps about one hash value for every ten atoms in the observable universe (the number of atoms is somewhat uncertain; estimates start at about 10^78 up to perhaps 10^82).

Because of the "birthday effect", the likelihood of collision is a bit higher than you might expect; the probability of a collision between any two containers is one in 2^256, but the probability reaches about 40% between sqrt(2^256) containers, or 2^128, or 10^38.5. That is still a lot of containers you can generate before a collision becomes likely; if you had a trillion computers each generating a trillion random containers per second, it would take just over ten million years to generate that many containers.

But a collision between any two containers is still possible, just very, very improbable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment