Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: docker diff between image layers #12641

Open
tcurdt opened this issue Apr 22, 2015 · 31 comments
Open

[Proposal]: docker diff between image layers #12641

tcurdt opened this issue Apr 22, 2015 · 31 comments
Labels
area/distribution kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@tcurdt
Copy link

tcurdt commented Apr 22, 2015

As discussed on IRC it would be great to be able to diff two images to see the changes on the file system level. The pragmatic approach would be to integrate it into the history command.

cpuguy83:   tcurdt: No, there's no built in way to get a diff between two layers.
tcurdt: cpuguy83 thx. do you think that would be hard to implement?
cpuguy83:   Would be cool if that was part of the history message
cpuguy83:   tcurdt: I wouldn't think so since we already have to get the diff to create the layer.
vdemeester: tcurdt: that would be easy yes as cpuguy83 it's already implemented for the diff command
vdemeester: it's just you can't say diff layer1..layer2 in the cli
tcurdt: cpuguy83 vdemeester : shall I open an issue for that and take it from there?
tcurdt: feature request reallly
cpuguy83:   tcurdt: Sure, sounds cool!
@vdemeester
Copy link
Member

The most important thing would be how to "design" the command/arguments of the diff command. The implementation should not be that difficult.

Should we do something like the git diff command ?

# difference using images
# using tags
$ docker diff ubuntu 14.04..12.04
# using tags and ~
$ docker diff ubuntu 14.04..14.04~1
# using the latest tags
$ docker diff ubuntu 12.04 # same as latest..12.04
# difference between 2 layer using hashes
$ docker diff ubuntu 51a9c7c1f8bb..5f92234dcf1e

This raise questions :

  • Are we doing just simple 2 layer comparaison or like "supposed" in my example above, should we walk the history to display the diff between all of them ?
  • What about container ? Should we be able to pass a container as well as images ?

@tcurdt you might want to add a "Proposal :" on your title I guess :)
/cc @cpuguy83 @jfrazelle @crosbymichael

@tcurdt tcurdt changed the title docker diff between image layers [Proposal]: docker diff between image layers Apr 23, 2015
@tcurdt
Copy link
Author

tcurdt commented Apr 23, 2015

@vdemeester true :) done!

@tcurdt
Copy link
Author

tcurdt commented Apr 23, 2015

@vdemeester maybe we should look at it from the user stories. I would like to track fs changes across the different layers and see their impact on the image size. So a history like output across multiple layers would be great - the flexibility to pick a range of layers would be even better. But both would work for my story.

As for the interface I am wondering if it should maybe be a different command.

docker diff [OPTIONS] CONTAINER

is on the container level. My proposal was for the image level.

docker diffi [OPTIONS] IMAGE IMAGE

or have it included in the history command via option.

The problem with diffi vs history I see is that the user would expect to be able to diff across image layers without common ancestors. I assume that's much harder to implement. I assume the history route would be much easier. Maybe something like

docker history [OPTIONS] IMAGE [IMAGE..IMAGE]

could work.

@TomasTomecek
Copy link
Contributor

To be honest, I would also like to see changes in json.

@ashwinphatak
Copy link

#12919 is an effort in this direction. Produces a diff always with respect to the parent, for now.

root@156c7024a18c:/go/src/github.com/docker/docker# docker run -it busybox
/ # touch hello.txt
/ # exit

root@156c7024a18c:/go/src/github.com/docker/docker# docker commit agitated_hypatia
c1d613fa2117ec46364470fe9fdc44777191d6eebef2b05ed3e387ffc010de9c

root@156c7024a18c:/go/src/github.com/docker/docker# docker diffi c1d6
C /root
A /root/.ash_history
A /hello.txt

@tcurdt
Copy link
Author

tcurdt commented May 1, 2015

@ashwinphatak Cool!

The real interesting bit is the size of the change for me though. That seems to be not that easy to get though.

I also think if we add a diffi command it should be diffi <image> <image>.
Like this it is not much different from the history command.

@ashwinphatak
Copy link

@tcurdt could you paste mock/sample output, as you envision it, for the proposed command?

@tcurdt
Copy link
Author

tcurdt commented May 1, 2015

This is not the final proposal for the output - but just to show the general idea what I was after when I opened the feature request:

$ docker history -v tcurdt/registry
IMAGE               CREATED             CREATED BY                                      SIZE
22f80fdc11ff        8 days ago          /bin/sh -c #(nop) EXPOSE 5000/tcp               0 B
9e3be3005c27        8 days ago          /bin/sh -c make PREFIX=/go clean binaries       25.53 MB
C /root 0
A /root/.ash_history +500
A /hello.txt +1200
3f32c1d419f8        8 days ago          /bin/sh -c #(nop) COPY dir:b08f6342f03cace7a7   5.351 MB
A /dir 5351000
d81c9168350b        8 days ago          /bin/sh -c #(nop) ENV GOPATH=/go                0 B
c502fc1d60fb        8 days ago          /bin/sh -c mkdir -p /go/src /go/bin && chmod    0 B
A /go/src 0
A /go/bin 0
4e5daed4ed3a        8 days ago          /bin/sh -c #(nop) ENV PATH=/usr/src/go/bin:/u   0 B
c9588f17f56f        8 days ago          /bin/sh -c cd /usr/src/go/src && ./make.bash    97.4 MB
A /usr/local/go +10000000
C /etc/defaults/bla +100
C /etc/defaults/other -5000
D /etc/defaults/something -50000
731974acc727        8 days ago          /bin/sh -c curl -sSL https://golang.org/dl/go   39.69 MB
A /tmp/bla.tgz 1000000
b520ce7bb832        8 days ago          /bin/sh -c #(nop) ENV GOLANG_VERSION=1.4.2      0 B
daf2888e8aee        8 days ago          /bin/sh -c apt-get update && apt-get install    88.32 MB
A /usr/local/newbin +10000
A /var/lib/apt/packgeindex +10000
9faf01c9b0ef        9 days ago          /bin/sh -c #(nop) ADD file:a50d79b5ca65ccabb5   125.1 MB
A /usr/local/go +10000

The final output should of course be more consistent.
And for a real image diff it would be much more complicated.

@tcurdt
Copy link
Author

tcurdt commented May 1, 2015

As the output is actually in a two layer hierarchy.
Maybe we should define the json and then just render that as it seem fit.

@ashwinphatak
Copy link

thanks, I'll investigate options to get the size for each change.

On Fri, May 1, 2015 at 2:21 PM, Torsten Curdt notifications@github.com
wrote:

This is not the final proposal for the output - but just to show the
general idea what I was after when I opened the feature request:

$ docker history -v tcurdt/registry
IMAGE CREATED CREATED BY SIZE
22f80fdc11ff 8 days ago /bin/sh -c #(nop) EXPOSE 5000/tcp 0 B
9e3be3005c27 8 days ago /bin/sh -c make PREFIX=/go clean binaries 25.53 MB
C /root 0
A /root/.ash_history +500
A /hello.txt +1200
3f32c1d419f8 8 days ago /bin/sh -c #(nop) COPY dir:b08f6342f03cace7a7 5.351 MB
A /dir 5351000
d81c9168350b 8 days ago /bin/sh -c #(nop) ENV GOPATH=/go 0 B
c502fc1d60fb 8 days ago /bin/sh -c mkdir -p /go/src /go/bin && chmod 0 B
A /go/src 0
A /go/bin 0
4e5daed4ed3a 8 days ago /bin/sh -c #(nop) ENV PATH=/usr/src/go/bin:/u 0 B
c9588f17f56f 8 days ago /bin/sh -c cd /usr/src/go/src && ./make.bash 97.4 MB
A /usr/local/go +10000000
C /etc/defaults/bla +100
C /etc/defaults/other -5000
D /etc/defaults/something -50000
731974acc727 8 days ago /bin/sh -c curl -sSL https://golang.org/dl/go 39.69 MB
A /tmp/bla.tgz 1000000
b520ce7bb832 8 days ago /bin/sh -c #(nop) ENV GOLANG_VERSION=1.4.2 0 B
daf2888e8aee 8 days ago /bin/sh -c apt-get update && apt-get install 88.32 MB
A /usr/local/newbin +10000
A /var/lib/apt/packgeindex +10000
9faf01c9b0ef 9 days ago /bin/sh -c #(nop) ADD file:a50d79b5ca65ccabb5 125.1 MB
A /usr/local/go +10000

The final output should of course be more consistent.
And for a real image diff it would be much more complicated.


Reply to this email directly or view it on GitHub
#12641 (comment).

@vdemeester
Copy link
Member

@ashwinphatak related to #12919 you might wanna follow the "Design Proposal" of the Advanced contributing page.

@ashwinphatak
Copy link

@vdemeester sounds good. I might have jumped the gun a bit on the implementation. Let's discuss any related user stories first and then I'll write up a design proposal.

@thaJeztah thaJeztah added Distribution kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny labels May 4, 2015
@dieend
Copy link

dieend commented May 13, 2015

+1 for this feature.

Another study case:
Sometime people might create a new layer of image not using Dockerfile, (for example, with packer we can uses chef or puppet). But this way people have no total control of temporary files used by the provisioner. Myself, would like to be able to see what was changed in the new layer, so I can add a clean up step on the provisioning/buildstep to decrease the final layer/image size.

Something like this, maybe:

> docker diff layer1 layer2 --path /opt/
+/opt/sometempfolders    +190mb

> docker diff layer1 layer2 --path /opt/sometempfolders
+/opt/sometempfolders/newtempfiles    +150mb
+/opt/sometempfolders/newtempfolder   +40mb

> docker diff layer1 layer2 --path /opt/sometempfolders/newtempfiles
#file difference like git diff

And maybe add recursive options.

@jberkus
Copy link

jberkus commented Aug 3, 2015

+1 for this. I was just trying to debug why the Postgres 9.5 test container was so large, and this feature would have helped.

dieend's proposal for syntax and output seems fine to me.

@GordonTheTurtle
Copy link

USER POLL

The best way to get notified when there are changes in this discussion is by clicking the Subscribe button in the top right.

The people listed below have appreciated your meaningfull discussion with a random +1:

@doctapp
@henrysher
@curtiszimmerman
@jakirkham
@dragonfax
@chipironcin

@jessfraz jessfraz added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny and removed kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny kind/proposal labels Sep 8, 2015
@cswarth
Copy link

cswarth commented Nov 2, 2015

Discussion seems to have died on this topic. I would hope that whatever proposal eventually moves forward will support diffing unrelated images. It seems like the immediate focus was on comparing different version of the same image, or perhaps layers within an image.

Right now, this second, I would like the ability to compare images that should be identical but are not. one is the result of an automated built on dockerhub an the other is built locally. One works (runs to completion), the other does not. I'm going through right now comparing md5 checksums of every file in each image to find where they differ. It would be nice if there were a tool to help with this.

@zhuguihua
Copy link
Contributor

I have the following requirements:

  1. to show content of "docker diff" like "git diff", show all changes between RWLayer of one container and its parent layer, so that we can see all changes (not only changed file name, but also changed contents) in one container directly.
  2. make all changes in one container to a patch, so that it can be applied into another container anywhere.

I plan to do like this:

  1. docker diff
    • show all changes in one container
  2. docker patch
    • make a patch of all changes
    • apply a patch into another container

Anyone have another ideas? Welcome more comments about this. Thanks.

@thaJeztah
Copy link
Member

You can docker commit the container as an image, then start new containers from that, or docker save the committed image, and extract the last layer

@zhuguihua
Copy link
Contributor

@thaJeztah I know what you mean. But I think "docker patch" has some advantages:

  1. reduce the image size
  2. faster than "docker commit"
  3. faster and more convenient to pass the content to other users
    And "docker path" is my rough idea. Now my focus is how can we know all detailed changes in one container directly. Is "docker diff" a feasible method?

@pavan123k
Copy link

+1 @zhuguihua

we should be able to:

  1. create a diff and generate a patch
  2. apply the patch

img_B - img_A = patch_B

to generate img_B, i just have to img_A + patch_B .

storing patch_B would take so much less storage, transport time would be so small and less tax on network bandwidth.

@stevvooe
Copy link
Contributor

@zhuguihua @pavan123k While patching may seem like an obvious feature, docker images don't lend themselves well in a patch model with the current way they are represented.

It might be good to divide this issue into two parts.

  1. The concept of being able to see the differences between two images/containers is very helpful. The output doesn't need to represented a patch as just giving some visibility into the difference is useful in itself.
  2. Proposal for support of a patch system between images.

Use case 1 above is a much simpler undertaking and I'd hate to hold it up because of the complexity of adding patch/apply. Use case 2 is very valid, but there is whole lot of plumbing that needs to happen first.

@l3m
Copy link

l3m commented Apr 22, 2016

I would be very interested to see changes. Doesn't need to be a full diff, but even just a list of touched filenames would be very helpful. For me, the main goal is to reduce image size, so being able to see what exactly contributes to the layer would be a great help.

@wrouesnel
Copy link
Contributor

We have a slightly different use case for this functionality: tracking expectations when overriding vendor supplied files. The idea is when you do something like apt-get install -y SomePackage and then have subsequent commands which replace bits of it, we want to check that the files we're replacing haven't changed (and thus possibly changed meaning) without noticing.

docker diff functionality would make this a lot easier to implement, though the other missing piece is the ability to mount one containers rootfs into anothers.

@Philmod
Copy link

Philmod commented Jan 10, 2017

+1

@mikhail
Copy link

mikhail commented Mar 21, 2017

Still interested in this. My use case is that with a lot of automation around creating the image - one layer keeps changing and not leveraging the cache when there is no expected change. Investigating why a simple ADD app /app produces different layer every time it's run is painful without a diff.

@edmorley
Copy link

edmorley commented Jul 11, 2017

It seems as though several features are being conflated here, which is hindering reaching consensus.

To increase chances of success, I propose limiting scope to a feature that just allows viewing what files each layer adds/modifies/deletes. ie: the equivalent of what would have been output by docker diff, had it been run prior to the temporary container being committed, when that layer was created.

This means for now punting on:

  • diffing across multiple layers / entirely different images
  • diffing file contents / generating patches
  • filtering by directory (downstream tooling can do this)
  • file sizes (since docker diff doesn't even lists sizes at present, though that would be a great feature to add too)

I see the new output being surfaced by one or both of:

  1. docker history <IMAGE> when --verbose or say --show-file-changes passed. The output would look like that in [Proposal]: docker diff between image layers #12641 (comment)
  2. docker inspect <LAYER SHA>. The output would look the same, except it would just be for that one layer.

Image optimisation use-cases this would facilitate:

  • Identifying files that were deleted/modified in a separate layer to the one that originally created them and which otherwise add invisible bloat to an image (this would make things like Docker images contain 20MB of deleted /var/lib/apt/lists/ files tianon/docker-brew-ubuntu-core#90 easier to spot).
  • Identifying which layer added stray temp/cache/... files that have already been noticed in the final image.
  • Actually making it easier to find those stray files amongst the noise in the first place (since running du or ls in the final image results in lots of noise from the base OS image which is likely already very optimised).

Some questions:

  1. Does the general feature sound acceptable?
  2. Should it be part of docker history, docker inspect, both, or somewhere else entirely?
  3. The "Advanced contributing" link above is broken, but I found the page here. However that guide seems to contradict itself and I can't see other people following that process? I've filed Advanced contributing: Design proposal section refers to two different repos docker/docs#3857 if someone could take a look?

Many thanks :-)

@vlk-charles
Copy link

This is a nice way to list the changed files in a layer:

ls "$(docker inspect --format '{{.GraphDriver.Data.UpperDir}}' <IMAGE>)"

This was tested on the overlay2 backend. Of course you can add -R to ls or do anything you want.

@santeriv
Copy link

@vlk-charles Thanks. In case somebody else than me starts to wonder what is 'overlay2' check your storage driver using:

docker info

Mine said that it was using aufs (and naturally inspect-snippet above does not work with that).

I had to create on my Ubuntu 16.04 (Docker version 17.05.0-ce, build 89658be) a file /etc/docker/daemon.json like mentioned in https://docs.docker.com/engine/userguide/storagedriver/selectadriver/#check-and-set-your-current-storage-driver .
Then restarted the docker service:

sudo service docker restart

If you used 'aufs' before also, do not be scared where your previous images and containers disappeared.
Switch back to 'aufs' in daemon.json and restart docker and they are back. :)

@dnk8n
Copy link

dnk8n commented Aug 26, 2019

FYI

Got sent these links today, tools that might be of use to people subscribing to this issue:

https://github.com/wagoodman/dive

A tool for exploring each layer in a docker image

https://github.com/orisano/dlayer

dlayer is docker layer analyzer.

@jcaesar
Copy link

jcaesar commented Nov 25, 2020

In case you're not using overlay2 but e.g. btrfs, this can get you a list of added files:

docker save $IMAGE | tar xfO - $(docker save $IMAGE | tar xfO - manifest.json | jq -r '.[]|.Layers|.[-1]') | tar tv

bit ugly. :/ Maybe there's a slightly easier way, for this kind of one-off solution?

@da-x
Copy link

da-x commented Sep 19, 2023

If you reached here because you want to compact very similar docker images for which the build process cannot make use of layer caching, you can check da-x/deltaimage: a tool to generate and apply binary deltas between Docker images to optimize registry storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distribution kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

Successfully merging a pull request may close this issue.