Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ability to flatten image after build #22641

Merged
merged 1 commit into from Nov 1, 2016

Conversation

@cpuguy83
Copy link
Collaborator

@cpuguy83 cpuguy83 commented May 10, 2016

- What I did
Allow built images to be squashed to their parent.
Squashing does not destroy any images or layers, and preserves the build cache.

- How I did it
Introduce a new CLI argument --squash to docker build.
Introduce a new param to the build API endpoint squash.

Once the build is complete, docker creates a new image loading the diffs from each layer into a single new layer and references all the parent's layers

- How to verify it

FROM busybox
RUN echo hello > /hello
RUN echo world >> /hello
RUN touch remove_me /remove_me
ENV HELLO world
RUN rm /remove_me
$ docker build -t test squash .
...
$ docker history test
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
4e10cb5b4cac        3 seconds ago                                                       12 B                merge sha256:88a7b0112a41826885df0e7072698006ee8f621c6ab99fca7fe9151d7b599702 to sha256:47bcc53f74dc94b1920f0b34f6036096526296767650f223433fe65c35f149eb
<missing>           5 minutes ago       /bin/sh -c rm /remove_me                        0 B
<missing>           5 minutes ago       /bin/sh -c #(nop) ENV HELLO=world               0 B
<missing>           5 minutes ago       /bin/sh -c touch remove_me /remove_me           0 B
<missing>           5 minutes ago       /bin/sh -c echo world >> /hello                 0 B
<missing>           6 minutes ago       /bin/sh -c echo hello > /hello                  0 B
<missing>           7 weeks ago         /bin/sh -c #(nop) CMD ["sh"]                    0 B
<missing>           7 weeks ago         /bin/sh -c #(nop) ADD file:47ca6e777c36a4cfff   1.113 MB

Test the image, check for /remove_me being gone, make sure hello\nworld is in /hello, make sure the HELLO envvar's value is world

- Description for the changelog

Add option to squash image layers to the FROM image after successful builds

- A picture of a cute animal (not mandatory but encouraged)

Some of the implementation is a little rough around the edges but wanted to get this out there.
I also really wanted to make sure that when using the full option the user is prompted to make sure they know what it's really doing.

@cpuguy83 cpuguy83 force-pushed the cpuguy83:build_finalization branch from d79671f to c59905f May 10, 2016
@cpuguy83 cpuguy83 changed the title Add image flattening Adds ability to flatten image after build May 10, 2016
@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented May 10, 2016

Hm, Deja vu #4232?

@cpuguy83
Copy link
Collaborator Author

@cpuguy83 cpuguy83 commented May 10, 2016

No, this is different. This is specifically for build

@cpuguy83 cpuguy83 force-pushed the cpuguy83:build_finalization branch 3 times, most recently from 412920d to edee69e May 10, 2016
@cpuguy83
Copy link
Collaborator Author

@cpuguy83 cpuguy83 commented May 10, 2016

But also addresses the issue with image history.

Also, maybe not --flatten, that was just the only word I could think of at the time. I like --squash better, maybe be more specific like --squash-rootfs.
Open to suggestion here.

@cpuguy83
Copy link
Collaborator Author

@cpuguy83 cpuguy83 commented May 10, 2016

Another thing to also consider is that perhaps for security reasons it would be undesirable for the image history to be preserved in the flattened image.

@cpuguy83 cpuguy83 force-pushed the cpuguy83:build_finalization branch from edee69e to 92d26f3 May 11, 2016
@justincormack
Copy link
Contributor

@justincormack justincormack commented May 12, 2016

I wonder if it would make sense to also allow a number as an arg, so 0 is scratch, 1 is parent etc, so you can leave eg two layers if you commonly use a pair of layers for build (eg alpine plus your app base)?

@cpuguy83
Copy link
Collaborator Author

@cpuguy83 cpuguy83 commented May 12, 2016

@justincormack But then they can just create that image as the base.

@justincormack
Copy link
Contributor

@justincormack justincormack commented May 12, 2016

@cpuguy83 yes I guess they would do that.

@cpuguy83
Copy link
Collaborator Author

@cpuguy83 cpuguy83 commented May 16, 2016

Thinking about this more, I'm wondering if we should make it default to squashing to the parent image.
This is because the only thing that will benefit from having these layers is the build machine itself, and the build machine will continue to have these layers.

With content addressability you can no longer push/pull a build cache, as such it doesn't make sense to keep all the extra layers as part of the image.

We can then either provide an option to fully squash (to 1 layer)... or even defer on such a decision.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented May 26, 2016

We discussed this in the maintainers meetup, and there's no consensus yet. Things discussed;

  • The --flatten=full didn't get much traction; probably best to always flatten to the FROM layer
  • We prefer to keep "layers" an implementation detail where possible, so rename the flag to something more universal (--optimize / --optimise 🇬🇧 , --finalize / --finalise 🇬🇧)
  • Should this be the default? (or make it optional first, with the chance to make it the default in future)
  • Should there be an option to make it the default (cli-config file possibly?)
  • Should there be a FLATTEN Dockerfile instruction, so that the author of the Dockerfile can specify that the image should always be flattened (can also be added in future)

Given that there's no consensus yet, we keep this in design review. We like the idea, however, so it stays open.

@alexpjohnson
Copy link

@alexpjohnson alexpjohnson commented Jun 3, 2016

This would be nice for us. We have some security concerns around using SSH keys in our Dockerfile, so we're making a base image, using docker run to mount the ssh keys to run bundle install and then committing the result. This is obviously not optimal because we lose the benefit of the layer cache, so this solution might give us a path forward in that area.

@phpdude
Copy link

@phpdude phpdude commented Jul 7, 2016

@thaJeztah Hi,

I am interested in this feature, it must have for a lot of projects, but this night I got better idea how to do it!

We need to add new instruction like ONBUILD, name it "SQUASH" for example. If instruction is prefixed with SQUASH, current command fs must be squashed with next instruction.

This will make simple to save Dockerfile readability and will allow us to make "commits" to fs layers when we want. We don't need to squash all an image, because it is stupid in many situations when you deploy with docker, if you squash everything, then you will upload and download all image every time even if you already have 90% of system on your server, we need to save ability to have more than one layers and ability to squash it with docker.

Now for this I build intermediate image and extend it with "FROM", intermediate image is squashed. This way I make double work: I build fs with docker, download it, unzip, merge layers, zip, upload to docker. Not very good way.

I want to try to provide new PR with this feature, but I must to check source codes for now to understand is it possible or is too hard for me (I don't write in Go, but I write in a lot of languages and it can be interesting task for me :).

What do you think? Before this idea I thought about cli flag too, but this night I got this cool idea which makes this process very simple and manageble.

@docwhat
Copy link

@docwhat docwhat commented Jul 7, 2016

Just to do an example to be sure I understand...

FROM alpine

COPY foobar.tgz /foobar.tgz
SQUASH
RUN tar xf /foobar.tgz && rm -f /foobar.tgz

So the COPY and RUN would be one layer and the /foobar.tgz file would not end up in any layer?

@cpuguy83
Copy link
Collaborator Author

@cpuguy83 cpuguy83 commented Jul 7, 2016

@docwhat SQUASH has been proposed before and is generally undesirable.
Really even this is probably not going to be accepted because it is further exposing the concept of layers to the user which we don't want to do.

@phpdude
Copy link

@phpdude phpdude commented Jul 7, 2016

@docwhat I think about something like

FROM alpine # will use all layers from parent image

SQUASH ADD /config/requirements.txt /requirements.txt

SQUASH RUN apt-get install -y python
RUN pip install /requirements.txt # new layer here

SQUASH EXPOSE 80
SQUASH RUN /install-all.sh
SQUASH COPY foobar.tgz /foobar.tgz
RUN tar xf /foobar.tgz && rm -f /foobar.tgz # new layer here
@phpdude
Copy link

@phpdude phpdude commented Jul 7, 2016

@cpuguy83 you already did it with ONBUILD comand which 99% of users don't need :)

My solution will allow to make clear images without extra entries in history (we can join history messages into one in resulting layer) and get very clean Dockerfile, not use 3rd party tools for this feature, which have to be in docker, because now you can optimize your layers only for multiple RUN commands, for EXPOSE, ADD, COPY, etc we can't and we can squash it only with 3rd party tools what is not cool :(

One big start point here is that we already DO IT. I just want to provide better (and much faster) way to do it to the community.

@cpuguy83
Copy link
Collaborator Author

@cpuguy83 cpuguy83 commented Jul 7, 2016

@phpdude ONBUILD is a builder instruction, nothing to do with layers.

@vincentwoo
Copy link
Contributor

@vincentwoo vincentwoo commented Dec 9, 2016

You must start the daemon with the --experimental flag to test this feature

@yellowmegaman
Copy link

@yellowmegaman yellowmegaman commented Dec 10, 2016

@vincentwoo Oh thanks a bunch! Quick search yielded nothing on this flag before your reply.

@krm1
Copy link

@krm1 krm1 commented Feb 15, 2017

Is there an indication when this feature can be used in production?

@cpuguy83
Copy link
Collaborator Author

@cpuguy83 cpuguy83 commented Feb 15, 2017

@krm1 It's only in experimental because we are not sure that it is the right interface to expose. With it in experimental we can change it between versions (or replace it with something else).

@alanbrent
Copy link

@alanbrent alanbrent commented Feb 15, 2017

@krm1 What's keeping you from using it now? As far as I can tell the resulting artifact is a Docker image like all other Docker images.

@cpuguy83 It really seems like the core concern for a --squash feature was nailed. Ultimately, the concerns around secrets are better solved in your build (Ci) pipeline.

@krm1
Copy link

@krm1 krm1 commented Feb 16, 2017

@cpuguy83 @alanbrent
I understood that in order to use the --squash feature I would have to run the Docker daemon with the --experimental flag. I would think that activating the --experimental flag would also enable other experimental features that could impact or influence Docker performance or behaviour in an unknowingly way.

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Feb 16, 2017

I would think that activating the --experimental flag would also enable other experimental features that could impact or influence Docker performance or behaviour in an unknowingly way.

No, experimental features would not be in the main code path; in addition, those features go through the same process as regular features. The difference is that design of the feature may change, and we're accepting feedback to change the feature based on that. Experimental just gives more freedom in that respect, knowing that we can change without causing a breaking change

@goldmann
Copy link
Contributor

@goldmann goldmann commented Feb 16, 2017

I think the question was about something else -- does the --experimental switch enable anything by default or every experimental feature is by default disabled, but just becomes available after we set --experimental?

@vdemeester
Copy link
Member

@vdemeester vdemeester commented Feb 16, 2017

For now, the --experimental flag is a feature flag for all experimental features, true. We could, in the future, have a configuration part where you could specify what part of experimental you want to be enabled ("experimental": { "squash", … }) but that's not the case for now.

@JonathonReinhart
Copy link

@JonathonReinhart JonathonReinhart commented Feb 16, 2017

Considering that one must explicitly opt-in to this experimental squashing feature with --squash, requiring one to also globally enable all other experimental features with --experimental seems illogical.

@vdemeester
Copy link
Member

@vdemeester vdemeester commented Feb 16, 2017

@JonathonReinhart not really, --squash is not meant to be a default behavior (i.e. when it will not be experimental anymore). The goal to have it in experimental right now is to see if it's useful and detect bugs we might have overlooked. I think it'll be out of experimental in the next release anyway 😉 👼

@buuck
Copy link

@buuck buuck commented Mar 11, 2017

I just used the new --squash flag for removing several GB of build files after compiling ROOT. Seems to have worked great, thanks a bunch!

Is there a way to use this flag with the automated build system, and/or will that be available once it is out of experimental?

dnephin pushed a commit to dnephin/docker that referenced this pull request Apr 17, 2017
Adds ability to flatten image after build
liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017
When running on a kernel which is not patched for the copy up bug
overlay2 will use the naive diff driver.

cherry-pick from moby#28138
and backport some code from moby#22641

Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Signed-off-by: Lei Jitang <leijitang@huawei.com>
@WhisperingChaos
Copy link
Contributor

@WhisperingChaos WhisperingChaos commented Dec 5, 2017

@buuck

Have you considered rewriting the Dockerfile to employ multi-stage building?

I certainly understand that you may not want to potentially radically change your Dockerfile and/or that you have a large inventory of pre-existing Dockerfiles that might cause this solution, at this time, to be prohibitively costly. That said, the large reduction you noticed is mostly the result of defining what should be in your image by excluding, through deletion, what shouldn't be there.

Using exclusion can be problematic especially when you've designed either intentionally/accidentally your build to seamlessly "adapt" to new versions of tooling that have semantically changed. For example, a compiler's set of exclusionary artifacts may have been altered, extended, and/or been relocated to a different path with the introduction of a new version, causing the statically defined delete operation, that once eliminated these artifacts to ignore them. This results in these unwanted artifacts remaining in the image. So the several Gig reduction that you noticed due to squash may at some future point mysteriously reappear. Worse yet, failed exclusionary behavior may preserve an artifact that doesn’t noticeably increase the image’s size but presents a juicy exploit. Therefore, instead of relying on an exclusionary mechanism, I would recommend the encoding of an inclusionary one.

In adopting an inclusionary strategy, you must fully detail what it is you wish to exist in the resultant image. This is not as difficult as it sounds and has many benefits including improving the security of your image and resulting running container. In general, it's much easier to identify what needs to be included than excluded, as the desired outputs of build tooling represent its public interface. This interface doesn't change as much as the build tooling's private implementation. For example, in a C++ project there maybe hundreds of object files and a small number of precompiled headers. All these artifacts result in producing a single executable file. A developer may change the C++ makefile for this project to incorporate features from libraries altering the build's implementation, but the end result of creating the desired executable is the same.

The recently introduced Multi-stage feature allows the resultant image to be isolated from other steps that build the final artifacts you wish to include in your image (separation of concerns). It also provides a copy mechanism to transfer these desired final artifacts, like the executable mentioned above, from the other build polluted steps to construct the resultant image.

Although I’ve personally been a proponent for multi-stage builds, I find its current implementation problematic. However, even in its current form, it offers superior facilities to building secure and minimally sized images than squash can ever achieve. Given my assessment of squash and desire for “simplicity” by offering a single way to realize a solution, I would like to see squash - squashed, as multi-stage builds are much more capable at performing the same operation. Perhaps, if you read into the tea leaves, you’ll notice the accelerated development and deployment timeline of multi-stage support, as well as its speedy inclusion as a standard docker feature while squash lingers in its experimental state. Please remember that I’m not a Docker Maintainer and the musings above are my own, not that of Docker Inc.

@draeath
Copy link

@draeath draeath commented Jun 6, 2018

Are there any indications if this will come out of experimental - or that the opt-in will be made more granular?

I would really like to utilize this functionality, but I've no desire to enable any other experimental features.

(I also have no interest in multi-stage builds at this time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet