New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ability to flatten image after build #22641

Merged
merged 1 commit into from Nov 1, 2016

Conversation

@cpuguy83
Contributor

cpuguy83 commented May 10, 2016

- What I did
Allow built images to be squashed to their parent.
Squashing does not destroy any images or layers, and preserves the build cache.

- How I did it
Introduce a new CLI argument --squash to docker build.
Introduce a new param to the build API endpoint squash.

Once the build is complete, docker creates a new image loading the diffs from each layer into a single new layer and references all the parent's layers

- How to verify it

FROM busybox
RUN echo hello > /hello
RUN echo world >> /hello
RUN touch remove_me /remove_me
ENV HELLO world
RUN rm /remove_me
$ docker build -t test squash .
...
$ docker history test
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
4e10cb5b4cac        3 seconds ago                                                       12 B                merge sha256:88a7b0112a41826885df0e7072698006ee8f621c6ab99fca7fe9151d7b599702 to sha256:47bcc53f74dc94b1920f0b34f6036096526296767650f223433fe65c35f149eb
<missing>           5 minutes ago       /bin/sh -c rm /remove_me                        0 B
<missing>           5 minutes ago       /bin/sh -c #(nop) ENV HELLO=world               0 B
<missing>           5 minutes ago       /bin/sh -c touch remove_me /remove_me           0 B
<missing>           5 minutes ago       /bin/sh -c echo world >> /hello                 0 B
<missing>           6 minutes ago       /bin/sh -c echo hello > /hello                  0 B
<missing>           7 weeks ago         /bin/sh -c #(nop) CMD ["sh"]                    0 B
<missing>           7 weeks ago         /bin/sh -c #(nop) ADD file:47ca6e777c36a4cfff   1.113 MB

Test the image, check for /remove_me being gone, make sure hello\nworld is in /hello, make sure the HELLO envvar's value is world

- Description for the changelog

Add option to squash image layers to the FROM image after successful builds

- A picture of a cute animal (not mandatory but encouraged)

Some of the implementation is a little rough around the edges but wanted to get this out there.
I also really wanted to make sure that when using the full option the user is prompted to make sure they know what it's really doing.

@cpuguy83 cpuguy83 changed the title from Add image flattening to Adds ability to flatten image after build May 10, 2016

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 10, 2016

Member

Hm, Deja vu #4232?

Member

thaJeztah commented May 10, 2016

Hm, Deja vu #4232?

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 May 10, 2016

Contributor

No, this is different. This is specifically for build

Contributor

cpuguy83 commented May 10, 2016

No, this is different. This is specifically for build

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 May 10, 2016

Contributor

But also addresses the issue with image history.

Also, maybe not --flatten, that was just the only word I could think of at the time. I like --squash better, maybe be more specific like --squash-rootfs.
Open to suggestion here.

Contributor

cpuguy83 commented May 10, 2016

But also addresses the issue with image history.

Also, maybe not --flatten, that was just the only word I could think of at the time. I like --squash better, maybe be more specific like --squash-rootfs.
Open to suggestion here.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 May 10, 2016

Contributor

Another thing to also consider is that perhaps for security reasons it would be undesirable for the image history to be preserved in the flattened image.

Contributor

cpuguy83 commented May 10, 2016

Another thing to also consider is that perhaps for security reasons it would be undesirable for the image history to be preserved in the flattened image.

@justincormack

This comment has been minimized.

Show comment
Hide comment
@justincormack

justincormack May 12, 2016

Contributor

I wonder if it would make sense to also allow a number as an arg, so 0 is scratch, 1 is parent etc, so you can leave eg two layers if you commonly use a pair of layers for build (eg alpine plus your app base)?

Contributor

justincormack commented May 12, 2016

I wonder if it would make sense to also allow a number as an arg, so 0 is scratch, 1 is parent etc, so you can leave eg two layers if you commonly use a pair of layers for build (eg alpine plus your app base)?

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 May 12, 2016

Contributor

@justincormack But then they can just create that image as the base.

Contributor

cpuguy83 commented May 12, 2016

@justincormack But then they can just create that image as the base.

@justincormack

This comment has been minimized.

Show comment
Hide comment
@justincormack

justincormack May 12, 2016

Contributor

@cpuguy83 yes I guess they would do that.

Contributor

justincormack commented May 12, 2016

@cpuguy83 yes I guess they would do that.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 May 16, 2016

Contributor

Thinking about this more, I'm wondering if we should make it default to squashing to the parent image.
This is because the only thing that will benefit from having these layers is the build machine itself, and the build machine will continue to have these layers.

With content addressability you can no longer push/pull a build cache, as such it doesn't make sense to keep all the extra layers as part of the image.

We can then either provide an option to fully squash (to 1 layer)... or even defer on such a decision.

Contributor

cpuguy83 commented May 16, 2016

Thinking about this more, I'm wondering if we should make it default to squashing to the parent image.
This is because the only thing that will benefit from having these layers is the build machine itself, and the build machine will continue to have these layers.

With content addressability you can no longer push/pull a build cache, as such it doesn't make sense to keep all the extra layers as part of the image.

We can then either provide an option to fully squash (to 1 layer)... or even defer on such a decision.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 26, 2016

Member

We discussed this in the maintainers meetup, and there's no consensus yet. Things discussed;

  • The --flatten=full didn't get much traction; probably best to always flatten to the FROM layer
  • We prefer to keep "layers" an implementation detail where possible, so rename the flag to something more universal (--optimize / --optimise 🇬🇧 , --finalize / --finalise 🇬🇧)
  • Should this be the default? (or make it optional first, with the chance to make it the default in future)
  • Should there be an option to make it the default (cli-config file possibly?)
  • Should there be a FLATTEN Dockerfile instruction, so that the author of the Dockerfile can specify that the image should always be flattened (can also be added in future)

Given that there's no consensus yet, we keep this in design review. We like the idea, however, so it stays open.

Member

thaJeztah commented May 26, 2016

We discussed this in the maintainers meetup, and there's no consensus yet. Things discussed;

  • The --flatten=full didn't get much traction; probably best to always flatten to the FROM layer
  • We prefer to keep "layers" an implementation detail where possible, so rename the flag to something more universal (--optimize / --optimise 🇬🇧 , --finalize / --finalise 🇬🇧)
  • Should this be the default? (or make it optional first, with the chance to make it the default in future)
  • Should there be an option to make it the default (cli-config file possibly?)
  • Should there be a FLATTEN Dockerfile instruction, so that the author of the Dockerfile can specify that the image should always be flattened (can also be added in future)

Given that there's no consensus yet, we keep this in design review. We like the idea, however, so it stays open.

@alexpjohnson

This comment has been minimized.

Show comment
Hide comment
@alexpjohnson

alexpjohnson Jun 3, 2016

This would be nice for us. We have some security concerns around using SSH keys in our Dockerfile, so we're making a base image, using docker run to mount the ssh keys to run bundle install and then committing the result. This is obviously not optimal because we lose the benefit of the layer cache, so this solution might give us a path forward in that area.

This would be nice for us. We have some security concerns around using SSH keys in our Dockerfile, so we're making a base image, using docker run to mount the ssh keys to run bundle install and then committing the result. This is obviously not optimal because we lose the benefit of the layer cache, so this solution might give us a path forward in that area.

@phpdude

This comment has been minimized.

Show comment
Hide comment
@phpdude

phpdude Jul 7, 2016

@thaJeztah Hi,

I am interested in this feature, it must have for a lot of projects, but this night I got better idea how to do it!

We need to add new instruction like ONBUILD, name it "SQUASH" for example. If instruction is prefixed with SQUASH, current command fs must be squashed with next instruction.

This will make simple to save Dockerfile readability and will allow us to make "commits" to fs layers when we want. We don't need to squash all an image, because it is stupid in many situations when you deploy with docker, if you squash everything, then you will upload and download all image every time even if you already have 90% of system on your server, we need to save ability to have more than one layers and ability to squash it with docker.

Now for this I build intermediate image and extend it with "FROM", intermediate image is squashed. This way I make double work: I build fs with docker, download it, unzip, merge layers, zip, upload to docker. Not very good way.

I want to try to provide new PR with this feature, but I must to check source codes for now to understand is it possible or is too hard for me (I don't write in Go, but I write in a lot of languages and it can be interesting task for me :).

What do you think? Before this idea I thought about cli flag too, but this night I got this cool idea which makes this process very simple and manageble.

phpdude commented Jul 7, 2016

@thaJeztah Hi,

I am interested in this feature, it must have for a lot of projects, but this night I got better idea how to do it!

We need to add new instruction like ONBUILD, name it "SQUASH" for example. If instruction is prefixed with SQUASH, current command fs must be squashed with next instruction.

This will make simple to save Dockerfile readability and will allow us to make "commits" to fs layers when we want. We don't need to squash all an image, because it is stupid in many situations when you deploy with docker, if you squash everything, then you will upload and download all image every time even if you already have 90% of system on your server, we need to save ability to have more than one layers and ability to squash it with docker.

Now for this I build intermediate image and extend it with "FROM", intermediate image is squashed. This way I make double work: I build fs with docker, download it, unzip, merge layers, zip, upload to docker. Not very good way.

I want to try to provide new PR with this feature, but I must to check source codes for now to understand is it possible or is too hard for me (I don't write in Go, but I write in a lot of languages and it can be interesting task for me :).

What do you think? Before this idea I thought about cli flag too, but this night I got this cool idea which makes this process very simple and manageble.

@docwhat

This comment has been minimized.

Show comment
Hide comment
@docwhat

docwhat Jul 7, 2016

Just to do an example to be sure I understand...

FROM alpine

COPY foobar.tgz /foobar.tgz
SQUASH
RUN tar xf /foobar.tgz && rm -f /foobar.tgz

So the COPY and RUN would be one layer and the /foobar.tgz file would not end up in any layer?

docwhat commented Jul 7, 2016

Just to do an example to be sure I understand...

FROM alpine

COPY foobar.tgz /foobar.tgz
SQUASH
RUN tar xf /foobar.tgz && rm -f /foobar.tgz

So the COPY and RUN would be one layer and the /foobar.tgz file would not end up in any layer?

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 7, 2016

Contributor

@docwhat SQUASH has been proposed before and is generally undesirable.
Really even this is probably not going to be accepted because it is further exposing the concept of layers to the user which we don't want to do.

Contributor

cpuguy83 commented Jul 7, 2016

@docwhat SQUASH has been proposed before and is generally undesirable.
Really even this is probably not going to be accepted because it is further exposing the concept of layers to the user which we don't want to do.

@phpdude

This comment has been minimized.

Show comment
Hide comment
@phpdude

phpdude Jul 7, 2016

@docwhat I think about something like

FROM alpine # will use all layers from parent image

SQUASH ADD /config/requirements.txt /requirements.txt

SQUASH RUN apt-get install -y python
RUN pip install /requirements.txt # new layer here

SQUASH EXPOSE 80
SQUASH RUN /install-all.sh
SQUASH COPY foobar.tgz /foobar.tgz
RUN tar xf /foobar.tgz && rm -f /foobar.tgz # new layer here

phpdude commented Jul 7, 2016

@docwhat I think about something like

FROM alpine # will use all layers from parent image

SQUASH ADD /config/requirements.txt /requirements.txt

SQUASH RUN apt-get install -y python
RUN pip install /requirements.txt # new layer here

SQUASH EXPOSE 80
SQUASH RUN /install-all.sh
SQUASH COPY foobar.tgz /foobar.tgz
RUN tar xf /foobar.tgz && rm -f /foobar.tgz # new layer here
@phpdude

This comment has been minimized.

Show comment
Hide comment
@phpdude

phpdude Jul 7, 2016

@cpuguy83 you already did it with ONBUILD comand which 99% of users don't need :)

My solution will allow to make clear images without extra entries in history (we can join history messages into one in resulting layer) and get very clean Dockerfile, not use 3rd party tools for this feature, which have to be in docker, because now you can optimize your layers only for multiple RUN commands, for EXPOSE, ADD, COPY, etc we can't and we can squash it only with 3rd party tools what is not cool :(

One big start point here is that we already DO IT. I just want to provide better (and much faster) way to do it to the community.

phpdude commented Jul 7, 2016

@cpuguy83 you already did it with ONBUILD comand which 99% of users don't need :)

My solution will allow to make clear images without extra entries in history (we can join history messages into one in resulting layer) and get very clean Dockerfile, not use 3rd party tools for this feature, which have to be in docker, because now you can optimize your layers only for multiple RUN commands, for EXPOSE, ADD, COPY, etc we can't and we can squash it only with 3rd party tools what is not cool :(

One big start point here is that we already DO IT. I just want to provide better (and much faster) way to do it to the community.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 7, 2016

Contributor

@phpdude ONBUILD is a builder instruction, nothing to do with layers.

Contributor

cpuguy83 commented Jul 7, 2016

@phpdude ONBUILD is a builder instruction, nothing to do with layers.

@phpdude

This comment has been minimized.

Show comment
Hide comment
@phpdude

phpdude Jul 7, 2016

@cpuguy83 but for user is not difference is it builder instruction or docker :)

I mean docker already supports this format "[PREFIX ...] INSTRUCTION", so that would be great to have instruction to squash layer with next.

phpdude commented Jul 7, 2016

@cpuguy83 but for user is not difference is it builder instruction or docker :)

I mean docker already supports this format "[PREFIX ...] INSTRUCTION", so that would be great to have instruction to squash layer with next.

@phpdude

This comment has been minimized.

Show comment
Hide comment
@phpdude

phpdude Jul 7, 2016

Already are 3rd party builder projects who better supports layers work.

https://github.com/jlhawn/dockramp

Most build instructions are only used to specify metadata and other options for how to run your container, but docker build creates a new filesystem layer for each of these instructions - requiring you to wait while a potentially expensive filesystem commit is performed and an unnecessary filesystem layer is created.

Dockramp differentiates these instructions from others and only performs filesystem commits after instructions which typically do modify the filesystem: COPY, EXTRACT, and RUN. All other instructions may be combined together and expensive commits are only performed when needed.

Cache lookups are also more efficient: build cache data is stored locally rather than on the Docker daemon so that the client can decide what set of changes map to a specific image ID. These lookups can be done in constant time, while docker build cache lookups iterate over all image layers and get noticeably slower the more images you have installed.

phpdude commented Jul 7, 2016

Already are 3rd party builder projects who better supports layers work.

https://github.com/jlhawn/dockramp

Most build instructions are only used to specify metadata and other options for how to run your container, but docker build creates a new filesystem layer for each of these instructions - requiring you to wait while a potentially expensive filesystem commit is performed and an unnecessary filesystem layer is created.

Dockramp differentiates these instructions from others and only performs filesystem commits after instructions which typically do modify the filesystem: COPY, EXTRACT, and RUN. All other instructions may be combined together and expensive commits are only performed when needed.

Cache lookups are also more efficient: build cache data is stored locally rather than on the Docker daemon so that the client can decide what set of changes map to a specific image ID. These lookups can be done in constant time, while docker build cache lookups iterate over all image layers and get noticeably slower the more images you have installed.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 7, 2016

Contributor

As stated before, there will not be a SQUASH command, we've been through this before.
Now please stop spamming this PR or I will be forced to lock it.

If you wish to discuss this, you should open a new issue.

Contributor

cpuguy83 commented Jul 7, 2016

As stated before, there will not be a SQUASH command, we've been through this before.
Now please stop spamming this PR or I will be forced to lock it.

If you wish to discuss this, you should open a new issue.

@phpdude

This comment has been minimized.

Show comment
Hide comment
@phpdude

phpdude Jul 7, 2016

@cpuguy83 ok, do you have URL where we can ready about SQUASH command/prefix? Why was denied?

phpdude commented Jul 7, 2016

@cpuguy83 ok, do you have URL where we can ready about SQUASH command/prefix? Why was denied?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 7, 2016

Member

https://github.com/docker/docker/issues?utf8=✓&q=squash%20in%3Atitle%20%20is%3Aclosed%20

I'm going to remove some random "+1" comments here, PLEASE don't hijack a pull request with a different proposal it greatly complicates the review process

Member

thaJeztah commented Jul 7, 2016

https://github.com/docker/docker/issues?utf8=✓&q=squash%20in%3Atitle%20%20is%3Aclosed%20

I'm going to remove some random "+1" comments here, PLEASE don't hijack a pull request with a different proposal it greatly complicates the review process

@phpdude

This comment has been minimized.

Show comment
Hide comment
@phpdude

phpdude Jul 7, 2016

@thaJeztah thanks! I shut up :)

phpdude commented Jul 7, 2016

@thaJeztah thanks! I shut up :)

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Jul 14, 2016

Collaborator

The lack of flattening has been too big of a problem, and has gone on for too long. If there is no consensus then let's make a non-consensual decision. Here it is:

  • There must be a way to produce a flattened image from docker build
  • Adding a Dockerfile keyword for this is a bad idea.
  • Adding a CLI flag is a good idea.
  • --squash is my favorite. --flatten is fine too. I don't like the abstract ones. Yes this flag is layer-specific, ideally layers would be an implementation detail, but this topic hasn't made any progress in years. Waiting for the ideal architecture is no longer an option, too many users need this.
  • Let's keep it simple and make it a boolean flag. When true it squashes everything (rebases on scratch). The rest is a nice-to-have, we can add another flag later.
  • I really like the idea of squashing by default, but worry about edge cases and unexpected regressions. Let's keep the default unchanged in the next release, then consider switching when we have more real-world usage data?

Thanks all. Let's please implement this.

Collaborator

shykes commented Jul 14, 2016

The lack of flattening has been too big of a problem, and has gone on for too long. If there is no consensus then let's make a non-consensual decision. Here it is:

  • There must be a way to produce a flattened image from docker build
  • Adding a Dockerfile keyword for this is a bad idea.
  • Adding a CLI flag is a good idea.
  • --squash is my favorite. --flatten is fine too. I don't like the abstract ones. Yes this flag is layer-specific, ideally layers would be an implementation detail, but this topic hasn't made any progress in years. Waiting for the ideal architecture is no longer an option, too many users need this.
  • Let's keep it simple and make it a boolean flag. When true it squashes everything (rebases on scratch). The rest is a nice-to-have, we can add another flag later.
  • I really like the idea of squashing by default, but worry about edge cases and unexpected regressions. Let's keep the default unchanged in the next release, then consider switching when we have more real-world usage data?

Thanks all. Let's please implement this.

@justincormack

This comment has been minimized.

Show comment
Hide comment
@justincormack

justincormack Jul 14, 2016

Contributor

@cpuguy83 needs a rebase

Contributor

justincormack commented Jul 14, 2016

@cpuguy83 needs a rebase

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 14, 2016

Member

We've discussed this (again); current thoughts;

We prefer to not enable this by default, and make it optional: the current layered model is very efficient (storage wise), due to a lot of images sharing the same layers. Enabling it by default would throw that away.

Perhaps this should still be a separate command from build (docker squash), so that you can "optimize" the image the moment you are ready to push them to sign and push them to a registry; it's a distribution problem.

We can keep the option open to later add docker build --squash to do it all in one go

Member

thaJeztah commented Jul 14, 2016

We've discussed this (again); current thoughts;

We prefer to not enable this by default, and make it optional: the current layered model is very efficient (storage wise), due to a lot of images sharing the same layers. Enabling it by default would throw that away.

Perhaps this should still be a separate command from build (docker squash), so that you can "optimize" the image the moment you are ready to push them to sign and push them to a registry; it's a distribution problem.

We can keep the option open to later add docker build --squash to do it all in one go

@thaJeztah thaJeztah added this to the 1.13.0 milestone Jul 14, 2016

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jul 14, 2016

Member

Let's see if we can get this into 1.13 experimental

Member

thaJeztah commented Jul 14, 2016

Let's see if we can get this into 1.13 experimental

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 14, 2016

Contributor

This is now rebased.
I've changed the build flag to a bool called --squash, and make it experimental only.
With --squash it will currently squash to scratch.

TODO: add docker squash subcommand. I'm not 100% sold on this one b/c I'm not sure why people would need to squash images they didn't build.

Contributor

cpuguy83 commented Jul 14, 2016

This is now rebased.
I've changed the build flag to a bool called --squash, and make it experimental only.
With --squash it will currently squash to scratch.

TODO: add docker squash subcommand. I'm not 100% sold on this one b/c I'm not sure why people would need to squash images they didn't build.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 14, 2016

Contributor

Also not 100% sold on squashing this all the way vs squashing to the parent, but this is implementing as @shykes requested.

Contributor

cpuguy83 commented Jul 14, 2016

Also not 100% sold on squashing this all the way vs squashing to the parent, but this is implementing as @shykes requested.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 14, 2016

Contributor

And sorry for the spam...

The implementation is really no different either way with how it's implemented, which can squash to any image within the ancestor chain as well as scratch.

Contributor

cpuguy83 commented Jul 14, 2016

And sorry for the spam...

The implementation is really no different either way with how it's implemented, which can squash to any image within the ancestor chain as well as scratch.

@joaovieira

This comment has been minimized.

Show comment
Hide comment
@joaovieira

joaovieira Nov 9, 2016

Good stuff! Looking forward for this feature for private NPM modules in (smaller) base images - https://docs.npmjs.com/private-modules/docker-and-private-modules.

What was the final decision on the history? Is it persisted in the squashed image or cleared as well (as hinted in #22641 (comment))?

Any idea when is v1.13 expected to be released?

joaovieira commented Nov 9, 2016

Good stuff! Looking forward for this feature for private NPM modules in (smaller) base images - https://docs.npmjs.com/private-modules/docker-and-private-modules.

What was the final decision on the history? Is it persisted in the squashed image or cleared as well (as hinted in #22641 (comment))?

Any idea when is v1.13 expected to be released?

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Nov 11, 2016

Contributor

@joaovieira The entire history is kept.
But we are looking at adding build secrets for 1.14.

Code freeze for 1.13 is happening today.

Contributor

cpuguy83 commented Nov 11, 2016

@joaovieira The entire history is kept.
But we are looking at adding build secrets for 1.14.

Code freeze for 1.13 is happening today.

@vincentwoo

This comment has been minimized.

Show comment
Hide comment
@vincentwoo

vincentwoo Dec 6, 2016

Contributor

I like the direction of this change. Unfortunately, this feature is not useable for me until the option to squash to parent instead of scratch lands. As it stands, I have many large images that all parent off another rather large image.

Contributor

vincentwoo commented Dec 6, 2016

I like the direction of this change. Unfortunately, this feature is not useable for me until the option to squash to parent instead of scratch lands. As it stands, I have many large images that all parent off another rather large image.

@dmcgowan

This comment has been minimized.

Show comment
Hide comment
@dmcgowan

dmcgowan Dec 6, 2016

Member

@vincentwoo this feature squashes all the newly built layers into a single layer, it is not squashing to scratch

Member

dmcgowan commented Dec 6, 2016

@vincentwoo this feature squashes all the newly built layers into a single layer, it is not squashing to scratch

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Dec 6, 2016

Contributor

Yep, it is squashing up to parent, not scratch.

Contributor

cpuguy83 commented Dec 6, 2016

Yep, it is squashing up to parent, not scratch.

@vincentwoo

This comment has been minimized.

Show comment
Hide comment
@vincentwoo

vincentwoo Dec 6, 2016

Contributor

Ah, I was misinterpreting the <missing> tags in the history. That's strange, then: if anything from the layer 86 bytes and size and below is present, why is the tag <missing>?

vwoo@ubuntu:~/shared/execute/docker$ docker history us.gcr.io/coderpad-1189/coderpad:haskell
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
8d1e7f37286f        17 seconds ago                                                      940 MB              merge sha256:eb8c7160919288d051f164cc3138d2d0464bf0df7b56323b980a2d6da46ada4b to sha256:19c057023a3c6f902194a9f8c1519c594bc5a408d74e53f679287936e249da02
<missing>           2 minutes ago       /bin/sh -c chown -R coderpad /home/coderpad     0 B
<missing>           2 minutes ago       /bin/sh -c #(nop) ADD dir:3caedc1369477d27...   0 B
<missing>           2 minutes ago       /bin/sh -c cabal update && cabal install b...   0 B
<missing>           20 hours ago        /bin/sh -c #(nop)  ENV PATH=/opt/ghc/8.0.2...   0 B
<missing>           20 hours ago        /bin/sh -c apt-get install -y ghc-8.0.2 ca...   0 B
<missing>           20 hours ago        /bin/sh -c update-private-repos                 0 B
<missing>           20 hours ago        /bin/sh -c apt-add-repository ppa:hvr/ghc       0 B
<missing>           21 hours ago        /bin/sh -c #(nop) ADD file:848e86a394048c7...   86 B
<missing>           21 hours ago        /bin/sh -c #(nop) ADD dir:bb88fae25e86ac5f...   1.34 kB
<missing>           21 hours ago        /bin/sh -c chmod 1777 /tmp                      0 B
<missing>           21 hours ago        /bin/sh -c apt-get install -y build-essent...   193 MB
<missing>           21 hours ago        /bin/sh -c apt-get install -y git curl sof...   69.2 MB
<missing>           21 hours ago        /bin/sh -c #(nop)  ENV LANG=en_US.UTF-8 PA...   0 B
<missing>           21 hours ago        /bin/sh -c locale-gen en_US.UTF-8               1.62 MB
<missing>           21 hours ago        /bin/sh -c #(nop)  WORKDIR /home/coderpad       0 B
<missing>           21 hours ago        /bin/sh -c useradd coderpad -m                  334 kB
<missing>           21 hours ago        /bin/sh -c apt-get update                       22.3 MB
<missing>           21 hours ago        /bin/sh -c #(nop)  ENV DEBIAN_FRONTEND=non...   0 B
<missing>           7 days ago          /bin/sh -c #(nop)  CMD ["/bin/bash"]            0 B
<missing>           7 days ago          /bin/sh -c mkdir -p /run/systemd && echo '...   7 B
<missing>           7 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\...   1.9 kB
<missing>           7 days ago          /bin/sh -c rm -rf /var/lib/apt/lists/*          0 B
<missing>           7 days ago          /bin/sh -c set -xe   && echo '#!/bin/sh' >...   195 kB
<missing>           7 days ago          /bin/sh -c #(nop) ADD file:ded1872c7b5d88e...   188 MB
Contributor

vincentwoo commented Dec 6, 2016

Ah, I was misinterpreting the <missing> tags in the history. That's strange, then: if anything from the layer 86 bytes and size and below is present, why is the tag <missing>?

vwoo@ubuntu:~/shared/execute/docker$ docker history us.gcr.io/coderpad-1189/coderpad:haskell
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
8d1e7f37286f        17 seconds ago                                                      940 MB              merge sha256:eb8c7160919288d051f164cc3138d2d0464bf0df7b56323b980a2d6da46ada4b to sha256:19c057023a3c6f902194a9f8c1519c594bc5a408d74e53f679287936e249da02
<missing>           2 minutes ago       /bin/sh -c chown -R coderpad /home/coderpad     0 B
<missing>           2 minutes ago       /bin/sh -c #(nop) ADD dir:3caedc1369477d27...   0 B
<missing>           2 minutes ago       /bin/sh -c cabal update && cabal install b...   0 B
<missing>           20 hours ago        /bin/sh -c #(nop)  ENV PATH=/opt/ghc/8.0.2...   0 B
<missing>           20 hours ago        /bin/sh -c apt-get install -y ghc-8.0.2 ca...   0 B
<missing>           20 hours ago        /bin/sh -c update-private-repos                 0 B
<missing>           20 hours ago        /bin/sh -c apt-add-repository ppa:hvr/ghc       0 B
<missing>           21 hours ago        /bin/sh -c #(nop) ADD file:848e86a394048c7...   86 B
<missing>           21 hours ago        /bin/sh -c #(nop) ADD dir:bb88fae25e86ac5f...   1.34 kB
<missing>           21 hours ago        /bin/sh -c chmod 1777 /tmp                      0 B
<missing>           21 hours ago        /bin/sh -c apt-get install -y build-essent...   193 MB
<missing>           21 hours ago        /bin/sh -c apt-get install -y git curl sof...   69.2 MB
<missing>           21 hours ago        /bin/sh -c #(nop)  ENV LANG=en_US.UTF-8 PA...   0 B
<missing>           21 hours ago        /bin/sh -c locale-gen en_US.UTF-8               1.62 MB
<missing>           21 hours ago        /bin/sh -c #(nop)  WORKDIR /home/coderpad       0 B
<missing>           21 hours ago        /bin/sh -c useradd coderpad -m                  334 kB
<missing>           21 hours ago        /bin/sh -c apt-get update                       22.3 MB
<missing>           21 hours ago        /bin/sh -c #(nop)  ENV DEBIAN_FRONTEND=non...   0 B
<missing>           7 days ago          /bin/sh -c #(nop)  CMD ["/bin/bash"]            0 B
<missing>           7 days ago          /bin/sh -c mkdir -p /run/systemd && echo '...   7 B
<missing>           7 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\...   1.9 kB
<missing>           7 days ago          /bin/sh -c rm -rf /var/lib/apt/lists/*          0 B
<missing>           7 days ago          /bin/sh -c set -xe   && echo '#!/bin/sh' >...   195 kB
<missing>           7 days ago          /bin/sh -c #(nop) ADD file:ded1872c7b5d88e...   188 MB
@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Dec 6, 2016

Contributor

Because it's image history with no layer.

Contributor

cpuguy83 commented Dec 6, 2016

Because it's image history with no layer.

@yellowmegaman

This comment has been minimized.

Show comment
Hide comment
@yellowmegaman

yellowmegaman Dec 9, 2016

Great feature! It's merged, but can't try it.
docker build -t test --squash .
Error response from daemon: squash is only supported with experimental mode

$ dpkg -l|grep docker
ii  docker-engine                         1.13.0~rc3-0~debian-stretch

Any ideas?

Great feature! It's merged, but can't try it.
docker build -t test --squash .
Error response from daemon: squash is only supported with experimental mode

$ dpkg -l|grep docker
ii  docker-engine                         1.13.0~rc3-0~debian-stretch

Any ideas?

@vincentwoo

This comment has been minimized.

Show comment
Hide comment
@vincentwoo

vincentwoo Dec 9, 2016

Contributor

You must start the daemon with the --experimental flag to test this feature

Contributor

vincentwoo commented Dec 9, 2016

You must start the daemon with the --experimental flag to test this feature

@yellowmegaman

This comment has been minimized.

Show comment
Hide comment
@yellowmegaman

yellowmegaman Dec 10, 2016

@vincentwoo Oh thanks a bunch! Quick search yielded nothing on this flag before your reply.

@vincentwoo Oh thanks a bunch! Quick search yielded nothing on this flag before your reply.

@krm1

This comment has been minimized.

Show comment
Hide comment
@krm1

krm1 Feb 15, 2017

Is there an indication when this feature can be used in production?

krm1 commented Feb 15, 2017

Is there an indication when this feature can be used in production?

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Feb 15, 2017

Contributor

@krm1 It's only in experimental because we are not sure that it is the right interface to expose. With it in experimental we can change it between versions (or replace it with something else).

Contributor

cpuguy83 commented Feb 15, 2017

@krm1 It's only in experimental because we are not sure that it is the right interface to expose. With it in experimental we can change it between versions (or replace it with something else).

@alanbrent

This comment has been minimized.

Show comment
Hide comment
@alanbrent

alanbrent Feb 15, 2017

@krm1 What's keeping you from using it now? As far as I can tell the resulting artifact is a Docker image like all other Docker images.

@cpuguy83 It really seems like the core concern for a --squash feature was nailed. Ultimately, the concerns around secrets are better solved in your build (Ci) pipeline.

@krm1 What's keeping you from using it now? As far as I can tell the resulting artifact is a Docker image like all other Docker images.

@cpuguy83 It really seems like the core concern for a --squash feature was nailed. Ultimately, the concerns around secrets are better solved in your build (Ci) pipeline.

@krm1

This comment has been minimized.

Show comment
Hide comment
@krm1

krm1 Feb 16, 2017

@cpuguy83 @alanbrent
I understood that in order to use the --squash feature I would have to run the Docker daemon with the --experimental flag. I would think that activating the --experimental flag would also enable other experimental features that could impact or influence Docker performance or behaviour in an unknowingly way.

krm1 commented Feb 16, 2017

@cpuguy83 @alanbrent
I understood that in order to use the --squash feature I would have to run the Docker daemon with the --experimental flag. I would think that activating the --experimental flag would also enable other experimental features that could impact or influence Docker performance or behaviour in an unknowingly way.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Feb 16, 2017

Member

I would think that activating the --experimental flag would also enable other experimental features that could impact or influence Docker performance or behaviour in an unknowingly way.

No, experimental features would not be in the main code path; in addition, those features go through the same process as regular features. The difference is that design of the feature may change, and we're accepting feedback to change the feature based on that. Experimental just gives more freedom in that respect, knowing that we can change without causing a breaking change

Member

thaJeztah commented Feb 16, 2017

I would think that activating the --experimental flag would also enable other experimental features that could impact or influence Docker performance or behaviour in an unknowingly way.

No, experimental features would not be in the main code path; in addition, those features go through the same process as regular features. The difference is that design of the feature may change, and we're accepting feedback to change the feature based on that. Experimental just gives more freedom in that respect, knowing that we can change without causing a breaking change

@goldmann

This comment has been minimized.

Show comment
Hide comment
@goldmann

goldmann Feb 16, 2017

Contributor

I think the question was about something else -- does the --experimental switch enable anything by default or every experimental feature is by default disabled, but just becomes available after we set --experimental?

Contributor

goldmann commented Feb 16, 2017

I think the question was about something else -- does the --experimental switch enable anything by default or every experimental feature is by default disabled, but just becomes available after we set --experimental?

@vdemeester

This comment has been minimized.

Show comment
Hide comment
@vdemeester

vdemeester Feb 16, 2017

Member

For now, the --experimental flag is a feature flag for all experimental features, true. We could, in the future, have a configuration part where you could specify what part of experimental you want to be enabled ("experimental": { "squash", … }) but that's not the case for now.

Member

vdemeester commented Feb 16, 2017

For now, the --experimental flag is a feature flag for all experimental features, true. We could, in the future, have a configuration part where you could specify what part of experimental you want to be enabled ("experimental": { "squash", … }) but that's not the case for now.

@JonathonReinhart

This comment has been minimized.

Show comment
Hide comment
@JonathonReinhart

JonathonReinhart Feb 16, 2017

Considering that one must explicitly opt-in to this experimental squashing feature with --squash, requiring one to also globally enable all other experimental features with --experimental seems illogical.

Considering that one must explicitly opt-in to this experimental squashing feature with --squash, requiring one to also globally enable all other experimental features with --experimental seems illogical.

@vdemeester

This comment has been minimized.

Show comment
Hide comment
@vdemeester

vdemeester Feb 16, 2017

Member

@JonathonReinhart not really, --squash is not meant to be a default behavior (i.e. when it will not be experimental anymore). The goal to have it in experimental right now is to see if it's useful and detect bugs we might have overlooked. I think it'll be out of experimental in the next release anyway 😉 👼

Member

vdemeester commented Feb 16, 2017

@JonathonReinhart not really, --squash is not meant to be a default behavior (i.e. when it will not be experimental anymore). The goal to have it in experimental right now is to see if it's useful and detect bugs we might have overlooked. I think it'll be out of experimental in the next release anyway 😉 👼

@buuck

This comment has been minimized.

Show comment
Hide comment
@buuck

buuck Mar 11, 2017

I just used the new --squash flag for removing several GB of build files after compiling ROOT. Seems to have worked great, thanks a bunch!

Is there a way to use this flag with the automated build system, and/or will that be available once it is out of experimental?

buuck commented Mar 11, 2017

I just used the new --squash flag for removing several GB of build files after compiling ROOT. Seems to have worked great, thanks a bunch!

Is there a way to use this flag with the automated build system, and/or will that be available once it is out of experimental?

dnephin pushed a commit to dnephin/docker that referenced this pull request Apr 17, 2017

Merge pull request #22641 from cpuguy83/build_finalization
Adds ability to flatten image after build

liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017

Use naive diff for overlay2 when opaque copy up bug present
When running on a kernel which is not patched for the copy up bug
overlay2 will use the naive diff driver.

cherry-pick from moby#28138
and backport some code from moby#22641

Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Signed-off-by: Lei Jitang <leijitang@huawei.com>
@WhisperingChaos

This comment has been minimized.

Show comment
Hide comment
@WhisperingChaos

WhisperingChaos Dec 5, 2017

Contributor

@buuck

Have you considered rewriting the Dockerfile to employ multi-stage building?

I certainly understand that you may not want to potentially radically change your Dockerfile and/or that you have a large inventory of pre-existing Dockerfiles that might cause this solution, at this time, to be prohibitively costly. That said, the large reduction you noticed is mostly the result of defining what should be in your image by excluding, through deletion, what shouldn't be there.

Using exclusion can be problematic especially when you've designed either intentionally/accidentally your build to seamlessly "adapt" to new versions of tooling that have semantically changed. For example, a compiler's set of exclusionary artifacts may have been altered, extended, and/or been relocated to a different path with the introduction of a new version, causing the statically defined delete operation, that once eliminated these artifacts to ignore them. This results in these unwanted artifacts remaining in the image. So the several Gig reduction that you noticed due to squash may at some future point mysteriously reappear. Worse yet, failed exclusionary behavior may preserve an artifact that doesn’t noticeably increase the image’s size but presents a juicy exploit. Therefore, instead of relying on an exclusionary mechanism, I would recommend the encoding of an inclusionary one.

In adopting an inclusionary strategy, you must fully detail what it is you wish to exist in the resultant image. This is not as difficult as it sounds and has many benefits including improving the security of your image and resulting running container. In general, it's much easier to identify what needs to be included than excluded, as the desired outputs of build tooling represent its public interface. This interface doesn't change as much as the build tooling's private implementation. For example, in a C++ project there maybe hundreds of object files and a small number of precompiled headers. All these artifacts result in producing a single executable file. A developer may change the C++ makefile for this project to incorporate features from libraries altering the build's implementation, but the end result of creating the desired executable is the same.

The recently introduced Multi-stage feature allows the resultant image to be isolated from other steps that build the final artifacts you wish to include in your image (separation of concerns). It also provides a copy mechanism to transfer these desired final artifacts, like the executable mentioned above, from the other build polluted steps to construct the resultant image.

Although I’ve personally been a proponent for multi-stage builds, I find its current implementation problematic. However, even in its current form, it offers superior facilities to building secure and minimally sized images than squash can ever achieve. Given my assessment of squash and desire for “simplicity” by offering a single way to realize a solution, I would like to see squash - squashed, as multi-stage builds are much more capable at performing the same operation. Perhaps, if you read into the tea leaves, you’ll notice the accelerated development and deployment timeline of multi-stage support, as well as its speedy inclusion as a standard docker feature while squash lingers in its experimental state. Please remember that I’m not a Docker Maintainer and the musings above are my own, not that of Docker Inc.

Contributor

WhisperingChaos commented Dec 5, 2017

@buuck

Have you considered rewriting the Dockerfile to employ multi-stage building?

I certainly understand that you may not want to potentially radically change your Dockerfile and/or that you have a large inventory of pre-existing Dockerfiles that might cause this solution, at this time, to be prohibitively costly. That said, the large reduction you noticed is mostly the result of defining what should be in your image by excluding, through deletion, what shouldn't be there.

Using exclusion can be problematic especially when you've designed either intentionally/accidentally your build to seamlessly "adapt" to new versions of tooling that have semantically changed. For example, a compiler's set of exclusionary artifacts may have been altered, extended, and/or been relocated to a different path with the introduction of a new version, causing the statically defined delete operation, that once eliminated these artifacts to ignore them. This results in these unwanted artifacts remaining in the image. So the several Gig reduction that you noticed due to squash may at some future point mysteriously reappear. Worse yet, failed exclusionary behavior may preserve an artifact that doesn’t noticeably increase the image’s size but presents a juicy exploit. Therefore, instead of relying on an exclusionary mechanism, I would recommend the encoding of an inclusionary one.

In adopting an inclusionary strategy, you must fully detail what it is you wish to exist in the resultant image. This is not as difficult as it sounds and has many benefits including improving the security of your image and resulting running container. In general, it's much easier to identify what needs to be included than excluded, as the desired outputs of build tooling represent its public interface. This interface doesn't change as much as the build tooling's private implementation. For example, in a C++ project there maybe hundreds of object files and a small number of precompiled headers. All these artifacts result in producing a single executable file. A developer may change the C++ makefile for this project to incorporate features from libraries altering the build's implementation, but the end result of creating the desired executable is the same.

The recently introduced Multi-stage feature allows the resultant image to be isolated from other steps that build the final artifacts you wish to include in your image (separation of concerns). It also provides a copy mechanism to transfer these desired final artifacts, like the executable mentioned above, from the other build polluted steps to construct the resultant image.

Although I’ve personally been a proponent for multi-stage builds, I find its current implementation problematic. However, even in its current form, it offers superior facilities to building secure and minimally sized images than squash can ever achieve. Given my assessment of squash and desire for “simplicity” by offering a single way to realize a solution, I would like to see squash - squashed, as multi-stage builds are much more capable at performing the same operation. Perhaps, if you read into the tea leaves, you’ll notice the accelerated development and deployment timeline of multi-stage support, as well as its speedy inclusion as a standard docker feature while squash lingers in its experimental state. Please remember that I’m not a Docker Maintainer and the musings above are my own, not that of Docker Inc.

@tt tt referenced this pull request Apr 19, 2018

Closed

Squash newly built layers #123

@draeath

This comment has been minimized.

Show comment
Hide comment
@draeath

draeath Jun 6, 2018

Are there any indications if this will come out of experimental - or that the opt-in will be made more granular?

I would really like to utilize this functionality, but I've no desire to enable any other experimental features.

(I also have no interest in multi-stage builds at this time)

draeath commented Jun 6, 2018

Are there any indications if this will come out of experimental - or that the opt-in will be made more granular?

I would really like to utilize this functionality, but I've no desire to enable any other experimental features.

(I also have no interest in multi-stage builds at this time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment