New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flatten images - merge multiple layers into a single one #332

Closed
unclejack opened this Issue Apr 4, 2013 · 231 comments

Comments

Projects
None yet
@unclejack
Contributor

unclejack commented Apr 4, 2013

There's no way to flatten images right now. When performing a build in multiple steps, a few images can be generated and a larger number of layers is produced. When these are pushed to the registry, a lot of data and a large number of layers have to be downloaded.

There are some cases where one starts with a base image (or another image), changes some large files in one step, changes them again in the next and deletes them in the end. This means those files would be stored in 2 separate layers and deleted by whiteout files in the final image.

These intermediary layers aren't necessarily useful to others or to the final deployment system.

Image flattening should work like this:

  • the history of the build steps needs to be preserved
  • the flattening can be done up to a target image (for example, up to a base image)
  • the flattening should also be allowed to be done completely (as if exporting the image)

@justone justone referenced this issue May 1, 2013

Merged

Builder #472

@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Jun 7, 2013

Contributor

@shykes How would you like this to work? Could you provide an example of how this should work, please?

It looks like AUFS has some limit with 39-41 layers. We should really have image flattening in order to allow commit->run->commit to be used after deployment as well.

Contributor

unclejack commented Jun 7, 2013

@shykes How would you like this to work? Could you provide an example of how this should work, please?

It looks like AUFS has some limit with 39-41 layers. We should really have image flattening in order to allow commit->run->commit to be used after deployment as well.

@bortels

This comment has been minimized.

Show comment
Hide comment
@bortels

bortels Aug 1, 2013

Ping.

My dockerfiles grow as I find neat stuff like this https://gist.github.com/jpetazzo/6127116

and IIRC, each RUN line makes a new level of AUFS, no?

I'm basically ignorant about many things, happy to admit it - if a "docker flatten" isn't coming down the pipe soon, does anyone have a reference for how to do it by hand? Or a reason it can't be?

(I guess I could workaround by moving all of the RUN lines into a single shell script, so it's not vital; but I can't do that with someone else's image. Hmm. Is there a way to "decompile" an image, recreating the Dockerfile used for it (assuming it was done entirely by a Dockerfile, of course).

bortels commented Aug 1, 2013

Ping.

My dockerfiles grow as I find neat stuff like this https://gist.github.com/jpetazzo/6127116

and IIRC, each RUN line makes a new level of AUFS, no?

I'm basically ignorant about many things, happy to admit it - if a "docker flatten" isn't coming down the pipe soon, does anyone have a reference for how to do it by hand? Or a reason it can't be?

(I guess I could workaround by moving all of the RUN lines into a single shell script, so it's not vital; but I can't do that with someone else's image. Hmm. Is there a way to "decompile" an image, recreating the Dockerfile used for it (assuming it was done entirely by a Dockerfile, of course).

@dqminh

This comment has been minimized.

Show comment
Hide comment
@dqminh

dqminh Aug 5, 2013

Contributor

I encountered this recently too when building images. Will something like http://aufs.sourceforge.net/aufs2/shwh/README.txt help here ?

Contributor

dqminh commented Aug 5, 2013

I encountered this recently too when building images. Will something like http://aufs.sourceforge.net/aufs2/shwh/README.txt help here ?

@vieux

This comment has been minimized.

Show comment
Hide comment
@vieux

vieux Aug 5, 2013

Collaborator

I made a small tool to flatten images: https://gist.github.com/vieux/6156567

You have to use full ids, to flatten dhrp/sshd : sudo python flatten.py 2bbfe079a94259b229ae66962d9d06b97fcdce7a5449775ef738bb619ff8ce73

Collaborator

vieux commented Aug 5, 2013

I made a small tool to flatten images: https://gist.github.com/vieux/6156567

You have to use full ids, to flatten dhrp/sshd : sudo python flatten.py 2bbfe079a94259b229ae66962d9d06b97fcdce7a5449775ef738bb619ff8ce73

@mhennings

This comment has been minimized.

Show comment
Hide comment
@mhennings

mhennings Aug 11, 2013

Contributor

+1

i see the need too.
if possible i would like a command that allows both, flatten all and squashing some selected layers.

if a container is flattened we should think about what happends when it is pushed. the registry / index could remove unneeded / duplicated layers, if enough inormation is sent during the push.
like "replaces Xxxxxxxxx, yyyyyyy, zzzzzzz"

Contributor

mhennings commented Aug 11, 2013

+1

i see the need too.
if possible i would like a command that allows both, flatten all and squashing some selected layers.

if a container is flattened we should think about what happends when it is pushed. the registry / index could remove unneeded / duplicated layers, if enough inormation is sent during the push.
like "replaces Xxxxxxxxx, yyyyyyy, zzzzzzz"

@jpetazzo

This comment has been minimized.

Show comment
Hide comment
@jpetazzo

jpetazzo Aug 13, 2013

Contributor

FWIW, the "aubrsync" (in aufs-tools package) might be useful for that,
since it aims at synchronizing and merging AUFS branches.

Contributor

jpetazzo commented Aug 13, 2013

FWIW, the "aubrsync" (in aufs-tools package) might be useful for that,
since it aims at synchronizing and merging AUFS branches.

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Aug 21, 2013

Collaborator

From my answer in a different thread:

Currently the only way to "squash" the image is to create a container from it, export that container into a raw tarball, and re-import that as an image. Unfortunately that will cause all image metadata to be lost, including its history but also ports, env, default command, maintainer info etc. So it's really not great.

There are 2 things we can do to improve the situation:

  1. A short-term solution is to implement a "lossless export" function, which would allow exporting an image to a tarball with all its metadata preserved, so that it can be re-imported on the other side without loss. This would preserve everything except history, because an image config does not currently carry all of its history. We could try to plan this for 0.7 which is scheduled for mid-September. That is, if our 0.7 release manager @vieux decides we have time to fit it in the release :)

  2. A 2nd step would be to add support for history as well. This is a little more work because we need to start storing an image's full history in each image, instead of spreading it out across all the aufs layers. This is planned for 0.8.

Collaborator

shykes commented Aug 21, 2013

From my answer in a different thread:

Currently the only way to "squash" the image is to create a container from it, export that container into a raw tarball, and re-import that as an image. Unfortunately that will cause all image metadata to be lost, including its history but also ports, env, default command, maintainer info etc. So it's really not great.

There are 2 things we can do to improve the situation:

  1. A short-term solution is to implement a "lossless export" function, which would allow exporting an image to a tarball with all its metadata preserved, so that it can be re-imported on the other side without loss. This would preserve everything except history, because an image config does not currently carry all of its history. We could try to plan this for 0.7 which is scheduled for mid-September. That is, if our 0.7 release manager @vieux decides we have time to fit it in the release :)

  2. A 2nd step would be to add support for history as well. This is a little more work because we need to start storing an image's full history in each image, instead of spreading it out across all the aufs layers. This is planned for 0.8.

@ykumar6

This comment has been minimized.

Show comment
Hide comment
@ykumar6

ykumar6 Aug 21, 2013

Hey guys, here's an idea we are prototyping. Let's say an image consists of 4 layers

L1<-L2<-L3<-L4

When we start a container off L4, we make changes in L5. Once the changes are complete, we commit back to get a new image

L1<-L2<-L3<-L4<-L5

At this point, we do a post-commit merge step where we start a new container, L4A from L3. We copy L5 & L4 into L4A and create a new image like this

L1<-L2<-L3-<-L4A

This way, we preserve the impermutable nature of the image but can compress layers when necessary to create new images

ykumar6 commented Aug 21, 2013

Hey guys, here's an idea we are prototyping. Let's say an image consists of 4 layers

L1<-L2<-L3<-L4

When we start a container off L4, we make changes in L5. Once the changes are complete, we commit back to get a new image

L1<-L2<-L3<-L4<-L5

At this point, we do a post-commit merge step where we start a new container, L4A from L3. We copy L5 & L4 into L4A and create a new image like this

L1<-L2<-L3-<-L4A

This way, we preserve the impermutable nature of the image but can compress layers when necessary to create new images

@dqminh

This comment has been minimized.

Show comment
Hide comment
@dqminh

dqminh Aug 22, 2013

Contributor

@shykes @ykumar6 i did some experiments on exporting the image and trying to preserve metadata last night here https://github.com/dqminh/docker-flatten . Would love to know if the approach is reasonable.

What it does is that it will try to compress all image's layers into a tarfile, generate a dockerfile with as much metadata as possible, and create a new image from that.

Contributor

dqminh commented Aug 22, 2013

@shykes @ykumar6 i did some experiments on exporting the image and trying to preserve metadata last night here https://github.com/dqminh/docker-flatten . Would love to know if the approach is reasonable.

What it does is that it will try to compress all image's layers into a tarfile, generate a dockerfile with as much metadata as possible, and create a new image from that.

@jpetazzo

This comment has been minimized.

Show comment
Hide comment
@jpetazzo

jpetazzo Sep 10, 2013

Contributor

Question: do we really want to flatten existing images, or to reduce the number of layers created by a Dockerfile?

If we want to flatten existing images, it could be the job of an external tool, which would download layers, merge them, upload a new image.

If we want to reduce the number of layers, we could have some syntactic sugar in Dockerfiles, meaning "don't commit between those steps because I want to reduce the number of layers or the first steps are creating lots of intermediary files that I clean up later and don't want to includee in my layers".

Contributor

jpetazzo commented Sep 10, 2013

Question: do we really want to flatten existing images, or to reduce the number of layers created by a Dockerfile?

If we want to flatten existing images, it could be the job of an external tool, which would download layers, merge them, upload a new image.

If we want to reduce the number of layers, we could have some syntactic sugar in Dockerfiles, meaning "don't commit between those steps because I want to reduce the number of layers or the first steps are creating lots of intermediary files that I clean up later and don't want to includee in my layers".

@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Sep 10, 2013

Contributor

@jpetazzo Removing commits done between two steps of a Dockerfile would be useful, but we might still want to be able to flatten images. There are some use cases which require "-privileged" to be provided during a run and that's not possible with a Dockerfile, so you have to script a Dockerfile run, some docker run -privileged steps and then commit.
We might also want to craft custom images which have one layer and one single parent layer (a common image such as ubuntu, centos, etc).

Contributor

unclejack commented Sep 10, 2013

@jpetazzo Removing commits done between two steps of a Dockerfile would be useful, but we might still want to be able to flatten images. There are some use cases which require "-privileged" to be provided during a run and that's not possible with a Dockerfile, so you have to script a Dockerfile run, some docker run -privileged steps and then commit.
We might also want to craft custom images which have one layer and one single parent layer (a common image such as ubuntu, centos, etc).

@dkulchenko

This comment has been minimized.

Show comment
Hide comment
@dkulchenko

dkulchenko Sep 10, 2013

@jpetazzo I would say both, as they address separate issues.

Flattening existing images allows you to work around the AUFS branch limit (you can only stack so many images), in the case where you're building on someone else's image, and someone else builds on yours, and your stack ends up hitting the limit pretty quick.

The syntactic sugar in the Dockerfile would allow building docker images that necessitate large toolchains to build and produce a comparatively small result (which I would argue is the more pressing of the two issues). Without it, a 2GB toolchain building a 10MB image will result in a 2058MB image.

dkulchenko commented Sep 10, 2013

@jpetazzo I would say both, as they address separate issues.

Flattening existing images allows you to work around the AUFS branch limit (you can only stack so many images), in the case where you're building on someone else's image, and someone else builds on yours, and your stack ends up hitting the limit pretty quick.

The syntactic sugar in the Dockerfile would allow building docker images that necessitate large toolchains to build and produce a comparatively small result (which I would argue is the more pressing of the two issues). Without it, a 2GB toolchain building a 10MB image will result in a 2058MB image.

@bortels

This comment has been minimized.

Show comment
Hide comment
@bortels

bortels Sep 11, 2013

I second the syntactic sugar - but I'd flip-flop it, in that I do a bunch of stuff (package building), and I really only want to commit the last step.

Maybe simply having an explicit "COMMIT imagename" in the dockerfile? And an implicit one right at the end? (I actually think commit at the end is sufficient - I'm not sure what use I'd have for an intermediate image, where I wouldn't just do it with a seperate dockerfile...)

I'll admit the AUFS limit was floating around in the back of my brain, but being able to flatten an arbitrary dockerfile is perfectly adequate for me there. (Doing so AND keeping history would be even nicer).

bortels commented Sep 11, 2013

I second the syntactic sugar - but I'd flip-flop it, in that I do a bunch of stuff (package building), and I really only want to commit the last step.

Maybe simply having an explicit "COMMIT imagename" in the dockerfile? And an implicit one right at the end? (I actually think commit at the end is sufficient - I'm not sure what use I'd have for an intermediate image, where I wouldn't just do it with a seperate dockerfile...)

I'll admit the AUFS limit was floating around in the back of my brain, but being able to flatten an arbitrary dockerfile is perfectly adequate for me there. (Doing so AND keeping history would be even nicer).

@jeffutter

This comment has been minimized.

Show comment
Hide comment
@jeffutter

jeffutter Sep 11, 2013

I am somewhat fond of @bortels idea. I can see use cases where you would want the intermediate steps when building the dockerfile (incase something fails, like apt-get due to networking). You would want to be able to resume at that step. However it would be nice to say "When this is done" or "When you get to point A" squash the previous layers.

jeffutter commented Sep 11, 2013

I am somewhat fond of @bortels idea. I can see use cases where you would want the intermediate steps when building the dockerfile (incase something fails, like apt-get due to networking). You would want to be able to resume at that step. However it would be nice to say "When this is done" or "When you get to point A" squash the previous layers.

@tomgruner

This comment has been minimized.

Show comment
Hide comment
@tomgruner

tomgruner Sep 25, 2013

An idea and script by Maciej Pasternacki:
http://3ofcoins.net/2013/09/22/flat-docker-images/

Docker looks really exciting, but the limit of 42 layers could cause some issues if a docker needs to be updated over a few years. Flattening every now and then doesn't sound so bad though.

tomgruner commented Sep 25, 2013

An idea and script by Maciej Pasternacki:
http://3ofcoins.net/2013/09/22/flat-docker-images/

Docker looks really exciting, but the limit of 42 layers could cause some issues if a docker needs to be updated over a few years. Flattening every now and then doesn't sound so bad though.

@a7rk6s

This comment has been minimized.

Show comment
Hide comment
@a7rk6s

a7rk6s Sep 26, 2013

When I started using Docker I soon wished for a "graft" command for image maintenance. Something like this:

$ docker graft d093370af24f 715eaaea0588
67deb2aef0e0

$ docker graft d093370af24f none
e4e168807d31

$ docker graft -t repo:8080/ubuntu12 d093370af24f 715eaaea0588
67deb2aef0e0

In other words it would basically change the parent of an image, or make it into a parent-less base image, and then return the new ID (possibly tagging/naming it). Would it be really slow because it'd have to bring both images into existence and compare them?

I like the "COMMIT" idea too. Or better, a "make a flattened image" flag when building, since this is really is more of a build option.

(Confession: I love Docker but the concept of the Dockerfile never clicked with me. Why add extra syntax just to run some shell commands? Why commit intermediate steps? So I've been making containers 100% with shell scripts. It's nice because it forces me to create build/setup scripts for my code, which is useful outside of Docker).

a7rk6s commented Sep 26, 2013

When I started using Docker I soon wished for a "graft" command for image maintenance. Something like this:

$ docker graft d093370af24f 715eaaea0588
67deb2aef0e0

$ docker graft d093370af24f none
e4e168807d31

$ docker graft -t repo:8080/ubuntu12 d093370af24f 715eaaea0588
67deb2aef0e0

In other words it would basically change the parent of an image, or make it into a parent-less base image, and then return the new ID (possibly tagging/naming it). Would it be really slow because it'd have to bring both images into existence and compare them?

I like the "COMMIT" idea too. Or better, a "make a flattened image" flag when building, since this is really is more of a build option.

(Confession: I love Docker but the concept of the Dockerfile never clicked with me. Why add extra syntax just to run some shell commands? Why commit intermediate steps? So I've been making containers 100% with shell scripts. It's nice because it forces me to create build/setup scripts for my code, which is useful outside of Docker).

@jpetazzo

This comment has been minimized.

Show comment
Hide comment
@jpetazzo

jpetazzo Sep 26, 2013

Contributor

Re "why commit intermediate steps": I find it very convenient when I have
longer Dockerfiles; when I modify one line, it only re-executes from that
line, thanks to the caching system; that saves me time+bandwidth+disk
space, since the first steps are usually those big "apt-get install" etc.;
of course, I could do the apt-get install and other big steps in a separate
Dockerfile, then commit that, then start another Dockerfile "FROM" the
previous image; but the Dockerfile caching system makes the whole thing way
easier. At least, to me :-)

On Wed, Sep 25, 2013 at 5:25 PM, a7rk6s notifications@github.com wrote:

When I started using Docker I soon wished for a "graft" command for image
maintenance. Something like this:

$ docker graft d093370af24f 715eaaea0588
67deb2aef0e0

$ docker graft d093370af24f none
e4e168807d31

$ docker graft -t repo:8080/ubuntu12 d093370af24f 715eaaea0588
67deb2aef0e0

In other words it would basically change the parent of an image, or make
it into a parent-less base image, and then return the new ID (possibly
tagging/naming it). Would it be really slow because it'd have to bring both
images into existence and compare them?

I like the "COMMIT" idea too. Or better, a "make a flattened image" flag
when building, since this is really is more of a build option.

(Confession: I love Docker but the concept of the Dockerfile never clicked
with me. Why add extra syntax just to run some shell commands? Why commit
intermediate steps? So I've been making containers 100% with shell scripts.
It's nice because it forces me to create build/setup scripts for my code,
which is useful outside of Docker).


Reply to this email directly or view it on GitHubhttps://github.com/moby/moby/issues/332#issuecomment-25135724
.

@jpetazzo https://twitter.com/jpetazzo
Latest blog post: http://blog.docker.io/2013/09/docker-joyent-openvpn-bliss/

Contributor

jpetazzo commented Sep 26, 2013

Re "why commit intermediate steps": I find it very convenient when I have
longer Dockerfiles; when I modify one line, it only re-executes from that
line, thanks to the caching system; that saves me time+bandwidth+disk
space, since the first steps are usually those big "apt-get install" etc.;
of course, I could do the apt-get install and other big steps in a separate
Dockerfile, then commit that, then start another Dockerfile "FROM" the
previous image; but the Dockerfile caching system makes the whole thing way
easier. At least, to me :-)

On Wed, Sep 25, 2013 at 5:25 PM, a7rk6s notifications@github.com wrote:

When I started using Docker I soon wished for a "graft" command for image
maintenance. Something like this:

$ docker graft d093370af24f 715eaaea0588
67deb2aef0e0

$ docker graft d093370af24f none
e4e168807d31

$ docker graft -t repo:8080/ubuntu12 d093370af24f 715eaaea0588
67deb2aef0e0

In other words it would basically change the parent of an image, or make
it into a parent-less base image, and then return the new ID (possibly
tagging/naming it). Would it be really slow because it'd have to bring both
images into existence and compare them?

I like the "COMMIT" idea too. Or better, a "make a flattened image" flag
when building, since this is really is more of a build option.

(Confession: I love Docker but the concept of the Dockerfile never clicked
with me. Why add extra syntax just to run some shell commands? Why commit
intermediate steps? So I've been making containers 100% with shell scripts.
It's nice because it forces me to create build/setup scripts for my code,
which is useful outside of Docker).


Reply to this email directly or view it on GitHubhttps://github.com/moby/moby/issues/332#issuecomment-25135724
.

@jpetazzo https://twitter.com/jpetazzo
Latest blog post: http://blog.docker.io/2013/09/docker-joyent-openvpn-bliss/

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Sep 26, 2013

Collaborator

On Wed, Sep 25, 2013 at 5:25 PM, a7rk6s notifications@github.com wrote:

When I started using Docker I soon wished for a "graft" command for image
maintenance. Something like this:

$ docker graft d093370af24f 715eaaea0588
67deb2aef0e0

$ docker graft d093370af24f none
e4e168807d31

$ docker graft -t repo:8080/ubuntu12 d093370af24f 715eaaea0588
67deb2aef0e0

In other words it would basically change the parent of an image, or make
it into a parent-less base image, and then return the new ID (possibly
tagging/naming it). Would it be really slow because it'd have to bring both
images into existence and compare them?

I like the "COMMIT" idea too. Or better, a "make a flattened image" flag
when building, since this is really is more of a build option.

This problem will go ahead on its own once each image carries its full
history (currently history is encoded in the chain of aufs layers, which
avoids duplication of data, but means you can't get rid of one without
getting rid of the other, hence the problem we're discussing).

Once that's in place, whether you commit at each build step or only at the
end will be entirely up to you (the person running the build). Depending on
the granularity you want. More granularity = more opportunities to re-use
past build steps and save bandwidth and disk space on upgrades. Less
granularity = you can remove build dependencies from the final image,
export to a single tarball without losing context, etc. I doubt we'll add
any syntax to the Dockerfile to control that.

(Confession: I love Docker but the concept of the Dockerfile never clicked
with me. Why add extra syntax just to run some shell commands? Why commit
intermediate steps? So I've been making containers 100% with shell scripts.
It's nice because it forces me to create build/setup scripts for my code,
which is useful outside of Docker).

That's a common misunderstanding. Dockerfiles are not a replacement for
shell scripts. They provide context for running shell scripts (or any
other kind of script) from a know starting point (hence the FROM keyword)
and a known source code repository (hence the ADD keyword).

Collaborator

shykes commented Sep 26, 2013

On Wed, Sep 25, 2013 at 5:25 PM, a7rk6s notifications@github.com wrote:

When I started using Docker I soon wished for a "graft" command for image
maintenance. Something like this:

$ docker graft d093370af24f 715eaaea0588
67deb2aef0e0

$ docker graft d093370af24f none
e4e168807d31

$ docker graft -t repo:8080/ubuntu12 d093370af24f 715eaaea0588
67deb2aef0e0

In other words it would basically change the parent of an image, or make
it into a parent-less base image, and then return the new ID (possibly
tagging/naming it). Would it be really slow because it'd have to bring both
images into existence and compare them?

I like the "COMMIT" idea too. Or better, a "make a flattened image" flag
when building, since this is really is more of a build option.

This problem will go ahead on its own once each image carries its full
history (currently history is encoded in the chain of aufs layers, which
avoids duplication of data, but means you can't get rid of one without
getting rid of the other, hence the problem we're discussing).

Once that's in place, whether you commit at each build step or only at the
end will be entirely up to you (the person running the build). Depending on
the granularity you want. More granularity = more opportunities to re-use
past build steps and save bandwidth and disk space on upgrades. Less
granularity = you can remove build dependencies from the final image,
export to a single tarball without losing context, etc. I doubt we'll add
any syntax to the Dockerfile to control that.

(Confession: I love Docker but the concept of the Dockerfile never clicked
with me. Why add extra syntax just to run some shell commands? Why commit
intermediate steps? So I've been making containers 100% with shell scripts.
It's nice because it forces me to create build/setup scripts for my code,
which is useful outside of Docker).

That's a common misunderstanding. Dockerfiles are not a replacement for
shell scripts. They provide context for running shell scripts (or any
other kind of script) from a know starting point (hence the FROM keyword)
and a known source code repository (hence the ADD keyword).

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Sep 26, 2013

Collaborator

s/the problem will go ahead/the problem will go away/

On Wed, Sep 25, 2013 at 5:36 PM, Solomon Hykes
solomon.hykes@dotcloud.comwrote:

On Wed, Sep 25, 2013 at 5:25 PM, a7rk6s notifications@github.com wrote:

When I started using Docker I soon wished for a "graft" command for image
maintenance. Something like this:

$ docker graft d093370af24f 715eaaea0588
67deb2aef0e0

$ docker graft d093370af24f none
e4e168807d31

$ docker graft -t repo:8080/ubuntu12 d093370af24f 715eaaea0588
67deb2aef0e0

In other words it would basically change the parent of an image, or make
it into a parent-less base image, and then return the new ID (possibly
tagging/naming it). Would it be really slow because it'd have to bring both
images into existence and compare them?

I like the "COMMIT" idea too. Or better, a "make a flattened image" flag
when building, since this is really is more of a build option.

This problem will go ahead on its own once each image carries its full
history (currently history is encoded in the chain of aufs layers, which
avoids duplication of data, but means you can't get rid of one without
getting rid of the other, hence the problem we're discussing).

Once that's in place, whether you commit at each build step or only at the
end will be entirely up to you (the person running the build). Depending on
the granularity you want. More granularity = more opportunities to re-use
past build steps and save bandwidth and disk space on upgrades. Less
granularity = you can remove build dependencies from the final image,
export to a single tarball without losing context, etc. I doubt we'll add
any syntax to the Dockerfile to control that.

(Confession: I love Docker but the concept of the Dockerfile never
clicked with me. Why add extra syntax just to run some shell commands? Why
commit intermediate steps? So I've been making containers 100% with shell
scripts. It's nice because it forces me to create build/setup scripts for
my code, which is useful outside of Docker).

That's a common misunderstanding. Dockerfiles are not a replacement for
shell scripts. They provide context for running shell scripts (or any
other kind of script) from a know starting point (hence the FROM keyword)
and a known source code repository (hence the ADD keyword).

Collaborator

shykes commented Sep 26, 2013

s/the problem will go ahead/the problem will go away/

On Wed, Sep 25, 2013 at 5:36 PM, Solomon Hykes
solomon.hykes@dotcloud.comwrote:

On Wed, Sep 25, 2013 at 5:25 PM, a7rk6s notifications@github.com wrote:

When I started using Docker I soon wished for a "graft" command for image
maintenance. Something like this:

$ docker graft d093370af24f 715eaaea0588
67deb2aef0e0

$ docker graft d093370af24f none
e4e168807d31

$ docker graft -t repo:8080/ubuntu12 d093370af24f 715eaaea0588
67deb2aef0e0

In other words it would basically change the parent of an image, or make
it into a parent-less base image, and then return the new ID (possibly
tagging/naming it). Would it be really slow because it'd have to bring both
images into existence and compare them?

I like the "COMMIT" idea too. Or better, a "make a flattened image" flag
when building, since this is really is more of a build option.

This problem will go ahead on its own once each image carries its full
history (currently history is encoded in the chain of aufs layers, which
avoids duplication of data, but means you can't get rid of one without
getting rid of the other, hence the problem we're discussing).

Once that's in place, whether you commit at each build step or only at the
end will be entirely up to you (the person running the build). Depending on
the granularity you want. More granularity = more opportunities to re-use
past build steps and save bandwidth and disk space on upgrades. Less
granularity = you can remove build dependencies from the final image,
export to a single tarball without losing context, etc. I doubt we'll add
any syntax to the Dockerfile to control that.

(Confession: I love Docker but the concept of the Dockerfile never
clicked with me. Why add extra syntax just to run some shell commands? Why
commit intermediate steps? So I've been making containers 100% with shell
scripts. It's nice because it forces me to create build/setup scripts for
my code, which is useful outside of Docker).

That's a common misunderstanding. Dockerfiles are not a replacement for
shell scripts. They provide context for running shell scripts (or any
other kind of script) from a know starting point (hence the FROM keyword)
and a known source code repository (hence the ADD keyword).

@a7rk6s

This comment has been minimized.

Show comment
Hide comment
@a7rk6s

a7rk6s Sep 26, 2013

They provide context

Makes sense. Though, the Dockerfiles I've seen in the wild have been all over the place (as are the ones I've created, since I'm still trying to find the best way to lay things out so it's easy to develop / maintain / repurpose chunks to make different images).

once each image carries its full history

Out of curiosity, will it be possible to do, e.g., "apt-get clean" after the image has been built, and end up with less disk space used?

a7rk6s commented Sep 26, 2013

They provide context

Makes sense. Though, the Dockerfiles I've seen in the wild have been all over the place (as are the ones I've created, since I'm still trying to find the best way to lay things out so it's easy to develop / maintain / repurpose chunks to make different images).

once each image carries its full history

Out of curiosity, will it be possible to do, e.g., "apt-get clean" after the image has been built, and end up with less disk space used?

@mattwallington

This comment has been minimized.

Show comment
Hide comment
@mattwallington

mattwallington Dec 3, 2013

I am assuming this didn't make it into .7 as previously mentioned. Any plans for the next release?

mattwallington commented Dec 3, 2013

I am assuming this didn't make it into .7 as previously mentioned. Any plans for the next release?

@vmadman

This comment has been minimized.

Show comment
Hide comment
@vmadman

vmadman Dec 28, 2013

Am I understanding this correctly? An image can only have a maximum of ~40 RUN/ADD statements in its entire lifetime.. including inheritance?

vmadman commented Dec 28, 2013

Am I understanding this correctly? An image can only have a maximum of ~40 RUN/ADD statements in its entire lifetime.. including inheritance?

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Dec 28, 2013

Collaborator

The limit is now 127 layers. That's the hardcoded maximum layers in an aufs mount.

We are working an lifting this by separating commit history from logical history.

On Sat, Dec 28, 2013 at 4:23 AM, Luke Chavers notifications@github.com
wrote:

Am I understanding this correctly? An image can only have a maximum of ~40 RUN/ADD statements in its entire lifetime.. including inheritance?

Reply to this email directly or view it on GitHub:
#332 (comment)

Collaborator

shykes commented Dec 28, 2013

The limit is now 127 layers. That's the hardcoded maximum layers in an aufs mount.

We are working an lifting this by separating commit history from logical history.

On Sat, Dec 28, 2013 at 4:23 AM, Luke Chavers notifications@github.com
wrote:

Am I understanding this correctly? An image can only have a maximum of ~40 RUN/ADD statements in its entire lifetime.. including inheritance?

Reply to this email directly or view it on GitHub:
#332 (comment)

@vmadman

This comment has been minimized.

Show comment
Hide comment
@vmadman

vmadman Dec 29, 2013

Ah... great. Any limit in that regard would be reaallly bad, I think, as it completely eliminates the ability to put anything that updates inside of Docker. i.e. A website deployment..

vmadman commented Dec 29, 2013

Ah... great. Any limit in that regard would be reaallly bad, I think, as it completely eliminates the ability to put anything that updates inside of Docker. i.e. A website deployment..

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Dec 29, 2013

Collaborator

You can easily update your website any number of times without increasing the number of layers. Just re-run "docker build" for each version of the website. Build caching will make sure it's fast and doesn't waste disk space.

On Sat, Dec 28, 2013 at 9:06 PM, Luke Chavers notifications@github.com
wrote:

Ah... great. Any limit in that regard would be reaallly bad I think as it completely eliminates the ability to put anything that updates inside of Docker without losing the benefits of Docker. i.e. A website deployment..

Reply to this email directly or view it on GitHub:
#332 (comment)

Collaborator

shykes commented Dec 29, 2013

You can easily update your website any number of times without increasing the number of layers. Just re-run "docker build" for each version of the website. Build caching will make sure it's fast and doesn't waste disk space.

On Sat, Dec 28, 2013 at 9:06 PM, Luke Chavers notifications@github.com
wrote:

Ah... great. Any limit in that regard would be reaallly bad I think as it completely eliminates the ability to put anything that updates inside of Docker without losing the benefits of Docker. i.e. A website deployment..

Reply to this email directly or view it on GitHub:
#332 (comment)

@monokrome

This comment has been minimized.

Show comment
Hide comment
@monokrome

monokrome Jan 10, 2014

See #3116 for another potential user interface suggestion for solving this same problem.

monokrome commented Jan 10, 2014

See #3116 for another potential user interface suggestion for solving this same problem.

@garo

This comment has been minimized.

Show comment
Hide comment
@garo

garo Jan 21, 2014

I'm fine with Dockerfile build committing each RUN/ADD command into another layer, it makes really fast to develop the Dockerfile. But after the build command completes without error I'd really much like that it would flatten all steps so that the end result would be one image which is added on top of the FROM image, instead of having to ship and push all images in between.

garo commented Jan 21, 2014

I'm fine with Dockerfile build committing each RUN/ADD command into another layer, it makes really fast to develop the Dockerfile. But after the build command completes without error I'd really much like that it would flatten all steps so that the end result would be one image which is added on top of the FROM image, instead of having to ship and push all images in between.

@daviddyball

This comment has been minimized.

Show comment
Hide comment
@daviddyball

daviddyball Jan 23, 2014

@garo, I think it should be optional as to how much you quash the image. That way people could choose to inherit from base-images or not. Choosing which image ID you'd like to compress down to would be of benefit

e.g. Given the following tree:

└─ad18ff9f83df Virtual Size: 484.7 MB Tags: myimage:latest
    └─f45f88e50248 Virtual Size: 552.7 MB
        └─3e5747d65960 Virtual Size: 552.8 MB
            └─8c381ae7a086 Virtual Size: 563.6 MB
                └─13d909f018b8 Virtual Size: 563.6 MB Tags: ubuntu:12.04```

```docker squash [IMAGE] [FROM] [TO]```

```docker squash myimage:latest 13d909f018b8 ad18ff9f83df```

The above example would result in there only being two images at the end, the base image `13d909f018b8` and `ad18ff9f83df` (myimage:latest).

Just an idea. I've not even looked into the way AUFS works, so this is purely an idea from an end-user perspective.

EDIT: Fixing tree formatting

daviddyball commented Jan 23, 2014

@garo, I think it should be optional as to how much you quash the image. That way people could choose to inherit from base-images or not. Choosing which image ID you'd like to compress down to would be of benefit

e.g. Given the following tree:

└─ad18ff9f83df Virtual Size: 484.7 MB Tags: myimage:latest
    └─f45f88e50248 Virtual Size: 552.7 MB
        └─3e5747d65960 Virtual Size: 552.8 MB
            └─8c381ae7a086 Virtual Size: 563.6 MB
                └─13d909f018b8 Virtual Size: 563.6 MB Tags: ubuntu:12.04```

```docker squash [IMAGE] [FROM] [TO]```

```docker squash myimage:latest 13d909f018b8 ad18ff9f83df```

The above example would result in there only being two images at the end, the base image `13d909f018b8` and `ad18ff9f83df` (myimage:latest).

Just an idea. I've not even looked into the way AUFS works, so this is purely an idea from an end-user perspective.

EDIT: Fixing tree formatting
@monokrome

This comment has been minimized.

Show comment
Hide comment
@monokrome

monokrome Jan 24, 2014

@garo @davidrobertwhite Given the syntax that I recommended in #3116, you can easily perform this explicitly by asking for only one COMMIT at the end of the Dockerfile. Without any COMMIT rules, it will use the current solution (one layer per RUN), and if you provide multiple COMMIT rules then you will be explicitly saying "I want these pieces as separate layers".

The benefits of this approach is that it's more forward compatible in some ways. For instance, it's possible that Docker might allow for (assuming it doesn't already) parallelized downloading of layers from the index. If you make everything into one giant layer, then you've effectively reduced the usefulness of such a feature. It's better for those kinds of things to be explicitly requested rather than implicitly. Even in the case without parallel downloads from the index, it's nice to have a few smaller layers than one giant one. That way, if a download fails, you don't have to re-download everything again.

Furthermore, providing this also allows for people to say "This layer updates the system", "This layer is where I installed Java", "This is where I installed the services for this machine" by putting a separate COMMIT rule at each point in the Dockerfile process. The suggestion that makes development more difficult is a bit NIL, because there could be a flag to --commit-all or something similar. This could effectively allow someone to ignore the COMMIT rules. Manually ignoring vs manually requesting in this case is better, because the Dockerfile should represent what it is doing by default unless explicitly requested not to.

TLDR: It's important to have some way of specifically asking for AUFS to commit at specific points, because the number of changes that one RUN command can make is very arbitrary, and squashing everything can cause problems as easily as not allowing a user to squash anything.

monokrome commented Jan 24, 2014

@garo @davidrobertwhite Given the syntax that I recommended in #3116, you can easily perform this explicitly by asking for only one COMMIT at the end of the Dockerfile. Without any COMMIT rules, it will use the current solution (one layer per RUN), and if you provide multiple COMMIT rules then you will be explicitly saying "I want these pieces as separate layers".

The benefits of this approach is that it's more forward compatible in some ways. For instance, it's possible that Docker might allow for (assuming it doesn't already) parallelized downloading of layers from the index. If you make everything into one giant layer, then you've effectively reduced the usefulness of such a feature. It's better for those kinds of things to be explicitly requested rather than implicitly. Even in the case without parallel downloads from the index, it's nice to have a few smaller layers than one giant one. That way, if a download fails, you don't have to re-download everything again.

Furthermore, providing this also allows for people to say "This layer updates the system", "This layer is where I installed Java", "This is where I installed the services for this machine" by putting a separate COMMIT rule at each point in the Dockerfile process. The suggestion that makes development more difficult is a bit NIL, because there could be a flag to --commit-all or something similar. This could effectively allow someone to ignore the COMMIT rules. Manually ignoring vs manually requesting in this case is better, because the Dockerfile should represent what it is doing by default unless explicitly requested not to.

TLDR: It's important to have some way of specifically asking for AUFS to commit at specific points, because the number of changes that one RUN command can make is very arbitrary, and squashing everything can cause problems as easily as not allowing a user to squash anything.

@mattwallington

This comment has been minimized.

Show comment
Hide comment
@mattwallington

mattwallington Jan 24, 2014

What about the case where you aren't using a dockerfile but instead have multiple commits on an image that update the same files. So you have multiple layers of edits of the same file and you need to compact them into a single layer and don't want to keep a history of all of the prior changes/layers?

On Jan 23, 2014, at 4:24 PM, "Brandon R. Stoner" notifications@github.com wrote:

@garo @davidrobertwhite Given the syntax that I recommended in #3116, you can easily perform this explicitly by asking for only one COMMIT at the end of the Dockerfile. Without any COMMIT rules, it will use the current solution (one layer per RUN), and if you provide multiple COMMIT rules then you will be explicitly saying "I want these pieces as separate layers".

The benefits of this approach is that it's more forward compatible in some ways. For instance, it's possible that Docker might allow for (assuming it doesn't already) parallelized downloading of layers from the index. If you make everything into one giant layer, then you've effectively reduced the usefulness of such a feature. It's better for those kinds of things to be explicitly requested rather than implicitly. Even in the case without parallel downloads from the index, it's nice to have a few smaller layers than one giant one. That way, if a download fails, you don't have to re-download everything again.

Furthermore, providing this also allows for people to say "This layer updates the system", "This layer is where I installed Java", "This is where I installed the services for this machine" by putting a separate COMMIT rule at each point in the Dockerfile process. The suggestion that makes development more difficult is a bit NIL, because there could be a flag to --commit-all or something similar. This could effectively allow someone to ignore the COMMIT rules. Manually ignoring vs manually requesting in this case is better, because the Dockerfile should represent what it is doing by default unless explicitly requested not to.

TLDR: It's important to have some way of specifically asking for AUFS to commit at specific points, because the number of changes that one RUN command can make is very arbitrary, and squashing everything can cause problems as easily as not allowing a user to squash anything.


Reply to this email directly or view it on GitHub.

mattwallington commented Jan 24, 2014

What about the case where you aren't using a dockerfile but instead have multiple commits on an image that update the same files. So you have multiple layers of edits of the same file and you need to compact them into a single layer and don't want to keep a history of all of the prior changes/layers?

On Jan 23, 2014, at 4:24 PM, "Brandon R. Stoner" notifications@github.com wrote:

@garo @davidrobertwhite Given the syntax that I recommended in #3116, you can easily perform this explicitly by asking for only one COMMIT at the end of the Dockerfile. Without any COMMIT rules, it will use the current solution (one layer per RUN), and if you provide multiple COMMIT rules then you will be explicitly saying "I want these pieces as separate layers".

The benefits of this approach is that it's more forward compatible in some ways. For instance, it's possible that Docker might allow for (assuming it doesn't already) parallelized downloading of layers from the index. If you make everything into one giant layer, then you've effectively reduced the usefulness of such a feature. It's better for those kinds of things to be explicitly requested rather than implicitly. Even in the case without parallel downloads from the index, it's nice to have a few smaller layers than one giant one. That way, if a download fails, you don't have to re-download everything again.

Furthermore, providing this also allows for people to say "This layer updates the system", "This layer is where I installed Java", "This is where I installed the services for this machine" by putting a separate COMMIT rule at each point in the Dockerfile process. The suggestion that makes development more difficult is a bit NIL, because there could be a flag to --commit-all or something similar. This could effectively allow someone to ignore the COMMIT rules. Manually ignoring vs manually requesting in this case is better, because the Dockerfile should represent what it is doing by default unless explicitly requested not to.

TLDR: It's important to have some way of specifically asking for AUFS to commit at specific points, because the number of changes that one RUN command can make is very arbitrary, and squashing everything can cause problems as easily as not allowing a user to squash anything.


Reply to this email directly or view it on GitHub.

@monokrome

This comment has been minimized.

Show comment
Hide comment
@monokrome

monokrome Jan 30, 2014

@mattwallington For that case, you could have a command-line flag to list specific commits to merge down or something similar. Some possible usages could be:

--commits=layer1,layer5,layer7
--commits=all

Once again, emphasizing explicit sqashing of all instead of using it as a (potentially harmful) default. When not provided, we can assume to keep all commits.

monokrome commented Jan 30, 2014

@mattwallington For that case, you could have a command-line flag to list specific commits to merge down or something similar. Some possible usages could be:

--commits=layer1,layer5,layer7
--commits=all

Once again, emphasizing explicit sqashing of all instead of using it as a (potentially harmful) default. When not provided, we can assume to keep all commits.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Dec 30, 2015

Member

Flattening images / nested builds is still on the radar. Making changes in that area is on hold until the builder is split from the daemon; https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax, which is actively worked on, but requires a lot of refactoring.

Member

thaJeztah commented Dec 30, 2015

Flattening images / nested builds is still on the radar. Making changes in that area is on hold until the builder is split from the daemon; https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax, which is actively worked on, but requires a lot of refactoring.

@TomasTomecek

This comment has been minimized.

Show comment
Hide comment
@TomasTomecek

TomasTomecek Jan 8, 2016

Contributor

@tiborvass wrote:

For relieving the immediate pain, we also suggest updating the documentation on how to manually squash layers if people really need it, with external tools. @crosbymichael volunteered on writing such a tool.

Did @crosbymichael write such tool? Will it be even possible to write such tool after 1.10 is out?

Contributor

TomasTomecek commented Jan 8, 2016

@tiborvass wrote:

For relieving the immediate pain, we also suggest updating the documentation on how to manually squash layers if people really need it, with external tools. @crosbymichael volunteered on writing such a tool.

Did @crosbymichael write such tool? Will it be even possible to write such tool after 1.10 is out?

@bmeng

This comment has been minimized.

Show comment
Hide comment
@bmeng

bmeng commented Jan 19, 2016

@foxx cool!

@foxx

This comment has been minimized.

Show comment
Hide comment
@foxx

foxx Jan 19, 2016

@TomasTomecek The other third party tool was docker-squash, but it's /very/ unstable. The best you can hope for right now is the approach I mentioned earlier.

Sadly the docker export command does not support ranged extraction either, therefore you can only merge from start to X, rather than X to X. This means that if you perform a merge on every build release, you'll have to re-upload the entire image, rather than a small portion of it.

Another option is to use Docker in combination with pip, where you package up your application as a pypi package and then only push new Docker containers when you need to update your system libs/deps (which ideally should be at least once a day, to ensure you're getting security patches). However, this means having a bootstrapper inside your container which is then running on your production boxes, a particularly nasty devops pattern.

You can reduce the impact of slow uploads slightly by using something like quay.io or Amazon ECS, and so long as you're building the containers on a reliable and speedy CI service such as CircleCI or Travis CI, then you can /just about/ achieve "github to production" in ~10 mins. This will require you to use a lot of provisioning optimizations (such as apt-fast) as well as segregated Dockerfile builds so that you can push application updates without having to re-build the system container.

None of this advice changes the fact that Docker is fundamentally flawed, but you at least have some knowledge on how to workaround these problems if you so choose to put Docker into production. As for development usage, you'll have to put up with the slow speeds or move to a different solution such as Vagrant.

I'll be touching on this topic more in my next blog post, see here for previous discussion.

tl;dr - It's unlikely that you'll see any improvement in this area for at least another 18 months, by which time Rocket should have reached production maturity.

foxx commented Jan 19, 2016

@TomasTomecek The other third party tool was docker-squash, but it's /very/ unstable. The best you can hope for right now is the approach I mentioned earlier.

Sadly the docker export command does not support ranged extraction either, therefore you can only merge from start to X, rather than X to X. This means that if you perform a merge on every build release, you'll have to re-upload the entire image, rather than a small portion of it.

Another option is to use Docker in combination with pip, where you package up your application as a pypi package and then only push new Docker containers when you need to update your system libs/deps (which ideally should be at least once a day, to ensure you're getting security patches). However, this means having a bootstrapper inside your container which is then running on your production boxes, a particularly nasty devops pattern.

You can reduce the impact of slow uploads slightly by using something like quay.io or Amazon ECS, and so long as you're building the containers on a reliable and speedy CI service such as CircleCI or Travis CI, then you can /just about/ achieve "github to production" in ~10 mins. This will require you to use a lot of provisioning optimizations (such as apt-fast) as well as segregated Dockerfile builds so that you can push application updates without having to re-build the system container.

None of this advice changes the fact that Docker is fundamentally flawed, but you at least have some knowledge on how to workaround these problems if you so choose to put Docker into production. As for development usage, you'll have to put up with the slow speeds or move to a different solution such as Vagrant.

I'll be touching on this topic more in my next blog post, see here for previous discussion.

tl;dr - It's unlikely that you'll see any improvement in this area for at least another 18 months, by which time Rocket should have reached production maturity.

@TomasTomecek

This comment has been minimized.

Show comment
Hide comment
@TomasTomecek

TomasTomecek Jan 20, 2016

Contributor

@TomasTomecek The other third party tool was docker-squash, but it's /very/ unstable. The best you can hope for right now is the approach I mentioned earlier.

The unstability of docker-squash was the reason we wrote our own tool: https://github.com/goldmann/docker-scripts#squashing and we are using it in production now.

Sadly the docker export command does not support ranged extraction either, therefore you can only merge from start to X, rather than X to X. This means that if you perform a merge on every build release, you'll have to re-upload the entire image, rather than a small portion of it.

This is the exact reason we discarded the solution.

Contributor

TomasTomecek commented Jan 20, 2016

@TomasTomecek The other third party tool was docker-squash, but it's /very/ unstable. The best you can hope for right now is the approach I mentioned earlier.

The unstability of docker-squash was the reason we wrote our own tool: https://github.com/goldmann/docker-scripts#squashing and we are using it in production now.

Sadly the docker export command does not support ranged extraction either, therefore you can only merge from start to X, rather than X to X. This means that if you perform a merge on every build release, you'll have to re-upload the entire image, rather than a small portion of it.

This is the exact reason we discarded the solution.

@foxx

This comment has been minimized.

Show comment
Hide comment
@foxx

foxx Jan 20, 2016

Interesting, your library seems to have a decent amount of tests as well. I'll give this a mention in the upcoming article, as it looks to serve as a decent workaround.

foxx commented Jan 20, 2016

Interesting, your library seems to have a decent amount of tests as well. I'll give this a mention in the upcoming article, as it looks to serve as a decent workaround.

@beorn

This comment has been minimized.

Show comment
Hide comment
@beorn

beorn Jan 31, 2016

I think flattening of layers is something that should be a standard part of Docker, via syntax in the Dockerfile, so that all of the vendor-provided docker images on Docker Hub can use it. It's important to provide small base images, and it'd be a shame if that had to be done through external tools importing/exporting, losing the transparency we have in Dockerfiles describe how images came to being.

Personally, I think a simple syntax that should work is just to allow any Dockerfile command (or any where it makes sense) to be preceded with an AND keyword, a slightly generalized version of the above feature request, e.g.,

WORKDIR /app
AND COPY requirements.txt /app/
AND RUN apt-get update
AND RUN apt-get install some-build-dependencies
# Install app
# (comments are ignored, so following commands are still in the same layer)
AND RUN pip install -f requirements.txt
AND RUN apt-get purge --auto-remove -y some-build-dependencies
AND RUN apt-get autoremove && apt-get clean

# following command not part of above layer
RUN apt-get install

It's easy enough to use # comments to document each layer and what's going on - that's what comments are for anyways.

Also, for when you're troubleshooting builds, it'd probably be good to have a way to tell Docker to ignore the AND keyword and create layers anyways (through a command line option or similar). So you can develop docker images with the full power of the per-line cache, but production/distributed images are not built that way.

beorn commented Jan 31, 2016

I think flattening of layers is something that should be a standard part of Docker, via syntax in the Dockerfile, so that all of the vendor-provided docker images on Docker Hub can use it. It's important to provide small base images, and it'd be a shame if that had to be done through external tools importing/exporting, losing the transparency we have in Dockerfiles describe how images came to being.

Personally, I think a simple syntax that should work is just to allow any Dockerfile command (or any where it makes sense) to be preceded with an AND keyword, a slightly generalized version of the above feature request, e.g.,

WORKDIR /app
AND COPY requirements.txt /app/
AND RUN apt-get update
AND RUN apt-get install some-build-dependencies
# Install app
# (comments are ignored, so following commands are still in the same layer)
AND RUN pip install -f requirements.txt
AND RUN apt-get purge --auto-remove -y some-build-dependencies
AND RUN apt-get autoremove && apt-get clean

# following command not part of above layer
RUN apt-get install

It's easy enough to use # comments to document each layer and what's going on - that's what comments are for anyways.

Also, for when you're troubleshooting builds, it'd probably be good to have a way to tell Docker to ignore the AND keyword and create layers anyways (through a command line option or similar). So you can develop docker images with the full power of the per-line cache, but production/distributed images are not built that way.

@cgrandsjo

This comment has been minimized.

Show comment
Hide comment
@cgrandsjo

cgrandsjo Feb 2, 2016

My opinion is that when working with a Dockerfile the default behaviour should be to only create one additional layer on top of the base image, independent of how many commands the Dockerfile contains.

If you for some reason want to add an extra layer then there should be an ADDLAYER command available that you could insert at any line in the Dockerfile to separate the layers.

Default behaviour, creates one layer on top of the Ubuntu image:

FROM ubuntu
MAINTAINER Me me@host.com
INCLUDE Dockerfile.dependencies
INCLUDE Dockerfile.base
INCLUDE Dockerfile.web

Using ADDLAYER to create two layers on top of the Ubuntu image:

FROM ubuntu
MAINTAINER Me me@host.com
INCLUDE Dockerfile.dependencies
ADDLAYER
INCLUDE Dockerfile.base
INCLUDE Dockerfile.web

cgrandsjo commented Feb 2, 2016

My opinion is that when working with a Dockerfile the default behaviour should be to only create one additional layer on top of the base image, independent of how many commands the Dockerfile contains.

If you for some reason want to add an extra layer then there should be an ADDLAYER command available that you could insert at any line in the Dockerfile to separate the layers.

Default behaviour, creates one layer on top of the Ubuntu image:

FROM ubuntu
MAINTAINER Me me@host.com
INCLUDE Dockerfile.dependencies
INCLUDE Dockerfile.base
INCLUDE Dockerfile.web

Using ADDLAYER to create two layers on top of the Ubuntu image:

FROM ubuntu
MAINTAINER Me me@host.com
INCLUDE Dockerfile.dependencies
ADDLAYER
INCLUDE Dockerfile.base
INCLUDE Dockerfile.web

@justincampbell

This comment has been minimized.

Show comment
Hide comment
@justincampbell

justincampbell Feb 2, 2016

@cgrandsjo That would cause the cache to not be used by default.

justincampbell commented Feb 2, 2016

@cgrandsjo That would cause the cache to not be used by default.

@cgrandsjo

This comment has been minimized.

Show comment
Hide comment
@cgrandsjo

cgrandsjo Feb 2, 2016

@justincampbell: Sorry, please elaborate your answer. Omitting the ADDLAYER command actually means that ADDLAYER is added to the end of the file "silently" and next time you build with the Dockerfile the cache will be used because an additional layer was created.

Update:
Actually I just realized that it depends on how you modify the Dockerfile, whether the cache will be used or not. Maybe the default behaviour should be as it is right now and for those who know what they are doing, there should be a docker build option to "squash" intermediate layers if that is desired.

cgrandsjo commented Feb 2, 2016

@justincampbell: Sorry, please elaborate your answer. Omitting the ADDLAYER command actually means that ADDLAYER is added to the end of the file "silently" and next time you build with the Dockerfile the cache will be used because an additional layer was created.

Update:
Actually I just realized that it depends on how you modify the Dockerfile, whether the cache will be used or not. Maybe the default behaviour should be as it is right now and for those who know what they are doing, there should be a docker build option to "squash" intermediate layers if that is desired.

@mishunika

This comment has been minimized.

Show comment
Hide comment
@mishunika

mishunika Feb 17, 2016

Yay, now I have images of around 30Gigs in size, and their actual size should not be more than 10G!

So yeah, just stepped into the same difficulty, and I was thinking that Dockerfile is really lacking some kind of COMMIT action (Inspired from the DB transactions) to decide where a layer should end.
Then I found this issue and I've read other commit related ideas that are in fact the same. I think that user should be able to specify explicitly what the layers should contain and when they should start/end.

Furthermore, in my opinion, the issue with caching is not a big one though. It can be the same as now, but delimited by the commit/addlayer levels, if something has changed in a such block, then no cache is used at all for this level. And even more, the commit thing can be optional, and the default behavior can be maintained as it is now.

mishunika commented Feb 17, 2016

Yay, now I have images of around 30Gigs in size, and their actual size should not be more than 10G!

So yeah, just stepped into the same difficulty, and I was thinking that Dockerfile is really lacking some kind of COMMIT action (Inspired from the DB transactions) to decide where a layer should end.
Then I found this issue and I've read other commit related ideas that are in fact the same. I think that user should be able to specify explicitly what the layers should contain and when they should start/end.

Furthermore, in my opinion, the issue with caching is not a big one though. It can be the same as now, but delimited by the commit/addlayer levels, if something has changed in a such block, then no cache is used at all for this level. And even more, the commit thing can be optional, and the default behavior can be maintained as it is now.

@foxx

This comment has been minimized.

Show comment
Hide comment
@foxx

foxx Feb 17, 2016

@mishunika See my previous answer, and also @TomasTomecek, for a workaround. Don't bother trying to push this proposal with Docker, it ain't going to happen any time soon (see previous comments from core devs)

foxx commented Feb 17, 2016

@mishunika See my previous answer, and also @TomasTomecek, for a workaround. Don't bother trying to push this proposal with Docker, it ain't going to happen any time soon (see previous comments from core devs)

@campbel

This comment has been minimized.

Show comment
Hide comment
@campbel

campbel Feb 19, 2016

@TomasTomecek is https://github.com/goldmann/docker-scripts#squashing a suitable tool for reducing image size?

For instance given a docker file:

FROM baseimage

RUN apt-get install buildtools
ADD / /src

RUN  buildtools build /src

RUN apt-get remove buildtools
RUN rm -rf /src

After building and squashing, would the resulting image lose the size of the src and buildtools?

campbel commented Feb 19, 2016

@TomasTomecek is https://github.com/goldmann/docker-scripts#squashing a suitable tool for reducing image size?

For instance given a docker file:

FROM baseimage

RUN apt-get install buildtools
ADD / /src

RUN  buildtools build /src

RUN apt-get remove buildtools
RUN rm -rf /src

After building and squashing, would the resulting image lose the size of the src and buildtools?

@goldmann

This comment has been minimized.

Show comment
Hide comment
@goldmann

goldmann Feb 19, 2016

Contributor

@campbel That's correct. This tool will remove unnecesary files. I haven't tested it with ADDing root filesystem (/) to the image (it's generally, a very bad idea), but I understand that this is just an example.

Please note that Docker 1.10 is still in works (see v2 branch). Feel free to open any issues.

Contributor

goldmann commented Feb 19, 2016

@campbel That's correct. This tool will remove unnecesary files. I haven't tested it with ADDing root filesystem (/) to the image (it's generally, a very bad idea), but I understand that this is just an example.

Please note that Docker 1.10 is still in works (see v2 branch). Feel free to open any issues.

@yoshiwaan

This comment has been minimized.

Show comment
Hide comment
@yoshiwaan

yoshiwaan Mar 18, 2016

I think the AND and ADDLAYER options mentioned above are useful for certain situations (such as controlling what to and what not to cache), but if you are chaining builds from images you control and later builds are removing things from the upstream builds then they don't help with the size problem.

Something as simple as a --squash option to docker build which looks through the layers and removes whiteout files and all underlying files in above layers (correct me if I'm wrong but that's my understanding of how it works) would be extremely useful.

It's the same as when you use git really, sometimes you want to rebase, sometimes you want full commit history and sometimes you just want to squash all that noise out of there.

yoshiwaan commented Mar 18, 2016

I think the AND and ADDLAYER options mentioned above are useful for certain situations (such as controlling what to and what not to cache), but if you are chaining builds from images you control and later builds are removing things from the upstream builds then they don't help with the size problem.

Something as simple as a --squash option to docker build which looks through the layers and removes whiteout files and all underlying files in above layers (correct me if I'm wrong but that's my understanding of how it works) would be extremely useful.

It's the same as when you use git really, sometimes you want to rebase, sometimes you want full commit history and sometimes you just want to squash all that noise out of there.

@sivang

This comment has been minimized.

Show comment
Hide comment
@sivang

sivang May 26, 2016

So, is this going to be a feature in docker or already solved in the stable release somehow?

sivang commented May 26, 2016

So, is this going to be a feature in docker or already solved in the stable release somehow?

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 May 26, 2016

Contributor

@sivang Maybe, now that the image format has been changed: #22641.

Please don't spam the PR, though.

Contributor

cpuguy83 commented May 26, 2016

@sivang Maybe, now that the image format has been changed: #22641.

Please don't spam the PR, though.

@foxx

This comment has been minimized.

Show comment
Hide comment
@foxx

foxx May 26, 2016

@sivang I'll be surprised if you see this feature in a release before 2017. If you need a quick fix, read previous suggestions or check out the far more superior option, rkt

foxx commented May 26, 2016

@sivang I'll be surprised if you see this feature in a release before 2017. If you need a quick fix, read previous suggestions or check out the far more superior option, rkt

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 26, 2016

Member

Thanks for the commercial break, @foxx

Member

thaJeztah commented May 26, 2016

Thanks for the commercial break, @foxx

@sivang

This comment has been minimized.

Show comment
Hide comment
@sivang

sivang May 26, 2016

Well, I used jwilder's docker-squash, it seemed to have done the flatten
job but loading the image back doesn't show it on the docker images list...

On Thu, May 26, 2016 at 6:24 PM, Sebastiaan van Stijn <
notifications@github.com> wrote:

Thanks for the commercial break, @foxx https://github.com/foxx


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#332 (comment)

sivang commented May 26, 2016

Well, I used jwilder's docker-squash, it seemed to have done the flatten
job but loading the image back doesn't show it on the docker images list...

On Thu, May 26, 2016 at 6:24 PM, Sebastiaan van Stijn <
notifications@github.com> wrote:

Thanks for the commercial break, @foxx https://github.com/foxx


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#332 (comment)

@JonathonReinhart

This comment has been minimized.

Show comment
Hide comment
@JonathonReinhart

JonathonReinhart Jul 1, 2016

Yet another disappointment from the Docker team; not because some feature doesn't exist, but because of a dismissive attitude by the maintainers.

@tiborvass said (#332 (comment)):

The problem with this issue is that it provides a solution to a problem that yet has to be defined
...
We're closing this issue. Would love to continue the debate on more focused issues.

Perhaps he didn't read the original issue (which was opened over three years ago), which very clearly stated:

There are some cases where one starts with a base image (or another image), changes some large files in one step, changes them again in the next and deletes them in the end. This means those files would be stored in 2 separate layers and deleted by whiteout files in the final image.

These intermediary layers aren't necessarily useful to others or to the final deployment system.

I don't understand what is "yet to be defined" or not focused about that, but in case you need something concrete:

FROM debian

# This line produces an intermediate layer 400 MB in size
ADD local_400MB_tarball_im_about_to_install.tar /tmp

# This line installs some software, and removes the tarball.
# Lets say it produces a layer with 20 MB of binaries
RUN cd /tmp && tar xf local_400MB_tarball_im_about_to_install.tar && cd foo && make install && cd /tmp && rm local_400MB_tarball_im_about_to_install.tar

The end result is that this image is sizeof(debian) + 420MB in size, when 400 MB of it were removed.

Perhaps if issues were addressed instead of dismissed, this project wouldn't have nearly as many issues in its history as it does commits.

JonathonReinhart commented Jul 1, 2016

Yet another disappointment from the Docker team; not because some feature doesn't exist, but because of a dismissive attitude by the maintainers.

@tiborvass said (#332 (comment)):

The problem with this issue is that it provides a solution to a problem that yet has to be defined
...
We're closing this issue. Would love to continue the debate on more focused issues.

Perhaps he didn't read the original issue (which was opened over three years ago), which very clearly stated:

There are some cases where one starts with a base image (or another image), changes some large files in one step, changes them again in the next and deletes them in the end. This means those files would be stored in 2 separate layers and deleted by whiteout files in the final image.

These intermediary layers aren't necessarily useful to others or to the final deployment system.

I don't understand what is "yet to be defined" or not focused about that, but in case you need something concrete:

FROM debian

# This line produces an intermediate layer 400 MB in size
ADD local_400MB_tarball_im_about_to_install.tar /tmp

# This line installs some software, and removes the tarball.
# Lets say it produces a layer with 20 MB of binaries
RUN cd /tmp && tar xf local_400MB_tarball_im_about_to_install.tar && cd foo && make install && cd /tmp && rm local_400MB_tarball_im_about_to_install.tar

The end result is that this image is sizeof(debian) + 420MB in size, when 400 MB of it were removed.

Perhaps if issues were addressed instead of dismissed, this project wouldn't have nearly as many issues in its history as it does commits.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 1, 2016

Contributor

@JonathonReinhart This problem is this issue is discussing a particular solution rather than the problems.
In reality, squashing is a stop-gap to a particular problem that is an implementation detail of the current storage subsystem... ie, we don't need squashing if/when the storage subsystem is replaced with a better solution.

Thank you for your kind and thoughtful comments.

Contributor

cpuguy83 commented Jul 1, 2016

@JonathonReinhart This problem is this issue is discussing a particular solution rather than the problems.
In reality, squashing is a stop-gap to a particular problem that is an implementation detail of the current storage subsystem... ie, we don't need squashing if/when the storage subsystem is replaced with a better solution.

Thank you for your kind and thoughtful comments.

@JonathonReinhart

This comment has been minimized.

Show comment
Hide comment
@JonathonReinhart

JonathonReinhart Jul 1, 2016

@cpuguy83 Sarcasm isn't necessary when someone is expressing frustration.

Regardless of whether or not this is the right solution, people will find this issue when looking for a solution to a very common problem. When you see that the issue is closed, you'll immediately wonder "Why was this closed? Was it fixed?", and when you see that was closed with a message essentially stating, "Sorry, too vauge, try again", that is a good way to frustrate and alienate users.

I am a big supporter of Docker, and advocate many different types of projects to use it. I think that it would greatly help the project if issues like this were handled better. Specifically, I think when @tiborvass closed this issue, it should have been locked (so the "resolution" of the issue didn't get burried in the middle of the page), and included a reference to other issue(s) where the problem(s) could be discussed in the "more focused" fashion he was advocating for.

JonathonReinhart commented Jul 1, 2016

@cpuguy83 Sarcasm isn't necessary when someone is expressing frustration.

Regardless of whether or not this is the right solution, people will find this issue when looking for a solution to a very common problem. When you see that the issue is closed, you'll immediately wonder "Why was this closed? Was it fixed?", and when you see that was closed with a message essentially stating, "Sorry, too vauge, try again", that is a good way to frustrate and alienate users.

I am a big supporter of Docker, and advocate many different types of projects to use it. I think that it would greatly help the project if issues like this were handled better. Specifically, I think when @tiborvass closed this issue, it should have been locked (so the "resolution" of the issue didn't get burried in the middle of the page), and included a reference to other issue(s) where the problem(s) could be discussed in the "more focused" fashion he was advocating for.

@foxx

This comment has been minimized.

Show comment
Hide comment
@foxx

foxx Jul 1, 2016

to a particular problem that is an implementation detail of the current storage subsystem

@cpuguy83 The entire implementation of Docker is fundamentally flawed, and this issue is just one of many such issues. So unless you are planning on rewriting the entire Docker platform from scratch, then flattening images is the best you're going to get.

The problem with this issue is that it provides a solution to a problem that yet has to be defined

@tiborvass I think it's pretty clear what the problem is, don't you?

foxx commented Jul 1, 2016

to a particular problem that is an implementation detail of the current storage subsystem

@cpuguy83 The entire implementation of Docker is fundamentally flawed, and this issue is just one of many such issues. So unless you are planning on rewriting the entire Docker platform from scratch, then flattening images is the best you're going to get.

The problem with this issue is that it provides a solution to a problem that yet has to be defined

@tiborvass I think it's pretty clear what the problem is, don't you?

@monokrome

This comment has been minimized.

Show comment
Hide comment
@monokrome

monokrome Jul 1, 2016

Problem: We have n layers when the results of actions performed in order to create each one only need to be in 1 layer.
Solution: ?!?!?!?!?

monokrome commented Jul 1, 2016

Problem: We have n layers when the results of actions performed in order to create each one only need to be in 1 layer.
Solution: ?!?!?!?!?

@ohjames

This comment has been minimized.

Show comment
Hide comment
@ohjames

ohjames Jul 14, 2016

when you see that was closed with a message essentially stating, "Sorry, too vauge, try again", that is a good way to frustrate and alienate users.

I see hundreds of people defining a very very clear problem... Hundreds of users all in unanimous agreement that docker handles layers in a way that doesn't make sense to them. Yet the people on the inside actually developing it are the only ones who feel that hundreds of community members all agreeing with each other and stating the same thing haven't "defined" themselves.

Even if I did agree that the problem wasn't clearly well defined (and I definitely don't) the way the core developers have responded to the community basically shows contempt. As for the solutions, how docker-squash manages to be so slow and delay our build time for so long on such a tiny set of layers, I don't know... can't wait for rkt.

ohjames commented Jul 14, 2016

when you see that was closed with a message essentially stating, "Sorry, too vauge, try again", that is a good way to frustrate and alienate users.

I see hundreds of people defining a very very clear problem... Hundreds of users all in unanimous agreement that docker handles layers in a way that doesn't make sense to them. Yet the people on the inside actually developing it are the only ones who feel that hundreds of community members all agreeing with each other and stating the same thing haven't "defined" themselves.

Even if I did agree that the problem wasn't clearly well defined (and I definitely don't) the way the core developers have responded to the community basically shows contempt. As for the solutions, how docker-squash manages to be so slow and delay our build time for so long on such a tiny set of layers, I don't know... can't wait for rkt.

@zerthimon

This comment has been minimized.

Show comment
Hide comment
@zerthimon

zerthimon Jul 14, 2016

@ohjames +1
I feel the same thing. I asked for a few features before, and they all were rejected with the following reasons:

  1. It will hurt portability
  2. it will hurt security

When will this project realize, users don't like to be FORCED to have portability and security at the price of productivity.
How about adding the feature users ask for, so USER HAS THE CHOICE and DECIDES FOR HIMSELF if he wants to use it even if it hurts protability and security.

Can't wait for someone fork this project and make it more friendly to the users.

zerthimon commented Jul 14, 2016

@ohjames +1
I feel the same thing. I asked for a few features before, and they all were rejected with the following reasons:

  1. It will hurt portability
  2. it will hurt security

When will this project realize, users don't like to be FORCED to have portability and security at the price of productivity.
How about adding the feature users ask for, so USER HAS THE CHOICE and DECIDES FOR HIMSELF if he wants to use it even if it hurts protability and security.

Can't wait for someone fork this project and make it more friendly to the users.

@justincormack

This comment has been minimized.

Show comment
Hide comment
@justincormack

justincormack Jul 14, 2016

Contributor

There is an open PR for flattening #22641

Contributor

justincormack commented Jul 14, 2016

There is an open PR for flattening #22641

@vdemeester

This comment has been minimized.

Show comment
Hide comment
@vdemeester

vdemeester Jul 14, 2016

Member

It took me a while to decide to answer something here, but I feel I need to pin-point some stuff.

First, as @justincormack there is a PR for flattening (#22641) — thus maybe we could reopen that issue as we are trying to, maybe, have it built-in.

How about adding the feature users ask for, so USER HAS THE CHOICE and DECIDES FOR HIMSELF if he wants to use it even if it hurts protability and security.

I'm gonna quote Nathan Leclaire here (from The Dockerfile is not the source of truth for your image).

The Dockerfile is a tool for creating images, but it is not the only weapon in your arsenal.

Dockerfiles and docker build is only one way to build Docker images. It is the default/built-in one, but you definitely have the choice to build your image with other tooling (and there is some : packer, rocker, dockramp, s2i… to only list a few). One of the focus of Dockerfile is portability and thus this is one of the main concern when discussing features on Dockerfile and docker.

If you don't care about portability or if the Dockerfile possibilities are too limited for your use cases, again, repeating myself, you are free to use other tooling to build images. Docker does not force you to use Dockerfiles to build image — it's just the default, built-in way to do it.

On the Dockerfile and image building subject, I highly recommend people to watch a talk from Gareth Rushgrove at DockerCon16 : The Dockerfile Explosion and the need for higher level tools.

Member

vdemeester commented Jul 14, 2016

It took me a while to decide to answer something here, but I feel I need to pin-point some stuff.

First, as @justincormack there is a PR for flattening (#22641) — thus maybe we could reopen that issue as we are trying to, maybe, have it built-in.

How about adding the feature users ask for, so USER HAS THE CHOICE and DECIDES FOR HIMSELF if he wants to use it even if it hurts protability and security.

I'm gonna quote Nathan Leclaire here (from The Dockerfile is not the source of truth for your image).

The Dockerfile is a tool for creating images, but it is not the only weapon in your arsenal.

Dockerfiles and docker build is only one way to build Docker images. It is the default/built-in one, but you definitely have the choice to build your image with other tooling (and there is some : packer, rocker, dockramp, s2i… to only list a few). One of the focus of Dockerfile is portability and thus this is one of the main concern when discussing features on Dockerfile and docker.

If you don't care about portability or if the Dockerfile possibilities are too limited for your use cases, again, repeating myself, you are free to use other tooling to build images. Docker does not force you to use Dockerfiles to build image — it's just the default, built-in way to do it.

On the Dockerfile and image building subject, I highly recommend people to watch a talk from Gareth Rushgrove at DockerCon16 : The Dockerfile Explosion and the need for higher level tools.

@rvs

This comment has been minimized.

Show comment
Hide comment
@rvs

rvs Jul 14, 2016

@justincormack Justin, what I really would love to see is very similar to #22641 but with a full power of git rebase (especially git rebase --interactive). Do you think it is feasible?

rvs commented Jul 14, 2016

@justincormack Justin, what I really would love to see is very similar to #22641 but with a full power of git rebase (especially git rebase --interactive). Do you think it is feasible?

@docbill

This comment has been minimized.

Show comment
Hide comment
@docbill

docbill Jul 15, 2016

It is important to distinguish what is needed per this request, and what is
desired. What is desired is to refactor docker images into a git or git
like repository view, so all the cool features one does with git would be
possible in docker. For example, I have docker image for plex. It is
basically a fedora base, with a download of the plex build onto. I have
the container set to autobuild on the docker hub. So every time the
Fedora base changes, it rebuilds, even though most of the time the plex
download does not change. What annoys me about that one, is when I pull
the update the layer for adding the plex download is treated as brand new,
even though byte per byte is identical to the original delta. With a git
like repository that could be handled, making a docker pull a much much
more efficient operation.

That is the desired...

However, the ask is simply to have a standard way to flatten an image. So
lets say as part of my build I downloaded the plex source installed the
developer dependencies, compiled it, and then deleted the everything except
the actual build. All the tools for the build would still be layers in
the image even though they would be inaccessible from my final image. It
is a huge waste... If the container could be flattened to remove unneeded
layers it would be much much more efficient use of space and bandwidth.

That is the required...

Don't say you can't do the required, because the desired is too much
work... Just hit the low hanging fruit first, and everyone will be much
happier.

On 14 July 2016 at 19:03, Roman V Shaposhnik notifications@github.com
wrote:

@justincormack https://github.com/justincormack Justin, what I really
would love to see is very similar to #22641
#22641 but with a full power of
git rebase (especially git rebase --interactive). Do you think it is
feasible?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#332 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADBcWBAHV30oG5yogvUifDjqAFINyFsLks5qVsBMgaJpZM4AjRHk
.

docbill commented Jul 15, 2016

It is important to distinguish what is needed per this request, and what is
desired. What is desired is to refactor docker images into a git or git
like repository view, so all the cool features one does with git would be
possible in docker. For example, I have docker image for plex. It is
basically a fedora base, with a download of the plex build onto. I have
the container set to autobuild on the docker hub. So every time the
Fedora base changes, it rebuilds, even though most of the time the plex
download does not change. What annoys me about that one, is when I pull
the update the layer for adding the plex download is treated as brand new,
even though byte per byte is identical to the original delta. With a git
like repository that could be handled, making a docker pull a much much
more efficient operation.

That is the desired...

However, the ask is simply to have a standard way to flatten an image. So
lets say as part of my build I downloaded the plex source installed the
developer dependencies, compiled it, and then deleted the everything except
the actual build. All the tools for the build would still be layers in
the image even though they would be inaccessible from my final image. It
is a huge waste... If the container could be flattened to remove unneeded
layers it would be much much more efficient use of space and bandwidth.

That is the required...

Don't say you can't do the required, because the desired is too much
work... Just hit the low hanging fruit first, and everyone will be much
happier.

On 14 July 2016 at 19:03, Roman V Shaposhnik notifications@github.com
wrote:

@justincormack https://github.com/justincormack Justin, what I really
would love to see is very similar to #22641
#22641 but with a full power of
git rebase (especially git rebase --interactive). Do you think it is
feasible?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#332 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADBcWBAHV30oG5yogvUifDjqAFINyFsLks5qVsBMgaJpZM4AjRHk
.

@jdmarshall

This comment has been minimized.

Show comment
Hide comment
@jdmarshall

jdmarshall Jul 19, 2016

@vdemeester

| The Dockerfile is not the source of truth for your image

I think some of the people asking for features like this one are more comfortable with the truth in this statement than some of the people with 'Docker member' after their names. There are, for instance, a number of security related issues that have been closed-won't-fix with the reason that docker build should be repeatable.

jdmarshall commented Jul 19, 2016

@vdemeester

| The Dockerfile is not the source of truth for your image

I think some of the people asking for features like this one are more comfortable with the truth in this statement than some of the people with 'Docker member' after their names. There are, for instance, a number of security related issues that have been closed-won't-fix with the reason that docker build should be repeatable.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Nov 2, 2016

Contributor

For those interested, we just merged --squash on docker build.
This will squash the final result of the build to it's parent image (ie. the FROM).
#22641

Contributor

cpuguy83 commented Nov 2, 2016

For those interested, we just merged --squash on docker build.
This will squash the final result of the build to it's parent image (ie. the FROM).
#22641

rtyler pushed a commit to rtyler/docker that referenced this issue Feb 23, 2018

Merge pull request #332 from jeanlouisboudart/master
Fix #331 backport  JENKINS_UC_DOWNLOAD feature in install-plugins.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment