New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Nested builds #7115

Closed
shykes opened this Issue Jul 19, 2014 · 57 comments

Comments

Projects
None yet
@shykes
Collaborator

shykes commented Jul 19, 2014

Some images require not just one base image, but the contents of multiple base images to be combined as part of the build process. A common example is an image with an elaborate build environment (base image #1), but a minimal runtime environment (base image #2) on top of which is added the binary output of the build (typically a very small set of binaries and libraries, or even a single static binary). See for example "create lightweight containers with buildroot" and "create the smallest possible container"

1. New Dockerfile keywords: IN and PUBLISH

IN defines a scope in which a subset of a Dockerfile can be executed. The scope is like a new build, nested within the primary build. It is anchored in a directory of the primary build. For example:

PUBLISH changes the path of the filesystem tree to use as the root of the image at the end of the build. The default value is / (eg. "publish the entire filesystem tree"). If it is set to eg. /foo/bar, then the contents of /foo/bar is published as the root filesystem of the image. All filesystem contents outside of that directory are discarded at the end of the build.

FROM ubuntu

RUN apt-get install build-essentials
ADD . /src
RUN cd /src && make

IN /var/build {
    FROM busybox
    EXPOSE 80
    ENTRYPOINT /usr/local/bin/app
}

RUN cp /src/build/app /var/build/usr/local/bin/app

PUBLISH /var/build

Behavior of RUN

When executing a RUN command in an inner build, the runtime uses the inner build directory as the sandbox to execute the command. So for example: IN /foo { touch /hello.txt } will create /foo/hello.txt.

Behavior of ADD

When executing ADD in an inner build, the original source context does not change. In other words, ADD . /dest will always result in the same content being copied, regardless of where in the Dockerfile it is invoked. Note: the destination of the ADD will change in a nested build, since the destination path is scoped to the current inner build.

The outer build can access the inner build

Note that filesystem changes caused by the inner build are visible from the outer build. For example, /usr/local/bin was created by FROM busybox and is therefore accessible to the final RUN command in the build.

Behavior of PUBLISH

Also note that PUBLISH /var/build causes the result of the inner build (the busybox image) to be published. Everything else (including the outer Ubuntu-based build environment) is discarded and not included in the image.

@SvenDowideit

This comment has been minimized.

Show comment
Hide comment
@SvenDowideit

SvenDowideit Jul 21, 2014

Contributor

I asked if we could invert the syntax and achieve the same function - and after lots of IRC discussion I think the answer is not really.

This Proposal has some interesting possible effects that we should list:

  • you can use IN and PUBLISH entirely independently.
  • there may be a third parameter to PUBLISH to give it a subname (perhaps registry/image/subname:tag when you docker build -t registry/image:tag)
  • you could PUBLISH more than once
  • you could overlay more than one IN / {FROM app} to do image mixins - and PUBLISH any dir you like, including leaving it as default

some of these may be bad, some may just need more info in the proposal :)

Contributor

SvenDowideit commented Jul 21, 2014

I asked if we could invert the syntax and achieve the same function - and after lots of IRC discussion I think the answer is not really.

This Proposal has some interesting possible effects that we should list:

  • you can use IN and PUBLISH entirely independently.
  • there may be a third parameter to PUBLISH to give it a subname (perhaps registry/image/subname:tag when you docker build -t registry/image:tag)
  • you could PUBLISH more than once
  • you could overlay more than one IN / {FROM app} to do image mixins - and PUBLISH any dir you like, including leaving it as default

some of these may be bad, some may just need more info in the proposal :)

@timthelion

This comment has been minimized.

Show comment
Hide comment
@timthelion

timthelion Jul 21, 2014

Contributor

Hm, @shykes version makes more technical sense where-as @SvenDowideit's version seems more logical. I'm +1 for @SvenDowideit's version.

Contributor

timthelion commented Jul 21, 2014

Hm, @shykes version makes more technical sense where-as @SvenDowideit's version seems more logical. I'm +1 for @SvenDowideit's version.

@erikh erikh added the Proposal label Jul 21, 2014

@srlochen

This comment has been minimized.

Show comment
Hide comment
@srlochen

srlochen Jul 21, 2014

+1 Having the ability to inject build/test dependencies and discard them at publishing time would simplify a lot for our docker build/release pipelines.

+1 Having the ability to inject build/test dependencies and discard them at publishing time would simplify a lot for our docker build/release pipelines.

@erikh erikh removed the Proposal label Jul 21, 2014

@vmarmol

This comment has been minimized.

Show comment
Hide comment
@vmarmol

vmarmol Jul 21, 2014

Contributor

It would also potentially make the final images much smaller :)
On Jul 21, 2014 2:04 PM, "srlochen" notifications@github.com wrote:

+1 Having the ability to inject build/test dependencies and discard them
at publishing time would simplify a lot for our docker build/release
pipelines.


Reply to this email directly or view it on GitHub
#7115 (comment).

Contributor

vmarmol commented Jul 21, 2014

It would also potentially make the final images much smaller :)
On Jul 21, 2014 2:04 PM, "srlochen" notifications@github.com wrote:

+1 Having the ability to inject build/test dependencies and discard them
at publishing time would simplify a lot for our docker build/release
pipelines.


Reply to this email directly or view it on GitHub
#7115 (comment).

@wyaeld

This comment has been minimized.

Show comment
Hide comment
@wyaeld

wyaeld Jul 21, 2014

Can someone elaborate where/how layer caching would work into either use-case, from the stated goal of trying to minimize overall size, is the inner buildfile cached completely as a separate container, and only the result is added to the parent layer?

The build process is typically the most time consuming, and benefits the most from caching.

wyaeld commented Jul 21, 2014

Can someone elaborate where/how layer caching would work into either use-case, from the stated goal of trying to minimize overall size, is the inner buildfile cached completely as a separate container, and only the result is added to the parent layer?

The build process is typically the most time consuming, and benefits the most from caching.

@proppy

This comment has been minimized.

Show comment
Hide comment
@proppy

proppy Jul 22, 2014

Contributor

I'm not sure if the context needs to be implicitly added/bound in the inner image fs (this could maybe be introduced later and separately from this proposal).

I deleted my earlier syntax change suggestion and created a separate proposal to discuss a more explicit way to bind the context, as per IRC discussion, see #7149.

Contributor

proppy commented Jul 22, 2014

I'm not sure if the context needs to be implicitly added/bound in the inner image fs (this could maybe be introduced later and separately from this proposal).

I deleted my earlier syntax change suggestion and created a separate proposal to discuss a more explicit way to bind the context, as per IRC discussion, see #7149.

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Jul 22, 2014

Collaborator

Guys I ask that you focus on criticizing the proposal instead of pushing completely different proposals in the comments. By all means create a separate issue if you have a proposal of your own!

Thanks.

Collaborator

shykes commented Jul 22, 2014

Guys I ask that you focus on criticizing the proposal instead of pushing completely different proposals in the comments. By all means create a separate issue if you have a proposal of your own!

Thanks.

@proppy

This comment has been minimized.

Show comment
Hide comment
@proppy

proppy Jul 22, 2014

Contributor

@shykes, agreed switching to constructive critism mode.

IN defines a scope in which a subset of a Dockerfile can be executed

Please specify which subset (are ADD and COPY available?)
Also specify what is the context of an inner build (inside IN{}).

Contributor

proppy commented Jul 22, 2014

@shykes, agreed switching to constructive critism mode.

IN defines a scope in which a subset of a Dockerfile can be executed

Please specify which subset (are ADD and COPY available?)
Also specify what is the context of an inner build (inside IN{}).

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Jul 22, 2014

Collaborator

@proppy

Please specify which subset (are ADD and COPY available?)

I didn't mean a subset of available instructions (all instructions should be available), but a subset of the Dockerfile content - in other words, whatever is enclosed in the curly braces. Happy to change the wording to something more clear.

Also specify what is the context of an inner build (inside IN{}).

The source context would be the same in all images. In other words, ADD . /dest will always result in the same content being copied, regardless of where in the Dockerfile it is invoked. Note: the destination of the ADD will change in a nested build, since the destination path is scoped to the current inner build.

Collaborator

shykes commented Jul 22, 2014

@proppy

Please specify which subset (are ADD and COPY available?)

I didn't mean a subset of available instructions (all instructions should be available), but a subset of the Dockerfile content - in other words, whatever is enclosed in the curly braces. Happy to change the wording to something more clear.

Also specify what is the context of an inner build (inside IN{}).

The source context would be the same in all images. In other words, ADD . /dest will always result in the same content being copied, regardless of where in the Dockerfile it is invoked. Note: the destination of the ADD will change in a nested build, since the destination path is scoped to the current inner build.

@proppy

This comment has been minimized.

Show comment
Hide comment
@proppy

proppy Jul 22, 2014

Contributor

@shykes, thanks I suggest adding this to your original proposal description, as those were the first question I had while reading it.

Contributor

proppy commented Jul 22, 2014

@shykes, thanks I suggest adding this to your original proposal description, as those were the first question I had while reading it.

@proppy

This comment has been minimized.

Show comment
Hide comment
@proppy

proppy Jul 22, 2014

Contributor

It is anchored in a directory of the primary build

What happens if a file exists in both the anchored directory and the fs of the base image used in the FROM of the inner build? Does an anchored directory have to be empty or IN will fail? Are multiple IN with the same anchored directory forbidden?

Contributor

proppy commented Jul 22, 2014

It is anchored in a directory of the primary build

What happens if a file exists in both the anchored directory and the fs of the base image used in the FROM of the inner build? Does an anchored directory have to be empty or IN will fail? Are multiple IN with the same anchored directory forbidden?

@fiadliel

This comment has been minimized.

Show comment
Hide comment
@fiadliel

fiadliel Jul 22, 2014

I have a possible use case for nested builds which doesn't seem to be covered (yet) by this proposal.

In some cases, the information written into a Dockerfile is duplicated information from an existing build system, which could have been auto-generated instead.

It would be nice if (optionally) the nested build would look for a Dockerfile at the root of the filesystem for the nested build, at that point in the build process. This means that previous steps could generate the Dockerfile and build context used to create the image.

More concretely, http://www.scala-sbt.org/sbt-native-packager/DetailedTopics/docker.html#tasks shows an example where a build system can create a Dockerfile and context, ready to use with Docker.

One example implementation here could be to look for a second Dockerfile if IN /var/build included no commands to execute.

I have a possible use case for nested builds which doesn't seem to be covered (yet) by this proposal.

In some cases, the information written into a Dockerfile is duplicated information from an existing build system, which could have been auto-generated instead.

It would be nice if (optionally) the nested build would look for a Dockerfile at the root of the filesystem for the nested build, at that point in the build process. This means that previous steps could generate the Dockerfile and build context used to create the image.

More concretely, http://www.scala-sbt.org/sbt-native-packager/DetailedTopics/docker.html#tasks shows an example where a build system can create a Dockerfile and context, ready to use with Docker.

One example implementation here could be to look for a second Dockerfile if IN /var/build included no commands to execute.

@vbatts

This comment has been minimized.

Show comment
Hide comment
@vbatts

vbatts Jul 22, 2014

Contributor

@shykes after looking over this proposal, it satisfies the use-case that #4933 was targeting.

Also, to further this functionality, the path argument to IN ought to expand ENV variables declared in the parent Dockerfile. This way something like $DESTDIR would be a natural flow from build image to runtime image.

Another topic, how will this relationship be tracked with the image metadata stored? will the IN image track the outer image or it FROM as the parent? or will there need to be an additional field for such? or perhaps a noop record layer that indicated where the image came from or what image copied bits into it?

Contributor

vbatts commented Jul 22, 2014

@shykes after looking over this proposal, it satisfies the use-case that #4933 was targeting.

Also, to further this functionality, the path argument to IN ought to expand ENV variables declared in the parent Dockerfile. This way something like $DESTDIR would be a natural flow from build image to runtime image.

Another topic, how will this relationship be tracked with the image metadata stored? will the IN image track the outer image or it FROM as the parent? or will there need to be an additional field for such? or perhaps a noop record layer that indicated where the image came from or what image copied bits into it?

@shykes

This comment has been minimized.

Show comment
Hide comment
@shykes

shykes Jul 22, 2014

Collaborator

@proppy updated

Collaborator

shykes commented Jul 22, 2014

@proppy updated

@SvenDowideit

This comment has been minimized.

Show comment
Hide comment
@SvenDowideit

SvenDowideit Jul 23, 2014

Contributor

@shykes on irc, you mentioned the possibility of having more than one PUBLISH instruction in a single Dockerfile. until the subname functionality lands, can you please define what happens when there are multiple PUBLISH instructions, possibly with different paths, and possibly in different places in the Dockerfile.

Similarly, can you define what happens with multiple IN's

OH - and nesting. can I have an IN inside an IN, and how deep, and can I have a PUBLISH inside an IN - what does that do?

I'm curious how IN will work - will the outer build create a context, upload that to the Daemon to build fresh, then download it and insert the result, or will it happen in the same build, thus possibly have access to the original context?

Can we define what happens when the IN /dir is not empty (1 error, 2 discarded before we enter, 3 the new image starts from there and magically mixes its FROM fs in)

I'm thinking I could use this as a build pipeline for boot2docker, with the final PUBLISHed image containing the docker and boot2docker binaries and the installers - each of which is built IN separate inner sections, and all the working is discarded. (or better, each is PUBLISHed separately). Is that a useful use-case?

Contributor

SvenDowideit commented Jul 23, 2014

@shykes on irc, you mentioned the possibility of having more than one PUBLISH instruction in a single Dockerfile. until the subname functionality lands, can you please define what happens when there are multiple PUBLISH instructions, possibly with different paths, and possibly in different places in the Dockerfile.

Similarly, can you define what happens with multiple IN's

OH - and nesting. can I have an IN inside an IN, and how deep, and can I have a PUBLISH inside an IN - what does that do?

I'm curious how IN will work - will the outer build create a context, upload that to the Daemon to build fresh, then download it and insert the result, or will it happen in the same build, thus possibly have access to the original context?

Can we define what happens when the IN /dir is not empty (1 error, 2 discarded before we enter, 3 the new image starts from there and magically mixes its FROM fs in)

I'm thinking I could use this as a build pipeline for boot2docker, with the final PUBLISHed image containing the docker and boot2docker binaries and the installers - each of which is built IN separate inner sections, and all the working is discarded. (or better, each is PUBLISHed separately). Is that a useful use-case?

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Jul 24, 2014

Contributor

I very much like (and need) this functionality. My main comment is that when I first read the Dockerfile, I didn't understand what was going on. It took me a bit to get it. So a couple comments

  1. I think IN is a bit too abstract of a keyword. What about BUILDIN to indicate your are doing a build in that directory.

  2. If we go with this feature, I think immediately people will want externalize the Dockerfile of the inner build. So a syntax like BUILDIN /var/build Dockerfile, where Dockerfile is interpreted the same as the SRC in a ADD command

  3. PUBLISH directory seems a bit problematic. It seems you should only be able to publish a directory that was first specified by IN. You wouldn't want to allow to PUBLISH any random folder because then the result image would have to be a full cp/tar of the directory. We would loose the image layering (unless there's a clever approach I don't know). I wonder if we can invent a syntax in which the IN context is named. like IN /var/build BINARIES { ... } and then PUBLISH BINARIES. The name should be optional, because people may not always want to publish the inner context.

A final general comment is how are we going to layer the inner context. It seems that with each ADD or RUN command in the outer Dockerfile context you could be modifying the contents of /var/build. So (assuming were bind mounting /var/build) you would need to create a new layer for the parent context and then all inner contexts for every Dockerfile directive. It seems the implementation of this could be messy.

It would be cleaner to implement if we explicitly knew for each Dockerfile invocation if it was going to modify the one of the contexts. For example, the below syntax would be easier to implement IMO, but it is uglier.

FROM ubuntu

RUN apt-get install build-essentials
ADD . /src
RUN cd /src && make

BUILD BINARIES {
    FROM busybox
    EXPOSE 80
    ENTRYPOINT /usr/local/bin/app
}

WITH ["BINARIES:/var/build"] RUN cp /src/build/app /var/build/usr/local/bin/app

PUBLISH BINARIES
Contributor

ibuildthecloud commented Jul 24, 2014

I very much like (and need) this functionality. My main comment is that when I first read the Dockerfile, I didn't understand what was going on. It took me a bit to get it. So a couple comments

  1. I think IN is a bit too abstract of a keyword. What about BUILDIN to indicate your are doing a build in that directory.

  2. If we go with this feature, I think immediately people will want externalize the Dockerfile of the inner build. So a syntax like BUILDIN /var/build Dockerfile, where Dockerfile is interpreted the same as the SRC in a ADD command

  3. PUBLISH directory seems a bit problematic. It seems you should only be able to publish a directory that was first specified by IN. You wouldn't want to allow to PUBLISH any random folder because then the result image would have to be a full cp/tar of the directory. We would loose the image layering (unless there's a clever approach I don't know). I wonder if we can invent a syntax in which the IN context is named. like IN /var/build BINARIES { ... } and then PUBLISH BINARIES. The name should be optional, because people may not always want to publish the inner context.

A final general comment is how are we going to layer the inner context. It seems that with each ADD or RUN command in the outer Dockerfile context you could be modifying the contents of /var/build. So (assuming were bind mounting /var/build) you would need to create a new layer for the parent context and then all inner contexts for every Dockerfile directive. It seems the implementation of this could be messy.

It would be cleaner to implement if we explicitly knew for each Dockerfile invocation if it was going to modify the one of the contexts. For example, the below syntax would be easier to implement IMO, but it is uglier.

FROM ubuntu

RUN apt-get install build-essentials
ADD . /src
RUN cd /src && make

BUILD BINARIES {
    FROM busybox
    EXPOSE 80
    ENTRYPOINT /usr/local/bin/app
}

WITH ["BINARIES:/var/build"] RUN cp /src/build/app /var/build/usr/local/bin/app

PUBLISH BINARIES
@icecrime

This comment has been minimized.

Show comment
Hide comment
@icecrime

icecrime Jul 25, 2014

Contributor

I can see how the proposal elegantly solves the issue of complex build workflows, but don't you fear it'll be misused as a mean to "sum" images? For example in:

FROM busybox
IN /redis/ { FROM redis }
IN /python/ { FROM python }

Perhaps IN and PUBLISH should be merged in a single keyword which does both (run a nested build and publish its result as output of the outer build), which would in effect restrict the feature to a way of defining "build steps" rather than a way of combining images.

Contributor

icecrime commented Jul 25, 2014

I can see how the proposal elegantly solves the issue of complex build workflows, but don't you fear it'll be misused as a mean to "sum" images? For example in:

FROM busybox
IN /redis/ { FROM redis }
IN /python/ { FROM python }

Perhaps IN and PUBLISH should be merged in a single keyword which does both (run a nested build and publish its result as output of the outer build), which would in effect restrict the feature to a way of defining "build steps" rather than a way of combining images.

@tianon

This comment has been minimized.

Show comment
Hide comment
@tianon

tianon Jul 25, 2014

Member

Honestly, I see that use as a cool bonus feature, especially since the two
images are placed neatly in separate directories, although the image size
will likely balloon in that case, but I don't think that's really avoidable
with this feature unless it's implemented very very cleverly (which is
obviously possible :P).

Member

tianon commented Jul 25, 2014

Honestly, I see that use as a cool bonus feature, especially since the two
images are placed neatly in separate directories, although the image size
will likely balloon in that case, but I don't think that's really avoidable
with this feature unless it's implemented very very cleverly (which is
obviously possible :P).

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Jul 26, 2014

Contributor

@tianon i don't think this needs to be implemented by actually coping the contents of the inner build to the outer layer. Instead setup two rootfs directoies for outer and inner context and mount the inner in to the /var/build. This will mean if you don't publish the inner context the resulting image will have none of the contents of the inner context because it was bind mounted.

This approach also mean this feature would not be able to "sum" up a bunch of images (which is not something we want to allow).

Contributor

ibuildthecloud commented Jul 26, 2014

@tianon i don't think this needs to be implemented by actually coping the contents of the inner build to the outer layer. Instead setup two rootfs directoies for outer and inner context and mount the inner in to the /var/build. This will mean if you don't publish the inner context the resulting image will have none of the contents of the inner context because it was bind mounted.

This approach also mean this feature would not be able to "sum" up a bunch of images (which is not something we want to allow).

@SvenDowideit

This comment has been minimized.

Show comment
Hide comment
@SvenDowideit

SvenDowideit Jul 26, 2014

Contributor

just to note - I would like to be able to sum up a bunch of images.

Doing so makes Docker interesting from a 'replacement for packages' perspective.

its basically making a way to turn off (or make a shared space in) the FS namespace.

so @ibuildthecloud @icecrime could you perhaps expand on your opinion - as it doesn't sound like we all have the same fear of doing it :)

Contributor

SvenDowideit commented Jul 26, 2014

just to note - I would like to be able to sum up a bunch of images.

Doing so makes Docker interesting from a 'replacement for packages' perspective.

its basically making a way to turn off (or make a shared space in) the FS namespace.

so @ibuildthecloud @icecrime could you perhaps expand on your opinion - as it doesn't sound like we all have the same fear of doing it :)

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Jul 26, 2014

Contributor

@SvenDowideit I can't say I'm totally opposed to it in general, but it is a separate topic. This proposal is to address the very real issue of separating your build and runtime environments in an elegant way. Anytime a new feature is proposed you must consider how it might be used in some unexpected way and what is that impact.

Allowing one to sum up a bunch of images will fundamentally change the nature of images. As you indicated, you move from an image essentially being a "full OS image" to an image being a "package." If we were to go in this direction we will need to invent new concepts and technology to describe, manage, and create images. At this point in time I don't think it would be helpful to bifurcate the nascent image ecosystem. Instead we should focus on the specific issue at hand and not focus on changing the nature of images.

Contributor

ibuildthecloud commented Jul 26, 2014

@SvenDowideit I can't say I'm totally opposed to it in general, but it is a separate topic. This proposal is to address the very real issue of separating your build and runtime environments in an elegant way. Anytime a new feature is proposed you must consider how it might be used in some unexpected way and what is that impact.

Allowing one to sum up a bunch of images will fundamentally change the nature of images. As you indicated, you move from an image essentially being a "full OS image" to an image being a "package." If we were to go in this direction we will need to invent new concepts and technology to describe, manage, and create images. At this point in time I don't think it would be helpful to bifurcate the nascent image ecosystem. Instead we should focus on the specific issue at hand and not focus on changing the nature of images.

@icecrime

This comment has been minimized.

Show comment
Hide comment
@icecrime

icecrime Jul 26, 2014

Contributor

@SvenDowideit Don't give my opinion too much credit, I'm a beginner with Docker ;-) TBH I'm not sure I understand how the 'replacement for packages' perspective relates to images combination.

I just have the impression that "how can I get both X and Y in my Docker image" is a recurring beginner question (that I've been asking myself): there's no easy way to do this today, which is probably a good thing as it encourages the "one process for one container" approach.

To sum up: using IN without PUBLISH as in my previous comment seems to me like providing an accessible way to do a discouraged thing (both technically by resulting in a bloated image, and functionally by facilitating multiple-responsibilities container). Thus my question: should we be able to use them independently?

Contributor

icecrime commented Jul 26, 2014

@SvenDowideit Don't give my opinion too much credit, I'm a beginner with Docker ;-) TBH I'm not sure I understand how the 'replacement for packages' perspective relates to images combination.

I just have the impression that "how can I get both X and Y in my Docker image" is a recurring beginner question (that I've been asking myself): there's no easy way to do this today, which is probably a good thing as it encourages the "one process for one container" approach.

To sum up: using IN without PUBLISH as in my previous comment seems to me like providing an accessible way to do a discouraged thing (both technically by resulting in a bloated image, and functionally by facilitating multiple-responsibilities container). Thus my question: should we be able to use them independently?

@proppy

This comment has been minimized.

Show comment
Hide comment
@proppy

proppy Jul 28, 2014

Contributor

What makes me inconfortable with the proposal in its current form is the tight coupling between the inner instructions and the outer ones.

In the example of the description, the outer RUN cp has to know about .../usr/local/bin to match the inner ENTRYPOINT /usr/local/bin/....

And the inner instructions don't need to ADD the binary, unlike a regular Dockerfile used with a binary context.

This create a model where the set of inner Dockerfile instructions and outer ones are unlikely to be composable across images, even more so if this is combined later with something like INCLUDE: /me imagine Dockerfiles only suitable for usage in IN block.

With the existing build model this is nicely abstracted by the context notion, and some docker users already compose builds today, by chaining multiple docker build with external scripts: where the output of the previous build is passed as the context of the next one.

Maybe the description could expand a little more on the methods used today, and which tradeoffs (if any) the proposal has to make to simplify and improve them.

Contributor

proppy commented Jul 28, 2014

What makes me inconfortable with the proposal in its current form is the tight coupling between the inner instructions and the outer ones.

In the example of the description, the outer RUN cp has to know about .../usr/local/bin to match the inner ENTRYPOINT /usr/local/bin/....

And the inner instructions don't need to ADD the binary, unlike a regular Dockerfile used with a binary context.

This create a model where the set of inner Dockerfile instructions and outer ones are unlikely to be composable across images, even more so if this is combined later with something like INCLUDE: /me imagine Dockerfiles only suitable for usage in IN block.

With the existing build model this is nicely abstracted by the context notion, and some docker users already compose builds today, by chaining multiple docker build with external scripts: where the output of the previous build is passed as the context of the next one.

Maybe the description could expand a little more on the methods used today, and which tradeoffs (if any) the proposal has to make to simplify and improve them.

@aigarius

This comment has been minimized.

Show comment
Hide comment
@aigarius

aigarius Jul 29, 2014

This has a quite heavy syntax in the file. I would prefer the combination of #7277 with #6906 (comment) to solve such issues.

It does miss a few nice things from this issue , namely the ability to build in one environment (such as full Debian) and then run in something completely different (like busybox) or to combine multiple outputs, but the syntax is much simpler. And #7277 could actually be merged with this ticket in a way to allow the use of separate images and separate Dockerfiles to define the sub-images.

So the example here could be reformulated as:
INCLUDE Dockefile.runtime IN /var/build
or even possible as:
INCLUDE Dockerfile.runtime IN /var/build AS runtime
thus removing the need for the PUBLISH directive altogether.

This has a quite heavy syntax in the file. I would prefer the combination of #7277 with #6906 (comment) to solve such issues.

It does miss a few nice things from this issue , namely the ability to build in one environment (such as full Debian) and then run in something completely different (like busybox) or to combine multiple outputs, but the syntax is much simpler. And #7277 could actually be merged with this ticket in a way to allow the use of separate images and separate Dockerfiles to define the sub-images.

So the example here could be reformulated as:
INCLUDE Dockefile.runtime IN /var/build
or even possible as:
INCLUDE Dockerfile.runtime IN /var/build AS runtime
thus removing the need for the PUBLISH directive altogether.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Sep 10, 2014

Contributor

Anyone every attempt at implementing one of these?

Contributor

rhatdan commented Sep 10, 2014

Anyone every attempt at implementing one of these?

@chancez

This comment has been minimized.

Show comment
Hide comment
@chancez

chancez Sep 10, 2014

Contributor

This would be an amazing thing to have. +1

Contributor

chancez commented Sep 10, 2014

This would be an amazing thing to have. +1

@tonistiigi

This comment has been minimized.

Show comment
Hide comment
@tonistiigi

tonistiigi Sep 13, 2014

Member

To me, the syntax/behavior proposed by @proppy in #7149 makes much more sense.

I (like many others here) have trouble understanding how the layering/caching would work according to this proposal. I assume inner build gets its own layers because otherwise the downloaded image size would still be huge. Are the contents of the inner layers then also copied to the outer layers? Or is it possible that the same layers are used by multiple images in different mountpoints.

Even if only a subdirectory is published, the outside layers still have to be kept around for the caching to work. I don't see a requirement that published directory has to be used in a IN block before, but then how does to builder know to ignore the contents outside of this directory in the parent layers preceding the PUBLISH step?

I think that the ability to combine different Docker images into one as suggested by @SvenDowideit is not related to the original smaller build problem and would have a better solution with an update to ADD/COPY command.

Member

tonistiigi commented Sep 13, 2014

To me, the syntax/behavior proposed by @proppy in #7149 makes much more sense.

I (like many others here) have trouble understanding how the layering/caching would work according to this proposal. I assume inner build gets its own layers because otherwise the downloaded image size would still be huge. Are the contents of the inner layers then also copied to the outer layers? Or is it possible that the same layers are used by multiple images in different mountpoints.

Even if only a subdirectory is published, the outside layers still have to be kept around for the caching to work. I don't see a requirement that published directory has to be used in a IN block before, but then how does to builder know to ignore the contents outside of this directory in the parent layers preceding the PUBLISH step?

I think that the ability to combine different Docker images into one as suggested by @SvenDowideit is not related to the original smaller build problem and would have a better solution with an update to ADD/COPY command.

@muayyad-alsadi

This comment has been minimized.

Show comment
Hide comment
@muayyad-alsadi

muayyad-alsadi Aug 5, 2015

Contributor

I find IN ... PUBLISH ... to be redundant compared to my proposal #15271
which look like this SWITCH_ROOT <other_image> <new_root> ...
where other_image can be scratch or busybox
and new_root is the directory to be copied from the previous process before the reset (which typically would be the target/destination of the previous build processes)

since #13171 is merged (we can cp between containers), the process would be cp <new_root> from the build container to some local/host tmp then dump and start over with the other_image (scratch or busybox in the example) as if it's a new Dockerfile with new FROM the only thing that is supposed to be inherited is the maintainer.

in python zen they "say flat is better than nested". there is no need for two instruction that marks two positions.

Contributor

muayyad-alsadi commented Aug 5, 2015

I find IN ... PUBLISH ... to be redundant compared to my proposal #15271
which look like this SWITCH_ROOT <other_image> <new_root> ...
where other_image can be scratch or busybox
and new_root is the directory to be copied from the previous process before the reset (which typically would be the target/destination of the previous build processes)

since #13171 is merged (we can cp between containers), the process would be cp <new_root> from the build container to some local/host tmp then dump and start over with the other_image (scratch or busybox in the example) as if it's a new Dockerfile with new FROM the only thing that is supposed to be inherited is the maintainer.

in python zen they "say flat is better than nested". there is no need for two instruction that marks two positions.

@alunduil

This comment has been minimized.

Show comment
Hide comment
@alunduil

alunduil Aug 20, 2015

+1 for @muayyad-alsadi 's idea. It's far better to have a single transition point than a checkpoint/release unless there are other use cases that I missed while skimming this issue.

+1 for @muayyad-alsadi 's idea. It's far better to have a single transition point than a checkpoint/release unless there are other use cases that I missed while skimming this issue.

@muayyad-alsadi

This comment has been minimized.

Show comment
Hide comment
@muayyad-alsadi

muayyad-alsadi Aug 21, 2015

Contributor

@alunduil @jlhawn the only use case (I can think of) that my proposal won't cover is having a common code base the takes forever to build then we want to extract more than one image from the same codebase. think of it as libreoffice package, you build one package that takes hours to build then you get multiple sub-packages like write and impress ..etc.

a real world example would be having both server and client coming from same source package and you want to build an image for the server and another one for the client.

I do have a idea. My vision to Dockerfile is like Spec-file in RPM world, just like dockerfile but instead of building container images, it builds a binary package. RPM has the feature of subpackages it's syntax is like this

Name: foobar
%files server
# foobar-server goes here
%files client
# foobar-client goes here
%files -n python-foobar
# python-foobar goes here (not foobar-python-foobar)

So we need to build sub-images that is auto-prefixed with the tag like those
we have only one build root and multiple published images

# the same tag passed to docker build
SWITCH_ROOT <other_image> <new_root> ...
# add -<tag-suffix> to the tag up to :
SWITCH_ROOT_N_PUBLISH <tag-suffix> <other_image> <new_root> ...

for example docker build -t foobar:2.5 would result in foobar:2.5, foobar-monitor:2.5 and foobar-client:2.5. I have concerns about having a syntax for a full tag because one building image for eggs would end with image for spam

Contributor

muayyad-alsadi commented Aug 21, 2015

@alunduil @jlhawn the only use case (I can think of) that my proposal won't cover is having a common code base the takes forever to build then we want to extract more than one image from the same codebase. think of it as libreoffice package, you build one package that takes hours to build then you get multiple sub-packages like write and impress ..etc.

a real world example would be having both server and client coming from same source package and you want to build an image for the server and another one for the client.

I do have a idea. My vision to Dockerfile is like Spec-file in RPM world, just like dockerfile but instead of building container images, it builds a binary package. RPM has the feature of subpackages it's syntax is like this

Name: foobar
%files server
# foobar-server goes here
%files client
# foobar-client goes here
%files -n python-foobar
# python-foobar goes here (not foobar-python-foobar)

So we need to build sub-images that is auto-prefixed with the tag like those
we have only one build root and multiple published images

# the same tag passed to docker build
SWITCH_ROOT <other_image> <new_root> ...
# add -<tag-suffix> to the tag up to :
SWITCH_ROOT_N_PUBLISH <tag-suffix> <other_image> <new_root> ...

for example docker build -t foobar:2.5 would result in foobar:2.5, foobar-monitor:2.5 and foobar-client:2.5. I have concerns about having a syntax for a full tag because one building image for eggs would end with image for spam

@alunduil

This comment has been minimized.

Show comment
Hide comment
@alunduil

alunduil Aug 21, 2015

I'm very biased (my opinion is that spec is a horrendous format and shouldn't be emulated). I personally don't need or want the publish multiple images from one build but I do see the utility of it. I really like the idea of two commands (one for just doing a shear and one for shear with multiple images). This lets me keep my Dockerfile nice and simple while accomplishing the goal (out of published image builds) while having the flexibility to do more if need be.

I'm very biased (my opinion is that spec is a horrendous format and shouldn't be emulated). I personally don't need or want the publish multiple images from one build but I do see the utility of it. I really like the idea of two commands (one for just doing a shear and one for shear with multiple images). This lets me keep my Dockerfile nice and simple while accomplishing the goal (out of published image builds) while having the flexibility to do more if need be.

@jessfraz jessfraz removed the kind/proposal label Sep 8, 2015

@jimmycuadra

This comment has been minimized.

Show comment
Hide comment
@jimmycuadra

jimmycuadra Oct 4, 2015

Contributor

Any updates on this? Not being able to easily separate the build environment from the final image is one of the biggest pain points in Docker for me. (And yes, I know about https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax.)

Contributor

jimmycuadra commented Oct 4, 2015

Any updates on this? Not being able to easily separate the build environment from the final image is one of the biggest pain points in Docker for me. (And yes, I know about https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax.)

@jakirkham

This comment has been minimized.

Show comment
Hide comment
@jakirkham

jakirkham Oct 14, 2015

Also, would be really interested in seeing something like this or some variant. From my understanding, it would be very helpful for testing a layer without including artifacts from testing in the final tagged commit.

Also, would be really interested in seeing something like this or some variant. From my understanding, it would be very helpful for testing a layer without including artifacts from testing in the final tagged commit.

@netroby

This comment has been minimized.

Show comment
Hide comment
@netroby

netroby Nov 6, 2015

+1 , would like this feature. really useful.

netroby commented Nov 6, 2015

+1 , would like this feature. really useful.

@sleaze

This comment has been minimized.

Show comment
Hide comment
@sleaze

sleaze Mar 7, 2016

+1 for docker multiple inheritance functionality

sleaze commented Mar 7, 2016

+1 for docker multiple inheritance functionality

@koliyo

This comment has been minimized.

Show comment
Hide comment

koliyo commented Apr 1, 2016

👍

@ionelmc

This comment has been minimized.

Show comment
Hide comment
@ionelmc

ionelmc Apr 14, 2016

Does this https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax mean this proposal won't be implemented any time soon? (if ever)

ionelmc commented Apr 14, 2016

Does this https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax mean this proposal won't be implemented any time soon? (if ever)

@srikanthNutigattu

This comment has been minimized.

Show comment
Hide comment
@srikanthNutigattu

srikanthNutigattu May 5, 2016

The proposal need to split so that each can be discussed and closed independently.

The proposal need to split so that each can be discussed and closed independently.

@mercuriete

This comment has been minimized.

Show comment
Hide comment
@mercuriete

mercuriete Jul 26, 2016

👍
Another use case...
Maven image to build java artifacts (jar)
then put this artifacts inside a smaller runtime image (java jre)

👍
Another use case...
Maven image to build java artifacts (jar)
then put this artifacts inside a smaller runtime image (java jre)

@hiroshi

This comment has been minimized.

Show comment
Hide comment
@hiroshi

hiroshi Oct 16, 2016

Hi, I'm working on a small tool. It can build small docker image with multiple steps. Some may feel it useful.

hiroshi commented Oct 16, 2016

Hi, I'm working on a small tool. It can build small docker image with multiple steps. Some may feel it useful.

@graingert

This comment has been minimized.

Show comment
Hide comment
Contributor

graingert commented Oct 18, 2016

@fletcher91

This comment has been minimized.

Show comment
Hide comment
@fletcher91

fletcher91 Dec 1, 2016

Since we're posting utilities, I've built one to build minimal Golang images in two steps based on the scratch image

Since we're posting utilities, I've built one to build minimal Golang images in two steps based on the scratch image

@xenoterracide

This comment has been minimized.

Show comment
Hide comment
@xenoterracide

xenoterracide Dec 23, 2016

I think multiple inheritance is a bad idea, see diamond problem, but composable traits a good one, I wrote on the multiple inheritance ticket how I think it could be accomplished safely syntactically.

that said glancing at this or the issue that I'm interested in is a syntactic sugar around temporary build layers for multiple && commands

for example this nasty piece of code

# oracle hackery that lies to it for it's bad installer
RUN mv /usr/bin/free /usr/bin/free.bak \
    && printf "#!/bin/sh\necho Swap - - 2048" > /usr/bin/free \
    && chmod +x /usr/bin/free \
    && mv /sbin/sysctl /sbin/sysctl.bak \
    && printf "#!/bin/sh" > /sbin/sysctl \
    && chmod +x /sbin/sysctl \
    && rpm --install /tmp/oracle-xe-$VERSION-1.0.x86_64.rpm \
    && rm /tmp/oracle-xe-$VERSION-1.0.x86_64.rpm* \
    && mv /usr/bin/free.bak /usr/bin/free \
    && mv /sbin/sysctl.bak /sbin/sysctl

the rpm command is actually expensive and takes a while when doing build, so if if something fails after it (while developing the image) I have to do the whole thing again. What'd be nice is a way to denote layers that are to be flattened, in the final build.

RUN mv ... 
FLT curl
FLT tar 
FLT rm tar

or something like that, where if say the tar failed (because I typoed the path) I wouldn't necessarily have to run the curl again, while developing the file. In the final image these would just look like one layer.

I think multiple inheritance is a bad idea, see diamond problem, but composable traits a good one, I wrote on the multiple inheritance ticket how I think it could be accomplished safely syntactically.

that said glancing at this or the issue that I'm interested in is a syntactic sugar around temporary build layers for multiple && commands

for example this nasty piece of code

# oracle hackery that lies to it for it's bad installer
RUN mv /usr/bin/free /usr/bin/free.bak \
    && printf "#!/bin/sh\necho Swap - - 2048" > /usr/bin/free \
    && chmod +x /usr/bin/free \
    && mv /sbin/sysctl /sbin/sysctl.bak \
    && printf "#!/bin/sh" > /sbin/sysctl \
    && chmod +x /sbin/sysctl \
    && rpm --install /tmp/oracle-xe-$VERSION-1.0.x86_64.rpm \
    && rm /tmp/oracle-xe-$VERSION-1.0.x86_64.rpm* \
    && mv /usr/bin/free.bak /usr/bin/free \
    && mv /sbin/sysctl.bak /sbin/sysctl

the rpm command is actually expensive and takes a while when doing build, so if if something fails after it (while developing the image) I have to do the whole thing again. What'd be nice is a way to denote layers that are to be flattened, in the final build.

RUN mv ... 
FLT curl
FLT tar 
FLT rm tar

or something like that, where if say the tar failed (because I typoed the path) I wouldn't necessarily have to run the curl again, while developing the file. In the final image these would just look like one layer.

@AkihiroSuda

This comment has been minimized.

Show comment
Hide comment
@AkihiroSuda

AkihiroSuda Apr 4, 2017

Member

Given that we have multistage build now, can we update the status of this "roadmap" issue?
cc @tonistiigi
#32063 #31257

Member

AkihiroSuda commented Apr 4, 2017

Given that we have multistage build now, can we update the status of this "roadmap" issue?
cc @tonistiigi
#32063 #31257

@tonistiigi

This comment has been minimized.

Show comment
Hide comment
@tonistiigi

tonistiigi Apr 10, 2017

Member

Thanks for the ping @AkihiroSuda . Let's close this as #32063 that addresses this problem is merged.

Member

tonistiigi commented Apr 10, 2017

Thanks for the ping @AkihiroSuda . Let's close this as #32063 that addresses this problem is merged.

@mercuriete

This comment has been minimized.

Show comment
Hide comment
@mercuriete

mercuriete Apr 28, 2017

thanks you very much
@tonistiigi
I was waiting for this sooo long
👍

thanks you very much
@tonistiigi
I was waiting for this sooo long
👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment