Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I combine several images into one via Dockerfile #3378

Closed
anentropic opened this issue Dec 29, 2013 · 99 comments
Closed

How do I combine several images into one via Dockerfile #3378

anentropic opened this issue Dec 29, 2013 · 99 comments

Comments

@anentropic
Copy link

@anentropic anentropic commented Dec 29, 2013

I have several Dockerfiles to build images which eg setup a postgresql client, set up a generic python app environment

I want to make a Dockerfile for my python webapp which combines both those images and then runs some more commands

If I understood the docs correctly, if I use FROM a second time I start creating a new image instead of adding to the current one?

@SvenDowideit
Copy link
Contributor

@SvenDowideit SvenDowideit commented Dec 29, 2013

you Chain them :)

so for example, if you have one Dockerfile that sets up your generic postgres client and generic python app env, you tag the result of that build (eg mygenericenv), and then your subsequent Dockerfiles use FROM mygenericenv.

for eg

## Dockerfile.genericwebapp might have FROM ubuntu
cat Dockerfile.genericwebapp | docker build -t genericwebapp -
## Dockerfile.genericpython-web would have FROM genericwebapp
cat Dockerfile.genericpython-web | docker build -t genericpython-web -
## and then this specific app i'm testing might have a docker file that containers FROM genericpython-web
docker build -t thisapp .

@anentropic
Copy link
Author

@anentropic anentropic commented Dec 29, 2013

I can see how to do that, i.e. genericA --> specificA but is there any way to do something like:

genericA --
            \
             ---> specificAB
            /
genericB --

?

@tianon
Copy link
Member

@tianon tianon commented Dec 29, 2013

Not through any official means, but some people have had luck manually modifying the image hierarchy to achieve this (but if you do this, you do so at your own risk, and you get to keep all the pieces).

The reason this won't be supported officially is because imagine I want to take "ubuntu" and graft "centos" on top. There will be lots of really fun conflicts causing a support nightmare, so if you want to do things like that, you're on your own.

@anentropic
Copy link
Author

@anentropic anentropic commented Dec 29, 2013

Ok I see why. I was looking for composable blocks of functionality but maybe this isn't the Docker use case... seems like I should be using it to set up the raw containers then run something like ansible or saltstack on top to configure the software in them.

@shykes
Copy link
Contributor

@shykes shykes commented Dec 30, 2013

The idea behind containers is that the smallest unit of real composition is the container. That is, a container is the smallest thing you can produce in advance, not knowing what else it will be combined with, and have strong guarantees of how it will behave and interact with other components.

Therefore, any unit smaller than a container - be it a ruby or shell script, a c++ source tree, a binary on its own, a set of configuration files, a system package, etc. - cannot be safely composed, because it will behave very differently depending on its build dependencies, runtime dependencies, and what other components are part of the composition.

That reality can be partially masked by brute force. Such brute force can be pragmatic and "good enough" (giant Makefile which auto-detects everything for a more portable build of your app) or overly grandiose ("let's model in advance every possible permutation of every dependency and interference between components, and express them in a high-level abstraction!")

When you rely on Ansible, Chef or any other configuration management to create "composable components" you are relying on a leaky abstraction: these components are not, in fact, composable. From one system to the next they will produce builds which behave differently in a million ways. All the extra abstraction in the end will buy you very little.

My advice is to focus on 2 things: 1) the source code, and 2) the runnable container. These are the only 2 reliable points of composition.

On Sun, Dec 29, 2013 at 1:46 PM, anentropic notifications@github.com
wrote:

Ok I see why. I was looking for composable blocks of functionality but maybe this isn't the Docker use case... seems like I should be using it to set up the raw containers then run something like ansible or saltstack on top to configure the software in them.

Reply to this email directly or view it on GitHub:
#3378 (comment)

@anentropic
Copy link
Author

@anentropic anentropic commented Dec 30, 2013

Thanks for giving more perspective.

So you're saying that for reusing parts of Dockerfiles the only tool available is copy and paste? Coming from more of a 'dev' than 'ops' point of view it feels a bit wrong.

Maybe it's a mistake having the public index of images, it makes it seem like you can share reusable building blocks vaguely analogous to Chef recipes, but my experience so far is it is not useful because:
a) for most images there's no info about what it does and what's inside
b) the docs encourage committing your work to the index (so you can later pull it) even though what you made is probably not useful to others, I'm guessing most of what's in there is probably not worth sharing

I feel like the docs don't really guide you to use Docker in a sensible way at the moment

@unclejack
Copy link
Contributor

@unclejack unclejack commented Jan 10, 2014

@anentropic The right way to do this with Dockerfiles is by building multiple images with multiple Dockerfiles.
Here's an example: Dockerfile 1 builds a generic image on top of an Ubuntu base image, Dockerfile 2 uses the resulting image of Dockerfile 1 to build an image for a database servers, Dockerfile 3 uses the database server image and configures it for a special role.

docker build should be quite easy to run and unnecessary complexity shouldn't be added.

The public index of images is extremely useful. Docker images are usually meant to run one service or a bunch of services which can't run in separate containers. You can usually pull an image, run it and get some useful software up and running without much effort.

@unclejack unclejack closed this Jan 10, 2014
@anentropic
Copy link
Author

@anentropic anentropic commented Jan 10, 2014

Understood... so in the scenario I outlined with ascii art above, the Docker way would be:

  • start with Dockerfiles for independent images GenericA and GenericB
  • to make an image SpecificAB I would copy and paste the contents of the GenericB Dockerfile into a new Dockerfile that starts with: FROM GenericA

The problem I see is that if the 'recipe' (to borrow a Chef term) for GenericB is quite complex and has many steps there is no way I can share this info, except by publishing the Dockerfile to Github so that others can copy and paste the relevant parts into their own Dockerfile.

Have you tried using the public index? For example, I did a search for "postgres"... how do I judge the usefulness of (or distinguish in any way between) images such as these:

?

What value do these provide when the only way to be sure I have got a Postgres server set up the way I want, on a particular base image, with nothing dodgy hidden in there, is going to be to create it myself from scratch.

I can see the value of some 'officially blessed' base images in a public index. I can see the value of having a private index of my own custom images ready to pull from.

But it seems a shame that there's no way (apart from copy & paste) to share the series of commands in the Dockerfile as a recipe... such as the suggestion for an 'include' command that was rejected here #2108

@unclejack
Copy link
Contributor

@unclejack unclejack commented Jan 10, 2014

@anentropic You can use a trusted image and you can also find a postgres Dockerfile to build the image yourself.

Images are usually more useful when you customize the Dockerfile to ensure they fit your exact needs. That's why you've discovered that more users have uploaded an image for the same piece of software to the registry.

Existing specific images like the postgres images might not meet your particular needs, but there are also base images and these can be used right away to build something which is useful for you.

Base images like ubuntu, centos and some images from stackbrew/* are images you can use to build what you need.

An example of a great ready to use image is stackbrew/registry. This image lets you play around with a private Docker registry as soon as docker pull stackbrew/registry and docker run -p stackbrew/registry are done executing.

Docker's goal is to help with deployment and with preparing the environment where your software runs. This means that builds are linear and done only during the initial build, but you will run the exact same software every single time.

Configuration management systems may allow you to do something more or employ some other tricks, but they're not as "immutable" and you can end up having two hosts which have subtle differences which aren't picked up by the configuration management software.

@jakirkham
Copy link

@jakirkham jakirkham commented Jun 27, 2015

Hate to necro an old thread, but wanted to offer something that IMHO helps resolves the original posters problem and may help others looking for a similar solution to this problem here.

Let us assume for simplicity that they all use the same base image R. Imagine I have service A and service B. I want them in separate Docker images and both on the same Docker image.

Write a script to install service A and write a separate script to install service B. Then have a git repo with the script for A and another one for script B. Create git repos for all three Docker images that will be built. Each contains git submodules with the install script(s) that will be used. Each Dockerfile will simply ADD an install script and then RUN the install script and do this for one or both scripts. If you wish to remove the script(s) from the image, tack that on after running it.

This way there is one copy of each install script and any docker images you want using them. This avoids unnecessary copying of code and keeps the maintenance burden minimal. The only duplication of effort is moving up the commit used by the submodules, which is significantly better than the alternative and probably could be automated.

@rjurney
Copy link

@rjurney rjurney commented Dec 9, 2015

I think I mis-understand how this works, so I'm replying to get clarification. I want to use Ubuntu 11 with the official selenium docker images. They use Ubuntu 15.

https://github.com/SeleniumHQ/docker-selenium/blob/master/Base/Dockerfile

What is the correct way for me to do this? To clone that repo and edit all the files to say Ubuntu 11 and not 15? This can't be right, can it? This would mean that everyone with any disagreement with any aspect of official images can't make use of them without duplicating the code for them. I think I have it wrong, can someone explain? What is the right way to use the official selenium image with Ubuntu 11?

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Dec 9, 2015

@rjurney yes, that's how that would work; in your example, the whole Dockerfile is developed with ubuntu:15.04 in mind; are those packages available on ubuntu:11? Do they work? Does selenium run on them? Chances are that modifications need to be made in the Dockerfile to make it work on another version of Ubuntu.

"swapping" the base image of an existing image also wouldn't work, because Docker only stores the differences between the base-image and the image. Using a different base-image therefore leads to unpredictable results (e.g., "remove file X", where "file X" exists in the original base image, but not in the base image you selected). Also, the packages/binaries in images building "on top" of a base images, are packages that are built for that version, those binaries may not be compatible with a different base image.

This would mean that everyone with any disagreement with any aspect of official images can't make use of them without duplicating the code for them

Yes. The official images are supported by the maintainers of those images (which in this case, are the maintainers of Selenium). If you think changes are needed to those images, the best way is to open a feature request in their repository. If that feature request is not accepted, you should probably build your own version.

(Also note that there is not official ubuntu:11 image)

@rjurney
Copy link

@rjurney rjurney commented Dec 9, 2015

In the rest of the software world, single inheritance is not seen as
adequate to reasonably express needed semantics. It leads to much code
duplication, which would be considered a bug. Why is this seen as
acceptable for docker? Even if you're building one service at a time,
composition is needed at the operating system level. I don't mean to beat a
dead horse, but this limit seems a little extreme. Might it be better
expressed as a best practice? As a result of the strictness of this
decision, someone will build a tool that does composition or multiple
inheritance and expresses them through single inheritance and duplication.
Having this be outside docker proper will not serve the docker community.

On Wednesday, December 9, 2015, Sebastiaan van Stijn <
notifications@github.com> wrote:

@rjurney https://github.com/rjurney yes, that's how that would work; in
your example, the whole Dockerfile is developed with ubuntu:15.04 in mind;
are those packages available on ubuntu:11? Do they work? Does selenium run
on them? Chances are that modifications need to be made in the Dockerfile
to make it work on another version of Ubuntu.

"swapping" the base image of an existing image also wouldn't work, because
Docker only stores the differences between the base-image and the
image. Using a different base-image therefore leads to unpredictable
results (e.g., "remove file X", where "file X" exists in the original base
image, but not in the base image you selected). Also, the packages/binaries
in images building "on top" of a base images, are packages that are built
for that version, those binaries may not be compatible with a different
base image.

This would mean that everyone with any disagreement with any aspect of
official images can't make use of them without duplicating the code for them

Yes. The official images are supported by the maintainers of those images
(which in this case, are the maintainers of Selenium). If you think changes
are needed to those images, the best way is to open a feature request in
their repository. If that feature request is not accepted, you should
probably build your own version.

(Also note that there is not official ubuntu:11 image)


Reply to this email directly or view it on GitHub
#3378 (comment).

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

@cpuguy83
Copy link
Member

@cpuguy83 cpuguy83 commented Dec 9, 2015

@rjurney multiple inheritance is also extremely complex and not just something you just add in without thought for consequences, corner cases, and incompatibilities.

#12749 was the latest attempt to add such functionality -- ultimately declined because there is other work to be done first.
There's a lot of work being done on the builder, including enabling client-driven builds which can open this up quite a bit.

Single inheritance Dockerfiles works for the (vast) majority of use cases, as such there is no rush to enhance this. It needs to be done correctly and deliberately.
And based on your comments above I'd say you don't actually need multiple inheritance, just a way to specify a base image that the Dockerfile is run against without duplicating the existing code.

@rjurney
Copy link

@rjurney rjurney commented Dec 9, 2015

That would satisfy my needs, yes. Being able to modify some property of the
chain of dockerfiles.

Ok, glad to hear you are on top of this. Thanks for your patience :)

On Wed, Dec 9, 2015 at 9:59 AM, Brian Goff notifications@github.com wrote:

@rjurney https://github.com/rjurney multiple inheritance is also
extremely complex and not just something you just add in without thought
for consequences, corner cases, and incompatibilities.

#12749 #12749 was the latest
attempt to add such functionality -- ultimately declined because there is
other work to be done first.
There's a lot of work being done on the builder, including enabling
client-driven builds which can open this up quite a bit.

Single inheritance Dockerfiles works for the (vast) majority of use cases,
as such there is no rush to enhance this. It needs to be done correctly and
deliberately.
And based on your comments above I'd say you don't actually need multiple
inheritance, just a way to specify a base image that the Dockerfile is run
against without duplicating the existing code.


Reply to this email directly or view it on GitHub
#3378 (comment).

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

@docbill
Copy link

@docbill docbill commented Dec 9, 2015

@rjurney Where do you get your information. To my knowledge Java has never had multiple inheritance, and never will. I'm sure the same is true for many languages. Many consider multiple inheritance extremely harmful, as it can result in almost impossible to predictable code. The same would be true for a docker container.

As I see it, what we need for docker is not the concept of multiple inheritance, but the concept of an include or external dependencies. e.g. You can mount containers at run time. What is truly needed is a way to to the equivalent with images. So you could for example have an imaged that was defined to be based on Fedora 22, and mount an oracle image to add database functionality.

This can be done quite successfully when running containers, but there is just no syntax for specifying it with images. So until run-time there is no way docker can know about these dependencies or in anyway manage them for you.

@rjurney
Copy link

@rjurney rjurney commented Dec 9, 2015

Please note that I mentioned multiple inheritance and composition.
Composition is the preferred way to do this, definitely.

I agree with everything else you said, so +1.

On Wednesday, December 9, 2015, Bill C Riemers notifications@github.com
wrote:

@rjurney https://github.com/rjurney Where do you get your information.
To my knowledge Java has never had multiple inheritance, and never will.
I'm sure the same is true for many languages. Many consider multiple
inheritance extremely harmful, as it can result in almost impossible to
predictable code. The same would be true for a docker container.

As I see it, what we need for docker is not the concept of multiple
inheritance, but the concept of an include or external dependencies. e.g.
You can mount containers at run time. What is truly needed is a way to to
the equivalent with images. So you could for example have an imaged that
was defined to be based on Fedora 22, and mount an oracle image to add
database functionality.

This can be done quite successfully when running containers, but there is
just no syntax for specifying it with images. So until run-time there is no
way docker can know about these dependencies or in anyway manage them for
you.


Reply to this email directly or view it on GitHub
#3378 (comment).

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

@rjurney
Copy link

@rjurney rjurney commented Dec 10, 2015

I'm going to shut up after this, but I put this rant in the aforementioned pull request instead of this ticket, by mistake. So I'm putting it here.

Someone is going to build this. Not accepting a pull that adds INCLUDE will delay and externalize this feature. This should be the basis of the decision here: should this be inside docker or outside docker?

An example comes to mind. In Apache Pig, the team made the decision not to include loops, despite many requests for them, because it was decided that Pig should be great for DAG dataflows and that is it. Instead, an integration was created to script pig scripts, so you could loop through scripts from any JVM language. Note that this was a conscious decision and that alternatives were pursued. This is the model process in my opinion.

Another Pig example comes to mind... Pig Macros. They didn't exist and were 'un pig' until someone (ok, me) started a thread about how incredibly ugly their large pig project was and that there was no way to fix this problem without generating Pig from an external tool, which was undesirable. Many people chimed in, and the Pig team added macros. Macros make clean pig possible, and the community benefitted.

I suggest that you address the decision head on and have a discussion around it, which hasn't occurred here yet, and for findability probably belongs here. This will exist. Duplicating scripts in domain specific languages is terrible. The people will demand it. Will this feature be inside Docker or outside Docker? How will you facilitate this behavior outside of docker?

Sorry, I'm probably missing lots of context on the mailing list, but as a new Docker user... I feel very hesitant to do much with Docker without the ability to compose dockerfiles from existing recipes. I went down this road with Pig, and it nearly killed me. I think many people will feel this way.

In case anyone cares...

The half-adopted presentation about loops and macros in Pig: http://wiki.apache.org/pig/TuringCompletePig
Pig Macro JIRA: https://issues.apache.org/jira/browse/PIG-1793
API Interface to Pig JIRA: https://issues.apache.org/jira/browse/PIG-1333
One that was outright rejected to respect Apache Hive... add SQL to Pig: https://issues.apache.org/jira/browse/PIG-824

Finally, I had an idea that might make this change easy... what if INCLUDE'd files can't inherit? i.e. you would avoid objections by keeping things super simple. Deal with the rest later as more is learned. There could be a simple Dockerfile for instance that installs the pre-req's and binaries, and sets up daemons for MySQL on Ubuntu. If need be, this could be versioned by version of Ubuntu and MySQL. Personally, I'm going to hack a utility to do these simple INCLUDEs and use it to organize my dockerfiles in this way. I can't wait to order and re-use my code.

@DJGummikuh
Copy link

@DJGummikuh DJGummikuh commented Dec 18, 2015

+1 for the INCLUDE idea. Though I believe prohibiting inheritance will only shift the issue, since now you would be able to modify the mainstream image you're inheriting from but not the other images you include. Basically what would make sense would be if you could specify an image to be "includable" in that it does not deliver any operating system stuff that might break existing base image stuff. This flag would have to be set by the docker build process and would prevent non-adequately flagged images to be included. And I mean let's face it. If you're playing with Dockerfiles you're probably not a person that is seeing his machine for the first day so I would believe that while it makes sense to prevent the end user of docker to do stupid things, there should be a little more freedom for the guys that actually create those images. And I mean seriously, being able to select a base image and including all the stuff I want into it to provision my app would be pretty damn awesome.

@parliament718
Copy link

@parliament718 parliament718 commented Jan 24, 2016

+1 for INCLUDE. I simply need nginx and ssh image combined in one. Why does this have to be so hard?

@rjurney
Copy link

@rjurney rjurney commented Jan 25, 2016

The idea that this isn't needed is frankly confusing to the point of being
disingenuous. Most users will use this, if it is created. "Add ssh to
ubuntu" and "add nginx to ubuntu" are pretty common tasks that everyone
need not repeat. What docker HQ really seems to be saying on this is,
"Obviously needed, but we think it will get too ugly. So we pretend." It
would be better if you could actually just be honest and open about this.
Sorry if I'm cranky.

On Sat, Jan 23, 2016 at 6:22 PM, Vazy notifications@github.com wrote:

+1 for INCLUDE. I simply need nginx and ssh image combined in one. Why
does this have to be so hard?


Reply to this email directly or view it on GitHub
#3378 (comment).

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

@vdemeester
Copy link
Member

@vdemeester vdemeester commented Jan 25, 2016

@rjurney let's wait for the build spin-out ; because this way, there will be more than one way to build images (and thus a custom builder could appear that does that). One of the reason docker maintainers (working or not working for Docker) are frisky about it, is because it would add complexity where we want to add flexibility and simplicity. By extracting the builder, we'll have better separation of concern (between building images and running them) and lots of use-case will be more freely implemented in custom builders.

@rjurney
Copy link

@rjurney rjurney commented Jan 25, 2016

Here again, are you pushing this out of the project? Custom sounds... not
the default, included way. When in fact, includes are a simple need that
most everyone has. Repeating yourself is complexity. Inheritance only is
complexity. Includes match a need everyone e has in the simplest way
possible.

On Sunday, January 24, 2016, Vincent Demeester notifications@github.com
wrote:

@rjurney https://github.com/rjurney let's wait for the build spin-out ;
because this way, there will be more than one way to build images (and thus
a custom builder could appear that does that). One of the reason docker
maintainers (working or not working for Docker) are frisky about it, is
because it would add complexity where we want to add flexibility and
simplicity. By extracting the builder, we'll have better separation of
concern (between building images and running them) and lots of use-case
will be more freely implemented in custom builders.


Reply to this email directly or view it on GitHub
#3378 (comment).

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

@mcraveiro
Copy link

@mcraveiro mcraveiro commented Feb 4, 2016

+1, combining images would be extremely useful. Imagine a (god forbid) C++ use case. I build an imagine with boost, another with say Qt, all with the same compiler, etc. Now say I want to build an app with both boost and Qt, I just need to combine the two and presto - a dev environment ready. This would be incredibly useful.

@jakirkham
Copy link

@jakirkham jakirkham commented Feb 4, 2016

Personally, I feel this is too important of an issue not to tackle. That being said we need to get a good understanding of what the problems and scope are regardless of where it is implemented.

So, I see these problems presented by merging.

  1. Handling merge conflicts.
  2. Resolving different bases (Ubuntu and CentOS).

With the first one I think the simple answer is don't. To me it sounds to complicated and potentially problematic and would require suite of tools to solve and still might be too magical. So, if this were added merging conflicts should just fail. I suppose it could be revisited later, but that seems like more trouble than it is worth.

As for the second case, it seems like you could add a constraint that they share some base layers. Now the question becomes how many is enough. I think the correct answer when starting would be the two images being merged must have the same FROM image. There might need to be more constraints here, but it isn't clear to me that those case wouldn't fall under problem 1, which have resolved by simply disallowing it.

Are there some other problems I am missing here?

@anentropic
Copy link
Author

@anentropic anentropic commented Feb 4, 2016

I think there should be no attempt to merge... I can't see that happening

A more realistic approach might be a templating type of solution, i.e. allow to INCLUDE a Dockerfile fragment (which has no FROM clause, just a list of commands) into a real Dockerfile... the fragments can be shared, reused, and included against any compatible base image Dockerfile

@rainabba
Copy link

@rainabba rainabba commented May 2, 2018

Amazing this is still an issue and topic. How hard is it to "INCLUDE someimage", then when parsing it, check the base is compatible (in the FROM chain) and if so, execute the rest of THAT file at that point (as if I had copied the Dockerfile from the project and pasted it into mine)?

The whole "people will do bad things they don't realize" excuse is absurd in this context. This is already insanely complex and that why we need this to help simplify it.

@cpuguy83
Copy link
Member

@cpuguy83 cpuguy83 commented May 2, 2018

@rainabba This is an entirely unhelpful comment.
There are basically two reasons for it's why it's not done, either:

  1. It's not so easy
  2. No one has taken the time to do the work.

In reality, it is usually both.

@rainabba
Copy link

@rainabba rainabba commented May 2, 2018

  1. It's a parsing and string-replace problem that any new coder could accomplish in all of 10 minutes IF they knew where in the code. I'm not saying it would be usable in all cases, but for the limited cases I'm seeing suggested here over and over (where bases are effectively common), it's a dead-ringer.

  2. Of course not, this thread provides ~102 reasons it can't or shouldn't be done, so why would anyone think to do it regardless?

On the other hand, my comment serves (like SO many others here) to demonstrate that there is a need and with the hope to influence either the obstructing attitudes or to at least act as a reminder. If that's "entirely unhelpful", then you've just explained why this issue (ignored feature request) is still here and active and it's not a technical one.

@cpuguy83
Copy link
Member

@cpuguy83 cpuguy83 commented May 2, 2018

It's way more than parsing a string.
Docker and the Dockerfile is used by millions of people. Adding API's is a significant thing... even outside of that the underlying implementation is not "parsing a string".

In any case there's many proposals to solve the problem and this is a very old and closed issue.

@kenahoo
Copy link

@kenahoo kenahoo commented May 3, 2018

I do think that if Docker doesn't figure out a clean solution to this scenario, it will probably be replaced by whatever tool does figure it out.

I noticed one of my colleagues using the following pattern, which might be a decent workaround:

ARG from
FROM $from
... rest of dockerfile

I haven't tried it myself though, so I'm not sure how it would work in practice, e.g. how it behaves with caching, etc.

@alexreg
Copy link

@alexreg alexreg commented May 3, 2018

Indeed, this is a very important problem, and hasn't been addressed properly. I'm amazed a company as big as Docker haven't tackled it yet.

@cosminonea
Copy link

@cosminonea cosminonea commented Aug 8, 2018

Just my two cents... I am just learning more about Docker at the moment and I feel something like INCLUDE would be very useful. I liked the multiple inheritance example above and wanted to address the comments about possible problems and conflicts with it.

Multiple inheritance is hard in any language that supports it but when a conflict occurs it's the responsibility of the Docker file creator to rethink what they are doing and start again. Docker should just build the image and not try to prove the build has no issues.

@larytet
Copy link

@larytet larytet commented Aug 8, 2018

@cosminonea

I feel something like INCLUDE would be very useful

I have support for macros in https://github.com/larytet/dockerfile-generator/ I could support "include" too.

@docbill
Copy link

@docbill docbill commented Aug 9, 2018

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Aug 9, 2018

That last one is possible already; COPY --from accepts both a build-stage, or an image, so for example;

FROM busybox

COPY --from=alpine:latest / /
COPY --from=docker:latest /usr/local/bin/docker /usr/local/bin/

Edit; or to take the actual example;

FROM fedora

COPY --from=ubuntu:latest / /ubuntu/
COPY --from=debian:latest / /debian/

@docbill
Copy link

@docbill docbill commented Aug 9, 2018

@reitzig
Copy link

@reitzig reitzig commented Jan 2, 2019

@thaJeztah Using multi-stage builds for this still requires you to know which files exactly to copy from each image; that's even harder to maintain than copy-pasting the setup code from another image.

Of course, merging Docker images is not trivial. Since arbitrary scripts can be run during builds, the build process resists any general attempt of automatic conflict detection; the halting problem says hi! The best you can do (short of significantly limiting what builds can do) is to define precise semantics: say the last FROM/INCLUDE wins (e.g. if they "write" the same file) or fail on file-system-level conflict or ....

The sometimes stated issue of different "base" images (stretch vs ubuntu vs alpine vs ...), however, is simple: require that the DAG of image dependencies not only has a single source (the current image) but also a single sink (the shared "ancestor" of all images in the "hierarchy").

Ultimately, of course, you'd get garbage-in-garbage-out -- is it ever different, really?

FWIW, my use cases are:

  1. Running a Tomcat web application with a PostgreSQL database and an S3 object store.
    While this can be solved by using Docker Compose, a single container may be nicer.
  2. Multi-language builds run in Docker containers (e.g. on Jenkins, Circle CI, ...).
    There are official images for most popular toolchains, but getting a single container equipped to handle more than one runs in exactly the issue discussed here.

@larytet
Copy link

@larytet larytet commented Jan 2, 2019

@rjurney
Copy link

@rjurney rjurney commented Jan 3, 2019

@reitzig This is not the only option. The right options is to constrain INCLUDEs to avoid big problems. INCLUDEs can't inherit. There it is. Simple. Still incredibly useful.

This feature request is popular but Docker is Free as in Beer but not by any means Free as in Freedom.

@cpuguy83
Copy link
Member

@cpuguy83 cpuguy83 commented Jan 5, 2019

@rjurney With the inclusion of buildkit support since 18.06, users can provide their own frontend parser for the builder. There is already an official (from Docker Inc) experimental Dockerfile parser that includes lots of new features (support for secrets for starters).

You can of course also add your own "INCLUDE" behavior in a custom Dockerfile frontend, or you can do something totally different that's not Dockerfile at all (there's an example for buidpacks).

To use a custom frontend, just need to point Docker at an image which can handle it. Do this as a comment on the first line of your Dockerfile (or whatever thing it will be) syntax = myCustomFrontendImage

More details here:
https://docs.docker.com/develop/develop-images/build_enhancements/#overriding-default-frontends

With buildkit enabled, Docker can build whatever you want it to (doesn't even have to be a Dockerfile format) with whatever features you need.

@reitzig
Copy link

@reitzig reitzig commented Jan 16, 2019

This feature request is popular but Docker is Free as in Beer but not by any means Free as in Freedom.

As offtopic as that note is, I think it should be noted that you are wrong. Thanks to Docker's Apache licensing, everybody has the freedom to fork and develop their own interpreter for Dockerfiles that provides the features developed here. If they are careful, the resulting images will be compatible with existing Docker runtimes/tools.
Of course, the maintainers of the Docker project are similarly free to not merge such a feature into their fork (the original?).

@FranklinYu
Copy link

@FranklinYu FranklinYu commented Jan 16, 2019

@reitzig That is obviously just meaningless rant without actually referring what is free software. Moby is free software of course.

@rjurney
Copy link

@rjurney rjurney commented Jan 16, 2019

@rjurney
Copy link

@rjurney rjurney commented Jan 16, 2019

@FranklinYu
Copy link

@FranklinYu FranklinYu commented Jan 16, 2019

Free as in Beer means Apache.

Disagree. Freeware can be proprietary software.

Free as in Freedom means community control.

What's community control? Projects run by a foundation? So you would consider VS Code, Atom editor, and Ubuntu as non-free software? Then your definition is significantly different from the one proposed by FSF, EFF, and many other organizations.

I agree that Docker Inc is not actively discussing with community in this issue, but this has nothing to do with "Free as in Freedom".

@cpuguy83
Copy link
Member

@cpuguy83 cpuguy83 commented Jan 16, 2019

Sorry folks, let's not have these sorts of discussions on the issue tracker.

I agree that Docker Inc is not actively discussing with community in this issue

We have made it possible to support any build format you want to have via docker build. The "official" Dockerfile format does not support this option, but that doesn't mean that docker build can't make use of it.
Check out https://matt-rickard.com/building-a-new-dockerfile-frontend/ as an example of building a custom frontend that works with docker build.
Note that this frontend is an example of how you can do something completely different from the Dockerfile format, but that is not necessary. You can take the existing Dockerfile format and add your own functionality if you like.

As far as adding something into the official Dockerfile format.... I will say proposals are always welcome, the format is maintained in https://github.com/moby/buildkit.
Bear in mind, though, every new feature means new burden of maintainership, including often limiting what can be done in the future.

I think it's likely that many of the use case for combining multiple Dockerfiles can actually be solved with new functionality in Dockerfile... specicially the ability to COPY --from and RUN --mount from arbitrary images.

@dejarp
Copy link

@dejarp dejarp commented Jun 9, 2019

If this hypothetical INCLUDE could just create the extra containers as an impl detail with me NOT having to give a @#$% it would greatly reduce the amount of frustration surrounding the implicit and dodgy sales pitch of composable containers. I really just want to get back to the application and delivering functionality. Sorry for the the bad vibes, but I am docker/container noob and ran into the same confusion that a lot of other posters have already expressed.

@bergkvist
Copy link

@bergkvist bergkvist commented Jun 21, 2020

What if you could do this:

              /--- python:3.8.3-alpine3.12 ---\
             /                                 \
alpine:3.12.0                                   custom image (with both python and rust)
             \                                 /
              \----- rust:1.44-alpine3.12 ----/

Notice that both images are descendants of the same image. This is key!

As easily as this:

FROM alpine:3.12.0
INCLUDE rust:1.44-alpine3.12
INCLUDE python:3.8.3-alpine3.12

Compared to when using the "COPY --from image"-instruction (multi-stage builds), you won't have to think about the implementation details (which files/environment variables to copy over).

What it looks like right now if you want to combine the images

FROM alpine:3.12.0

# INCLUDE rust:1.44-alpine3.12
COPY --from=rust:1.44-alpine3.12 / /
ENV RUSTUP_HOME=/usr/local/rustup \
    CARGO_HOME=/usr/local/cargo \
    PATH=/usr/local/cargo/bin:$PATH \
    RUST_VERSION=1.44.1

# INCLUDE python:3.8.3-alpine3.12
COPY --from=python:3.8.3-alpine3.12 / /
ENV PATH /usr/local/bin:$PATH
ENV LANG C.UTF-8
ENV GPG_KEY E3FF2839C048B25C084DEBE9B26995E310250568
ENV PYTHON_VERSION 3.8.3
ENV PYTHON_PIP_VERSION 20.1.1
ENV PYTHON_GET_PIP_URL https://github.com/pypa/get-pip/raw/eff16c878c7fd6b688b9b4c4267695cf1a0bf01b/get-pip.py
ENV PYTHON_GET_PIP_SHA256 b3153ec0cf7b7bbf9556932aa37e4981c35dc2a2c501d70d91d2795aa532be79

ENV-instructions are copy-pasted from the Dockerfiles of these images.


This would also allow for much better container reuse, and make it extremely easy to throw things together that could otherwise take ages to compile or build yourself!

Consider that with this approach, a program only needs to be compiled once per platform/base image version - and it is easier to reuse, rather than implement it yourself. Just think about how many times the "wheel has been reimplemented" in C++ due to the lack of a good/universal package manager. Do we want a similar situation to arise for Docker?

@eine
Copy link

@eine eine commented Jun 25, 2020

@bergkvist, see #3378 (comment) and #3378 (comment).

It feels to me that none of the solutions you propose corresponds to the diagram. Instead, you are doing:

              /--- python:3.8.3-alpine3.12 ---\
             /                                 \
alpine:3.12.0                                   \
             \                                   \
              \----- rust:1.44-alpine3.12 --------\ custom image 

So, any file which was modified in rust is overwritten by python. Combining them without copying one over the other would require some merging.

@bergkvist
Copy link

@bergkvist bergkvist commented Jun 25, 2020

@eine Yes, in case of conflicts, files will be overwritten. That's true. So the figure being symmetric would be a special case of when no (relevant) files overlap. Your version of the figure is more general.

My point about having both images inherit from the same exact image, is that the chance of critical conflicts might be slim.

I imagine that there could arise some conflicts related to the package manager files. If both images used the package manager to install different things. I'm not sure if there are any other "common conflicts" like that which could be handled with some kind of special case.

Merging two files is anything but straight forward. I think in the general case, it might be better to just overwrite than trying to be smart. At least then it is easier to debug when things don't work.

@bergkvist
Copy link

@bergkvist bergkvist commented Jun 25, 2020

Since I commented here 4 days ago, I decided to learn Golang, and look into the frontend code for the moby/buildkit code.

I have now created a custom frontend that accepts INCLUDE-statements as I discussed above.

#syntax=bergkvist/includeimage
FROM alpine:3.12.0
INCLUDE rust:1.44-alpine3.12
INCLUDE python:3.8.3-alpine3.12

To use the custom syntax, remember to set DOCKER_BUILDKIT=1 when building.

DOCKER_BUILDKIT=1 docker build -t myimage .

The code is available here: https://github.com/bergkvist/includeimage
And image on Docker Hub: https://hub.docker.com/r/bergkvist/includeimage

@bergkvist
Copy link

@bergkvist bergkvist commented Jun 14, 2021

As a side-note; if you want truly composable Docker builds, I recommend checking out dockerTools in nixpkgs. This will also result in more reproducible (and typically very small) images.

$ docker load < $(nix-build docker-image.nix)
# docker-image.nix
let
  pkgs = import <nixpkgs> {};
  python = pkgs.python38;
  rustc = pkgs.rustc;
in pkgs.dockerTools.buildImage {
  name = "myimage";
  tag = "latest";
  contents = [ python rustc ];
}

https://nix.dev/tutorials/building-and-running-docker-images

@rjurney
Copy link

@rjurney rjurney commented Jun 18, 2021

This thread is funny because of what it says about Docker and the people that run the company. Users overwhelmingly hate copying code around and find single inheritance without composition unworkable. It creates lots of technical debt. There is a simple proposal on the table to bypass all the complexity and not allow INCLUDEs to inherit. It would save many thousands of people from copying snippets of code around.

The community's opinion doesn't mean anything in this project. This is not that model of open source. Move along, now. This ticket is dead. We should close it so everyone is clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet