New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I combine several images into one via Dockerfile #3378

Closed
anentropic opened this Issue Dec 29, 2013 · 82 comments

Comments

Projects
None yet
@anentropic

anentropic commented Dec 29, 2013

I have several Dockerfiles to build images which eg setup a postgresql client, set up a generic python app environment

I want to make a Dockerfile for my python webapp which combines both those images and then runs some more commands

If I understood the docs correctly, if I use FROM a second time I start creating a new image instead of adding to the current one?

@SvenDowideit

This comment has been minimized.

Contributor

SvenDowideit commented Dec 29, 2013

you Chain them :)

so for example, if you have one Dockerfile that sets up your generic postgres client and generic python app env, you tag the result of that build (eg mygenericenv), and then your subsequent Dockerfiles use FROM mygenericenv.

for eg

## Dockerfile.genericwebapp might have FROM ubuntu
cat Dockerfile.genericwebapp | docker build -t genericwebapp -
## Dockerfile.genericpython-web would have FROM genericwebapp
cat Dockerfile.genericpython-web | docker build -t genericpython-web -
## and then this specific app i'm testing might have a docker file that containers FROM genericpython-web
docker build -t thisapp .
@anentropic

This comment has been minimized.

anentropic commented Dec 29, 2013

I can see how to do that, i.e. genericA --> specificA but is there any way to do something like:

genericA --
            \
             ---> specificAB
            /
genericB --

?

@tianon

This comment has been minimized.

Member

tianon commented Dec 29, 2013

Not through any official means, but some people have had luck manually modifying the image hierarchy to achieve this (but if you do this, you do so at your own risk, and you get to keep all the pieces).

The reason this won't be supported officially is because imagine I want to take "ubuntu" and graft "centos" on top. There will be lots of really fun conflicts causing a support nightmare, so if you want to do things like that, you're on your own.

@anentropic

This comment has been minimized.

anentropic commented Dec 29, 2013

Ok I see why. I was looking for composable blocks of functionality but maybe this isn't the Docker use case... seems like I should be using it to set up the raw containers then run something like ansible or saltstack on top to configure the software in them.

@shykes

This comment has been minimized.

Collaborator

shykes commented Dec 30, 2013

The idea behind containers is that the smallest unit of real composition is the container. That is, a container is the smallest thing you can produce in advance, not knowing what else it will be combined with, and have strong guarantees of how it will behave and interact with other components.

Therefore, any unit smaller than a container - be it a ruby or shell script, a c++ source tree, a binary on its own, a set of configuration files, a system package, etc. - cannot be safely composed, because it will behave very differently depending on its build dependencies, runtime dependencies, and what other components are part of the composition.

That reality can be partially masked by brute force. Such brute force can be pragmatic and "good enough" (giant Makefile which auto-detects everything for a more portable build of your app) or overly grandiose ("let's model in advance every possible permutation of every dependency and interference between components, and express them in a high-level abstraction!")

When you rely on Ansible, Chef or any other configuration management to create "composable components" you are relying on a leaky abstraction: these components are not, in fact, composable. From one system to the next they will produce builds which behave differently in a million ways. All the extra abstraction in the end will buy you very little.

My advice is to focus on 2 things: 1) the source code, and 2) the runnable container. These are the only 2 reliable points of composition.

On Sun, Dec 29, 2013 at 1:46 PM, anentropic notifications@github.com
wrote:

Ok I see why. I was looking for composable blocks of functionality but maybe this isn't the Docker use case... seems like I should be using it to set up the raw containers then run something like ansible or saltstack on top to configure the software in them.

Reply to this email directly or view it on GitHub:
#3378 (comment)

@anentropic

This comment has been minimized.

anentropic commented Dec 30, 2013

Thanks for giving more perspective.

So you're saying that for reusing parts of Dockerfiles the only tool available is copy and paste? Coming from more of a 'dev' than 'ops' point of view it feels a bit wrong.

Maybe it's a mistake having the public index of images, it makes it seem like you can share reusable building blocks vaguely analogous to Chef recipes, but my experience so far is it is not useful because:
a) for most images there's no info about what it does and what's inside
b) the docs encourage committing your work to the index (so you can later pull it) even though what you made is probably not useful to others, I'm guessing most of what's in there is probably not worth sharing

I feel like the docs don't really guide you to use Docker in a sensible way at the moment

@unclejack

This comment has been minimized.

Contributor

unclejack commented Jan 10, 2014

@anentropic The right way to do this with Dockerfiles is by building multiple images with multiple Dockerfiles.
Here's an example: Dockerfile 1 builds a generic image on top of an Ubuntu base image, Dockerfile 2 uses the resulting image of Dockerfile 1 to build an image for a database servers, Dockerfile 3 uses the database server image and configures it for a special role.

docker build should be quite easy to run and unnecessary complexity shouldn't be added.

The public index of images is extremely useful. Docker images are usually meant to run one service or a bunch of services which can't run in separate containers. You can usually pull an image, run it and get some useful software up and running without much effort.

@unclejack unclejack closed this Jan 10, 2014

@anentropic

This comment has been minimized.

anentropic commented Jan 10, 2014

Understood... so in the scenario I outlined with ascii art above, the Docker way would be:

  • start with Dockerfiles for independent images GenericA and GenericB
  • to make an image SpecificAB I would copy and paste the contents of the GenericB Dockerfile into a new Dockerfile that starts with: FROM GenericA

The problem I see is that if the 'recipe' (to borrow a Chef term) for GenericB is quite complex and has many steps there is no way I can share this info, except by publishing the Dockerfile to Github so that others can copy and paste the relevant parts into their own Dockerfile.

Have you tried using the public index? For example, I did a search for "postgres"... how do I judge the usefulness of (or distinguish in any way between) images such as these:

?

What value do these provide when the only way to be sure I have got a Postgres server set up the way I want, on a particular base image, with nothing dodgy hidden in there, is going to be to create it myself from scratch.

I can see the value of some 'officially blessed' base images in a public index. I can see the value of having a private index of my own custom images ready to pull from.

But it seems a shame that there's no way (apart from copy & paste) to share the series of commands in the Dockerfile as a recipe... such as the suggestion for an 'include' command that was rejected here #2108

@unclejack

This comment has been minimized.

Contributor

unclejack commented Jan 10, 2014

@anentropic You can use a trusted image and you can also find a postgres Dockerfile to build the image yourself.

Images are usually more useful when you customize the Dockerfile to ensure they fit your exact needs. That's why you've discovered that more users have uploaded an image for the same piece of software to the registry.

Existing specific images like the postgres images might not meet your particular needs, but there are also base images and these can be used right away to build something which is useful for you.

Base images like ubuntu, centos and some images from stackbrew/* are images you can use to build what you need.

An example of a great ready to use image is stackbrew/registry. This image lets you play around with a private Docker registry as soon as docker pull stackbrew/registry and docker run -p stackbrew/registry are done executing.

Docker's goal is to help with deployment and with preparing the environment where your software runs. This means that builds are linear and done only during the initial build, but you will run the exact same software every single time.

Configuration management systems may allow you to do something more or employ some other tricks, but they're not as "immutable" and you can end up having two hosts which have subtle differences which aren't picked up by the configuration management software.

@jakirkham

This comment has been minimized.

jakirkham commented Jun 27, 2015

Hate to necro an old thread, but wanted to offer something that IMHO helps resolves the original posters problem and may help others looking for a similar solution to this problem here.

Let us assume for simplicity that they all use the same base image R. Imagine I have service A and service B. I want them in separate Docker images and both on the same Docker image.

Write a script to install service A and write a separate script to install service B. Then have a git repo with the script for A and another one for script B. Create git repos for all three Docker images that will be built. Each contains git submodules with the install script(s) that will be used. Each Dockerfile will simply ADD an install script and then RUN the install script and do this for one or both scripts. If you wish to remove the script(s) from the image, tack that on after running it.

This way there is one copy of each install script and any docker images you want using them. This avoids unnecessary copying of code and keeps the maintenance burden minimal. The only duplication of effort is moving up the commit used by the submodules, which is significantly better than the alternative and probably could be automated.

@rjurney

This comment has been minimized.

rjurney commented Dec 9, 2015

I think I mis-understand how this works, so I'm replying to get clarification. I want to use Ubuntu 11 with the official selenium docker images. They use Ubuntu 15.

https://github.com/SeleniumHQ/docker-selenium/blob/master/Base/Dockerfile

What is the correct way for me to do this? To clone that repo and edit all the files to say Ubuntu 11 and not 15? This can't be right, can it? This would mean that everyone with any disagreement with any aspect of official images can't make use of them without duplicating the code for them. I think I have it wrong, can someone explain? What is the right way to use the official selenium image with Ubuntu 11?

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Dec 9, 2015

@rjurney yes, that's how that would work; in your example, the whole Dockerfile is developed with ubuntu:15.04 in mind; are those packages available on ubuntu:11? Do they work? Does selenium run on them? Chances are that modifications need to be made in the Dockerfile to make it work on another version of Ubuntu.

"swapping" the base image of an existing image also wouldn't work, because Docker only stores the differences between the base-image and the image. Using a different base-image therefore leads to unpredictable results (e.g., "remove file X", where "file X" exists in the original base image, but not in the base image you selected). Also, the packages/binaries in images building "on top" of a base images, are packages that are built for that version, those binaries may not be compatible with a different base image.

This would mean that everyone with any disagreement with any aspect of official images can't make use of them without duplicating the code for them

Yes. The official images are supported by the maintainers of those images (which in this case, are the maintainers of Selenium). If you think changes are needed to those images, the best way is to open a feature request in their repository. If that feature request is not accepted, you should probably build your own version.

(Also note that there is not official ubuntu:11 image)

@rjurney

This comment has been minimized.

rjurney commented Dec 9, 2015

In the rest of the software world, single inheritance is not seen as
adequate to reasonably express needed semantics. It leads to much code
duplication, which would be considered a bug. Why is this seen as
acceptable for docker? Even if you're building one service at a time,
composition is needed at the operating system level. I don't mean to beat a
dead horse, but this limit seems a little extreme. Might it be better
expressed as a best practice? As a result of the strictness of this
decision, someone will build a tool that does composition or multiple
inheritance and expresses them through single inheritance and duplication.
Having this be outside docker proper will not serve the docker community.

On Wednesday, December 9, 2015, Sebastiaan van Stijn <
notifications@github.com> wrote:

@rjurney https://github.com/rjurney yes, that's how that would work; in
your example, the whole Dockerfile is developed with ubuntu:15.04 in mind;
are those packages available on ubuntu:11? Do they work? Does selenium run
on them? Chances are that modifications need to be made in the Dockerfile
to make it work on another version of Ubuntu.

"swapping" the base image of an existing image also wouldn't work, because
Docker only stores the differences between the base-image and the
image. Using a different base-image therefore leads to unpredictable
results (e.g., "remove file X", where "file X" exists in the original base
image, but not in the base image you selected). Also, the packages/binaries
in images building "on top" of a base images, are packages that are built
for that version, those binaries may not be compatible with a different
base image.

This would mean that everyone with any disagreement with any aspect of
official images can't make use of them without duplicating the code for them

Yes. The official images are supported by the maintainers of those images
(which in this case, are the maintainers of Selenium). If you think changes
are needed to those images, the best way is to open a feature request in
their repository. If that feature request is not accepted, you should
probably build your own version.

(Also note that there is not official ubuntu:11 image)


Reply to this email directly or view it on GitHub
#3378 (comment).

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Dec 9, 2015

@rjurney multiple inheritance is also extremely complex and not just something you just add in without thought for consequences, corner cases, and incompatibilities.

#12749 was the latest attempt to add such functionality -- ultimately declined because there is other work to be done first.
There's a lot of work being done on the builder, including enabling client-driven builds which can open this up quite a bit.

Single inheritance Dockerfiles works for the (vast) majority of use cases, as such there is no rush to enhance this. It needs to be done correctly and deliberately.
And based on your comments above I'd say you don't actually need multiple inheritance, just a way to specify a base image that the Dockerfile is run against without duplicating the existing code.

@rjurney

This comment has been minimized.

rjurney commented Dec 9, 2015

That would satisfy my needs, yes. Being able to modify some property of the
chain of dockerfiles.

Ok, glad to hear you are on top of this. Thanks for your patience :)

On Wed, Dec 9, 2015 at 9:59 AM, Brian Goff notifications@github.com wrote:

@rjurney https://github.com/rjurney multiple inheritance is also
extremely complex and not just something you just add in without thought
for consequences, corner cases, and incompatibilities.

#12749 #12749 was the latest
attempt to add such functionality -- ultimately declined because there is
other work to be done first.
There's a lot of work being done on the builder, including enabling
client-driven builds which can open this up quite a bit.

Single inheritance Dockerfiles works for the (vast) majority of use cases,
as such there is no rush to enhance this. It needs to be done correctly and
deliberately.
And based on your comments above I'd say you don't actually need multiple
inheritance, just a way to specify a base image that the Dockerfile is run
against without duplicating the existing code.


Reply to this email directly or view it on GitHub
#3378 (comment).

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

@docbill

This comment has been minimized.

docbill commented Dec 9, 2015

@rjurney Where do you get your information. To my knowledge Java has never had multiple inheritance, and never will. I'm sure the same is true for many languages. Many consider multiple inheritance extremely harmful, as it can result in almost impossible to predictable code. The same would be true for a docker container.

As I see it, what we need for docker is not the concept of multiple inheritance, but the concept of an include or external dependencies. e.g. You can mount containers at run time. What is truly needed is a way to to the equivalent with images. So you could for example have an imaged that was defined to be based on Fedora 22, and mount an oracle image to add database functionality.

This can be done quite successfully when running containers, but there is just no syntax for specifying it with images. So until run-time there is no way docker can know about these dependencies or in anyway manage them for you.

@rjurney

This comment has been minimized.

rjurney commented Dec 9, 2015

Please note that I mentioned multiple inheritance and composition.
Composition is the preferred way to do this, definitely.

I agree with everything else you said, so +1.

On Wednesday, December 9, 2015, Bill C Riemers notifications@github.com
wrote:

@rjurney https://github.com/rjurney Where do you get your information.
To my knowledge Java has never had multiple inheritance, and never will.
I'm sure the same is true for many languages. Many consider multiple
inheritance extremely harmful, as it can result in almost impossible to
predictable code. The same would be true for a docker container.

As I see it, what we need for docker is not the concept of multiple
inheritance, but the concept of an include or external dependencies. e.g.
You can mount containers at run time. What is truly needed is a way to to
the equivalent with images. So you could for example have an imaged that
was defined to be based on Fedora 22, and mount an oracle image to add
database functionality.

This can be done quite successfully when running containers, but there is
just no syntax for specifying it with images. So until run-time there is no
way docker can know about these dependencies or in anyway manage them for
you.


Reply to this email directly or view it on GitHub
#3378 (comment).

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

@rjurney

This comment has been minimized.

rjurney commented Dec 10, 2015

I'm going to shut up after this, but I put this rant in the aforementioned pull request instead of this ticket, by mistake. So I'm putting it here.

Someone is going to build this. Not accepting a pull that adds INCLUDE will delay and externalize this feature. This should be the basis of the decision here: should this be inside docker or outside docker?

An example comes to mind. In Apache Pig, the team made the decision not to include loops, despite many requests for them, because it was decided that Pig should be great for DAG dataflows and that is it. Instead, an integration was created to script pig scripts, so you could loop through scripts from any JVM language. Note that this was a conscious decision and that alternatives were pursued. This is the model process in my opinion.

Another Pig example comes to mind... Pig Macros. They didn't exist and were 'un pig' until someone (ok, me) started a thread about how incredibly ugly their large pig project was and that there was no way to fix this problem without generating Pig from an external tool, which was undesirable. Many people chimed in, and the Pig team added macros. Macros make clean pig possible, and the community benefitted.

I suggest that you address the decision head on and have a discussion around it, which hasn't occurred here yet, and for findability probably belongs here. This will exist. Duplicating scripts in domain specific languages is terrible. The people will demand it. Will this feature be inside Docker or outside Docker? How will you facilitate this behavior outside of docker?

Sorry, I'm probably missing lots of context on the mailing list, but as a new Docker user... I feel very hesitant to do much with Docker without the ability to compose dockerfiles from existing recipes. I went down this road with Pig, and it nearly killed me. I think many people will feel this way.

In case anyone cares...

The half-adopted presentation about loops and macros in Pig: http://wiki.apache.org/pig/TuringCompletePig
Pig Macro JIRA: https://issues.apache.org/jira/browse/PIG-1793
API Interface to Pig JIRA: https://issues.apache.org/jira/browse/PIG-1333
One that was outright rejected to respect Apache Hive... add SQL to Pig: https://issues.apache.org/jira/browse/PIG-824

Finally, I had an idea that might make this change easy... what if INCLUDE'd files can't inherit? i.e. you would avoid objections by keeping things super simple. Deal with the rest later as more is learned. There could be a simple Dockerfile for instance that installs the pre-req's and binaries, and sets up daemons for MySQL on Ubuntu. If need be, this could be versioned by version of Ubuntu and MySQL. Personally, I'm going to hack a utility to do these simple INCLUDEs and use it to organize my dockerfiles in this way. I can't wait to order and re-use my code.

@DJGummikuh

This comment has been minimized.

DJGummikuh commented Dec 18, 2015

+1 for the INCLUDE idea. Though I believe prohibiting inheritance will only shift the issue, since now you would be able to modify the mainstream image you're inheriting from but not the other images you include. Basically what would make sense would be if you could specify an image to be "includable" in that it does not deliver any operating system stuff that might break existing base image stuff. This flag would have to be set by the docker build process and would prevent non-adequately flagged images to be included. And I mean let's face it. If you're playing with Dockerfiles you're probably not a person that is seeing his machine for the first day so I would believe that while it makes sense to prevent the end user of docker to do stupid things, there should be a little more freedom for the guys that actually create those images. And I mean seriously, being able to select a base image and including all the stuff I want into it to provision my app would be pretty damn awesome.

@parliament718

This comment has been minimized.

parliament718 commented Jan 24, 2016

+1 for INCLUDE. I simply need nginx and ssh image combined in one. Why does this have to be so hard?

@rjurney

This comment has been minimized.

rjurney commented Jan 25, 2016

The idea that this isn't needed is frankly confusing to the point of being
disingenuous. Most users will use this, if it is created. "Add ssh to
ubuntu" and "add nginx to ubuntu" are pretty common tasks that everyone
need not repeat. What docker HQ really seems to be saying on this is,
"Obviously needed, but we think it will get too ugly. So we pretend." It
would be better if you could actually just be honest and open about this.
Sorry if I'm cranky.

On Sat, Jan 23, 2016 at 6:22 PM, Vazy notifications@github.com wrote:

+1 for INCLUDE. I simply need nginx and ssh image combined in one. Why
does this have to be so hard?


Reply to this email directly or view it on GitHub
#3378 (comment).

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

@vdemeester

This comment has been minimized.

Member

vdemeester commented Jan 25, 2016

@rjurney let's wait for the build spin-out ; because this way, there will be more than one way to build images (and thus a custom builder could appear that does that). One of the reason docker maintainers (working or not working for Docker) are frisky about it, is because it would add complexity where we want to add flexibility and simplicity. By extracting the builder, we'll have better separation of concern (between building images and running them) and lots of use-case will be more freely implemented in custom builders.

@rjurney

This comment has been minimized.

rjurney commented Jan 25, 2016

Here again, are you pushing this out of the project? Custom sounds... not
the default, included way. When in fact, includes are a simple need that
most everyone has. Repeating yourself is complexity. Inheritance only is
complexity. Includes match a need everyone e has in the simplest way
possible.

On Sunday, January 24, 2016, Vincent Demeester notifications@github.com
wrote:

@rjurney https://github.com/rjurney let's wait for the build spin-out ;
because this way, there will be more than one way to build images (and thus
a custom builder could appear that does that). One of the reason docker
maintainers (working or not working for Docker) are frisky about it, is
because it would add complexity where we want to add flexibility and
simplicity. By extracting the builder, we'll have better separation of
concern (between building images and running them) and lots of use-case
will be more freely implemented in custom builders.


Reply to this email directly or view it on GitHub
#3378 (comment).

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com relato.io

@mcraveiro

This comment has been minimized.

mcraveiro commented Feb 4, 2016

+1, combining images would be extremely useful. Imagine a (god forbid) C++ use case. I build an imagine with boost, another with say Qt, all with the same compiler, etc. Now say I want to build an app with both boost and Qt, I just need to combine the two and presto - a dev environment ready. This would be incredibly useful.

@jakirkham

This comment has been minimized.

jakirkham commented Feb 4, 2016

Personally, I feel this is too important of an issue not to tackle. That being said we need to get a good understanding of what the problems and scope are regardless of where it is implemented.

So, I see these problems presented by merging.

  1. Handling merge conflicts.
  2. Resolving different bases (Ubuntu and CentOS).

With the first one I think the simple answer is don't. To me it sounds to complicated and potentially problematic and would require suite of tools to solve and still might be too magical. So, if this were added merging conflicts should just fail. I suppose it could be revisited later, but that seems like more trouble than it is worth.

As for the second case, it seems like you could add a constraint that they share some base layers. Now the question becomes how many is enough. I think the correct answer when starting would be the two images being merged must have the same FROM image. There might need to be more constraints here, but it isn't clear to me that those case wouldn't fall under problem 1, which have resolved by simply disallowing it.

Are there some other problems I am missing here?

@anentropic

This comment has been minimized.

anentropic commented Feb 4, 2016

I think there should be no attempt to merge... I can't see that happening

A more realistic approach might be a templating type of solution, i.e. allow to INCLUDE a Dockerfile fragment (which has no FROM clause, just a list of commands) into a real Dockerfile... the fragments can be shared, reused, and included against any compatible base image Dockerfile

@emanuil-tolev

This comment has been minimized.

emanuil-tolev commented Dec 5, 2016

So how do I use both ruby-2.3 and the java-8 images? They use the same debian jessie image as the base (I read the dockerfiles). I just want to execute the commands present in both of them. As it stands I had to copy/paste the Java Dockerfile into the Ruby Dockerfile. The app needs both, there's absolutely no way I'm getting around that.

I did take the opportunity to remove some Dockerfile commands while I was pasting them in - they were not harmful, but simply superfluous, since the "base" Dockerfile (that I was pasting commands into) already did those steps. I can thus sort of see the argument that I didn't really want a "ruby" & "java" image, I was actually building a 3rd "ruby+java all-in-one" image.

However, in this particular case, the commands in those two images seem to be fully compatible - if I did simply concatenate them, they should work. It would be useful to be able to specify such circumstances. I'm not a huge fan of the copy/paste approach - in my case the Java and Ruby Dockerfiles were simple enough, but some Dockerfiles are much more complex.

However, to everybody else like me who wants this feature - I can see lots of situations where this would be problematic. So it's not just a question of providing the capability to run "nginx" and then "ssh" on the same docker image - the same functionality would also enable you to run "debian" and "centos", which definitely won't produce a workable image. If it is ever introduced it seems like it would have to be an experimental option, off by default, which has loads of warnings attached to it.

So whatever the interface is to this feature, it would have to make it very clear that the onus on getting the reusable behaviours (sets of commands) right is on the Dockerfile developer.

EDIT: Ah, I missed the INCLUDE discussion.

Why don't we simplify the issue and start by implementing INCLUDE so that it does not allow inheritance? In other words, you can only include files that have no FROM.

That would handle many use cases, and the impetus would be on the files people INCLUDE to work on any reasonable operating system. uname exists for a reason. This would be a first step, and feedback on this implementation would help define anything further.

That seems like an easy decision to make. It would not be a ton of work. It would not be complex. Right?

👍

@rjurney that's basically what #12749 does

👍 perfect, looking forward to seeing what that will be able to do in its final form.

@kenahoo

This comment has been minimized.

kenahoo commented Jan 19, 2017

Very interested in this concept too. An "INCLUDE" mechanism is a very crude solution, but honestly would represent a pretty big step forward in maintainability of a set of docker files.

Personally I wouldn't want it to fail when it encounters a FROM, I'd want it to ignore the FROM and just apply the rest of the commands in sequence.

@rjurney

This comment has been minimized.

rjurney commented Jan 19, 2017

@cloutiertyler

This comment has been minimized.

cloutiertyler commented Apr 19, 2017

Here's the thing, I don't necessarily need merge. I think a lot of the problems could be solved with a rebase. My normal use case is

A (ubuntu) -> B (e.g. nginx)
A (ubuntu) -> C (e.g. node)

And I want a combined B & C image. Usually they don't have anything to do with each other, so it would be sufficient just to rebase all the diffs between A and C onto B. i.e.

A -> B -> C'

That seems like a simpler problem to solve.

@FranklinYu

This comment has been minimized.

FranklinYu commented Apr 19, 2017

@cloutiertyler Typically Node.js applications don't need this feature to work with Nginx (in my opinion). The Docker way would be two containers, one for Nginx, the other for Node. We configure the Node container to expose its port only to the Nginx container, and let Nginx container listen to the public port (like 80). Any reason why Nginx needs to be in the same container as Node?

An sample Docker Compose file may be

version: "2.1"

services:
  app: # Node.js application image
    build: .
  nginx: # Nginx container who can make request to app container
    image: nginx:1.10-alpine
    depends_on:
      - app
    ports:
      - "${PORT:-8080}:80" # public port, you may want PORT=80
@cloutiertyler

This comment has been minimized.

cloutiertyler commented Apr 22, 2017

@FranklinYu I appreciate the reply. I actually just used two random services as an example. My usual use case would be starting with a generic service (e.g. node based off of ubuntu) and a custom image of my own (also based off of ubuntu) and wanting to combine them.

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Apr 22, 2017

btw, it's not exactly rebasing but opens up a lot of use-cases for Dockerfiles.
Dockerfile now supports multi-stage builds. Example:

FROM golang AS myapp
COPY . /myapp
RUN cd /myapp && go build

FROM scratch
COPY --from=myapp /myapp /usr/bin/myapp

You can have as many stages as you like.
The --from parameter basically switches the context to the specifed build target name.
When you docker build -t myapp ., the resulting image called myapp:latest will be from the last stage.
You can also build specific stages with docker build --target=myapp, for example.

There's a few other very nice Dockerfile enhancements in 17.05 (currently available as RC1), give it a try!

@cloutiertyler

This comment has been minimized.

cloutiertyler commented Apr 23, 2017

Now that is interesting! I didn't know you could do that. I'll have to give that a try to see if it solves my common use cases.

@cloutiertyler

This comment has been minimized.

cloutiertyler commented May 25, 2017

While this is a great feature, having tried it out it doesn't really solve my most common problem. I just ran into it again today.

I would like a Jenkins image that has Docker installed so that I can build from within the container. The fact of the matter is that there's no way to do this without replicating the install process of one or the other in my Dockerfile.

This is a case where the blinkered arguments about this not being necessary since each container should only be one service obviously don't apply. My "one service" combines the functionality of Docker and Jenkins.

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented May 26, 2017

The fact of the matter is that there's no way to do this without replicating the install process of one or the other in my Dockerfile.

So you want to smash two dockerfiles into one so you don't have to copy/paste stuff?

@cloutiertyler

This comment has been minimized.

cloutiertyler commented May 26, 2017

Copy/paste is the equivalent of forking in this case. What I want to do is avoid forking a Dockerfile so I don't miss out on bug/security improvements or other changes when it invariably changes later on.

@dbazhal

This comment has been minimized.

dbazhal commented Jul 13, 2017

Can't just pass by. Looking for a way to distribute changes over a long chain of images inheritance (deeper than 2). Multi-stage doesn't seem to be the thing that claryfies a problem. Having an entity that could contain just block of directives, allowing me to include it into all my inheritor images, together with base image functionality looks like rational evolution.

@AwokeKnowing

This comment has been minimized.

AwokeKnowing commented Jul 29, 2017

For those wondering the right way to do this, from a Docker perspective, take a few minutes to review:
https://github.com/floydhub/dockerfiles

Here he creates an entire tree of Dockerfiles. As you go down the tree, you find different combinations of dependencies, each FROM the level above in the tree. So if you followed the tree from
-ubuntu->common-deps->python3->deepLearningBase->pyTorch
and you really wanted

-ubuntu->common-deps->python3->deepLearningBase->pyTorch 
+
-ubuntu->common-deps->python3->deepLearningBase->TensorFlow 

All you would do is add a node (folder) under deepLearningBase for both, eg
-ubuntu->common-deps->python3->deepLearningBase->TensorFlow-pyTorch

Now, you still have to make a dockerfile that combines pyTorch and TensorFlow dockerfiles, but
they key is that those files will be VERY SIMPLE, just a couple lines of install on top of deepLearningBase.

So what is really needed is for several Larger-scale github repositories like this, for different "worlds", such as Web Development, Deep Learning, Embedded software, etc.

Then you would just follow the tree to your required build, and if no one else made it yet, just add a node and combine 2 or 3 apt-get lines and make your new environment.

@chambm

This comment has been minimized.

chambm commented Jul 31, 2017

That looks like the "choose-your-own-adventure" style of composition. INCLUDE would be a lot simpler. Hell, I just want to compose a specified gcc image with nano so I don't have to install nano from apt-get every time!

@alexreg

This comment has been minimized.

alexreg commented Oct 30, 2017

I concur with @chambm in his above comment. There's no reason this shouldn't be possible in most cases (conflicts should be fairly rare, as they are on manually-managed OSes).

@1138-4EB

This comment has been minimized.

1138-4EB commented Apr 16, 2018

This is a use case quite similar to the one @cloutiertyler commented, where neither @FranklinYu 's solution, neither multi-stage builds commented by @cpuguy83 apply:

where:

  • The steps from A to C are exactly the same as those from B to D (dockerfileAC).
  • The development team of B knows nothing about C, D or E.
  • The development team of C knows nothing about B, D or E.

A user willing to build D (and/or E), must have access to dockerfileAC, but it is not required to know about dockerfileAB. Therefore, the user must have a better understanding of one dependency (C) than the other (B). Ideally, it should be possible to rely on teams A and B, and just build D as either A + (diff(B,A) + diff(C,A)) (merge) or A + diff(B,A) + diff(C,A) (rebase).

Because GHDL is not a web service and VUnit is not a web client, both tools need to be installed in the same image/container (E). Multi-stage builds are not useful, because we need to build a (probably unknown) dockerfile with two FROM labels, it is not a single forward chain.

If find this use case similar to the merge/rebase of two git branches: sometimes there are no conflicts, sometimes the conflicts are easily resolvable, sometimes it cannot be merged at all because there is no common history... Is there any tool, either official or external, that provides this feature? Note that it is ok if the tool just exports two images to git branches and actually uses git for the merge.

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Apr 16, 2018

@rainabba

This comment has been minimized.

rainabba commented May 2, 2018

Amazing this is still an issue and topic. How hard is it to "INCLUDE someimage", then when parsing it, check the base is compatible (in the FROM chain) and if so, execute the rest of THAT file at that point (as if I had copied the Dockerfile from the project and pasted it into mine)?

The whole "people will do bad things they don't realize" excuse is absurd in this context. This is already insanely complex and that why we need this to help simplify it.

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented May 2, 2018

@rainabba This is an entirely unhelpful comment.
There are basically two reasons for it's why it's not done, either:

  1. It's not so easy
  2. No one has taken the time to do the work.

In reality, it is usually both.

@rainabba

This comment has been minimized.

rainabba commented May 2, 2018

  1. It's a parsing and string-replace problem that any new coder could accomplish in all of 10 minutes IF they knew where in the code. I'm not saying it would be usable in all cases, but for the limited cases I'm seeing suggested here over and over (where bases are effectively common), it's a dead-ringer.

  2. Of course not, this thread provides ~102 reasons it can't or shouldn't be done, so why would anyone think to do it regardless?

On the other hand, my comment serves (like SO many others here) to demonstrate that there is a need and with the hope to influence either the obstructing attitudes or to at least act as a reminder. If that's "entirely unhelpful", then you've just explained why this issue (ignored feature request) is still here and active and it's not a technical one.

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented May 2, 2018

It's way more than parsing a string.
Docker and the Dockerfile is used by millions of people. Adding API's is a significant thing... even outside of that the underlying implementation is not "parsing a string".

In any case there's many proposals to solve the problem and this is a very old and closed issue.

@kenahoo

This comment has been minimized.

kenahoo commented May 3, 2018

I do think that if Docker doesn't figure out a clean solution to this scenario, it will probably be replaced by whatever tool does figure it out.

I noticed one of my colleagues using the following pattern, which might be a decent workaround:

ARG from
FROM $from
... rest of dockerfile

I haven't tried it myself though, so I'm not sure how it would work in practice, e.g. how it behaves with caching, etc.

@alexreg

This comment has been minimized.

alexreg commented May 3, 2018

Indeed, this is a very important problem, and hasn't been addressed properly. I'm amazed a company as big as Docker haven't tackled it yet.

@cosminonea

This comment has been minimized.

cosminonea commented Aug 8, 2018

Just my two cents... I am just learning more about Docker at the moment and I feel something like INCLUDE would be very useful. I liked the multiple inheritance example above and wanted to address the comments about possible problems and conflicts with it.

Multiple inheritance is hard in any language that supports it but when a conflict occurs it's the responsibility of the Docker file creator to rethink what they are doing and start again. Docker should just build the image and not try to prove the build has no issues.

@larytet

This comment has been minimized.

larytet commented Aug 8, 2018

@cosminonea

I feel something like INCLUDE would be very useful

I have support for macros in https://github.com/larytet/dockerfile-generator/ I could support "include" too.

@docbill

This comment has been minimized.

docbill commented Aug 9, 2018

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Aug 9, 2018

That last one is possible already; COPY --from accepts both a build-stage, or an image, so for example;

FROM busybox

COPY --from=alpine:latest / /
COPY --from=docker:latest /usr/local/bin/docker /usr/local/bin/

Edit; or to take the actual example;

FROM fedora

COPY --from=ubuntu:latest / /ubuntu/
COPY --from=debian:latest / /debian/
@docbill

This comment has been minimized.

docbill commented Aug 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment