Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build time only -v option #14080

Open
zrml opened this issue Jun 21, 2015 · 256 comments
Open

build time only -v option #14080

zrml opened this issue Jun 21, 2015 · 256 comments

Comments

@zrml
Copy link

@zrml zrml commented Jun 21, 2015

As suggested by @cpuguy83 in #3156
here is the use case for a flexible -v option at build time.

When building a Docker image I need to install a database and an app. It's all wrapped up in two tarballs: 1 for the DB and 1 for the App that needs to be installed in it (schema, objects, static data, credentials etc.). The whole solution is then run via a shell script that handles several shell variables and tune OS credentials and other things accordingly.
When I explode the above tarball (or use the Dockerfile ADD directive) the whole thing bloats up to about 1.5GB(!). Not ideal as you can immagine.

I would like to have this '-v /distrib/ready2installApp:/distrib' directive still possible (as it is today in the Dockerfile)
but

I would like to disassociate the declarative build process (infrastructure as code) from the container run-time deployable artifact. I do not want to have to deal with the dead weight of 1.5GB that I do not need.

Could we have an --unmount-volume option that I can run at the end of the Dockerfile?
or
Given how Volume works right now in a Dockerfile, maybe we need a new Dockerfile directive for a temporary volume that people use while installing? I think the Puppet example supplied by @fatherlinux was on a similar line...
or
Whatever you guys can think of.
The objective is avoiding to have to carry around all that dead weight that is useless for a deployed App or Service. However that dead weight is necessary @install-time. Not Everybody has a simple "yum install" from the official repositories. :)

thank you very much

@tpires
Copy link

@tpires tpires commented Jun 25, 2015

I'm looking for a similar solution.

Problem

Recently the enterprise I work enabled Zscaler proxy with SSL inspection, which implies having certificates installed and some environment variables set during build.

A temporarily solution was to create a new Dockerfile with certificates and environment variables set. But that doesn't seem reasonable, in a long term view.

So, my first thought was set a transparent proxy with HTTP and HTTPS, but again I need to pass a certificate during build.

The ideal scenario is with the same Dockerfile, I would be able to build my image on my laptop, at home, and enterprise.

Possible solution

# Enterprise
$ docker build -v /etc/ssl:/etc/ssl -t myimage .

# Home
$ docker build -t myimage .
@yngndrw
Copy link

@yngndrw yngndrw commented Jul 7, 2015

I have a slightly different use case for this feature - Caching packages which are downloaded / updated by the ASP.Net 5 package manager. The package manager manages its own cache folder so ultimately I just need a folder which I can re-use between builds.

I.e:

docker build -v /home/dokku/cache/dnx/packages:/opt/dnx/packages -t "dokku/aspnettest" .
@zrml
Copy link
Author

@zrml zrml commented Jul 16, 2015

@yngndrw what you propose would be OK for me too, i.e, we need to mount extra resources at build time that would not be necessary at run time as they have been installed in the container.

FWIW I saw somewhere in these pages somebody saying something along the line of (and I hope I'm paraphrasing it right) "resolve your compilation issue on a similar host machine then just install the deployable artifact or exe in the container".
I'm afraid it's not that simple guys. At times, I need to install in /usr/bin but I also need to edit some config file. I check for the OS I'm running on, the kernel params I need to tune, the files I need to create depending on variables or manifesto build files. There are many dependencies that are just not satisfied with a simple copy of a compiled product.

I re-state what I said when I open the issue: there is a difference between a manifest declaration file and its process and the run-time of an artifact.
If we truly believe in infrastructure-as-code and furthermore in immutable infrastructure, that Docker itself is promoting further & I like it btw, then this needs to be seriously considered IMO (see the bloating in post 1 herewith)

Thank you again

@fatherlinux
Copy link

@fatherlinux fatherlinux commented Aug 16, 2015

Another use case that is really interesting is upgrading software. There are times, like with FreeIPA, you should really test with a copy of the production data to makes sure that all of the different components can cleanly upgrade. You still want to do the upgrade in a "build" environment. You want the production copy of the data to live somewhere else so that when you move the new upgraded versions of the containers into production, they can mound the exact data that you did the upgrade on.

Another example, would be Satellite/Spacewalk which changes schema often and even changed databases from Oracle to Postgresql at version 5.6 (IIRC).

There are many, many scenarios when I temporarily need access to data while doing an upgrade of software in a containerized build, especially with distributed/micro services....

@fatherlinux
Copy link

@fatherlinux fatherlinux commented Aug 16, 2015

Essentially, I am now forced to do a manual upgrade by running a regular container with a -v bind mount, then doing a "docker commit." I cannot understand why the same capability wouldn't be available with an automated Dockerfile build?

@stevenschlansker
Copy link

@stevenschlansker stevenschlansker commented Aug 19, 2015

Seconding @yngndrw pointing out caching: the exact same reasoning applies to many popular projects such as Maven, npm, apt, rpm -- allowing a shared cache can dramatically speed up builds, but must not make it into the final image.

@NikonNLG
Copy link

@NikonNLG NikonNLG commented Aug 19, 2015

I agree with @stevenschlansker. It can be many requirements for attach cache volume, or some kind of few data gigabytes, which must present (in parsed state) on final image, but not as raw data.

@wjordan
Copy link

@wjordan wjordan commented Aug 20, 2015

I've also been bitten by the consistent resistance to extending docker build to support the volumes that can be used by docker run. I have not found the 'host-independent builds' mantra to be very convincing, as it only seems to make developing and iterating on Docker images more difficult and time-consuming when you need to re-download the entire package repository every time you rebuild an image.

My initial use case was a desire to cache OS package repositories to speed up development iteration. A workaround I've been using with some success is similar to the approach suggested by @fatherlinux, which is to just give up wrestling with docker build and the Dockerfile altogether, and start from scratch using docker run on a standard shell script followed by docker commit.

As a bit of an experiment, I extended my technique into a full-fledged replacement for docker build using a bit of POSIX shell scripting: dockerize.

If anyone wants to test out this script or the general approach, please let me know if it's interesting or helpful (or if it works at all for you). To use, put the script somewhere in your PATH and add it as a shebang for your build script (the #! thing), then set relevant environment variables before a second shebang line marking the start of your Docker installation script.

FROM, RUNDIR, and VOLUME variables will be automatically passed as arguments to docker run.
TAG, EXPOSE, and WORKDIR variables will be automatically passed as arguments to docker commit.

All other variables will be evaluated in the shell and passed as environment arguments to docker run, making them available within your build script.

For example, this script will cache and reuse Alpine Linux packages between builds (the VOLUME mounts a home directory to CACHE, which is then used as a symlink for the OS's package repository cache in the install script):

#!/usr/bin/env dockerize
FROM=alpine
TAG=${TAG:-wjordan/my-image}
WORKDIR=/var/cache/dockerize
CACHE=/var/cache/docker
EXPOSE=3001
VOLUME="${HOME}/.docker-cache:${CACHE} ${PWD}:${WORKDIR}:ro /tmp"
#!/bin/sh
ln -s ${CACHE}/apk /var/cache/apk
ln -s ${CACHE}/apk /etc/apk/cache
set -e
apk --update add gcc g++ make libc-dev python
[...etc etc build...]
@zrml
Copy link
Author

@zrml zrml commented Aug 24, 2015

So, after meeting the French contingent :) from Docker at MesoCon last week (it was a pleasure guys) I was made aware they have the same issue in-house and they developed a hack that copies over to a new slim image what they need.
I'd say that hacks are note welcome in the enterprise world ;) and this request should be properly handled.
Thank you for listening guys...

@raine
Copy link

@raine raine commented Sep 17, 2015

I'm also in favor of adding build-time -v flag to speed up builds by sharing a cache directory between them.

@zrml
Copy link
Author

@zrml zrml commented Sep 17, 2015

@yngndrw I don't understand why you closed two related issues. I read your #59 issue and I don't see how this relates to this. In some cases containers become super-bloated when it's not needed at run-time. Please read the 1st post.
I hope I'm not missing something here... as it has been a long day :-o

@yngndrw
Copy link

@yngndrw yngndrw commented Sep 17, 2015

@zrml Issue aspnet/aspnet-docker#59 was related to the built-in per-layer caching that docker provides during a build to all docker files, but this current issue is subtly different as we are talking about using host volumes to provide dockerfile-specific caching which is dependent on the dockerfile making special use of the volume. I closed issue aspnet/aspnet-docker#59 as it is not specifically related to the aspnet-docker project / repository.

The other issue that I think you're referring to is issue dokku/dokku#1231, which was regarding the Dokku processes explicitly disabling the built-in docker layer caching. Michael made a change to Dokku in order to allow this behaviour to be configurable and this resolved the issue in regards to the Dokku project / repository, so that issue was also closed.

There is possibly still a Docker-related issue that is outstanding (I.e. Why was Docker not handling the built-in layer caching as I expected in issue aspnet/aspnet-docker#59), but I haven't had a chance to work out why that is and confirm if it's still happening. If it is still an issue, then a new issue for this project / repository should be raised for it as it is distinct from this current issue.

@zrml
Copy link
Author

@zrml zrml commented Sep 18, 2015

@yngndrw exactly, so we agree this is different and known @docker.com so I'm re-opening it if you don't mind... well I cannot. Do you mind, please?
I'd like to see some comments from our colleagues in SF at least before we close it

BTW I was asked by @cpuguy83 to open a user case and explain it all, from log #3156

@yngndrw
Copy link

@yngndrw yngndrw commented Sep 18, 2015

@zrml I'm not sure I follow - Is it aspnet/aspnet-docker#59 that you want to re-open ? It isn't an /aspnet/aspnet-docker issue so I don't think it's right to re-open that issue. It should really be a new issue on /docker/docker, but would need to be verified and would need re-producible steps generating first.

@zrml
Copy link
Author

@zrml zrml commented Sep 18, 2015

no, no.. this one #14080 that you closed yesterday.

@yngndrw
Copy link

@yngndrw yngndrw commented Sep 18, 2015

This issue is still open ?

@zrml
Copy link
Author

@zrml zrml commented Sep 21, 2015

@yngndrw I believe I mis-read the red "closed" icon. Apologies.

@lukaso
Copy link

@lukaso lukaso commented Sep 22, 2015

Heartily agree that build time -v would be a huge help.

Build caching is one use case.

Another use case is using ssh keys at build time for building from private repos without them being stored in the layer, eliminating the need for hacks (though well engineered) such as this one: https://github.com/dockito/vault

@btrepp
Copy link

@btrepp btrepp commented Oct 8, 2015

I'm commenting here because this is hell in a corporate world.
We have a SSL intercepting proxy, while I can direct traffic through it, heaps of projects assume they have good SSL connections, so they die horribly.

Even though my machine (and thus the docker builder) trusts the proxy, docker images don't.
Worst still the best practice is now to use curl inside the container, so that is painful, I have to modify Dockerfiles to make them even build. I could mount the certificates with a -v option, and be happy.

This being said. Its less the fault of docker, more the fault of package managers using https when they should be using a system similar to how apt-get works. As that is still secure and verifyable, and also cacheable by a http proxy.

@zrml
Copy link
Author

@zrml zrml commented Oct 8, 2015

@btrepp thank you for another good use case.

@btrepp
Copy link

@btrepp btrepp commented Oct 9, 2015

I can think of another situation.

One of the things I would like to do with my dockerfiles is not ship the build tools with the "compiled" docker file. There's no reason a C app needs gcc, nor a ruby app need bundler in the image, but using docker build currently while have this.

An idea I've had is specifying a dockerfile, that runs multiple docker commands when building inside it. Psuedo-ish dockerfiles below.

Docker file that builds others

FROM dockerbuilder
RUN docker build -t docker/builder myapp/builder/Dockerfile
RUN docker run -v /app:/app builder
RUN docker build -t btrepp/myapplication myapp/Dockerfile

btrepp/myapplication dockerfile

FROM debian:jessie+sayrubyruntime
ADD . /app //(this is code thats been build using the builder dockerfile
ENTRYPOINT ["rails s"]

Here we have a temporary container that does all the bundling install/package management and any build scripts, but it produces the files that the runtime container needs.

The runtime container then just adds the results of this, meaning it shouldn't need much more than ruby installed. In the case of say GCC or even better statically linked go, we may not need anything other than the core OS files to run.

That would keep the docker images super light.

Issue here is that the temporary builder container would go away at the end, meaning it would be super expensive without the ability to load a cache of sorts, we would be grabbing debian:jessie a whole heap of times.

I've seen people to certain techniques like this, but using external http servers to add the build files. I would prefer to keep it all being build by docker. Though there is possibly a way of using a docker image to do this properly. Using run and thus being able to mount volumes.

@fatherlinux
Copy link

@fatherlinux fatherlinux commented Oct 14, 2015

Here is another example. Say I want to build a container for systemtap that has all of the debug symbols for the kernel in it (which are Yuuuuge). I have to mount the underlying /lib/modules so that the yum command knows which RPMs to install.

Furthermore, maybe I would rather have these live somewhere other than in the 1.5GB image (from the debug symbols)

I went to write a Dockerfile, then realize it was impossible :-(

docker run --privileged -v /lib/modules:/lib/modules --tty=true --interactive=true rhel7/rhel-tools /bin/bash
yum --enablerepo=rhel-7-server-debug-rpms install kernel-debuginfo-$(uname -r) kernel-devel-$(uname -r)
docker ps -a
CONTAINER ID        IMAGE                     COMMAND             CREATED             STATUS                        PORTS               NAMES
52dac30dc495        rhel7/rhel-tools:latest   "/bin/bash"         34 minutes ago      Exited (0) 15 minutes ago                         dreamy_thompson
docker commit dreamy_thompson stap:latest

https://access.redhat.com/solutions/1420883

@jeremyherbert
Copy link

@jeremyherbert jeremyherbert commented Nov 19, 2015

I'd like to repeat my use case here from #3949 as that bug has been closed for other reasons.

I'd really like to sandbox proprietary software in docker. It's illegal for me to host it anywhere, and the download process is not realistically (or legally) able to be automated. In total, the installers come to about 22GB (and they are getting bigger with each release). I think it's silly to expect that this should be copied into the docker image at build time.

@zrml
Copy link
Author

@zrml zrml commented Nov 19, 2015

Any news in this needed feature?
thank you

@GordonTheTurtle
Copy link

@GordonTheTurtle GordonTheTurtle commented Nov 30, 2015

USER POLL

The best way to get notified when there are changes in this discussion is by clicking the Subscribe button in the top right.

The people listed below have appreciated your meaningfull discussion with a random +1:

@vad

@kris-nova
Copy link

@kris-nova kris-nova commented Jun 12, 2020

I bet if someone dropped a few links and an example we could convince our friends to press the shiny close button.


(I would also benefit from them)

@thisismydesign
Copy link

@thisismydesign thisismydesign commented Jun 12, 2020

The use case of caching isn't solved for me and many others as the build time volumes with buildkit are not present in the final image.

@kris-nova
Copy link

@kris-nova kris-nova commented Jun 12, 2020

So I was able to pull all my build artifacts out of the temporary volume used at build time and reconstruct the image again with the previous cache using this bash I mentioned above.

I was able able to rebuild my image on top of itself such that the overlay filesystem only grabbed a small delta.

I was even able to re-use the volume for other images at build time.


are other folks not able to do this?

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Jun 12, 2020

(cache) mounts are in the "experimental" front-end; described in https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md (about to head into a meeting, but I can link more extended examples)

@kris-nova
Copy link

@kris-nova kris-nova commented Jun 12, 2020

thanks @thaJeztah LMK if I can help here in any way :)

@thisismydesign
Copy link

@thisismydesign thisismydesign commented Jun 12, 2020

#14080 (comment)

@thisismydesign sorry to ruin your excitement, but you can't --cache node_modules, it will not be present in the final image, so your app is broken.

@thaJeztah I don't believe the issue above is solved. Would love to take a look at some examples where it's possible to cache e.g. npm install during build time that will also allow the resulting image to use the cached installation.

@kris-nova I didn't solve this problem but then again I'm not looking to use bash scripts. Perhaps we need a new issue but this is a pretty common use case that AFAIK isn't solved yet.

@thisismydesign
Copy link

@thisismydesign thisismydesign commented Jun 12, 2020

@thaJeztah Here are some examples using cache mounts showing how the final image won't contain the mount and there it doesn't cover many use cases of build-time caching:

@ankon
Copy link

@ankon ankon commented Jun 12, 2020

For npm: Wouldn't one use the cache mounts for the npm cache directory (see https://docs.npmjs.com/cli-commands/cache.html, usually ~/.npm)?

@thisismydesign
Copy link

@thisismydesign thisismydesign commented Jun 12, 2020

@ankon That could work, thanks, I'll give it a try. Another use case I'm not sure about is Bundler and Ruby.

@thisismydesign
Copy link

@thisismydesign thisismydesign commented Jun 12, 2020

So I think (haven't tested yet) for Bundler you can at least get rid of the network dependency by using a build volume at $BUNDLE_PATH and then during the build

bundle install
bundle package
bundle install --standalone --local

This basically means you have a cached bundle install directory, from there you package gems into ./vendor/cache and re-install into ./bundle. But this doesn't spare the time around installing and building gems, it might actually make the build step longer.

@cpuguy83
Copy link
Contributor

@cpuguy83 cpuguy83 commented Jun 12, 2020

If you want to save the cached data into the image, then copy it into the image from the cache.

@thisismydesign
Copy link

@thisismydesign thisismydesign commented Jun 12, 2020

Thanks, however, it still is more a workaround because

  • you have to do an additional copy
  • I assume you have to have different directories between build and run environments (you cannot use the directory where you mounted a volume during the build, right?) so it requires additional setup

I don't know how much effort would it be to simply have a native option for mounting the same volume into the final image but I'm pretty sure it'd make the usage easier. These are just 2 examples from script languages where the way to use this cache wasn't obvious to me. I can most certainly imagine this will come up in different contexts as well.

@cpuguy83
Copy link
Contributor

@cpuguy83 cpuguy83 commented Jun 12, 2020

@thisismydesign It seems like what you want is to be able to share a cache between build and run?

@isanych
Copy link

@isanych isanych commented Jun 12, 2020

buildkit is a linux only solution, what do we do on windows?

@Bessonov
Copy link

@Bessonov Bessonov commented Jun 12, 2020

@thisismydesign I'm not sure why do you expect a (cache) mount to stay in the final image. I wouldn't expect this and I don't want to have ~1gb in my image just because of using download cache mount.

@nigelgbanks
Copy link

@nigelgbanks nigelgbanks commented Jun 13, 2020

buildkit is a linux only solution, what do we do on windows?

You can use buildkit on Windows.

https://docs.docker.com/develop/develop-images/build_enhancements/

You may find it easier to set the daemon setting through the Docker for Windows UI rather than setting the environment variable before executing.

@isanych
Copy link

@isanych isanych commented Jun 13, 2020

@nigelgbanks at the top of your link:

Limitations
Only supported for building Linux containers
@nigelgbanks
Copy link

@nigelgbanks nigelgbanks commented Jun 13, 2020

Oh sorry I just assume you were building Linux containers on Windows.

@thisismydesign
Copy link

@thisismydesign thisismydesign commented Jun 13, 2020

@thisismydesign It seems like what you want is to be able to share a cache between build and run?

That would solve my use case around caching, yes.

@westurner
Copy link

@westurner westurner commented Jun 13, 2020

@unilynx
Copy link

@unilynx unilynx commented Jun 14, 2020

Do any CI services support experimental buildkit features?

Do they have to explicitly support it? I'm using gitlab-ci with buildkit and it just works. After all, it's just a different way of invoking 'docker build'.

Of course, unless you bring your own runners to gitlab, odds of getting a cache hit during build are low anyway.

@westurner
Copy link

@westurner westurner commented Jun 14, 2020

Copying from a named stage of a multi-stage build is another solution

FROM golang:1.7.3 AS builder
COPY --from=builder

But then container image locality is still a mostly-unsolved issue for CI job scheduling

Runners would need to be more sticky and share (intermediate) images in a common filesystem in order to minimize unnecessary requests to (perenially-underfunded) package repos.

@ppenguin
Copy link

@ppenguin ppenguin commented Jun 22, 2020

I just tried buildkit but it only marginally improves my workflow, which would be 100% helped by "real" volume or bind mounts to the host.

I am using docker build to cross-compile old glibc versions which should then be part of new build containers providing these glibcs to build under and link against.

Now the repeated glibc source download is solved by a bind mount (from buildkit), the archive can be read only, no problem. But I have no way to access the build dir for analysis after failed builds, since the container bombs out on error. (If I restart it to access it, it restarts the build, so that doesn't help).

Also, I fail to see why I should be jumping through hoops like building a new container from an old one just to get rid of my build dir, where if the build dir would have been a mount in the first place, it would have been so easy. (Just do make install after the build and I have a clean container without build dir and without the downloaded sources).

So I still believe this is a very valid feature request and would make our lives a lot easier. Just because a feature could be abused and could break other functionality if used, does not mean it should be avoided to implement it at all cost. Just consider it an extra use for a more powerful tool.

@cpuguy83
Copy link
Contributor

@cpuguy83 cpuguy83 commented Jun 22, 2020

But I have no way to access the build dir for analysis after failed builds

Sounds like a feature request for buildkit. This is definitely a known missing piece.

One could do this today by having a target for fetching the "build dir". You'd just run that after a failed run, everything should still be cached, just need the last step to grab the data.
Understand this is a bit of a work-around, though.

Also, I fail to see why I should be jumping through hoops like building a new container from an old one just to get rid of my build dir

Can you explain more what you are wanting/expecting here?

@ppenguin
Copy link

@ppenguin ppenguin commented Jun 22, 2020

Can you explain more what you are wanting/expecting here?

In this case it's just wanting to kill 2 birds with 1 stone:

  • have an easy way to access intermediate results from the host (here "build dir analysis")
  • be sure that this storage space is not polluting the newly built image

Since this, and all the other cases where the build container (as well as "container build") needs to make building as painless as possible, would solved so much more elegantly by just providing -v functionality, I have a hard time understanding the resistance to provide this feature. Apart from the "cache-aware" functionality buildkit apparently offers, I can only see it as a convoluted and cumbersome way to achieve exactly this functionality, and only partially at that. (And in many cases where caching is the main goal, it would also be solved by -v, at the cost of having to lock the mounted volume to a specific container as long as it runs, but the cache with buildkit has the same restrictions afaict.)

@mcattle
Copy link

@mcattle mcattle commented Jun 27, 2020

Can you explain more what you are wanting/expecting here?

I'm using a multi-stage build process, where the build environment itself is containerized, and the end result is an image containing only the application and the runtime environment (without the build tools).

What I'd like is some way for the interim Docker build container to output unit test and code coverage results files to the host system in the events of both a successful build and a failed build, without having to pass them into the build output image for extraction (because the whole build process is short-circuited if the unit tests don't pass in the earlier step, so there won't be an output image in that situation, and that's when we need the unit test results the most). I figure if a host volume could be mounted to the Docker build process, then the internal test commands can direct their output to the mounted folder.

@ppenguin
Copy link

@ppenguin ppenguin commented Jun 27, 2020

@mcattle
Indeed very similar also to (one of the functionalities) I need.
Since moving to buildah a few days ago I got every function I needed and more. Debugging my build container would have been utterly impossible without the possibility to flexibly enter the exited container and links to the host. Now I'm a happy camper. (I'm sorry to crash the party with a "competitor", I'd happily remove this comment if offence is taken, but it was such an effective solution for the use cases presented in this thread that I thought I should mention it).

@cpuguy83
Copy link
Contributor

@cpuguy83 cpuguy83 commented Jun 27, 2020

@westurner
Copy link

@westurner westurner commented Jun 27, 2020

without having to resort to bind mounts from the client.

Here I explain why a build time -v option is not resorting to or sacrificing reproducibility any more than depending on network resources at build time.

#14080 (comment) :

COPY || REMOTE_FETCH || read()

  • Which of these are most reproducible?

I'm going with buildah for build time -v (and cgroupsv2) as well.

@xellsys
Copy link

@xellsys xellsys commented Jun 28, 2020

@mcattle I have had the same requirement. I solved it with labeling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.