Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build time only -v option #14080

Open
zrml opened this issue Jun 21, 2015 · 270 comments
Open

build time only -v option #14080

zrml opened this issue Jun 21, 2015 · 270 comments
Labels
area/builder kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@zrml
Copy link

zrml commented Jun 21, 2015

As suggested by @cpuguy83 in #3156
here is the use case for a flexible -v option at build time.

When building a Docker image I need to install a database and an app. It's all wrapped up in two tarballs: 1 for the DB and 1 for the App that needs to be installed in it (schema, objects, static data, credentials etc.). The whole solution is then run via a shell script that handles several shell variables and tune OS credentials and other things accordingly.
When I explode the above tarball (or use the Dockerfile ADD directive) the whole thing bloats up to about 1.5GB(!). Not ideal as you can immagine.

I would like to have this '-v /distrib/ready2installApp:/distrib' directive still possible (as it is today in the Dockerfile)
but

I would like to disassociate the declarative build process (infrastructure as code) from the container run-time deployable artifact. I do not want to have to deal with the dead weight of 1.5GB that I do not need.

Could we have an --unmount-volume option that I can run at the end of the Dockerfile?
or
Given how Volume works right now in a Dockerfile, maybe we need a new Dockerfile directive for a temporary volume that people use while installing? I think the Puppet example supplied by @fatherlinux was on a similar line...
or
Whatever you guys can think of.
The objective is avoiding to have to carry around all that dead weight that is useless for a deployed App or Service. However that dead weight is necessary @install-time. Not Everybody has a simple "yum install" from the official repositories. :)

thank you very much

@tpires
Copy link

tpires commented Jun 25, 2015

I'm looking for a similar solution.

Problem

Recently the enterprise I work enabled Zscaler proxy with SSL inspection, which implies having certificates installed and some environment variables set during build.

A temporarily solution was to create a new Dockerfile with certificates and environment variables set. But that doesn't seem reasonable, in a long term view.

So, my first thought was set a transparent proxy with HTTP and HTTPS, but again I need to pass a certificate during build.

The ideal scenario is with the same Dockerfile, I would be able to build my image on my laptop, at home, and enterprise.

Possible solution

# Enterprise
$ docker build -v /etc/ssl:/etc/ssl -t myimage .

# Home
$ docker build -t myimage .

@yngndrw
Copy link

yngndrw commented Jul 7, 2015

I have a slightly different use case for this feature - Caching packages which are downloaded / updated by the ASP.Net 5 package manager. The package manager manages its own cache folder so ultimately I just need a folder which I can re-use between builds.

I.e:

docker build -v /home/dokku/cache/dnx/packages:/opt/dnx/packages -t "dokku/aspnettest" .

@jessfraz jessfraz added the kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny label Jul 10, 2015
@zrml
Copy link
Author

zrml commented Jul 16, 2015

@yngndrw what you propose would be OK for me too, i.e, we need to mount extra resources at build time that would not be necessary at run time as they have been installed in the container.

FWIW I saw somewhere in these pages somebody saying something along the line of (and I hope I'm paraphrasing it right) "resolve your compilation issue on a similar host machine then just install the deployable artifact or exe in the container".
I'm afraid it's not that simple guys. At times, I need to install in /usr/bin but I also need to edit some config file. I check for the OS I'm running on, the kernel params I need to tune, the files I need to create depending on variables or manifesto build files. There are many dependencies that are just not satisfied with a simple copy of a compiled product.

I re-state what I said when I open the issue: there is a difference between a manifest declaration file and its process and the run-time of an artifact.
If we truly believe in infrastructure-as-code and furthermore in immutable infrastructure, that Docker itself is promoting further & I like it btw, then this needs to be seriously considered IMO (see the bloating in post 1 herewith)

Thank you again

@fatherlinux
Copy link

Another use case that is really interesting is upgrading software. There are times, like with FreeIPA, you should really test with a copy of the production data to makes sure that all of the different components can cleanly upgrade. You still want to do the upgrade in a "build" environment. You want the production copy of the data to live somewhere else so that when you move the new upgraded versions of the containers into production, they can mound the exact data that you did the upgrade on.

Another example, would be Satellite/Spacewalk which changes schema often and even changed databases from Oracle to Postgresql at version 5.6 (IIRC).

There are many, many scenarios when I temporarily need access to data while doing an upgrade of software in a containerized build, especially with distributed/micro services....

@fatherlinux
Copy link

Essentially, I am now forced to do a manual upgrade by running a regular container with a -v bind mount, then doing a "docker commit." I cannot understand why the same capability wouldn't be available with an automated Dockerfile build?

@stevenschlansker
Copy link

Seconding @yngndrw pointing out caching: the exact same reasoning applies to many popular projects such as Maven, npm, apt, rpm -- allowing a shared cache can dramatically speed up builds, but must not make it into the final image.

@NikonNLG
Copy link

I agree with @stevenschlansker. It can be many requirements for attach cache volume, or some kind of few data gigabytes, which must present (in parsed state) on final image, but not as raw data.

@wjordan
Copy link

wjordan commented Aug 20, 2015

I've also been bitten by the consistent resistance to extending docker build to support the volumes that can be used by docker run. I have not found the 'host-independent builds' mantra to be very convincing, as it only seems to make developing and iterating on Docker images more difficult and time-consuming when you need to re-download the entire package repository every time you rebuild an image.

My initial use case was a desire to cache OS package repositories to speed up development iteration. A workaround I've been using with some success is similar to the approach suggested by @fatherlinux, which is to just give up wrestling with docker build and the Dockerfile altogether, and start from scratch using docker run on a standard shell script followed by docker commit.

As a bit of an experiment, I extended my technique into a full-fledged replacement for docker build using a bit of POSIX shell scripting: dockerize.

If anyone wants to test out this script or the general approach, please let me know if it's interesting or helpful (or if it works at all for you). To use, put the script somewhere in your PATH and add it as a shebang for your build script (the #! thing), then set relevant environment variables before a second shebang line marking the start of your Docker installation script.

FROM, RUNDIR, and VOLUME variables will be automatically passed as arguments to docker run.
TAG, EXPOSE, and WORKDIR variables will be automatically passed as arguments to docker commit.

All other variables will be evaluated in the shell and passed as environment arguments to docker run, making them available within your build script.

For example, this script will cache and reuse Alpine Linux packages between builds (the VOLUME mounts a home directory to CACHE, which is then used as a symlink for the OS's package repository cache in the install script):

#!/usr/bin/env dockerize
FROM=alpine
TAG=${TAG:-wjordan/my-image}
WORKDIR=/var/cache/dockerize
CACHE=/var/cache/docker
EXPOSE=3001
VOLUME="${HOME}/.docker-cache:${CACHE} ${PWD}:${WORKDIR}:ro /tmp"
#!/bin/sh
ln -s ${CACHE}/apk /var/cache/apk
ln -s ${CACHE}/apk /etc/apk/cache
set -e
apk --update add gcc g++ make libc-dev python
[...etc etc build...]

@zrml
Copy link
Author

zrml commented Aug 24, 2015

So, after meeting the French contingent :) from Docker at MesoCon last week (it was a pleasure guys) I was made aware they have the same issue in-house and they developed a hack that copies over to a new slim image what they need.
I'd say that hacks are note welcome in the enterprise world ;) and this request should be properly handled.
Thank you for listening guys...

@raine
Copy link

raine commented Sep 17, 2015

I'm also in favor of adding build-time -v flag to speed up builds by sharing a cache directory between them.

@zrml
Copy link
Author

zrml commented Sep 17, 2015

@yngndrw I don't understand why you closed two related issues. I read your #59 issue and I don't see how this relates to this. In some cases containers become super-bloated when it's not needed at run-time. Please read the 1st post.
I hope I'm not missing something here... as it has been a long day :-o

@yngndrw
Copy link

yngndrw commented Sep 17, 2015

@zrml Issue aspnet/aspnet-docker#59 was related to the built-in per-layer caching that docker provides during a build to all docker files, but this current issue is subtly different as we are talking about using host volumes to provide dockerfile-specific caching which is dependent on the dockerfile making special use of the volume. I closed issue aspnet/aspnet-docker#59 as it is not specifically related to the aspnet-docker project / repository.

The other issue that I think you're referring to is issue dokku/dokku#1231, which was regarding the Dokku processes explicitly disabling the built-in docker layer caching. Michael made a change to Dokku in order to allow this behaviour to be configurable and this resolved the issue in regards to the Dokku project / repository, so that issue was also closed.

There is possibly still a Docker-related issue that is outstanding (I.e. Why was Docker not handling the built-in layer caching as I expected in issue aspnet/aspnet-docker#59), but I haven't had a chance to work out why that is and confirm if it's still happening. If it is still an issue, then a new issue for this project / repository should be raised for it as it is distinct from this current issue.

@zrml
Copy link
Author

zrml commented Sep 18, 2015

@yngndrw exactly, so we agree this is different and known @docker.com so I'm re-opening it if you don't mind... well I cannot. Do you mind, please?
I'd like to see some comments from our colleagues in SF at least before we close it

BTW I was asked by @cpuguy83 to open a user case and explain it all, from log #3156

@yngndrw
Copy link

yngndrw commented Sep 18, 2015

@zrml I'm not sure I follow - Is it aspnet/aspnet-docker#59 that you want to re-open ? It isn't an /aspnet/aspnet-docker issue so I don't think it's right to re-open that issue. It should really be a new issue on /docker/docker, but would need to be verified and would need re-producible steps generating first.

@zrml
Copy link
Author

zrml commented Sep 18, 2015

no, no.. this one #14080 that you closed yesterday.

@yngndrw
Copy link

yngndrw commented Sep 18, 2015

This issue is still open ?

@zrml
Copy link
Author

zrml commented Sep 21, 2015

@yngndrw I believe I mis-read the red "closed" icon. Apologies.

@lukaso
Copy link

lukaso commented Sep 22, 2015

Heartily agree that build time -v would be a huge help.

Build caching is one use case.

Another use case is using ssh keys at build time for building from private repos without them being stored in the layer, eliminating the need for hacks (though well engineered) such as this one: https://github.com/dockito/vault

@btrepp
Copy link

btrepp commented Oct 8, 2015

I'm commenting here because this is hell in a corporate world.
We have a SSL intercepting proxy, while I can direct traffic through it, heaps of projects assume they have good SSL connections, so they die horribly.

Even though my machine (and thus the docker builder) trusts the proxy, docker images don't.
Worst still the best practice is now to use curl inside the container, so that is painful, I have to modify Dockerfiles to make them even build. I could mount the certificates with a -v option, and be happy.

This being said. Its less the fault of docker, more the fault of package managers using https when they should be using a system similar to how apt-get works. As that is still secure and verifyable, and also cacheable by a http proxy.

@zrml
Copy link
Author

zrml commented Oct 8, 2015

@btrepp thank you for another good use case.

@btrepp
Copy link

btrepp commented Oct 9, 2015

I can think of another situation.

One of the things I would like to do with my dockerfiles is not ship the build tools with the "compiled" docker file. There's no reason a C app needs gcc, nor a ruby app need bundler in the image, but using docker build currently while have this.

An idea I've had is specifying a dockerfile, that runs multiple docker commands when building inside it. Psuedo-ish dockerfiles below.

Docker file that builds others

FROM dockerbuilder
RUN docker build -t docker/builder myapp/builder/Dockerfile
RUN docker run -v /app:/app builder
RUN docker build -t btrepp/myapplication myapp/Dockerfile

btrepp/myapplication dockerfile

FROM debian:jessie+sayrubyruntime
ADD . /app //(this is code thats been build using the builder dockerfile
ENTRYPOINT ["rails s"]

Here we have a temporary container that does all the bundling install/package management and any build scripts, but it produces the files that the runtime container needs.

The runtime container then just adds the results of this, meaning it shouldn't need much more than ruby installed. In the case of say GCC or even better statically linked go, we may not need anything other than the core OS files to run.

That would keep the docker images super light.

Issue here is that the temporary builder container would go away at the end, meaning it would be super expensive without the ability to load a cache of sorts, we would be grabbing debian:jessie a whole heap of times.

I've seen people to certain techniques like this, but using external http servers to add the build files. I would prefer to keep it all being build by docker. Though there is possibly a way of using a docker image to do this properly. Using run and thus being able to mount volumes.

@fatherlinux
Copy link

Here is another example. Say I want to build a container for systemtap that has all of the debug symbols for the kernel in it (which are Yuuuuge). I have to mount the underlying /lib/modules so that the yum command knows which RPMs to install.

Furthermore, maybe I would rather have these live somewhere other than in the 1.5GB image (from the debug symbols)

I went to write a Dockerfile, then realize it was impossible :-(

docker run --privileged -v /lib/modules:/lib/modules --tty=true --interactive=true rhel7/rhel-tools /bin/bash
yum --enablerepo=rhel-7-server-debug-rpms install kernel-debuginfo-$(uname -r) kernel-devel-$(uname -r)
docker ps -a
CONTAINER ID        IMAGE                     COMMAND             CREATED             STATUS                        PORTS               NAMES
52dac30dc495        rhel7/rhel-tools:latest   "/bin/bash"         34 minutes ago      Exited (0) 15 minutes ago                         dreamy_thompson
docker commit dreamy_thompson stap:latest

https://access.redhat.com/solutions/1420883

@jeremyherbert
Copy link

I'd like to repeat my use case here from #3949 as that bug has been closed for other reasons.

I'd really like to sandbox proprietary software in docker. It's illegal for me to host it anywhere, and the download process is not realistically (or legally) able to be automated. In total, the installers come to about 22GB (and they are getting bigger with each release). I think it's silly to expect that this should be copied into the docker image at build time.

@zrml
Copy link
Author

zrml commented Nov 19, 2015

Any news in this needed feature?
thank you

@GordonTheTurtle
Copy link

USER POLL

The best way to get notified when there are changes in this discussion is by clicking the Subscribe button in the top right.

The people listed below have appreciated your meaningfull discussion with a random +1:

@vad

@isanych
Copy link

isanych commented Jun 13, 2020

@nigelgbanks at the top of your link:

Limitations
Only supported for building Linux containers

@nigelgbanks
Copy link

Oh sorry I just assume you were building Linux containers on Windows.

@thisismydesign
Copy link

@thisismydesign It seems like what you want is to be able to share a cache between build and run?

That would solve my use case around caching, yes.

@westurner
Copy link

westurner commented Jun 13, 2020 via email

@unilynx
Copy link

unilynx commented Jun 14, 2020

Do any CI services support experimental buildkit features?

Do they have to explicitly support it? I'm using gitlab-ci with buildkit and it just works. After all, it's just a different way of invoking 'docker build'.

Of course, unless you bring your own runners to gitlab, odds of getting a cache hit during build are low anyway.

@westurner
Copy link

westurner commented Jun 14, 2020

Copying from a named stage of a multi-stage build is another solution

FROM golang:1.7.3 AS builder
COPY --from=builder

But then container image locality is still a mostly-unsolved issue for CI job scheduling

Runners would need to be more sticky and share (intermediate) images in a common filesystem in order to minimize unnecessary requests to (perenially-underfunded) package repos.

@ppenguin
Copy link

I just tried buildkit but it only marginally improves my workflow, which would be 100% helped by "real" volume or bind mounts to the host.

I am using docker build to cross-compile old glibc versions which should then be part of new build containers providing these glibcs to build under and link against.

Now the repeated glibc source download is solved by a bind mount (from buildkit), the archive can be read only, no problem. But I have no way to access the build dir for analysis after failed builds, since the container bombs out on error. (If I restart it to access it, it restarts the build, so that doesn't help).

Also, I fail to see why I should be jumping through hoops like building a new container from an old one just to get rid of my build dir, where if the build dir would have been a mount in the first place, it would have been so easy. (Just do make install after the build and I have a clean container without build dir and without the downloaded sources).

So I still believe this is a very valid feature request and would make our lives a lot easier. Just because a feature could be abused and could break other functionality if used, does not mean it should be avoided to implement it at all cost. Just consider it an extra use for a more powerful tool.

@cpuguy83
Copy link
Member

But I have no way to access the build dir for analysis after failed builds

Sounds like a feature request for buildkit. This is definitely a known missing piece.

One could do this today by having a target for fetching the "build dir". You'd just run that after a failed run, everything should still be cached, just need the last step to grab the data.
Understand this is a bit of a work-around, though.

Also, I fail to see why I should be jumping through hoops like building a new container from an old one just to get rid of my build dir

Can you explain more what you are wanting/expecting here?

@ppenguin
Copy link

Can you explain more what you are wanting/expecting here?

In this case it's just wanting to kill 2 birds with 1 stone:

  • have an easy way to access intermediate results from the host (here "build dir analysis")
  • be sure that this storage space is not polluting the newly built image

Since this, and all the other cases where the build container (as well as "container build") needs to make building as painless as possible, would solved so much more elegantly by just providing -v functionality, I have a hard time understanding the resistance to provide this feature. Apart from the "cache-aware" functionality buildkit apparently offers, I can only see it as a convoluted and cumbersome way to achieve exactly this functionality, and only partially at that. (And in many cases where caching is the main goal, it would also be solved by -v, at the cost of having to lock the mounted volume to a specific container as long as it runs, but the cache with buildkit has the same restrictions afaict.)

@mcattle
Copy link

mcattle commented Jun 27, 2020

Can you explain more what you are wanting/expecting here?

I'm using a multi-stage build process, where the build environment itself is containerized, and the end result is an image containing only the application and the runtime environment (without the build tools).

What I'd like is some way for the interim Docker build container to output unit test and code coverage results files to the host system in the events of both a successful build and a failed build, without having to pass them into the build output image for extraction (because the whole build process is short-circuited if the unit tests don't pass in the earlier step, so there won't be an output image in that situation, and that's when we need the unit test results the most). I figure if a host volume could be mounted to the Docker build process, then the internal test commands can direct their output to the mounted folder.

@ppenguin
Copy link

@mcattle
Indeed very similar also to (one of the functionalities) I need.
Since moving to buildah a few days ago I got every function I needed and more. Debugging my build container would have been utterly impossible without the possibility to flexibly enter the exited container and links to the host. Now I'm a happy camper. (I'm sorry to crash the party with a "competitor", I'd happily remove this comment if offence is taken, but it was such an effective solution for the use cases presented in this thread that I thought I should mention it).

@cpuguy83
Copy link
Member

cpuguy83 commented Jun 27, 2020 via email

@westurner
Copy link

without having to resort to bind mounts from the client.

Here I explain why a build time -v option is not resorting to or sacrificing reproducibility any more than depending on network resources at build time.

#14080 (comment) :

COPY || REMOTE_FETCH || read()

  • Which of these are most reproducible?

I'm going with buildah for build time -v (and cgroupsv2) as well.

@xellsys
Copy link

xellsys commented Jun 28, 2020

@mcattle I have had the same requirement. I solved it with labeling.

@eero-t
Copy link

eero-t commented Aug 27, 2020

I'm going with buildah for build time -v (and cgroupsv2) as well.

I'm seriously considering switch from Ubuntu (which has just docker) to Fedora (which has replaced docker with podman/buildah) on our build server because of "-v" support.

Btw. Podman supports also rootless mode, and so far it has seemed fully Docker compatible (except for differences in --user/USER impact, and image caching, that come from using rootless mode instead of running as root like Docker daemon does).

PS. while cgroups v2 is needed for rootless operation, support for that is more about container runtime, than docker. If you use CRun instead of RunC (like Fedora does), you would have cgroups v2 support. RunC does have some v2 & rootless support in Git, but I had some problems when testing it on Fedora (31) few months ago.

EDIT: Ubuntu has podman/buildah/etc in Groovy, just not in latest 20.04 LTS, I think imported from Debian unstable. It hasn't been backported to LTS, at least not yet. Whereas it's been in Fedora since 2018 I think.

@thaJeztah
Copy link
Member

@eero-t perhaps you could describe your use-case, and what's missing in the options that BuildKit currently provides that is not addressed for those.

@Jean-Daniel
Copy link

I have a simple use case. I want to install a local .deb package.

Actually, I have to ADD/COPY it into the image, and then RUN apt install ./package.deb && rm package.deb

With the capability to mount volume, it would be possible to install the package without having to create a layer that include the .deb itself.

@cpuguy83
Copy link
Member

@Jean-Daniel You can use

RUN --mount=source=package.deb,target=/tmp/package.deb dpkg -i /tmp/package.deb

In order to achieve this you need to be building with buildkit AND using at least v1.2 of the dockerfile spec. (1.1-experimental also has it).

You can specify which dockerfile version at the head of your Dockerfile

# syntax=docker/dockerfile:1.2

Or I believe this is default in Docker 20.10 (if you are building with buildkit).

@Jean-Daniel
Copy link

Thanks a lot for the reminder.

I'm already using it for caching, but miss the fact that it can be used to mount a local dir, and even better, a dir from an other build stage (using the from parameter) :-)

@brunoais
Copy link

brunoais commented Apr 18, 2021

RUN --mount=source=package.deb,target=/tmp/package.deb dpkg -i /tmp/package.deb

@cpuguy83 Where can I find this in the manual? I can only find the --mount (vs -v) page and a page about using the RUN --mount but only for secrets.

@glensc
Copy link
Contributor

glensc commented Apr 18, 2021

@brunoais https://docs.docker.com/develop/develop-images/build_enhancements/

@brunoais
Copy link

brunoais commented Apr 18, 2021

@glensc It doesn't mention about sources and targets, if I read it right. Did I miss anything?
(meanwhile I found the links again and I added them to my post).

@Bessonov
Copy link

Wow, I feel quite old commenting 4 years after my previous comment 😺

RUN --mount=type=cache

is an amazing feature, and I use it daily to speed up local as well as CI builds. 👍

However, my use case involves security. I am in a project where we develop inside containers, and it works really well. Considering events of supply chain vulnerabilities, this provides great isolation and protects the system and files outside of the root of the project. You know, usual stuff like confidential documents and kompormats, passphrase-less SSH keys, .thunderbird and OneDrive folders, plain-text passwords, bitcoin wallets, my nude photos, etc. The Dockerfile involves the installation of great tools like (shameless advertising of tools I love!) fnm, pnpm, and playwright. However, because there is no possibility to mount folders like ~/.local and ~/.cache, I am forced to implement workarounds like copying from/to cache and packaging them into the image. Even worse, I can't just run pnpm install while building the image to populate node_modules and node_modules/.pnpm folders. Therefore, I need an init script that performs these steps inside the container, where I can use read-write bind mounts.

Please give us the option to use our disks during the build in an appropriate way, not only as a cache 🙏

The readwrite option of --mount=type=bind is highly misleading and reserves the name of a really helpful option. Can you please deprecate it and rename it to something like readnobackwrite until we have a real readwrite option?

@cpuguy83
Copy link
Member

@Bessonov PTAL at build secrets: https://docs.docker.com/build/building/secrets/
There is built-in support for passing through an ssh agent, adding secrets, etc.

@Bessonov
Copy link

@cpuguy83 Thanks! However, I am not sure I understand your comment. I would love to:
Host:

ls -la
package.json
src/

Dockerfile:

... snip ...
WORKDIR /home/dev/app
RUN \
	--mount=type=bind,source=./,target=/home/dev/app,uid=1000,gid=1000 \
<<EOF

pnpm install

EOF

Host:

docker compose build my-glamour-dev-container

... snip ...

ls -la
package.json
src/
node_modules/

How can SSH agent or secrets help me?

@cpuguy83
Copy link
Member

I may have misread your comment re: ssh keys and secrets.

From your example, I don't think we'd have any means of doing exactly what you specified, however...

... snip ...
WORKDIR /home/dev/app
RUN \
	--mount=type=bind,source=./,target=/home/dev/app,uid=1000,gid=1000 \
<<EOF

pnpm install

EOF

FROM scratch
COPY --from=0 /home/dev/app/node_modules /

Couple that with --output=./node_modules

@cpuguy83
Copy link
Member

Not sure if compose will give let you set an output dir like that since it is container focussed, but docker build will.

@Bessonov
Copy link

@cpuguy83 Thanks again, it's a very interesting approach and I see other use cases where I can use it! However, even if compose would support it, it's not the same as just docker compose up -d for preparing and running the dev environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

No branches or pull requests