New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature request: Selectively disable caching for specific RUN commands in Dockerfile #1996

Closed
mohanraj-r opened this Issue Sep 24, 2013 · 237 comments

Comments

Projects
None yet
@mohanraj-r
Copy link

mohanraj-r commented Sep 24, 2013

branching off the discussion from #1384 :

I understand -no-cache will disable caching for the entire Dockerfile. But would be useful if I can disable cache for a specific RUN command? For example updating repos or downloading a remote file .. etc. From my understanding that right now RUN apt-get update if cached wouldn't actually update the repo? This will cause the results to be different than from a VM?

If disable caching for specific commands in the Dockerfile is made possible, would the subsequent commands in the file then not use the cache? Or would they do something a bit more intelligent - e.g. use cache if the previous command produced same results (fs layer) when compared to a previous run?

@tianon

This comment has been minimized.

Copy link
Member

tianon commented Sep 24, 2013

I think the way to combat this is to take the point in the Dockerfile you do want to be cached to and tag that as an image to use in your future Dockerfile's FROM, that can then be built with -no-cache without consequence, since the base image would not be rebuilt.

@mohanraj-r

This comment has been minimized.

Copy link
Author

mohanraj-r commented Oct 3, 2013

But wouldn't this limit interleaving cached and non-cached commands with ease ?

For e.g. lets say I want to update my repo and wget files from a server and perform bunch of steps in between - e.g. install software from the repo (that could have been updated) - perform operations on the downloaded file (that could have changed in the server) etc.

What would be ideal is for a way to specify to docker in the Dockerfile to run specific commands without cache every time and the only reuse previous image if there is no change (for e.g no update in repo).

Wouldn't this be useful to have ?

@joelreymont

This comment has been minimized.

Copy link

joelreymont commented Oct 18, 2013

What about CACHE ON and CACHE OFF in the Dockerfile? Each instruction would affect subsequent commands.

@konklone

This comment has been minimized.

Copy link

konklone commented Oct 29, 2013

Yeah, I'm using git clone commands in my Dockerfile, and if I want it to re-clone with updates, I need to, like, add a comment at the end of the line to trigger a rebuild from that line. I shouldn't need to create a whole new base container for this step.

@githart

This comment has been minimized.

Copy link

githart commented Nov 6, 2013

Can a container ID be passed to 'docker build' as a "do not cache past this ID" instruction? Similar to the way in which 'docker build' will cache all steps up to a changed line in a Dockerfile?

@shykes

This comment has been minimized.

Copy link
Collaborator

shykes commented Jan 6, 2014

I agree we need more powerful and fine-grained control over the build cache. Currently I'm not sure exactly how to expose this to the user.

I think this will become easier with the upcoming API extensions, specifically naming and introspection.

@timruffles

This comment has been minimized.

Copy link
Contributor

timruffles commented Feb 6, 2014

Would be a great feature. Currently I'm using silly things like RUN a=a some-command, then RUN a=b some-command to break the cache

@rogernolan

This comment has been minimized.

Copy link

rogernolan commented Feb 7, 2014

Getting better control over the cache would make using docker from CI a lot happier.

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Feb 7, 2014

@shykes

What about changing --no-cache from a bool to a string and have it take a regex for where in the docker we want to bust the cache?

docker build --no-cache "apt-get install" .

@shykes

This comment has been minimized.

Copy link
Collaborator

shykes commented Feb 7, 2014

I agree and suggested this exact feature on IRC.

Except I think to preserve reverse compatibility we should create a new flag (say "--uncache") so we can keep --cached as a (deprecated) bool flag that resolves to "--uncache .*"

On Fri, Feb 7, 2014 at 9:17 AM, Michael Crosby notifications@github.com
wrote:

@shykes
What about changing --no-cache from a bool to a string and have it take a regex for where in the docker we want to bust the cache?

docker build --no-cache "apt-get install" .

Reply to this email directly or view it on GitHub:
#1996 (comment)

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Feb 7, 2014

What does everyone else think about this? Anyone up for implementing the feature?

@timruffles

This comment has been minimized.

Copy link
Contributor

timruffles commented Feb 8, 2014

I'm up for having a stab at implementing this today if nobody else has started?

@timruffles

This comment has been minimized.

Copy link
Contributor

timruffles commented Feb 9, 2014

I've started work on it - wanted to validate the approach looks good.

  • The noCache field of buildfile becomes a *regexp.Regexp.
    • A nil value there means what utilizeCache = true used to.
  • Passing a string to docker build --no-cache now sends a validate regex string to the server.
  • Just calling --no-cache results in a default of .*
  • The regex is then used in a new method buildfile.utilizeCache(cmd []string) bool to check commands that ignore cache

One thing: as far as I can see, the flag/mflag package doesn't support string flags without a value, so I'll need to do some extra fiddling to support both --no-cache and --no-cache some-regex

@tianon

This comment has been minimized.

Copy link
Member

tianon commented Feb 25, 2014

I really think this ought to be a separate new flag. The behavior and syntax of --no-cache is already well defined and used in many, many places by many different people. I'd vote for --break-cache or something similar, and have --no-cache do exactly what it does today (since that's very useful behavior that many people rely on and still want).

Anyways, IANTM (I am not the maintainer) so these are just my personal thoughts. :)

@timruffles

This comment has been minimized.

Copy link
Contributor

timruffles commented Feb 25, 2014

@tianon --no-cache is currently bool, so this simply extends the existing behaviour.

  • docker build --no-cache - same behaviour as before: ignores cache
  • docker build --no-cache someRegex - ignores any RUN or ADD commands that match someRegex
@tianon

This comment has been minimized.

Copy link
Member

tianon commented Feb 25, 2014

Right, that's all fine. The problem is that --no-cache is a bool, so the existing behavior is actually:

  • --no-cache=true - explicitly disable cache
  • --no-cache=false - explicitly enable cache
  • --no-cache - shorthand for --no-cache=true

I also think we'd be doing ourselves a disservice by making "true" and "false" special case regex strings to solve this, since that will create potentially surprising behavior for our users in the future. ("When I use --no-cache with a regex of either 'true' or 'false', it doesn't work like it's supposed to!")

@timruffles

This comment has been minimized.

Copy link
Contributor

timruffles commented Mar 1, 2014

@tianon yes you're right. Had a quick look and people are using =true/false.

Happy to modify the PR to add new flag as you suggest, what do the maintainers think (@crosbymichael, @shykes)? This would also mean I could remove the code added to mflag to allow string/bool flags.

@crazyscience

This comment has been minimized.

Copy link

crazyscience commented Mar 13, 2014

+1 for @wagerlabs approach

@marcuslinke

This comment has been minimized.

Copy link
Contributor

marcuslinke commented Apr 11, 2014

@crosbymichael, @timruffles Wouldn't it be better if the author of the Dockerfile decides which build step should be cached and which should not? The person that creates the Dockerfile is not necessarily the same that builds the image. Moving the decision to the docker build command demands detailed knowledge from the person that just want to use a specific Dockerfile.

Consider a corporate environment where someone just want to rebuild an existing image hierarchy to update some dependencies. The existing Dockerfile tree may be created years ago by someone else.

@hunterloftis

This comment has been minimized.

Copy link

hunterloftis commented Apr 13, 2014

+1 for @wagerlabs approach

@cressie176

This comment has been minimized.

Copy link
Contributor

cressie176 commented Apr 14, 2014

+1 for @wagerlabs approach although it would be even nicer if there was a way to cache bust on a time interval too, e.g.

CACHE [interval | OFF]
RUN apt-get update
CACHE ON

I appreciate this might fly against the idea of containers being non deterministic, however it's exactly the sort of thing you want to do in a continuous deployment scenario where your pipeline has good automated testing.

As a workaround I'm currently generating cache busters in the script I use to run docker build and adding them in the dockerfile to force a cache bust

FROM ubuntu:13.10
ADD ./files/cachebusters/per-day /root/cachebuster
...
ADD ./files/cachebusters/per-build /root/cachebuster
RUN git clone git@github.com:cressie176/my-project.git /root/my-project
@tfoote

This comment has been minimized.

Copy link

tfoote commented Apr 19, 2014

I'm looking to use containers for continuous integration and the ability to set timeouts on specific elements in the cache would be really valuable. Without this I cannot deploy. Forcing a full rebuild every time is much too slow.

My current plan to work around this is to dynamically inject commands such as RUN echo 2014-04-17-00:15:00 with the generated line rounded down to the last 15 minutes to invalidate cache elements when the rounded number jumps. ala every 15 minutes. This works for me because I have a script generating the dockerfile every time, but it won't work without that script.

@amarnus

This comment has been minimized.

Copy link

amarnus commented May 2, 2014

+1 for the feature.

@hiroprotagonist

This comment has been minimized.

Copy link

hiroprotagonist commented May 7, 2014

I also want to vote for this feature. The cache is annoying when building parts of a container from git repositories which updates only on the master branch.
👍

@amarnus

This comment has been minimized.

Copy link

amarnus commented May 7, 2014

@hiroprotagonist Having a git pull in your ENTRYPOINT might help?

@hiroprotagonist

This comment has been minimized.

Copy link

hiroprotagonist commented May 8, 2014

@amarnus I've solved it similar to the idea @tfoote had. I am running the build from a jenkins job and instead of running the docker build command directly the job starts a build skript wich generates the Dockerfile from a template and adds the line 'RUN echo currentsMillies' above the git commands. Thanks to sed and pipes this was a matter of minutes. Anyway, i still favor this feature as part of the Dockerfile itself.

@mattp-

This comment has been minimized.

Copy link

mattp- commented Aug 4, 2018

why was this closed with no comment?

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Aug 4, 2018

There was a comment, but its hidden by github
#1996 (comment)

@krinsman

This comment has been minimized.

Copy link

krinsman commented Aug 7, 2018

+1

This feature would be a life-saver for me right now.

@csymeonides-mf

This comment was marked as spam.

Copy link

csymeonides-mf commented Aug 8, 2018

+1

@Simran-B

This comment has been minimized.

Copy link

Simran-B commented Aug 8, 2018

Closing this as we don't see that many real world use cases

212 comments and counting, but still no use case? Seems pretty ignorant.

@yifeikong

This comment was marked as spam.

Copy link

yifeikong commented Aug 12, 2018

+1

4 similar comments
@davidCarlos

This comment was marked as spam.

Copy link

davidCarlos commented Aug 13, 2018

+1

@privetgit

This comment was marked as spam.

Copy link

privetgit commented Aug 19, 2018

+1

@mdasari823

This comment was marked as spam.

Copy link

mdasari823 commented Aug 21, 2018

+1

@make-ing

This comment was marked as spam.

Copy link

make-ing commented Aug 23, 2018

+1

@chiffa

This comment has been minimized.

Copy link

chiffa commented Aug 23, 2018

the problem is still here and still requires a solution. There are plenty of real-world uses still present.

@jaromil

This comment has been minimized.

Copy link

jaromil commented Sep 1, 2018

+1

I suspect the Docker developers have no incentive to implement this, to protect their centralised building infrastructure from being DDsS'ed by no-cache requests.

I also suspect that a parallel infrastructure that facilitate no-cache builds would be interesting for enterprise users.

Overall this issue is not about a software feature, but a service scaling issue.

@bluzi

This comment has been minimized.

Copy link

bluzi commented Sep 2, 2018

@jaromil That's not entirely true, as this is not possible on self-hosted repositories as well.

@jaromil

This comment has been minimized.

Copy link

jaromil commented Sep 3, 2018

What software is there to run a self-hosted repository? I don't really know what you refer to.
A simple self-hosted solution could be a cron cloning git repos and runnig docker build --no-cache - I'm sure this problem cannot occur on open source software: anyone is then able to modify the docker build commandline.

@vpedrosa

This comment has been minimized.

Copy link

vpedrosa commented Sep 3, 2018

@jaromil I don't think that's the problem. It would be more efficient to have it for DockerHub's open source projects (as well as paid ones, they don't charge for number of builds). In a CI/CD environment with frequent builds, this get even worse.

As long as you need to do that (you are using docker and git and don't want to have 5 containers running shared volumes), you must rebuild the container and upload every time you upload new version. The entire container.
With an in-code no-cache flag, every time you run the build you just build and replace that single layer instead of whole container for updating the version.

About the self-hosting rep, you'd be surprised. I understand @bluzi comment, there is no ddos impact if you self- host (or use aws ecr).

@jaromil

This comment has been minimized.

Copy link

jaromil commented Sep 3, 2018

Ok this is certainly a more complex scenario I was envisioning. now i think...uploading with a sort of nocache single layer hashes... push and override, you name it. I am Not Sure

@PaulSD

This comment has been minimized.

Copy link

PaulSD commented Nov 28, 2018

TLDR: I think some improvements to the Docker documentation might help a lot.

I ended up here after encountering my own problems/confusion with caching. After reading all of the comments here and in #10682, I found a workable solution for my particular use case. Yet somehow I still felt frustrated with Docker's response to this, and it appears that many others feel the same way.

Why? After thinking about this from several different angles, I think the problem here is a combination of vague use cases, overly generalized arguments against the proposed changes (which may be valid but don't directly address the presented use cases), and a lack of documentation for Docker's recommendations for some common use cases. Perhaps I can help clarify things and identify documentation that could be improved to help with this situation.

Reading between the lines, it sounds to me like most of the early commenters on this feature request would be happy with a solution that uses additional arguments to docker image build to disable the cache at a specific point in the Dockerfile. It sounds like Docker's current solution for this (described in #1996 (comment)) should be sufficient in most of these cases, and it sounds like many users are happy with this. (If anyone has a use case where they can provide additional arguments to docker image build but this solution is still inadequate, it would probably help to add a comment explaining why this is inadequate.)

All of the lingering frustration appears to be related to the requirement to pass additional arguments to docker image build to control the caching behavior. However, the use cases related to this have not been described very well.

Reading between the lines again, it appears to me that all of these use cases are either related to services that run docker image build on a user's behalf, or related to Dockerfiles that are distributed to other users who then run docker image build themselves. (If anyone has any other use cases where passing additional arguments to docker image build is a problem, it would probably help to add a comment explaining your use case in detail.)

In many of these cases, it sounds like the use case does not actually require the ability to disable caching at a specific point in the Dockerfile (which was the original point of this feature request). Instead, it sounds like many users would be happy with the ability to disable caching entirely from within the Dockerfile, without using the "--no-cache" argument to docker image build and without requiring manual modifications to the Dockerfile before each build. (When describing use cases, it would probably help to mention whether partial caching is actually required or whether disabling the cache entirely would be sufficient for your use case.)

In cases where a service runs docker image build on a user's behalf, it sounds like Docker is expecting all such services to either unconditionally disable the cache or give the user an option to disable the cache. According to #10682 (comment), Docker Hub unconditionally disables the cache. If a service does not already do this, Docker has #10682 (comment) suggested complaining to to service provider about it.

This seems to me to be a reasonable position for Docker to take regarding services that run docker image build. However, this position really needs to be officially documented in a conspicuous place so that both service providers and users know what to expect. It does not appear that this position or the Docker Hub caching behavior are currently documented anywhere other than those off-the-cuff comments buried deep inside that huge/ancient/closed pull request, so it is no surprise that both service providers and users routinely get this wrong. Perhaps adding information to the docker build reference describing Docker's opinion on the use of caching by build services, and adding information to the Docker Hub automated build documentation about the Docker Hub caching behavior might eliminate this problem?

For cases where Dockerfiles are distributed to other users who then run docker image build themselves, some people have argued that the use of the simple docker build . command (with no additional arguments) is so common that it would be unreasonable for Dockerfile builders to require users to add arguments, while other people (for example: #1996 (comment) #10682 (comment) #10682 (comment)) have argued that it would be inappropriate to unconditionally prevent users from using caching by hard-coding cache overrides into the Dockerfile. In the absence of detailed/compelling use cases for this, Docker has made the executive decision to require additional command line arguments to control caching, which seems to be the source of much of the lingering frustration. (If anyone has a compelling use case related to this, it would probably help to add a comment explaining it in detail.)

However, it seems to me that Docker may be able to make everyone happy simply by breaking users' habit of running docker build . without additional arguments. The caching behavior and "--no-cache" argument are not mentioned in any of the relevant Docker tutorials (such as this or this
or this). In addition, while the docker build documentation does list the "--no-cache" argument, it doesn't explain its significance or highlight the fact that it is important in many common use cases. (Also note that the docker image build documentation is empty. It should at least reference the docker build documentation.) It appears that only the Dockerfile reference and best practices documentation actually describe the caching behavior and mention the role of the "--no-cache" argument. However, these documents are likely to be read only by advanced Dockerfile writers. So, it is no surprise that only advanced users are familiar with the "--no-cache" argument, and that most users would only ever run docker build . without additional arguments and then be confused when it doesn't behave how they or the Dockerfile writer expect/want. Perhaps updating the tutorials and docker build documentation to mention the "--no-cache" argument and its significance might eliminate this problem?

@fabiomolinar

This comment was marked as spam.

Copy link

fabiomolinar commented Dec 11, 2018

+1

@DarrienG

This comment has been minimized.

Copy link

DarrienG commented Jan 2, 2019

+1

docker's official tool bashbrew doesn't let you add arguments when building images, so the "officially supported" answer does not work.

@jazib

This comment was marked as spam.

Copy link

jazib commented Jan 9, 2019

+1

1 similar comment
@wreed4

This comment was marked as spam.

Copy link

wreed4 commented Jan 28, 2019

+1

@aengelas

This comment has been minimized.

Copy link

aengelas commented Feb 19, 2019

The use case I'm hitting right now is wanting to pass transient, short-lived secrets in as build args for installing private packages. That completely breaks caching because it means that every time the secret changes (basically every build), the cache gets busted and the packages get reinstalled all over again, even though the only change is the secret.

I've tried bypassing this by consuming the ARG in a script that gets COPY'd in prior to specifying the ARG, but Docker appears to invalidate everything after the ARG is declared if the ARG input has changed.

The behavior I'd like to see is to be able to flag an ARG as always caching, either in the Dockerfile or on the CLI when calling build. For use cases like secrets, that's often what you want; the contents of the package list should dictate when the cache is invalidated, not the argument passed to ARG.

I understand the theory that these sections could be pulled out into a second image that's then used as a base image, but that's rather awkward when the packages are used by a project, like in a package.json, requirements.txt, Gemfile, etc. That base image would just be continually rebuilt as well.

@HariSekhon

This comment has been minimized.

Copy link

HariSekhon commented Feb 19, 2019

+1 to CACHE OFF from this line directive - I've been waiting for this for literally years.

I have had to disable cache on docker hub / docker cloud and this would save tonnes of time and builds if I could cache the big layer and then just run a nocache update command near the end of the dockerfile.

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Feb 19, 2019

The behavior I'd like to see is to be able to flag an ARG as always caching, either in the Dockerfile or on the CLI when calling build. For use cases like secrets, that's often what you want; the contents of the package list should dictate when the cache is invalidated, not the argument passed to ARG.

--build-arg PASSWORD=<wrong> could produce a different result than --build-arg PASSWORD=<correct>, so I'm not sure if just looking at the contents of the package list would work for that. The builder cannot anticipate by itself what effect setting/changing an environment variable would have on the steps that are run (are make DEBUG=1 foo and make DEBUG=0 foo the same?). The only exception currently made is for xx_PROXY environment variables, where the assumption is made that a proxy may be needed for network-connections, but switching to a different proxy should produce the same result. So in order for that to work, some way to indicate a specific environment variable (/ build arg) to be ignored for caching would be needed.

note that BuildKit now has experimental support for RUN --mount=type=secret and RUN --mount=type=ssh, which may be helpful for passing secrets/credentials, but may still invalidate cache if those secrets change (not sure; this might be something to bring up in the buildkit issue tracker https://github.com/moby/buildkit/issues).

I have had to disable cache on docker hub / docker cloud

Does Docker Hub / Cloud actually use caching? I think no caching is used there (as in; it's using ephemeral build environments)

@HariSekhon

This comment has been minimized.

Copy link

HariSekhon commented Feb 19, 2019

I remember DockerHub used to not use build caching, but I had been looking at my automated builds on Docker Cloud just before this ticket and there is a Building Caching slider next to each branch's Autobuild slider now, although it is off by default.

I dare not enable build caching because steps like git clone will not get the latest repo download since it only compares the directive string which will not change. Explaining this issue to a colleague today that has been a thorn in our side for years, he was surprised as it seems like a large imperfection for many use cases.

I would much prefer the initial git clone && make build be cached and then just do a NO CACHE on a git pull && make build step to get only a much smaller code update + dependencies not already installed as the last layer, thereby efficiently caching the bulk of the image, not just for builds, but more importantly for all clients who right now must re-download and replace hundreds of MB of layers each time which is extremely inefficient.

The size is because many of the projects have a large number of dependencies, eg. system packages + Perl CPAN modules + Python PyPI modules etc.

Even using Alpine isn't much smaller once you add the system package dependencies and the CPAN and PyPI dependencies as I have been using Alpine for years to try to see if I could create smaller images but once you have lots of dependencies it doesn't make much difference if the base starts smaller since adding system packages adds most of it right back.

Caching the earlier layers which include all the system packages + CPAN + PyPI modules would mean very little should end up changing in the last layer of updates as I won't update working installed modules in most cases (I used scripts from my bash-tools utility submodule repo to only install packages that aren't already installed to avoid installing needless non-bugfix updates)

@HariSekhon

This comment has been minimized.

Copy link

HariSekhon commented Feb 19, 2019

I was looking at using a trick like changing ARG for a while (an idea I got from searching through blogs like http://dev.im-bot.com/docker-select-caching/):

In Dockerfile:

ARG NOCACHE=0

Then run docker build like so:

docker build --build-arg NOCACHE=$(date +%s) ...

but I don't think this is possible in Docker Cloud.

There are environment variables but it seems not possible to use dynamic contents such as epoch above (or at least not documented that I could find), and with environment variables I'm not sure it would invalidate caching for that directive line onwards.

@aengelas

This comment has been minimized.

Copy link

aengelas commented Feb 19, 2019

@thaJeztah Yes, this sort of behavior could easily have negative consequences if misunderstood or abused, but it would very nicely solve certain use cases.

--build-arg PASSWORD=<wrong> could produce a different result than --build-arg PASSWORD=<correct>, so I'm not sure if just looking at the contents of the package list would work for that

Although you're correct that it would produce different results, if the package list hasn't changed, I don't really care if the password is right or wrong; the packages are already in the prior image, so the user running this already has access (i.e., it's not a security concern), and if the password was wrong previously, I would expect the burden to be on the Dockerfile author to fail the installation if it's required, which would mean that you'd still get a chance to correctly install packages after fixing the password.

Yes, I was picturing something like docker build --force-cache-build-arg SECRET=supersecret. That's pretty clunky, I'm sure someone could come up with something better.

@HariSekhon It sounds like your use-case is actually the opposite of mine, though, right? You want to selectively force miss the cache, rather than selectively force hit the cache?

@itdependsnetworks

This comment has been minimized.

Copy link

itdependsnetworks commented Feb 20, 2019

Adding this worked for me:

ADD http://date.jsontest.com/ /tmp/bustcache

but that site is down right now. This should work

ADD http://api.geonames.org/timezoneJSON?formatted=true&lat=47.01&lng=10.2&username=demo&style=full /tmp/bustcache
@HariSekhon

This comment has been minimized.

Copy link

HariSekhon commented Feb 21, 2019

@itdependsnetworks

Perfect, that's a good workaround and the site is back up now. It's also useful to record the build date of the image.

I had tried this and similar other special files would should change each time

COPY /dev/random ...

but that didn't work even though RUN ls -l -R /etc showed such files were present they were always not found, I suspect there is some protection against using special files.

Now I think more about it on DockerHub / Docker Cloud you could probably also use a pre build hook to generated a file containing a datestamp and then COPY that to the image just before the layer your want to cachebust, achieving similar result, although the ADD shown above I think is more portable to local docker and cloud builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment