New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature request: Selectively disable caching for specific RUN commands in Dockerfile #1996

Closed
mohanraj-r opened this Issue Sep 24, 2013 · 228 comments

Comments

Projects
None yet
@mohanraj-r
Copy link

mohanraj-r commented Sep 24, 2013

branching off the discussion from #1384 :

I understand -no-cache will disable caching for the entire Dockerfile. But would be useful if I can disable cache for a specific RUN command? For example updating repos or downloading a remote file .. etc. From my understanding that right now RUN apt-get update if cached wouldn't actually update the repo? This will cause the results to be different than from a VM?

If disable caching for specific commands in the Dockerfile is made possible, would the subsequent commands in the file then not use the cache? Or would they do something a bit more intelligent - e.g. use cache if the previous command produced same results (fs layer) when compared to a previous run?

@tianon

This comment has been minimized.

Copy link
Member

tianon commented Sep 24, 2013

I think the way to combat this is to take the point in the Dockerfile you do want to be cached to and tag that as an image to use in your future Dockerfile's FROM, that can then be built with -no-cache without consequence, since the base image would not be rebuilt.

@mohanraj-r

This comment has been minimized.

Copy link

mohanraj-r commented Oct 3, 2013

But wouldn't this limit interleaving cached and non-cached commands with ease ?

For e.g. lets say I want to update my repo and wget files from a server and perform bunch of steps in between - e.g. install software from the repo (that could have been updated) - perform operations on the downloaded file (that could have changed in the server) etc.

What would be ideal is for a way to specify to docker in the Dockerfile to run specific commands without cache every time and the only reuse previous image if there is no change (for e.g no update in repo).

Wouldn't this be useful to have ?

@joelreymont

This comment has been minimized.

Copy link

joelreymont commented Oct 18, 2013

What about CACHE ON and CACHE OFF in the Dockerfile? Each instruction would affect subsequent commands.

@konklone

This comment has been minimized.

Copy link

konklone commented Oct 29, 2013

Yeah, I'm using git clone commands in my Dockerfile, and if I want it to re-clone with updates, I need to, like, add a comment at the end of the line to trigger a rebuild from that line. I shouldn't need to create a whole new base container for this step.

@githart

This comment has been minimized.

Copy link

githart commented Nov 6, 2013

Can a container ID be passed to 'docker build' as a "do not cache past this ID" instruction? Similar to the way in which 'docker build' will cache all steps up to a changed line in a Dockerfile?

@shykes

This comment has been minimized.

Copy link
Collaborator

shykes commented Jan 6, 2014

I agree we need more powerful and fine-grained control over the build cache. Currently I'm not sure exactly how to expose this to the user.

I think this will become easier with the upcoming API extensions, specifically naming and introspection.

@timruffles

This comment has been minimized.

Copy link
Contributor

timruffles commented Feb 6, 2014

Would be a great feature. Currently I'm using silly things like RUN a=a some-command, then RUN a=b some-command to break the cache

@rogernolan

This comment has been minimized.

Copy link

rogernolan commented Feb 7, 2014

Getting better control over the cache would make using docker from CI a lot happier.

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Feb 7, 2014

@shykes

What about changing --no-cache from a bool to a string and have it take a regex for where in the docker we want to bust the cache?

docker build --no-cache "apt-get install" .

@shykes

This comment has been minimized.

Copy link
Collaborator

shykes commented Feb 7, 2014

I agree and suggested this exact feature on IRC.

Except I think to preserve reverse compatibility we should create a new flag (say "--uncache") so we can keep --cached as a (deprecated) bool flag that resolves to "--uncache .*"

On Fri, Feb 7, 2014 at 9:17 AM, Michael Crosby notifications@github.com
wrote:

@shykes
What about changing --no-cache from a bool to a string and have it take a regex for where in the docker we want to bust the cache?

docker build --no-cache "apt-get install" .

Reply to this email directly or view it on GitHub:
#1996 (comment)

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Feb 7, 2014

What does everyone else think about this? Anyone up for implementing the feature?

@timruffles

This comment has been minimized.

Copy link
Contributor

timruffles commented Feb 8, 2014

I'm up for having a stab at implementing this today if nobody else has started?

@timruffles

This comment has been minimized.

Copy link
Contributor

timruffles commented Feb 9, 2014

I've started work on it - wanted to validate the approach looks good.

  • The noCache field of buildfile becomes a *regexp.Regexp.
    • A nil value there means what utilizeCache = true used to.
  • Passing a string to docker build --no-cache now sends a validate regex string to the server.
  • Just calling --no-cache results in a default of .*
  • The regex is then used in a new method buildfile.utilizeCache(cmd []string) bool to check commands that ignore cache

One thing: as far as I can see, the flag/mflag package doesn't support string flags without a value, so I'll need to do some extra fiddling to support both --no-cache and --no-cache some-regex

@tianon

This comment has been minimized.

Copy link
Member

tianon commented Feb 25, 2014

I really think this ought to be a separate new flag. The behavior and syntax of --no-cache is already well defined and used in many, many places by many different people. I'd vote for --break-cache or something similar, and have --no-cache do exactly what it does today (since that's very useful behavior that many people rely on and still want).

Anyways, IANTM (I am not the maintainer) so these are just my personal thoughts. :)

@timruffles

This comment has been minimized.

Copy link
Contributor

timruffles commented Feb 25, 2014

@tianon --no-cache is currently bool, so this simply extends the existing behaviour.

  • docker build --no-cache - same behaviour as before: ignores cache
  • docker build --no-cache someRegex - ignores any RUN or ADD commands that match someRegex
@tianon

This comment has been minimized.

Copy link
Member

tianon commented Feb 25, 2014

Right, that's all fine. The problem is that --no-cache is a bool, so the existing behavior is actually:

  • --no-cache=true - explicitly disable cache
  • --no-cache=false - explicitly enable cache
  • --no-cache - shorthand for --no-cache=true

I also think we'd be doing ourselves a disservice by making "true" and "false" special case regex strings to solve this, since that will create potentially surprising behavior for our users in the future. ("When I use --no-cache with a regex of either 'true' or 'false', it doesn't work like it's supposed to!")

@timruffles

This comment has been minimized.

Copy link
Contributor

timruffles commented Mar 1, 2014

@tianon yes you're right. Had a quick look and people are using =true/false.

Happy to modify the PR to add new flag as you suggest, what do the maintainers think (@crosbymichael, @shykes)? This would also mean I could remove the code added to mflag to allow string/bool flags.

@crazyscience

This comment has been minimized.

Copy link

crazyscience commented Mar 13, 2014

+1 for @wagerlabs approach

@marcuslinke

This comment has been minimized.

Copy link
Contributor

marcuslinke commented Apr 11, 2014

@crosbymichael, @timruffles Wouldn't it be better if the author of the Dockerfile decides which build step should be cached and which should not? The person that creates the Dockerfile is not necessarily the same that builds the image. Moving the decision to the docker build command demands detailed knowledge from the person that just want to use a specific Dockerfile.

Consider a corporate environment where someone just want to rebuild an existing image hierarchy to update some dependencies. The existing Dockerfile tree may be created years ago by someone else.

@hunterloftis

This comment has been minimized.

Copy link

hunterloftis commented Apr 13, 2014

+1 for @wagerlabs approach

@cressie176

This comment has been minimized.

Copy link
Contributor

cressie176 commented Apr 14, 2014

+1 for @wagerlabs approach although it would be even nicer if there was a way to cache bust on a time interval too, e.g.

CACHE [interval | OFF]
RUN apt-get update
CACHE ON

I appreciate this might fly against the idea of containers being non deterministic, however it's exactly the sort of thing you want to do in a continuous deployment scenario where your pipeline has good automated testing.

As a workaround I'm currently generating cache busters in the script I use to run docker build and adding them in the dockerfile to force a cache bust

FROM ubuntu:13.10
ADD ./files/cachebusters/per-day /root/cachebuster
...
ADD ./files/cachebusters/per-build /root/cachebuster
RUN git clone git@github.com:cressie176/my-project.git /root/my-project
@tfoote

This comment has been minimized.

Copy link

tfoote commented Apr 19, 2014

I'm looking to use containers for continuous integration and the ability to set timeouts on specific elements in the cache would be really valuable. Without this I cannot deploy. Forcing a full rebuild every time is much too slow.

My current plan to work around this is to dynamically inject commands such as RUN echo 2014-04-17-00:15:00 with the generated line rounded down to the last 15 minutes to invalidate cache elements when the rounded number jumps. ala every 15 minutes. This works for me because I have a script generating the dockerfile every time, but it won't work without that script.

@amarnus

This comment has been minimized.

Copy link

amarnus commented May 2, 2014

+1 for the feature.

@hiroprotagonist

This comment has been minimized.

Copy link

hiroprotagonist commented May 7, 2014

I also want to vote for this feature. The cache is annoying when building parts of a container from git repositories which updates only on the master branch.
👍

@amarnus

This comment has been minimized.

Copy link

amarnus commented May 7, 2014

@hiroprotagonist Having a git pull in your ENTRYPOINT might help?

@hiroprotagonist

This comment has been minimized.

Copy link

hiroprotagonist commented May 8, 2014

@amarnus I've solved it similar to the idea @tfoote had. I am running the build from a jenkins job and instead of running the docker build command directly the job starts a build skript wich generates the Dockerfile from a template and adds the line 'RUN echo currentsMillies' above the git commands. Thanks to sed and pipes this was a matter of minutes. Anyway, i still favor this feature as part of the Dockerfile itself.

@roooodcastro

This comment has been minimized.

Copy link

roooodcastro commented May 9, 2018

+1

2 similar comments
@tcallahan14

This comment has been minimized.

Copy link

tcallahan14 commented May 13, 2018

+1

@feraudet

This comment has been minimized.

Copy link

feraudet commented May 23, 2018

+1

@zyfdegh

This comment has been minimized.

Copy link

zyfdegh commented May 23, 2018

Currently the most simple way to disable cache for a layer (and the following):

Dockerfile

ARG CACHE_DATE
RUN wget https://raw.githubusercontent.com/want/lastest-file/master/install.sh -O - | bash

And when you build the image, --build-arg needs to be added

docker build  --build-arg CACHE_DATE="$(date)"

Then the wget command will be executed everytime you build the image, rather than using a cache.

@ORESoftware

This comment has been minimized.

Copy link

ORESoftware commented May 27, 2018

RUNNC or CACHE OFF would be nice

in the meantime, this looks promising:
http://dev.im-bot.com/docker-select-caching/

that is:

screenshot 2018-05-26 19 03 09

@bluzi

This comment has been minimized.

Copy link

bluzi commented Jun 11, 2018

i'm going to go keep calm and join the herd:

+1

@shadycuz

This comment has been minimized.

Copy link

shadycuz commented Jun 14, 2018

Yeah I need selective caching on commands. My COPY fails 80% of the time if I only change one word in a config file. I would like to never cache my COPY but cache everything else. Having a CACHE ON and CACHE OFF would be great.

RUN X
RUN X
CACHE OFF
COPY /config /etc/myapp/config
CACHE ON
@curtiszimmerman

This comment has been minimized.

Copy link

curtiszimmerman commented Jun 14, 2018

@shadycuz You will never be able to "re-enable" the cache after disabling/invalidating it using any method. The build will not be able to verify (in a reasonable amount of time with a reasonable amount of resources) that the non-cached layer didn't change something else in the filesystem which it would need to consider in newer layers. In order to minimize the impact of always needing to pull in an external config file, you should put your COPY directive as far down in the Dockerfile as possible (so that Docker can use the build cache for as much of the build process as possible before the cache is invalidated).

To invalidate the cache at a specific point in the build process, you can refer to any of the other comments about using --build-arg and ARG mentioned here previously.

@zyfdegh

This comment has been minimized.

Copy link

zyfdegh commented Jun 15, 2018

@shadycuz @curtiszimmerman Yes, we might only preserve CACHE OFF but not CACHE ON, because the following layers need to be rebuilt if a former layer is changed.

@Simran-B

This comment has been minimized.

Copy link

Simran-B commented Jul 31, 2018

I agree that CACHE ON makes no sense from a technical point of view. It helps to express the intention more clearly, which layers are actually intended to be invalidated however.

A more flexible solution would be command similar to RUN that allowed some shell code to determine if the cache should be invalidated. An exit code of 0 could mean "use cache" and 1 "invalidate cache". If no shell code is given, the default could be to invalidate the cache from here on. The command could be called INVALIDATE for instance.

@mattp-

This comment has been minimized.

Copy link

mattp- commented Aug 4, 2018

why was this closed with no comment?

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Aug 4, 2018

There was a comment, but its hidden by github
#1996 (comment)

@krinsman

This comment has been minimized.

Copy link

krinsman commented Aug 7, 2018

+1

This feature would be a life-saver for me right now.

@csymeonides-mf

This comment has been minimized.

Copy link

csymeonides-mf commented Aug 8, 2018

+1

@Simran-B

This comment has been minimized.

Copy link

Simran-B commented Aug 8, 2018

Closing this as we don't see that many real world use cases

212 comments and counting, but still no use case? Seems pretty ignorant.

@yifeikong

This comment has been minimized.

Copy link

yifeikong commented Aug 12, 2018

+1

4 similar comments
@davidCarlos

This comment has been minimized.

Copy link

davidCarlos commented Aug 13, 2018

+1

@privetgit

This comment has been minimized.

Copy link

privetgit commented Aug 19, 2018

+1

@mdasari823

This comment has been minimized.

Copy link

mdasari823 commented Aug 21, 2018

+1

@make-ing

This comment has been minimized.

Copy link

make-ing commented Aug 23, 2018

+1

@chiffa

This comment has been minimized.

Copy link

chiffa commented Aug 23, 2018

the problem is still here and still requires a solution. There are plenty of real-world uses still present.

@jaromil

This comment has been minimized.

Copy link

jaromil commented Sep 1, 2018

+1

I suspect the Docker developers have no incentive to implement this, to protect their centralised building infrastructure from being DDsS'ed by no-cache requests.

I also suspect that a parallel infrastructure that facilitate no-cache builds would be interesting for enterprise users.

Overall this issue is not about a software feature, but a service scaling issue.

@bluzi

This comment has been minimized.

Copy link

bluzi commented Sep 2, 2018

@jaromil That's not entirely true, as this is not possible on self-hosted repositories as well.

@jaromil

This comment has been minimized.

Copy link

jaromil commented Sep 3, 2018

What software is there to run a self-hosted repository? I don't really know what you refer to.
A simple self-hosted solution could be a cron cloning git repos and runnig docker build --no-cache - I'm sure this problem cannot occur on open source software: anyone is then able to modify the docker build commandline.

@vpedrosa

This comment has been minimized.

Copy link

vpedrosa commented Sep 3, 2018

@jaromil I don't think that's the problem. It would be more efficient to have it for DockerHub's open source projects (as well as paid ones, they don't charge for number of builds). In a CI/CD environment with frequent builds, this get even worse.

As long as you need to do that (you are using docker and git and don't want to have 5 containers running shared volumes), you must rebuild the container and upload every time you upload new version. The entire container.
With an in-code no-cache flag, every time you run the build you just build and replace that single layer instead of whole container for updating the version.

About the self-hosting rep, you'd be surprised. I understand @bluzi comment, there is no ddos impact if you self- host (or use aws ecr).

@jaromil

This comment has been minimized.

Copy link

jaromil commented Sep 3, 2018

Ok this is certainly a more complex scenario I was envisioning. now i think...uploading with a sort of nocache single layer hashes... push and override, you name it. I am Not Sure

@PaulSD

This comment has been minimized.

Copy link

PaulSD commented Nov 28, 2018

TLDR: I think some improvements to the Docker documentation might help a lot.

I ended up here after encountering my own problems/confusion with caching. After reading all of the comments here and in #10682, I found a workable solution for my particular use case. Yet somehow I still felt frustrated with Docker's response to this, and it appears that many others feel the same way.

Why? After thinking about this from several different angles, I think the problem here is a combination of vague use cases, overly generalized arguments against the proposed changes (which may be valid but don't directly address the presented use cases), and a lack of documentation for Docker's recommendations for some common use cases. Perhaps I can help clarify things and identify documentation that could be improved to help with this situation.

Reading between the lines, it sounds to me like most of the early commenters on this feature request would be happy with a solution that uses additional arguments to docker image build to disable the cache at a specific point in the Dockerfile. It sounds like Docker's current solution for this (described in #1996 (comment)) should be sufficient in most of these cases, and it sounds like many users are happy with this. (If anyone has a use case where they can provide additional arguments to docker image build but this solution is still inadequate, it would probably help to add a comment explaining why this is inadequate.)

All of the lingering frustration appears to be related to the requirement to pass additional arguments to docker image build to control the caching behavior. However, the use cases related to this have not been described very well.

Reading between the lines again, it appears to me that all of these use cases are either related to services that run docker image build on a user's behalf, or related to Dockerfiles that are distributed to other users who then run docker image build themselves. (If anyone has any other use cases where passing additional arguments to docker image build is a problem, it would probably help to add a comment explaining your use case in detail.)

In many of these cases, it sounds like the use case does not actually require the ability to disable caching at a specific point in the Dockerfile (which was the original point of this feature request). Instead, it sounds like many users would be happy with the ability to disable caching entirely from within the Dockerfile, without using the "--no-cache" argument to docker image build and without requiring manual modifications to the Dockerfile before each build. (When describing use cases, it would probably help to mention whether partial caching is actually required or whether disabling the cache entirely would be sufficient for your use case.)

In cases where a service runs docker image build on a user's behalf, it sounds like Docker is expecting all such services to either unconditionally disable the cache or give the user an option to disable the cache. According to #10682 (comment), Docker Hub unconditionally disables the cache. If a service does not already do this, Docker has #10682 (comment) suggested complaining to to service provider about it.

This seems to me to be a reasonable position for Docker to take regarding services that run docker image build. However, this position really needs to be officially documented in a conspicuous place so that both service providers and users know what to expect. It does not appear that this position or the Docker Hub caching behavior are currently documented anywhere other than those off-the-cuff comments buried deep inside that huge/ancient/closed pull request, so it is no surprise that both service providers and users routinely get this wrong. Perhaps adding information to the docker build reference describing Docker's opinion on the use of caching by build services, and adding information to the Docker Hub automated build documentation about the Docker Hub caching behavior might eliminate this problem?

For cases where Dockerfiles are distributed to other users who then run docker image build themselves, some people have argued that the use of the simple docker build . command (with no additional arguments) is so common that it would be unreasonable for Dockerfile builders to require users to add arguments, while other people (for example: #1996 (comment) #10682 (comment) #10682 (comment)) have argued that it would be inappropriate to unconditionally prevent users from using caching by hard-coding cache overrides into the Dockerfile. In the absence of detailed/compelling use cases for this, Docker has made the executive decision to require additional command line arguments to control caching, which seems to be the source of much of the lingering frustration. (If anyone has a compelling use case related to this, it would probably help to add a comment explaining it in detail.)

However, it seems to me that Docker may be able to make everyone happy simply by breaking users' habit of running docker build . without additional arguments. The caching behavior and "--no-cache" argument are not mentioned in any of the relevant Docker tutorials (such as this or this
or this). In addition, while the docker build documentation does list the "--no-cache" argument, it doesn't explain its significance or highlight the fact that it is important in many common use cases. (Also note that the docker image build documentation is empty. It should at least reference the docker build documentation.) It appears that only the Dockerfile reference and best practices documentation actually describe the caching behavior and mention the role of the "--no-cache" argument. However, these documents are likely to be read only by advanced Dockerfile writers. So, it is no surprise that only advanced users are familiar with the "--no-cache" argument, and that most users would only ever run docker build . without additional arguments and then be confused when it doesn't behave how they or the Dockerfile writer expect/want. Perhaps updating the tutorials and docker build documentation to mention the "--no-cache" argument and its significance might eliminate this problem?

@fabiomolinar

This comment was marked as spam.

Copy link

fabiomolinar commented Dec 11, 2018

+1

@DarrienG

This comment has been minimized.

Copy link

DarrienG commented Jan 2, 2019

+1

docker's official tool bashbrew doesn't let you add arguments when building images, so the "officially supported" answer does not work.

@jazib

This comment has been minimized.

Copy link

jazib commented Jan 9, 2019

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment