Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
New feature request: Selectively disable caching for specific RUN commands in Dockerfile #1996
Comments
|
I think the way to combat this is to take the point in the Dockerfile you do want to be cached to and tag that as an image to use in your future Dockerfile's |
mohanraj-r
commented
Oct 3, 2013
|
But wouldn't this limit interleaving cached and non-cached commands with ease ? For e.g. lets say I want to update my repo and wget files from a server and perform bunch of steps in between - e.g. install software from the repo (that could have been updated) - perform operations on the downloaded file (that could have changed in the server) etc. What would be ideal is for a way to specify to docker in the Dockerfile to run specific commands without cache every time and the only reuse previous image if there is no change (for e.g no update in repo). Wouldn't this be useful to have ? |
wagerlabs
commented
Oct 18, 2013
|
What about CACHE ON and CACHE OFF in the Dockerfile? Each instruction would affect subsequent commands. |
konklone
commented
Oct 29, 2013
|
Yeah, I'm using |
githart
commented
Nov 6, 2013
|
Can a container ID be passed to 'docker build' as a "do not cache past this ID" instruction? Similar to the way in which 'docker build' will cache all steps up to a changed line in a Dockerfile? |
|
I agree we need more powerful and fine-grained control over the build cache. Currently I'm not sure exactly how to expose this to the user. I think this will become easier with the upcoming API extensions, specifically naming and introspection. |
|
Would be a great feature. Currently I'm using silly things like |
rogernolan
commented
Feb 7, 2014
|
Getting better control over the cache would make using docker from CI a lot happier. |
|
What about changing
|
|
I agree and suggested this exact feature on IRC. Except I think to preserve reverse compatibility we should create a new flag (say "--uncache") so we can keep --cached as a (deprecated) bool flag that resolves to "--uncache .*" On Fri, Feb 7, 2014 at 9:17 AM, Michael Crosby notifications@github.com
|
|
What does everyone else think about this? Anyone up for implementing the feature? |
|
I'm up for having a stab at implementing this today if nobody else has started? |
|
I've started work on it - wanted to validate the approach looks good.
One thing: as far as I can see, the flag/mflag package doesn't support string flags without a value, so I'll need to do some extra fiddling to support both |
This was referenced Feb 11, 2014
|
I really think this ought to be a separate new flag. The behavior and syntax of Anyways, IANTM (I am not the maintainer) so these are just my personal thoughts. :) |
|
@tianon
|
|
Right, that's all fine. The problem is that
I also think we'd be doing ourselves a disservice by making "true" and "false" special case regex strings to solve this, since that will create potentially surprising behavior for our users in the future. ("When I use |
crazyscience
commented
Mar 13, 2014
|
+1 for @wagerlabs approach |
|
@crosbymichael, @timruffles Wouldn't it be better if the author of the Dockerfile decides which build step should be cached and which should not? The person that creates the Dockerfile is not necessarily the same that builds the image. Moving the decision to the docker build command demands detailed knowledge from the person that just want to use a specific Dockerfile. Consider a corporate environment where someone just want to rebuild an existing image hierarchy to update some dependencies. The existing Dockerfile tree may be created years ago by someone else. |
hunterloftis
commented
Apr 13, 2014
|
+1 for @wagerlabs approach |
|
+1 for @wagerlabs approach although it would be even nicer if there was a way to cache bust on a time interval too, e.g.
I appreciate this might fly against the idea of containers being non deterministic, however it's exactly the sort of thing you want to do in a continuous deployment scenario where your pipeline has good automated testing. As a workaround I'm currently generating cache busters in the script I use to run docker build and adding them in the dockerfile to force a cache bust
|
tfoote
commented
Apr 19, 2014
|
I'm looking to use containers for continuous integration and the ability to set timeouts on specific elements in the cache would be really valuable. Without this I cannot deploy. Forcing a full rebuild every time is much too slow. My current plan to work around this is to dynamically inject commands such as |
cressie176
referenced this issue
May 2, 2014
Closed
Allow files and paths to be ignored when uploading context #2224
amarnus
commented
May 2, 2014
|
+1 for the feature. |
hiroprotagonist
commented
May 7, 2014
|
I also want to vote for this feature. The cache is annoying when building parts of a container from git repositories which updates only on the master branch. |
amarnus
commented
May 7, 2014
|
@hiroprotagonist Having a |
hiroprotagonist
commented
May 8, 2014
|
@amarnus I've solved it similar to the idea @tfoote had. I am running the build from a jenkins job and instead of running the docker build command directly the job starts a build skript wich generates the Dockerfile from a template and adds the line 'RUN echo currentsMillies' above the git commands. Thanks to sed and pipes this was a matter of minutes. Anyway, i still favor this feature as part of the Dockerfile itself. |
dannykansas
commented
May 17, 2014
|
Adding my +1 for @wagerlabs approach. Also having this issue with CI. I'm simply using a dynamic echo RUN statement for the time being, but I would love this feature. |
leonardschneider
commented
May 27, 2014
|
+1 for CACHE ON/OFF. My use case is also CI automation. |
stilliard
commented
May 27, 2014
|
+1, especially the ability to set a run commands cache interval like in @cressie176 's example |
disposable-ksa98
commented
May 27, 2014
|
"For example updating repos or downloading a remote file" +1 |
dannykansas
commented
May 27, 2014
|
If it helps anyone, here's the piece of code I'm using in my Jenkins build:
|
This was referenced Jun 7, 2014
bfitzsimmons
commented
Jun 19, 2014
|
+1 for CACHE ON/OFF |
claytondaley
commented
Jul 30, 2014
|
As a possible alternative to the CACHE ON/OFF approach, what about an extra keyword like "ALWAYS". The keyword would be used in combination with an existing command (e.g. "ALWAYS RUN" or "ALWAYS ADD"). By design, the "ALWAYS" keyword does not go to the cache to complete the adjacent command. However, it compares the result to the CACHE (implicitly the cache for other times the same line was executed), linking to the cached image if the result of the ALWAYS command is unchanged. I believe the underlying need is to identify "non-idempotent instructions". The ALWAYS command does this very explicitly. My impression is that the CACHE ON/OFF approach could work equally well, but could aso require lots of switching over blocks of code (which may encourage users to block off more lines than really required). |
|
I am also more for a prefix to commands, like ALWAYS or CACHE 1 WEEK ADD ... |
CheRuisiBesares
commented
Aug 6, 2014
|
So I was struggling with this issue for a while and I just wanted to share my work around incase its helpful while this gets sorted out. I really didn't want to add anything outside of the docker file to the build invocation or change the file every time. Anyway this is a silly example but it uses the add mechanism to bust the cache and doesn't require any file manipulations.
Obviously you can pick your own use case and network random gen. Anyway maybe it will help some people out idk. |
gzankevich
commented
Aug 6, 2014
|
Another +1 for @wagerlabs approach |
assertrandom
commented
Aug 7, 2014
|
Another +1 to the feature. Meanwhile using @cruisibesarescondev workaround. |
tcarlyle
commented
Aug 7, 2014
|
one more +1 for the feature request. And thanks to @cruisibesarescondev for the workaround |
toldjuuso
commented
Aug 7, 2014
|
Another +1 for the feature. Cheers @cruisibesarescondev for the workaround. |
tfoote
commented
Aug 10, 2014
|
I think the ALWAYS keyword is a good approach, especially as it has simple clear semantics. A slightly more complicated approach would be to add a minimum time, (useful in things like a buildfarm or continuous integration). For that I'd propose a syntax "EVERY XXX" where XXX is a timeout. And if it's been longer than XXX since the cache of that command was built it must rerun the command. And check if the output has changed. If no change reuse the cached result, noting the last updated time. This would mean that EVERY 0 would be the same as ALWAYS. For a workaround at the moment I generate my Dockerfiles using empy templates in python and I embed the following snippets which works as above except that does not detect the same result in two successive runs, but does force a retrigger every XXX seconds. At the top:
Where I want to force a rerun:
Which looks like this in the Dockerfile
As you can see it rounds to the nearest 60 so each time 60 seconds pass the next run will rerun all following commands. |
pikeas
commented
Aug 26, 2014
|
+1 for ALWAYS syntax. +.5 for CACHE ON/CACHE OFF. |
hellais
commented
Sep 2, 2014
|
+1 for ALWAYS syntax. |
kigiri
commented
Sep 3, 2014
|
Yes, ALWAYS syntax looks very intuitive. |
|
I don't like CACHE ON/OFF because I think lines should be "self contained" and adding blocks to Dockerfiles would introduce a lot of "trouble" (like having to check "is this line covered by cache?" when merging...). |
|
@kuon I think there are already a number of commands that affect subsequent instructions, e.g. |
|
Yeah, that's true, but I don't use them for the same reason. I always do I'd prefer a block notation:
This is more explicit and avoid having a I might be overthinking it, Dockerfiles are not actually run in production (just when building the image), so having the cache disabled when you build won't actually do much harm. But I also feel Dockerfiles are really limiting (having to chain all commands with a && in a single RUN to avoid creating a gazillion of images, not being able to use variables...). Maybe this issue is the opportunity for a new Dockerfile format. |
|
I'd like to come back on what I just said. I read what @shykes said in another issue docker#2266 and I also agree with him (Dockerfile need to stay a really simple assembly like language). I said I'd like variable or things like that, but that can be covered by some other language, but in this case, each line in a Dockerfile should be self contained, eg:
Which would always run the command (no cache), but would also not create an image and use the user jon. This kind of self contained line are much easier to generate from any other language. If you have to worry about the context (user, cache, workdir), it's more error prone. |
ghost
commented
Sep 27, 2014
|
Can it be |
abramsm
commented
Nov 19, 2014
|
Any status update on this one? |
orrery
commented
Dec 9, 2014
|
Selectively disabling the cache would be very useful. I grab files from a remote amazon s3 repository via the awscli command (from the amazon AWS toolkit), and I have no easy way to bust the cache via an ADD command (at least I can't think of a way without editing the Dockerfile to trigger it). I believe there is a strong case for control to be given back to the user to selectively bust the cache when using RUN. If anyone has a suggestion for me I'd be happy to hear from you. |
hellais
commented
Dec 10, 2014
|
Wanted to bump this issue up a bit since it's something that we have a big need for. Still convinced |
|
How about a simple |
hellais
commented
Dec 10, 2014
|
@cpuguy83 that would work also for my particular use case. I am not sure if it's technically possible to have only one command not be cached, but the rest of them to be cached. Probably not since docker is based on incremental diffs. Having support for |
orrery
commented
Dec 10, 2014
|
Regarding my previous post, it would indeed be sufficient to just bust the cache from that point in the script onwards, the rest would just be down to intelligent script design (and I believe this would address most people's requirements). Is this doable instead of selectively disabling cache bust? |
RyanHartje
commented
Jan 30, 2016
|
So many +1s, if you pull the git repo in your docker file, cache keeps your images from building. Makes it kind of hard to push builds through CI. |
Vingtoft
commented
Feb 1, 2016
|
+1 cloning git repos (its very annoying that the image needs to be build from scratch each time a small edit has been made in a git repo) |
itsprdp
commented
Feb 1, 2016
|
@Vingtoft If you are updating the files in the repo then your cache is invalidated. |
Vingtoft
commented
Feb 1, 2016
|
@itsprdp I did not know that, thank you for clarifying. |
Vingtoft
commented
Feb 1, 2016
|
@itsprdp I have just tested. When I'm updating the repo and building the image, Docker is still using the cache. |
RyanHartje
commented
Feb 2, 2016
|
@itsprdp That isn't correct in my experience. I made a new commit to a repo to test, and when building again, it uses the same cache. If I change the docker file previous to the repo, of course it will be cache busted, however simply updating a repo does not seem to fix this issue. |
itsprdp
commented
Feb 2, 2016
|
@RyanHartje Sorry for the confusion. It is supposed to invalidate the cache if the repository is updated and that's something to consider by contributors. |
Vingtoft
commented
Feb 2, 2016
|
@itsprdp Only updating the changed files in a repo would be awesome, but less (or should I say more?) would do as well. |
douineauromain
commented
Feb 18, 2016
|
+1, cache used during git clone :( |
shane-axiom
commented
Feb 18, 2016
|
An integrated solution would be nice, but in the meantime you can bust the cache at a specific Dockerfile instruction using ARG. In the Dockerfile: ARG CACHEBUST=1
RUN git clone https://github.com/octocat/Hello-World.gitOn the command line: docker build -t your-image --build-arg CACHEBUST=$(date +%s) .Setting Edit: Which, uh, is just what @thaJeztah said. I'll leave this up as an additional description of his solution. |
pulkitsinghal
commented
Mar 3, 2016
|
@shane-axiom How about using the git commit hash as the value for
Based on clues from http://stackoverflow.com/questions/15677439/how-to-get-latest-git-commit-hash-command#answer-15679887 |
shane-axiom
commented
Mar 9, 2016
|
@pulkitsinghal That looks wonderful for busting the cache for git repos. For other uses (such as pulling in SNAPSHOT dependencies, etc) the always-busting timestamp approach works well. |
recursionbane
commented
Apr 18, 2016
|
+1 for CACHE ON | OFF |
|
+1 |
KBoehme
commented
Apr 22, 2016
|
+1 |
nikow
commented
Apr 23, 2016
|
Remember about @CheRuisiBesares aproach, you can always use |
brycereynolds
commented
Apr 27, 2016
|
To post an additional use-case....
In our A |
mmobini
commented
Apr 28, 2016
•
|
+1 I have similar issue for npm install which use cache and dont use my new published library in npm. It would be great if I can disable cache per RUN command in docker file. |
|
@brycereynolds @mmobini see docker#1996 (comment) for manually busting the cache. However, not specifying a specific version of packages that need to be installed may not be best practice, as the end-result of your Dockerfile (and source code) is no longer guaranteed to be reproducible (i.e., it builds successfully today, but doesn't tomorrow, because one of the packages was updated). I can see this being "ok" during development, but for production (and automated builds on Docker Hub), the best approach is to explicitly specify a version. Doing so also allows users to verify the exact packages that were used to produce the image. |
sukrit007
referenced this issue
in totem/docker-image-factory
Apr 28, 2016
Open
Support for cache busting #31
ctrimble
commented
Apr 28, 2016
|
I have a use case where not being able to invalidate the cache is causing issues. I am running Dropwizard applications (Java REST Services built with Maven) from Docker and an automated system is doing all of the container builds and deployment for me. I include a Dockerfile in my repo and it does the rest. The system runs a production version and one or more development versions of my application. Development builds are where I am having issues. During development, some of the project's dependencies have SNAPSHOT in their version numbers. This instructs Maven that the version is under development and it should bring down a new version with every build. As a result, an identical file structure can result in two distinct builds. This is the desired behavior, since bugs may have been fixed in a SNAPSHOT dependency. To support this, it would be helpful to force Docker to run a particular command, since there is no way to determine the effect of the command based on the current state of the file system. A majority of Java projects are going to run into this, since Maven style SNAPSHOT dependencies are used by several different build systems. |
|
@ctrimble You can use |
ctrimble
commented
Apr 29, 2016
|
@cpuguy83 thank you for the reply. I read the thread and understand the current options. I have opened a ticket with the build system I am using to supply a cache busting argument. Producing two distinct images for a single application seems like a lot of hoops to go through to speed up builds. It would be much easier to be able to specify something like:
This pattern will come up in development builds frequently. It would be nice to have semantics for it in the Dockerfile. |
|
@ctrimble Busting the cache on one step will cause the cache to always be busted for each subsequent step. |
ctrimble
commented
Apr 29, 2016
|
@cpuguy83 exactly. The semantics of my build system are temporal for development builds. I have to select correct builds over caching. I would really like to get both. |
This was referenced May 19, 2016
atrauzzi
commented
Nov 18, 2016
•
|
There's been considerable discussion here, apologies if it's already been suggested, but what if there was something like this:
All docker would do is store the MD5 (or whatever other hash is hip) of the file and if it changes, all steps thereafter are invalidated. I'd probably be doing something like:
May also want to enable a check that some how elapses after a time period. Ansible's For that, the syntax would be:
Docker would know the last-run time and calculate if the time had elapsed based on "now". |
|
@atrauzzi We just support |
atrauzzi
commented
Nov 18, 2016
|
@cpuguy83 Are there any docs or explanations about |
|
@atrauzzi yes, in the build reference. Basically, |
|
I don't see why one would need to check that a file cache is still valid individually, |
atrauzzi
commented
Nov 18, 2016
•
|
@cpuguy83 Good point, didn't even think that, and of course I'm already using it. What about the timestamp/duration approach? Is that doable with what's already available? |
Through build-args;
change the build arg to bust the cache |
sarpk
commented
Nov 28, 2016
|
+1 for a cleaner way |
ianseyer
commented
Feb 1, 2017
|
+1 for a cleaner way |
multi-io
commented
Feb 5, 2017
|
There should also be separate options for disabling reading the cache and for disabling writing to it. For example, you may want to build an image anew from scratch and ignore any cached layers, but still write the resulting new layers to the cache. |
benoror
referenced this issue
in shoonoise/cabot-docker
Feb 8, 2017
Closed
Default username/password: docker/docker doesn't work #42
jrusk
commented
Feb 8, 2017
|
+1 |
chris13524
commented
Feb 9, 2017
•
|
Might I suggest passing the step number to the build command? Something like this: It would ignore all caches after and including step 5 during the build. |
bupadon
commented
Feb 28, 2017
|
+1 |
neoxue
commented
Mar 7, 2017
|
CACHE ON|OFF +1 |
chris13524
commented
Mar 7, 2017
|
The issue with these |
dreamcat4
commented
Mar 7, 2017
|
It is a valid idea / ethos. The command is supposed to coalese together all non-cached layers into a single layer at the point when the cache gets switched back on. Of course you can still argue the best naming / correctness of semantics / preferred syntax of the feature. |
MrCheater
commented
Mar 14, 2017
|
+1 |
CageFox
commented
Mar 16, 2017
|
+1 the must have feature |
StalkAlex
commented
Mar 25, 2017
•
|
Agree for CACHE ON|OFF +1 |
solsson
referenced this issue
in Yolean/build-contract
Mar 27, 2017
Open
Pick up BUILD_FLAGS from external env #22
BeauAnasson
commented
Mar 29, 2017
|
+1 Would be amazing. |
lxblvs
commented
Apr 7, 2017
|
I did not really understand the way Docker caches the steps before and spent half a day investigating why my system is not building correctly. It was the "git clone" caching. Would love to have the |
ChipmunkV
commented
Apr 12, 2017
|
How it's closed? What is the best workaround? |
habeebr
commented
Apr 20, 2017
|
I tried #1996 (comment) and it worked
On the command line:
|
stints
commented
Apr 28, 2017
|
Why not create a new command similar to RUN but doesn't ever cache RUNNC for RUN NO CACHE? |
lumannnn
commented
Jun 20, 2017
|
I can confirm, @habeebr (#1996 (comment)) - I use it in combination with #1996 (comment) |
naoko
commented
Aug 1, 2017
|
+1 |
andrepuschmann
commented
Aug 14, 2017
|
RUNNC is a great idea! |
dolphy01
commented
Sep 21, 2017
•
|
Why was this issue closed? Between the myriad duplicates asking for essentially the same thing and the lengthy comment history of more than one of these duplicates, it seems obvious that there is a healthy interest in seeing this functionality available. I get that it's hard, and perhaps that no one has suggested a sufficiently elegant solution that both meets the need and is clean enough be an attractive Docker addition...but that does not mean that there is no need. The only other argument I've heard in favor of closing this is that there are other ways to accomplish this...but that argument doesn't really pass muster either. Creating multiple base images for the sole purpose of getting around the lack of cache control is unwieldy, contriving an invalidation through an ARG is obtuse and unintuitive. I imagine users want to utilize these "workarounds" about as much as Docker developers want to officially incorporate a sloppy hack into the tool. |
|
its not hard: #10682 |
mohanraj-r commentedSep 24, 2013
branching off the discussion from #1384 :
I understand -no-cache will disable caching for the entire Dockerfile. But would be useful if I can disable cache for a specific RUN command? For example updating repos or downloading a remote file .. etc. From my understanding that right now RUN apt-get update if cached wouldn't actually update the repo? This will cause the results to be different than from a VM?
If disable caching for specific commands in the Dockerfile is made possible, would the subsequent commands in the file then not use the cache? Or would they do something a bit more intelligent - e.g. use cache if the previous command produced same results (fs layer) when compared to a previous run?