Expose clobberTimes as a flag to ADD instruction #4351

Closed
bjaglin opened this Issue Feb 26, 2014 · 23 comments

Comments

Projects
None yet
@bjaglin
Contributor

bjaglin commented Feb 26, 2014

In my scenario, I am ADDing a bunch of libraries packaged as jars, copied over in a target directory during build time. I would like that instruction to be cached across builds (unless new libraries are added obviously, which doesn't happen very often).

When ADDing a URL in a Dockerfile, ctime, atime and mtime are not considered for the hash (see https://github.com/dotcloud/docker/pull/2809/files#diff-d8fbf83f530e600c64298c1b18b04d4aR411). It would be interesting to expose a flag controlling that clobberTimes for local files/directories, to be able to utilize caches in cases where the metadata of a file doesn't matter. I am thinking about adding a --ignore-times or --clober-times flag to the ADD instruction but I can't see any option in the current set of instructions, so it might be considered an anti-pattern. For the effect of that flag to be completely consistent across image builders, the times should also be reset to a default value (0?), so that the use of this option produces the exact same image no matter what the times on the filesystem of the host building the image are.

I am willing to contribute a patch if there is some interest, and some agreement on the exact naming/semantics of that flag.

Running Docker 0.8.1 on Ubuntu 14.04.

@bjaglin

This comment has been minimized.

Show comment
Hide comment
@bjaglin

bjaglin Feb 26, 2014

Contributor

Improvement over #2809

Contributor

bjaglin commented Feb 26, 2014

Improvement over #2809

@tianon

This comment has been minimized.

Show comment
Hide comment
@tianon

tianon Feb 26, 2014

Member

+1 to having it be something optional, that solves the debate very nicely IMO :)

Member

tianon commented Feb 26, 2014

+1 to having it be something optional, that solves the debate very nicely IMO :)

@bjaglin

This comment has been minimized.

Show comment
Hide comment
@bjaglin

bjaglin Mar 31, 2014

Contributor

Duplicate of #3699

Contributor

bjaglin commented Mar 31, 2014

Duplicate of #3699

@bjaglin

This comment has been minimized.

Show comment
Hide comment
@bjaglin

bjaglin Apr 7, 2014

Contributor

Reopening as the issue this was duplicated on was more of a (now fixed) bug than a feature request.

Contributor

bjaglin commented Apr 7, 2014

Reopening as the issue this was duplicated on was more of a (now fixed) bug than a feature request.

@amuino

This comment has been minimized.

Show comment
Hide comment
@amuino

amuino Apr 7, 2014

Contributor

If the maintainers think that the current behavior is the desired one, I'd take the optional flag option (although I'd go beyond times and just use the content sum, ignoring all metadata, specially given there are a number of other bugs related to file metadata normalization)

Contributor

amuino commented Apr 7, 2014

If the maintainers think that the current behavior is the desired one, I'd take the optional flag option (although I'd go beyond times and just use the content sum, ignoring all metadata, specially given there are a number of other bugs related to file metadata normalization)

@amuino

This comment has been minimized.

Show comment
Hide comment
@amuino

amuino Aug 15, 2014

Contributor

The flag should probably apply to COPY too.

Contributor

amuino commented Aug 15, 2014

The flag should probably apply to COPY too.

@bjaglin

This comment has been minimized.

Show comment
Hide comment
@bjaglin

bjaglin Sep 11, 2014

Contributor

Related to #7387

Contributor

bjaglin commented Sep 11, 2014

Related to #7387

@boushley

This comment has been minimized.

Show comment
Hide comment
@boushley

boushley Sep 12, 2014

Any update on what flags might look like to these commands or progress on the implementation?

If they really want to avoid flags we could add new commands that are equivalent to COPY and ADD with the differences in how caching is determined. Can't think of a great name now but maybe ADD-CONTENTS and COPY-CONTENTS to be clear that it's just the contents of the files that matter.

Any update on what flags might look like to these commands or progress on the implementation?

If they really want to avoid flags we could add new commands that are equivalent to COPY and ADD with the differences in how caching is determined. Can't think of a great name now but maybe ADD-CONTENTS and COPY-CONTENTS to be clear that it's just the contents of the files that matter.

@duglin

This comment has been minimized.

Show comment
Hide comment
@duglin

duglin Sep 30, 2014

Contributor

@vbatts @crosbymichael @tianon
Wasn't sure if I should put the comment here or in #7387 but I was wondering if it wouldn't be a good idea to allow the user some level of control here instead of changing the tarsum for everyone. So, let the default tarsum work as it does today (meaning include 'mtime'), but allow people to specify some options on ADD/COPY to tell us to be less picky - e.g.:
ADD --ignore-times foo /tmp/
I don't think this would be too hard to add (I could work on the PR) and doesn't break backwards compat.
The downside to this is that I don't think we have any other Dockerfile commands that takes options like this, but I could be wrong, and does this introduce complexity or open the door for better things in the future?

Contributor

duglin commented Sep 30, 2014

@vbatts @crosbymichael @tianon
Wasn't sure if I should put the comment here or in #7387 but I was wondering if it wouldn't be a good idea to allow the user some level of control here instead of changing the tarsum for everyone. So, let the default tarsum work as it does today (meaning include 'mtime'), but allow people to specify some options on ADD/COPY to tell us to be less picky - e.g.:
ADD --ignore-times foo /tmp/
I don't think this would be too hard to add (I could work on the PR) and doesn't break backwards compat.
The downside to this is that I don't think we have any other Dockerfile commands that takes options like this, but I could be wrong, and does this introduce complexity or open the door for better things in the future?

@techniq techniq referenced this issue in dokku/dokku Feb 19, 2015

Closed

Dockerfile support #791

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz Feb 25, 2015

Contributor

@duglin didnt we add mtime

Contributor

jessfraz commented Feb 25, 2015

@duglin didnt we add mtime

@duglin

This comment has been minimized.

Show comment
Hide comment
@duglin

duglin Feb 26, 2015

Contributor

@jfrazelle yes mtime is considered part of the cache check. Once #10775 is merged (hint hint :-) ) then we can look at how to add flags like this - its a very popular request.

Contributor

duglin commented Feb 26, 2015

@jfrazelle yes mtime is considered part of the cache check. Once #10775 is merged (hint hint :-) ) then we can look at how to add flags like this - its a very popular request.

@Silex

This comment has been minimized.

Show comment
Hide comment
@Silex

Silex Mar 15, 2015

I was asked to merge my issue into this one, my proposal was like so:

--cache-strategy timestamp|sha1sum|md5sum

timestamp (default): use the modification time of the file (fast). Only cares about file timestamp.
sha1sum: use sha1sum to determinate if file has changed. Only cares about file contents.
md5sum: use md5sum to determinate if file has changed (slow). Only cares about file contents.

Silex commented Mar 15, 2015

I was asked to merge my issue into this one, my proposal was like so:

--cache-strategy timestamp|sha1sum|md5sum

timestamp (default): use the modification time of the file (fast). Only cares about file timestamp.
sha1sum: use sha1sum to determinate if file has changed. Only cares about file contents.
md5sum: use md5sum to determinate if file has changed (slow). Only cares about file contents.
@wrossmck

This comment has been minimized.

Show comment
Hide comment
@rflynn

This comment has been minimized.

Show comment
Hide comment

rflynn commented Mar 31, 2015

+1

@soupdiver

This comment has been minimized.

Show comment
Hide comment

I also like #4351 (comment)
👍

@UnquietCode

This comment has been minimized.

Show comment
Hide comment
@UnquietCode

UnquietCode May 11, 2015

I'm digging the idea of a cache strategy flag (comment above) but would suggest that the default be set to use the checksum. Also, supporting and exposing different hash algorithms seems unnecessary and confusing.

Thus:

--cache-strategy checksum|timestamp

checksum: (default) use a hashing algorithm to determinate if file has changed. Only cares about file contents.
timestamp:  use the modification time of the file (fast). Only cares about file timestamp.

Where timestamp would probably use mtime and could make use of HTTP headers and other sources of timing data.

I'm digging the idea of a cache strategy flag (comment above) but would suggest that the default be set to use the checksum. Also, supporting and exposing different hash algorithms seems unnecessary and confusing.

Thus:

--cache-strategy checksum|timestamp

checksum: (default) use a hashing algorithm to determinate if file has changed. Only cares about file contents.
timestamp:  use the modification time of the file (fast). Only cares about file timestamp.

Where timestamp would probably use mtime and could make use of HTTP headers and other sources of timing data.

@duglin

This comment has been minimized.

Show comment
Hide comment
@duglin

duglin May 11, 2015

Contributor

I'm +1 to allowing the user to specify some flag to tell the builder to use checksums only and not timestamps. I think enough people have complained that we need to do something to help them.

I'm -1 to the idea of changing the defaults though. Unless we can be 100% sure that no one is counting on today's behavior (which I don't think we can do), I think changing the default on people would be a mistake. Eg. I can see people using 'touch' to force a cache miss - I think we'd break them.

One thing that's less clear to me what the flag should look like. --cache-strategy is certainly the proper technical description of it, but... is it too technical (geeky)? Do we expect to have lots of new caching strategies in the future? If so, then that flag make sense. But if not then I'm leaning more towards something like "--ignore-mtime" which is more restrictive but also more user-friendly.

Contributor

duglin commented May 11, 2015

I'm +1 to allowing the user to specify some flag to tell the builder to use checksums only and not timestamps. I think enough people have complained that we need to do something to help them.

I'm -1 to the idea of changing the defaults though. Unless we can be 100% sure that no one is counting on today's behavior (which I don't think we can do), I think changing the default on people would be a mistake. Eg. I can see people using 'touch' to force a cache miss - I think we'd break them.

One thing that's less clear to me what the flag should look like. --cache-strategy is certainly the proper technical description of it, but... is it too technical (geeky)? Do we expect to have lots of new caching strategies in the future? If so, then that flag make sense. But if not then I'm leaning more towards something like "--ignore-mtime" which is more restrictive but also more user-friendly.

@dreamcat4

This comment has been minimized.

Show comment
Hide comment
@dreamcat4

dreamcat4 May 11, 2015

Hello. Please make sure the solution works with not downloading the github releases every time. That website is very popular souce to ADD tarballs from. Thank you.

Hello. Please make sure the solution works with not downloading the github releases every time. That website is very popular souce to ADD tarballs from. Thank you.

@wrossmck

This comment has been minimized.

Show comment
Hide comment
@wrossmck

wrossmck May 11, 2015

@duglin I'd argue that --cache-strategy is more user friendly than --ignore-mtime, and there are the obvious benefits of not being locked into a specific flip-flop flag.

+1 for --cache-strategy
+1 for not changing the defaults

@duglin I'd argue that --cache-strategy is more user friendly than --ignore-mtime, and there are the obvious benefits of not being locked into a specific flip-flop flag.

+1 for --cache-strategy
+1 for not changing the defaults

@dreamcat4

This comment has been minimized.

Show comment
Hide comment
@dreamcat4

dreamcat4 May 11, 2015

Why not just abbreviate --cache-strategy to --cache for short? Then:

ADD --cache=checksum https://github.com/account/repo/.../archive.tgz archive.tgz
# or
ADD --cache=mtime https://github.com/account/repo/.../archive.tgz archive.tgz

Oh wait. Did I mention that I also need that in conjunction with a --extract flag all on the same line?

ADD --cache=mtime --extract https://github.com/account/repo/.../archive.tgz archive.tgz
# looks better than
ADD --cache-strategy=mtime --extract https://github.com/account/repo/.../archive.tgz archive.tgz

Unnecessarily long / verbose arguments like --cache-strategy will ultimately make the Dokerfile more trouble to read / bit gets scrolled that much extra off the page.

Why not just abbreviate --cache-strategy to --cache for short? Then:

ADD --cache=checksum https://github.com/account/repo/.../archive.tgz archive.tgz
# or
ADD --cache=mtime https://github.com/account/repo/.../archive.tgz archive.tgz

Oh wait. Did I mention that I also need that in conjunction with a --extract flag all on the same line?

ADD --cache=mtime --extract https://github.com/account/repo/.../archive.tgz archive.tgz
# looks better than
ADD --cache-strategy=mtime --extract https://github.com/account/repo/.../archive.tgz archive.tgz

Unnecessarily long / verbose arguments like --cache-strategy will ultimately make the Dokerfile more trouble to read / bit gets scrolled that much extra off the page.

@j0hnsmith

This comment has been minimized.

Show comment
Hide comment
@Silex

This comment has been minimized.

Show comment
Hide comment
@Silex

Silex Jun 15, 2015

Using --cache instead of --cache-strategy looks good. When I suggested it I didn't put that much thought into it, improvements are always welcome.

My original suggestion was docker build --cache=md5 -t foo, but having it tunable for ADD or COPY statements looks even more flexible... still what counts for me is to have the global docker build flag, because it's rare that you need a flag for one copy and a different one for another.

Oh, I'm also fine with options for such flag to be timestamp and checksum instead of specific implementations.

Silex commented Jun 15, 2015

Using --cache instead of --cache-strategy looks good. When I suggested it I didn't put that much thought into it, improvements are always welcome.

My original suggestion was docker build --cache=md5 -t foo, but having it tunable for ADD or COPY statements looks even more flexible... still what counts for me is to have the global docker build flag, because it's rare that you need a flag for one copy and a different one for another.

Oh, I'm also fine with options for such flag to be timestamp and checksum instead of specific implementations.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jun 15, 2015

Member

We recently merged a pull-request that makes the cache ignore mtime; #12031. After some discussion, it was considered to be a bug that mtime was taken into account to invalidate the cache (#12031 (comment)).

Unfortunately the PR didn't make it in time for the 1.7 release, but it should be in the 1.8 release.

This means that with the 1.8 release, mtime will be ignored for local files.

For people "mis-using" that behavior to deliberately bust the cache, this can be a breaking change, and a different approach should be taken.

I'm going to close this issue, because the "mtime" bug is now resolved in #12031

If there's still a need to have more fine-grained control over cache strategies (as discussed above), please open a new issue.

Member

thaJeztah commented Jun 15, 2015

We recently merged a pull-request that makes the cache ignore mtime; #12031. After some discussion, it was considered to be a bug that mtime was taken into account to invalidate the cache (#12031 (comment)).

Unfortunately the PR didn't make it in time for the 1.7 release, but it should be in the 1.8 release.

This means that with the 1.8 release, mtime will be ignored for local files.

For people "mis-using" that behavior to deliberately bust the cache, this can be a breaking change, and a different approach should be taken.

I'm going to close this issue, because the "mtime" bug is now resolved in #12031

If there's still a need to have more fine-grained control over cache strategies (as discussed above), please open a new issue.

@thaJeztah thaJeztah closed this Jun 15, 2015

@ejholmes ejholmes referenced this issue in remind101/docker-build Jul 8, 2015

Closed

Faster #3

1 of 2 tasks complete

@CaseyLeask CaseyLeask referenced this issue in delfick/harpoon Sep 26, 2016

Open

Retire gitmit #22

@lanoxx lanoxx referenced this issue in spotify/docker-maven-plugin Jan 12, 2017

Closed

File timestamp set to Jan 1, 1970 #58

@thaJeztah thaJeztah added this to the 1.8.0 milestone Jan 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment