-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make build cache ignore mtime #12031
Conversation
@jfrazelle @icecrime I'm not sure if someone has already submitted a patch like this. It seems like an optional flag to ADD/COPY is also a popular idea: #4351 |
ping @erikh @tiborvass |
By backwards compat do you mean how it would result in a cache miss the first time people use it?
I'm agreeable to that. I can update this PR with some conditionals for using tarsum.v0 or tarsum.v1 |
I know @tianon has given an explanation as to why just disabling it wouldn't be good, but I can never remember the reason. I don't know a single person that likes the fact that mtime is used. |
@jlhawn yup - just worried about people who might count on today's behavior. |
e.g. haven't had to do this in a while but I know in a past life I would just "touch ..." something to force a rebuild of it - so a content-only check would break those cases. |
I guess you can currently do something like |
@duglin couldn't you just use |
Instead of a flag on ADD/COPY, what about a flag on To me, a flag on |
I like the proposed solution, I'm not sure I see the point of an extra flag for this. However, this breaks some tests:
|
@phemmer a flag on |
Ping @jlhawn: can you please fix the tests? |
@icecrime can we get #10775 behind us then we can explore all of the options available for features like this mtime one? I'm not sure we should change the current default behavior, so I think leveraging CMD flags might be a better choice. So, I'm not sure there's a point in tweaking code in PRs like this if we may go a different direction after #10775 |
👍 for the design. The build cache should use the latest version of tarsum. The debate over whether or not mtime should apply belongs in a tarsum issue. |
ping @jlhawn looks like tests still failing |
I'd go with this. |
I don't see how we can go with this since it changes the behavior people will see. For example, if I touch a file in my build context to force a cache miss on a COPY/ADD, this PR will break me. I would prefer if we offered a flag on ADD/COPY to allow people to opt-in to the new behavior. |
Does docker have a compatibility guarantee between releases? Are we locked into this behavior until docker 2.0? |
Don't know if its written down but its generally good practice not to change semantics on people in minor (.x) releases. |
@phemmer if we decide we want this, then it would be in a minor Docker release (next one being 1.7). EDIT: 1.7 is a minor release |
I definitely prefer current behavior over flags to instructions. But I prefer |
Note that tarsum v0 also doesn't take into account xattrs, so this could be considered a bug. |
Well, you could still
Wasn't there an issue with AUFS not supporting xattrs? Or does my memory fail me here? |
Make build cache ignore mtime
\o/ don't know when it happened, lost track of mtime 😄 |
What release is this scheduled for? [edit] nvm just now noticed the 1.8 milestone at the top |
COPY has a tendency to think that the file changed and invalidating all the following steps. Since this COPY is for the user creation it's very early in the Dockerfile, and even with the fat cache layer before it it still adds a 20min rebuild way too often. It's possible that the logs in docker/logs is causing the cache invalidation, but it's happening too often in all branches. Some references: * https://stackoverflow.com/questions/48551953/why-does-my-docker-cache-get-invalidated-by-this-copy-command * Request for reason of cache invalidation: moby/moby#9294 * COPY invalidates cache: moby/moby#21913 * Timestamp part of hash: moby/moby#9391 ** Fixed in moby/moby#12031
COPY has a tendency to think that the file changed and invalidating all the following steps. Since this COPY is for the user creation it's very early in the Dockerfile, and even with the fat cache layer before it it still adds a 20min rebuild way too often. It's possible that the logs in docker/logs is causing the cache invalidation, but it's happening too often in all branches. Some references: * https://stackoverflow.com/questions/48551953/why-does-my-docker-cache-get-invalidated-by-this-copy-command * Request for reason of cache invalidation: moby/moby#9294 * COPY invalidates cache: moby/moby#21913 * Timestamp part of hash: moby/moby#9391 ** Fixed in moby/moby#12031
COPY has a tendency to think that the file changed and invalidating all the following steps. Since this COPY is for the user creation it's very early in the Dockerfile, and even with the fat cache layer before it it still adds a 20min rebuild way too often. It's possible that the logs in docker/logs is causing the cache invalidation, but it's happening too often in all branches. Some references: * https://stackoverflow.com/questions/48551953/why-does-my-docker-cache-get-invalidated-by-this-copy-command * Request for reason of cache invalidation: moby/moby#9294 * COPY invalidates cache: moby/moby#21913 * Timestamp part of hash: moby/moby#9391 ** Fixed in moby/moby#12031
COPY has a tendency to think that the file changed and invalidating all the following steps. Since this COPY is for the user creation it's very early in the Dockerfile, and even with the fat cache layer before it it still adds a 20min rebuild way too often. It's possible that the logs in docker/logs is causing the cache invalidation, but it's happening too often in all branches. Some references: * https://stackoverflow.com/questions/48551953/why-does-my-docker-cache-get-invalidated-by-this-copy-command * Request for reason of cache invalidation: moby/moby#9294 * COPY invalidates cache: moby/moby#21913 * Timestamp part of hash: moby/moby#9391 ** Fixed in moby/moby#12031
Build cache uses pgk/tarsum to get a digest of content which is
ADD'd or COPY'd during a build. The builder has always used v0 of
the tarsum algorithm which includes mtimes however since the whole
file is hashed anyway, the mtime doesn't really provide any extra
information about whether the file has changed and many version
control tools like Git strip mtime from files when they are cloned.
This patch updates the build subsystem to use v1 of Tarsum which
explicitly ignores mtime when calculating a digest. Now ADD and
COPY will result in a cache hit if only the mtime and not the file
contents have changed.
NOTE: Tarsum is NOT a meant to be a cryptographically secure hash
function. It is a best-effort approach to determining if two sets of
filesystem content are different.