Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make build cache ignore mtime #12031

Merged
merged 2 commits into from
Jun 9, 2015
Merged

Conversation

jlhawn
Copy link
Contributor

@jlhawn jlhawn commented Apr 2, 2015

Build cache uses pgk/tarsum to get a digest of content which is
ADD'd or COPY'd during a build. The builder has always used v0 of
the tarsum algorithm which includes mtimes however since the whole
file is hashed anyway, the mtime doesn't really provide any extra
information about whether the file has changed and many version
control tools like Git strip mtime from files when they are cloned.

This patch updates the build subsystem to use v1 of Tarsum which
explicitly ignores mtime when calculating a digest. Now ADD and
COPY will result in a cache hit if only the mtime and not the file
contents have changed.

NOTE: Tarsum is NOT a meant to be a cryptographically secure hash
function. It is a best-effort approach to determining if two sets of
filesystem content are different.

@jlhawn
Copy link
Contributor Author

jlhawn commented Apr 2, 2015

@jfrazelle @icecrime I'm not sure if someone has already submitted a patch like this. It seems like an optional flag to ADD/COPY is also a popular idea: #4351

@duglin
Copy link
Contributor

duglin commented Apr 2, 2015

ping @erikh @tiborvass
I'm worried about breaking backwards compat on this. So, perhaps we could control this feature via a cmd flag on ADD/COPY once #10775 is merged.

@jlhawn
Copy link
Contributor Author

jlhawn commented Apr 2, 2015

@duglin

I'm worried about breaking backwards compat on this.

By backwards compat do you mean how it would result in a cache miss the first time people use it?

So, perhaps we could control this feature via a cmd flag on ADD/COPY

I'm agreeable to that. I can update this PR with some conditionals for using tarsum.v0 or tarsum.v1

@cpuguy83
Copy link
Member

cpuguy83 commented Apr 2, 2015

I know @tianon has given an explanation as to why just disabling it wouldn't be good, but I can never remember the reason.

I don't know a single person that likes the fact that mtime is used.

@duglin
Copy link
Contributor

duglin commented Apr 2, 2015

@jlhawn yup - just worried about people who might count on today's behavior.

@duglin
Copy link
Contributor

duglin commented Apr 2, 2015

e.g. haven't had to do this in a while but I know in a past life I would just "touch ..." something to force a rebuild of it - so a content-only check would break those cases.

@cpuguy83
Copy link
Member

cpuguy83 commented Apr 2, 2015

I guess you can currently do something like touch cachebreaker to deliberatly break the cache

@jlhawn
Copy link
Contributor Author

jlhawn commented Apr 2, 2015

@duglin couldn't you just use --no-cache? Though it may be a sledge hammer for a small nail if you just wanted to invalidate one cached layer.

@duglin
Copy link
Contributor

duglin commented Apr 2, 2015

@jlhawn yep exactly, its a bit of an overkill hammer. Kind of reminds me of the discussion in #10682

@phemmer
Copy link
Contributor

phemmer commented Apr 3, 2015

Instead of a flag on ADD/COPY, what about a flag on docker build? So far none of the Dockerfile commands have flags. While I'm not opposed to them, I feel like they shouldn't be added unless needed.

To me, a flag on docker build even makes more sense. If you're working in a repo (build context) that doesn't preserve mtimes, it's going to be mtimes for every file in the repo that isn't trustworthy, not just one or two files.

@icecrime
Copy link
Contributor

icecrime commented Apr 3, 2015

I like the proposed solution, I'm not sure I see the point of an extra flag for this.

However, this breaks some tests:

--- FAIL: TestBuildADDCurrentDirWithCache (2.04s)
    docker_cli_build_test.go:3033: The cache should have been invalided but hasn't.
--- FAIL: TestBuildADDRemoteFileMTime (2.59s)
    docker_cli_build_test.go:3201: The cache should not have been used but was

@duglin
Copy link
Contributor

duglin commented Apr 3, 2015

@phemmer a flag on docker build is possible as long as people are comfortable with it applying to all ADD/COPY commands and don't need to be selective. Having said that though, I worry about forcing the person issuing the docker build to remember to include this flag each time. I suspect that many times the author of the Dockerfile knows about the issue and will want some way to control it from within the Dockerfile to ensure that all users of the Dockerfile get the same caching behavior.
Of course having to specify it on each ADD/COPY is annoying (In practice I do wonder whether it would really be needed on all or just a small handful). In some other previous issue/PR, the idea of setting global flags from within the Dockerfile was mentioned, perhaps this might be a good usecase for it?

@icecrime
Copy link
Contributor

Ping @jlhawn: can you please fix the tests?

@duglin
Copy link
Contributor

duglin commented Apr 16, 2015

@icecrime can we get #10775 behind us then we can explore all of the options available for features like this mtime one? I'm not sure we should change the current default behavior, so I think leveraging CMD flags might be a better choice. So, I'm not sure there's a point in tweaking code in PRs like this if we may go a different direction after #10775

@stevvooe
Copy link
Contributor

👍 for the design. The build cache should use the latest version of tarsum. The debate over whether or not mtime should apply belongs in a tarsum issue.

@LK4D4
Copy link
Contributor

LK4D4 commented May 12, 2015

ping @jlhawn looks like tests still failing

@jlhawn
Copy link
Contributor Author

jlhawn commented May 12, 2015

@LK4D4 @icecrime @duglin sorry, I forgot about this PR. Do we want to go with this solution? or should I close it in favor of the cache-strategy flag approach?

@LK4D4
Copy link
Contributor

LK4D4 commented May 12, 2015

I'd go with this.
ping @docker/core-maintainers

@duglin
Copy link
Contributor

duglin commented May 12, 2015

I don't see how we can go with this since it changes the behavior people will see. For example, if I touch a file in my build context to force a cache miss on a COPY/ADD, this PR will break me. I would prefer if we offered a flag on ADD/COPY to allow people to opt-in to the new behavior.

@phemmer
Copy link
Contributor

phemmer commented May 12, 2015

it changes the behavior people will see

Does docker have a compatibility guarantee between releases? Are we locked into this behavior until docker 2.0?

@duglin
Copy link
Contributor

duglin commented May 12, 2015

Don't know if its written down but its generally good practice not to change semantics on people in minor (.x) releases.

@tiborvass
Copy link
Contributor

@phemmer if we decide we want this, then it would be in a minor Docker release (next one being 1.7).

EDIT: 1.7 is a minor release

@LK4D4
Copy link
Contributor

LK4D4 commented May 13, 2015

I definitely prefer current behavior over flags to instructions. But I prefer tarsum.V2 over both.

@stevvooe
Copy link
Contributor

Note that tarsum v0 also doesn't take into account xattrs, so this could be considered a bug.

@thaJeztah
Copy link
Member

if I touch a file in my build context to force a cache miss on a COPY/ADD

Well, you could still date >> foobar in stead of touch. However, IMO the cache should only break if the build-context is actually different. I don't think mtime should be taken into account. Apart from "manually" breaking the cache, do we know an actual use-case where the mtime should break the case (i.e., is important for the resulting image). If there are cases, I think they are very exceptional. We could always add an --use-mtime option :)

Note that tarsum v0 also doesn't take into account xattrs, so this could be considered a bug.

Wasn't there an issue with AUFS not supporting xattrs? Or does my memory fail me here?

thaJeztah added a commit that referenced this pull request Jun 9, 2015
@thaJeztah thaJeztah merged commit 90259fe into moby:master Jun 9, 2015
@cpuguy83
Copy link
Member

cpuguy83 commented Jun 9, 2015

@thaJeztah
Copy link
Member

\o/ don't know when it happened, lost track of mtime 😄

@srcspider
Copy link

What release is this scheduled for?

[edit] nvm just now noticed the 1.8 milestone at the top

seriema added a commit to seriema/retro-cloud that referenced this pull request Mar 27, 2020
COPY has a tendency to think that the file changed and invalidating all
the following steps. Since this COPY is for the user creation it's very
early in the Dockerfile, and even with the fat cache layer before it it
still adds a 20min rebuild way too often.

It's possible that the logs in docker/logs is causing the cache
invalidation, but it's happening too often in all branches.

Some references:
* https://stackoverflow.com/questions/48551953/why-does-my-docker-cache-get-invalidated-by-this-copy-command
* Request for reason of cache invalidation: moby/moby#9294
* COPY invalidates cache: moby/moby#21913
* Timestamp part of hash: moby/moby#9391
** Fixed in moby/moby#12031
seriema added a commit to seriema/retro-cloud that referenced this pull request Mar 27, 2020
COPY has a tendency to think that the file changed and invalidating all
the following steps. Since this COPY is for the user creation it's very
early in the Dockerfile, and even with the fat cache layer before it it
still adds a 20min rebuild way too often.

It's possible that the logs in docker/logs is causing the cache
invalidation, but it's happening too often in all branches.

Some references:
* https://stackoverflow.com/questions/48551953/why-does-my-docker-cache-get-invalidated-by-this-copy-command
* Request for reason of cache invalidation: moby/moby#9294
* COPY invalidates cache: moby/moby#21913
* Timestamp part of hash: moby/moby#9391
** Fixed in moby/moby#12031
seriema added a commit to seriema/retro-cloud that referenced this pull request Mar 27, 2020
COPY has a tendency to think that the file changed and invalidating all
the following steps. Since this COPY is for the user creation it's very
early in the Dockerfile, and even with the fat cache layer before it it
still adds a 20min rebuild way too often.

It's possible that the logs in docker/logs is causing the cache
invalidation, but it's happening too often in all branches.

Some references:
* https://stackoverflow.com/questions/48551953/why-does-my-docker-cache-get-invalidated-by-this-copy-command
* Request for reason of cache invalidation: moby/moby#9294
* COPY invalidates cache: moby/moby#21913
* Timestamp part of hash: moby/moby#9391
** Fixed in moby/moby#12031
seriema added a commit to seriema/retro-cloud that referenced this pull request Mar 28, 2020
COPY has a tendency to think that the file changed and invalidating all
the following steps. Since this COPY is for the user creation it's very
early in the Dockerfile, and even with the fat cache layer before it it
still adds a 20min rebuild way too often.

It's possible that the logs in docker/logs is causing the cache
invalidation, but it's happening too often in all branches.

Some references:
* https://stackoverflow.com/questions/48551953/why-does-my-docker-cache-get-invalidated-by-this-copy-command
* Request for reason of cache invalidation: moby/moby#9294
* COPY invalidates cache: moby/moby#21913
* Timestamp part of hash: moby/moby#9391
** Fixed in moby/moby#12031
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.