Improved build cache invalidation with content hashes #3705
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #3145.
The current build cache invalidation algorithm compares the input
timestamps to the output timestamps, and only triggers a rebuild if the
input timestamps are newer than the output timestamps. However, this
does not appear to be sufficient: as discussed in #3145, we think that
the reason that doing
rm -r output
often fixes weird compile errors isthat we should really be considering the input file to have changed if
its timestamp is different to what it was at the last successful
build, regardless of whether it is before or after the output timestamp.
Essentially, timestamps on input files can't be trusted to the extent
that we do for cache invalidation, because of things like switching
between different versions of dependencies or switching branches;
sometimes you can have an input file's contents and timestamp both
change, but have the timestamp still be older than the output timestamp.
This commit implements a slightly different cache invalidation
algorithm, where we make a note of the timestamps of all input files at
the start of each build, and we consider files to have changed in
subsequent builds if their input timestamps have changed at all
(regardless of whether the new input timestamps are before or after the
output timestamps).
The timestamps are stored in a json file
cache-db.json
in the outputdirectory; I also considered putting the timestamps in the externs
files, but I think having them stored separately is preferable because
then we don't have to update the module's externs file if its input file
timestamp changes but its hash doesn't, which means that we don't force
a rebuild for downstream modules.
As an additional enhancement, we also make note of file content hashes
and store them in the
cache-db.json
file. On subsequent builds, iftimestamps have changed, we compare the previous hash to the new hash,
and if they are identical, we can skip rebuilding the module. This means
that e.g. touching a file no longer forces a rebuild. Note that we only
compute hashes in the case where timestamps differ to avoid doing extra
unnecessary work. This scheme of checking timestamps and then hashes was
inspired by Shake, which provides this mechanism as one of its options
for Change; see #3145 (comment)
I've also added some tests so that we can make changes to this part of
the compiler a little more confidently.
I'm using the latest version of
these
(which is not in our Stacksnapshot) because it doesn't incur a
lens
dependency, whereas earlierversions do.