Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Rework cache to key on hash of file contents instead of mtime #3437
Attempt to fix #3403
At present, I'm only adding the ability to validate the cache using a hash even though the module file mtime doesn't match the meta mtime.
However, I'm NOT getting rid of the use of
To verify that the approach works, I first create a failing test by changing the tests runner to touch all the source files before running mypy for incremental tests. Then in the second commit, I fix the broken tests.
How this whole thing should be tested is something that I'd like feedback on (running the tests twice is too time consuming I think?). As it stands, this PR only tests the new approach, and loses the tests for the old approach (with mtime/size). Maybe some kind of combination of the two would be good?
@gvanrossum I added the optimization along the lines you suggested. Now whenever the source code is parsed, we also calculate the source hash (based on the byte representation).
Actually, we need to calculate source hash in two places: once when parsing the source file, and once when checking if the cache is in sync with the source. In the first case, we can do it without an extra file read; in the second, we do need to read the file again (assuming mtime/size didn't give us the positive answer).
The new commit implements this optimization.
I really like this! But there are a few issues still... Let me know if you have time to work on those, else I will take over the development of this PR.
Almost there! I really just have very small refactoring wishes, and they are optional.