Skip to content

Rework cache to key on hash of file contents instead of mtime #3403

@gvanrossum

Description

@gvanrossum

The key of the cache is currently the mtime (and size) of the source file. A problem here is that it's impossible to share a "pre-warmed" cache between users because the mtimes of files are different for each users. I propose to change the cache to be keyed on a hash instead. It should probably be sha1, or perhaps md5 if it's significantly faster.

Hashing all files in a large tree is pretty efficient. But just to make it more efficient we could add a "meta cache" that is keyed by mtime, where if we've already got a hash for a file and its mtime isn't changed, we bypass computation of the hash. If the mtime changed (but the size matches that in the cache) we re-hash the file and if that hash matches the hash from the cache we update the mtime in the "meta cache".

Once we have all this in place we can think about sharing copies of the cache between users (using a mechanism outside mypy, so mypy won't have to worry about authentication and security).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions