-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Force-discard cache if cache format changed #20152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I only have a few potential improvements and a question.
| if meta[0] != cache_version() or meta[1] != CACHE_VERSION: | ||
| manager.log(f"Metadata abandoned for {id}: incompatible cache format") | ||
| return None | ||
| data_io = Buffer(meta[2:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slice operation does an almost full copy of the meta buffer. It's probably not a big deal, but it would be nice if we could avoid it. What about adding read_byte operation in the internal API (that would be just an alias to read_tag for now I guess) that could be used to read the initial two bytes? (No need to do this in this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also maybe slice the memoryview?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I just measured this slice on a "micro-benchmark", it takes around 0.1 microsecond per file (interpreted). So even with 10K files in the build this will be an extra millisecond. It is probably even less when compiled. I will add a TODO here and below.
| meta.write(data_io) | ||
| meta_bytes = data_io.getvalue() | ||
| # Prefix with both low- and high-level cache format versions for future validation. | ||
| meta_bytes = bytes([cache_version(), CACHE_VERSION]) + data_io.getvalue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, the concatenate operation does a full copy of the cache data. We could add a write_byte operation to write the version bytes to a Buffer object in a future-proof way.
mypy/cache.py
Outdated
| conventionally *does not* read the start tag (to simplify logic for unions). Known exceptions | ||
| are MypyFile.read() and SymbolTableNode.read(), since those two never appear in a union. | ||
| If any of these details change, please bump CACHE_VERSION below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to bump CACHE_VERSION also if the structure of the meta cache serialization format changes? We might not able to read the mypy version info field from the meta file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do need to bump it. It will most likely fail with ValueError and thus will be caught and cache will be discarded, but the log line will be wrong. I will update this.
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
If either low-level (i.e.
librt) or high-level cache format changes, discard the cache. Note I intentionally don't uselibrtto read/write the first two bytes of cache meta, se we are 100% sure we can always read them.