Skip to content

Conversation

@ilevkivskyi
Copy link
Member

If either low-level (i.e. librt) or high-level cache format changes, discard the cache. Note I intentionally don't use librt to read/write the first two bytes of cache meta, se we are 100% sure we can always read them.

@ilevkivskyi ilevkivskyi requested a review from JukkaL October 31, 2025 15:21
@github-actions

This comment has been minimized.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I only have a few potential improvements and a question.

if meta[0] != cache_version() or meta[1] != CACHE_VERSION:
manager.log(f"Metadata abandoned for {id}: incompatible cache format")
return None
data_io = Buffer(meta[2:])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slice operation does an almost full copy of the meta buffer. It's probably not a big deal, but it would be nice if we could avoid it. What about adding read_byte operation in the internal API (that would be just an alias to read_tag for now I guess) that could be used to read the initial two bytes? (No need to do this in this PR.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also maybe slice the memoryview?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I just measured this slice on a "micro-benchmark", it takes around 0.1 microsecond per file (interpreted). So even with 10K files in the build this will be an extra millisecond. It is probably even less when compiled. I will add a TODO here and below.

meta.write(data_io)
meta_bytes = data_io.getvalue()
# Prefix with both low- and high-level cache format versions for future validation.
meta_bytes = bytes([cache_version(), CACHE_VERSION]) + data_io.getvalue()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, the concatenate operation does a full copy of the cache data. We could add a write_byte operation to write the version bytes to a Buffer object in a future-proof way.

mypy/cache.py Outdated
conventionally *does not* read the start tag (to simplify logic for unions). Known exceptions
are MypyFile.read() and SymbolTableNode.read(), since those two never appear in a union.
If any of these details change, please bump CACHE_VERSION below.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to bump CACHE_VERSION also if the structure of the meta cache serialization format changes? We might not able to read the mypy version info field from the meta file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do need to bump it. It will most likely fail with ValueError and thus will be caught and cache will be discarded, but the log line will be wrong. I will update this.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2025

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@ilevkivskyi ilevkivskyi merged commit 92101f3 into python:master Nov 1, 2025
21 checks passed
@ilevkivskyi ilevkivskyi deleted the use-cache-version branch November 1, 2025 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants