Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save cache metadata in JSON format #1353

Merged
merged 5 commits into from
Sep 15, 2023

Conversation

ianthomas23
Copy link
Collaborator

This changes the file format used for storing cached metadata from pickle to JSON. It eliminates the problem that pickled files written using one version of Python might not be readable using a different version of python, and an extra benefit is that such files are now human readable so easier to understand/debug.

All loading and saving of cache metadata is now performed in specific low-level functions CacheMetadata._load and CacheMetadata._save.

We still support the reading of cached metadata from pickled files so that pre-existing caches should still be usable. Such files will be saved as JSON the next time they are saved.

The implementation uses ujson if it is installed, falling back to json. This is the same approach used in ReferenceFileSystem.

The only slight complexity here is that we have to test both JSON and pickle formats in CI, so I have added a private boolean attribute CacheMetadata._force_save_pickle. This should only be used in tests, not by end users. If set to True subsequent saves are in pickle format so we can simulate old pickled metadata to confirm it can still be read and is converted to JSON on next save.

Fixes #825.

try:
with open(fn, "r") as f:
return json.load(f)
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception should be way more specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOError or JSON decode error. The latter will depend on the JSON library used, I don't think they nicely call it the same thing - but probably both subclass ValueError.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Exception was purposely generic. If anything fails with the JSON load it tried the pickle load. A corner case failure in the first will be repeated in the second.

It is not as simple as IOError or JSON decode errror. It is easy to obtain a UnicodeDecodeError for example, and I expect there are others that I haven't explicitly tested for.

But I don't have a strong opinion either way. I have changed it to a ValueError as suggested.

@martindurant martindurant merged commit d99979d into fsspec:master Sep 15, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak in CachingFileSystem (possibly caused by pickle.load ?)
3 participants