Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publicly expose/document API for retrieving cache file details #857

Open
jwodder opened this issue Dec 14, 2021 · 1 comment
Open

Publicly expose/document API for retrieving cache file details #857

jwodder opened this issue Dec 14, 2021 · 1 comment

Comments

@jwodder
Copy link
Contributor

jwodder commented Dec 14, 2021

In an application that uses CachingFileSystem, we are interested in implementing partial cache cleanup based on cached file size & age. While fsspec already has the necessary pieces to accomplish this, such code would depend on currently-undocumented implementation details regarding the structure of cache metadata, which would not be a wise thing to do. We thus request the addition of some public method to CachingFileSystem for listing cached paths and the files that cache them.

@martindurant
Copy link
Member

I support this, but I think we can also do a better job of cleaning up the expectations of the caching framework in general. For example, we can split apart:

  • how do we turn target path names into cache path names (currently we have just the one option, to hash paths or keep the basename)
  • how parts of a file are stored (this is the one piece we currently implement as separate classes)
  • how any necessary metadata is stored (single JSON file? sidecar files? same place as cache or elsewhere?)
  • how consistency and liveness of cached data is determined (we have a couple of options)

I am hoping that, at the very minimum, we can, for example, enable caching to backends other than the local filesystem, so as a "local" S3 bucket in the same data centre as the process. But then we cannot assume that we have direct or up-to-date access to the cache metadata and we have to face the other problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants