-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for cloud object storage (S3, GCS, ADLS, etc.) #1164
Conversation
|
||
|
||
@contextlib.contextmanager | ||
def download_h5(url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why download_h5 and not open_h5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also does this work with remote urls? I guess if it's remote fsspec.open_local downloads it first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
download_h5
will first download the remote file to local, then open it for read-only.
upload_h5
opens the file for write then uploads to the remote url.
The open_local
just means to download the file locally before opening, yes, as opposed to streaming. This is primarily for the purpose of caching (if you intend to re-read repeatedly).
Also consolidated HDF5 and Parquet caching into a single abstraction layer as part of the backend called
CacheManager
. Now, local backends can optionally cache in either HDF5 or Parquet, and the newDatasetManager
abstraction will do the work of creating the appropriate dataset based on the cache format.Depends on uber/petastorm#665.