-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of Zarr on HPC file systems #659
Comments
Hi @pbranson, that's really interesting to see other people using Zarr on HPC systems. cc @tinaok @fbriol @apatlpo. First of all
Can't you just increase the chunk_size to have reasonably sized files? I've not personaly used Zarr a lot on big datasets, but I've seen users (ccied above) generate millions of files on our system with incorrect chunk size (10's TB dataset with 7MB chunks...). My first advice was to increase the chunk size so as to have at least 100MB chunks or even more. This recommandation is also valid on cloud object store. But I imagine you are probably aware of this.
Do you know where this comes from? Is Xarray able to correctly work with chunks from inside zip files? e.g. accessing different parts of one zip file from multiple process? Sorry about this reply, I'm merely asking more questions and not answering the major part of you issue... |
In my instance it is mainly due to the poor performance of using open_mf_dataset on many poorly chunked netcdf files. The dask graph creates many many file open requests to the netcdf files. And I think there was some sort of threading/file locking issue which would cause these file opens to take many seconds each Working on a file by file basis to make 7GB zips comprised of approximately 50-100MB chunks could be sent to the scheduler in a job array and a small dask cluster tackling each. Regarding parallel access to the zip files - yep it works and it seems to add only minimal overhead (I havent quantified it yet), my understanding is that the zip header has all the chunks and their binary offsets in the file, so it reads the header and seeks to chunk location. |
I wonder if @martindurant has some suggestions on this? |
Yes indeed, reading files from within a zip archive ought to add little overhead, but obviously the system is having to do the same offsetting somewhere along the way. Just don't use tar :) My first question: is there anything wrong with your current approach and the little code you posted? It would be reasonable to have something like this as an option in intake-xarray, although I'm not sure how specialised it is. Since ZipFileSystem I don't know about storing the sets of metadata outside the archives. Obviously, it could be done, something along the lines of the existing consolidation mechanism, or something in Intake, or a new class of dict-store. Effectively, it's like creating a zarr group out of existing data-sets. |
Yes I suspect that there is likely performance implications for the stripe
size at the (in my case lustre) storage layer that will influence
performance of parallel seeks within the zips files.
There isn't really anything wrong perse with what I'm doing, reading 480
consolidated metadata files doesn't take too long, rather the xr.concat
actually takes most of the time (30-40s), which is the step I was hoping to
circumvent by a precompiled full set of metadata. Then a user of the
dataset doesn't need to have several workers running just to open the
dataset, when they may subsequently just slice out a few chunks worth for
their time/region of interest.
…On Wed., 3 Jul. 2019, 10:59 pm Martin Durant, ***@***.***> wrote:
Yes indeed, reading files from within a zip archive ought to add little
overhead, but obviously the system is having to do the same offsetting
*somewhere* along the way. Just don't use tar :)
My first question: is there anything wrong with your current approach and
the little code you posted? It would be reasonable to have something like
this as an option in intake-xarray, although I'm not sure how specialised
it is.
Since ZipFileSystem
<https://github.com/intake/filesystem_spec/blob/master/fsspec/implementations/zip.py#L8>
is now implemented in fsspec, it would be possible to use URLs only and
have Dask/xarray sort things out; I should say *will* be possible in the
near future.
I don't know about storing the sets of metadata outside the archives.
Obviously, it could be done, something along the lines of the existing
consolidation mechanism, or something in Intake, or a new class of
dict-store. Effectively, it's like creating a zarr group out of existing
data-sets.
However, it would take some planning. From the point of view of the
original problem here, perhaps the current two-stage consolidation (write
zarr as normal, then create single metadata for the data-set) could be
short-circuited to avoid the creation of the many small .z* files, at the
cost of making the dataset unreadable without using the consolidate
metadata and possibly hard to change.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#659?email_source=notifications&email_token=ADG5WQDAFU6E3W3O23H62CDP5S5DLA5CNFSM4H345FRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZEXHMQ#issuecomment-508130226>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADG5WQDGUI3NFFAWIICIMKTP5S5DLANCNFSM4H345FRA>
.
|
I'll raise the idea of a "metadata only" zarr group, which could consolidate the consolidated metadatas of its members and how they are stored, at today's zarr meeting. You could also raise an issue at zarr for something like this - or restate your problem, to see if they have any better ideas. |
Hi, what is your stripe count for your Zarr file on your lustre file system? Depending on your chunk size, and number of chunks you have in your Zarr file, modifying stripe count from your system default might improve the performance. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date. |
When using Zarr on HPC filesystems for large datasets, frequently a large number of files are created depending on the chunk_size utilised.
On HPC often file inode limits are placed as many small files can adversely affect the stability of the filesystem and place significant load on the metadata servers.
After creating the Zarr datasets in a directory, they can be stored into zip files (without compression as chunks should already be compressed) and the zarr.storage.ZipStore utilised to access the files and this works well, i.e.:
ds = xr.open_zarr(zarr.storage.ZipStore(zipfile,mode='r'))
Also the recently added zarr.consolidate_metadata works too.
Whilst this works well, often I have found the it is more efficient to still partition the dataset into years/months depending on the source files and size of the dataset.
I am wondering about what the recommended way to consolidate metadata across multiple zarr ZipStores might be? Is it possible to create an intake catalog for this? I think that would require some minor alterations xr.open_zarr to check for .zip file extension and use ZipStore. Alternatively is it possible to create some sort of .zmetadata file that consolidates the metadata?
ww3.aus_4m.197901.zip
ww3.aus_4m.197902.zip
ww3.aus_4m.197903.zip
ww3.aus_4m.197904.zip
ww3.aus_4m.197905.zip
ww3.aus_4m.197906.zip
At present I use the following boilerplate code to load a mult-zipstore dataset:
Thanks for any advice
The text was updated successfully, but these errors were encountered: