Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelise writing of memh5 containers #42

Open
jrs65 opened this issue Feb 18, 2017 · 0 comments
Open

Parallelise writing of memh5 containers #42

jrs65 opened this issue Feb 18, 2017 · 0 comments

Comments

@jrs65
Copy link
Contributor

jrs65 commented Feb 18, 2017

At the moment when distributed memh5 datasets are written to disk, they are done so serially, with each rank waiting for its turn to write. This is clearly a bit dumb when running on nice parallel filesystems like GPFS (nice might be a bit generous).

The way to work around this is to:

  • Use one rank to create and pre-allocate the dataset as a contiguous dataset, also probably a good time to write out the attributes.
  • Close the file.
  • Redistribute the data to the slowest varying axis.
  • Have each rank open the file, figure out the offset into the file for its chunk of data and then lock the range of the data it needs.
  • Each rank writes its data and closes the file in parallel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant