Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] HDF5+GDS+multi-threading #295

Closed
madsbk opened this issue Oct 2, 2023 · 5 comments
Closed

[Discussion] HDF5+GDS+multi-threading #295

madsbk opened this issue Oct 2, 2023 · 5 comments

Comments

@madsbk
Copy link
Member

madsbk commented Oct 2, 2023

In #287, we propose to implement a Virtual File Driver (VFD) that uses KvikIO to accelerate HDF5 IO. However, HDF5 isn’t thread-safe thus implementing a VFD might be of limited interest to projects like Legate that make heavy use of muti-threading.

Notice, it is possible to compile HDF5 with --enable-threadsafe but it effectively makes the entire HDF5 library a giant critical region. There is a RFC to make HDF5 (or part of it) thread-safe, RFC: Multi-Thread HDF5, but it is not coming soon.

Let’s look at some alternative approaches that supports both GDS and multi-threading:

1. Use Kerchunk

  • Easy to support GDS through KvikIO
  • We would need to extent Kerchunk to support virtual dataset
  • Only read support
  • Only support basic HDF5 (no endianness change etc.)

2. Parse the HDF5 metadata and extract contiguous data blocks ourselves

  • Easy to support GDS through KvikIO
  • Support decompression on-the-fly using nvCOMP
  • Support writing by locally creating empty HDF5 files and then fill them with data in parallel.
  • Only support basic HDF5 (no endianness change etc.)
  • Harder to support compression since we don’t know the size of the data blocks on disk in this case.

3. Wait for multi-thread support in HDF5

  • Supports read and write of any HDF5 file.
  • It might take a look time for something like RFC: Multi-Thread HDF5 to be released.
  • Hard to support GDS, we need to implement a VFD that uses KvikIO.
  • Might be hard to support on-the-fly GPU compression and decompression.

Any thoughts?

@wence-
Copy link
Contributor

wence- commented Oct 2, 2023

[...]

Any thoughts?

Note that I do not know a lot about the details of this VFD interface in HDF5. So I am therefore maybe being naive.

At what level do you need thread-safety in the VFD interface? It looks to me like you're providing callbacks for read/write that HDF5 can use. If the HDF5 calls are single-threaded, you can presumably do whatever you like internally as long as you expose a "single-thread consistent" interface to HDF5.

Or is it not that easy?

@madsbk
Copy link
Member Author

madsbk commented Oct 2, 2023

If the HDF5 calls are single-threaded, you can presumably do whatever you like internally as long as you expose a "single-thread consistent" interface to HDF5.

Correct, the VFD itself can be multi-threaded but Legate uses threads (as opposed to processes) when parallelizing tasks on the same node. E.g., if two Legate tasks runs on the same machine, their calls to hdf5 must be serialized.

@manopapad
Copy link

  1. Supporting writes is pretty important, so I would vote against relying on Kerchunk for the long term.

  2. I am favorable to this one, more comments after (3)

  3. Legate specifically might be OK with single-thread-per-process (or at least serialized access from different threads within the same process), so the VFD approach doesn't need to wait on multi-threaded HDF5, at least for Legate. The reason is that we may have to switch to a rank-per-GPU default anyway (for the benefit of other libraries that just don't work under rank-per-node).

    The more fundamental problem for Legate is that we would have multiple processes trying to read/write the same HDF5 file; can the VFD approach handle that mode? On another thread you linked to https://forum.hdfgroup.org/t/parallel-read-of-a-single-hdf5-file/7960/4, which seems to suggest that the (only?) way to get safe multi-process access is to use an MPI-based VFD, and Legate is trying to move away from depending on MPI (as that throws a wrench e.g. on redistributability of builds).

    Implementing a Legate+Kvikio-aware VFD might be even more work than (2), but it would presumably work out-of-the-box with all HDF5 features.

    Also, you possibly have less control over how the underlying file I/O is invoked, so it might not be done in the most performant way possible (this is speculative; possibly this is not an issue, depending on what contract the VFD interface provides to the implementor).

    Note: All of the above is from the point of view of Legate; other clients might be more strict about the need for true multi-threading, and not care about including MPI.

So at this point I believe the question is, is it better to go through the "official" VFD extension interface, or only use the HDF5 API up to the point where we get access to the underlying buffers, and from that point on proceed independently. The latter would be less constrained by the main HDF5 library's quirks, and would have clearer performance characteristics, but wouldn't be as fully-featured. Which alternative is more programming effort is unclear.

I am favorable towards (2), but I am absolutely not an expert here.

@madsbk
Copy link
Member Author

madsbk commented Oct 3, 2023

The more fundamental problem for Legate is that we would have multiple processes trying to read/write the same HDF5 file; can the VFD approach handle that mode?

In principle, yes. The MPI backend in hdf5 is implemented using a VFD approach. For reading, this should be straightforward but in order to support writing, we would have to implement something similar to MPIO VFD.

@madsbk
Copy link
Member Author

madsbk commented Oct 3, 2023

So at this point I believe the question is, is it better to go through the "official" VFD extension interface, or only use the HDF5 API up to the point where we get access to the underlying buffers, and from that point on proceed independently. The latter would be less constrained by the main HDF5 library's quirks, and would have clearer performance characteristics, but wouldn't be as fully-featured.

Very well put.

Which alternative is more programming effort is unclear.

That I can answer, option (2) is significant less work. Particularly, if we want to support parallel write to a single file (uncompressed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants