-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential deadlock from some combination of h5py
, gc
, and fsspec
#2019
Comments
Specifically, it runs an event loop on another thread, and submits coroutines to it and waits for their completion. Such coroutines will be created every time a read is outside of the file-like object's cache of bytes, and most of the running code will be within aiohttp. |
I built h5py from source and set I'm not even sure what to look for right now. In theory, we should have two threads trying to acquire the lock, each waiting on the other, right? Or perhaps there are multiple locks in flight? |
If you put a This is where we do the most complicated things with gc (we temporarily turn it off 😱 to get around a race condition) Lines 113 to 153 in e3740aa
There is also a libhdf5 level lock, but that should be completely protected by h5py's Given that traceback, it looks like it is in a waiting loop inside of fsspec waiting to get bits back. If I am understanding this right fsspec is completely naive to h5py / hdf5 and asside from the file-like interface vica versa? My guess is that this is race condition in fsspec's cache where something is noted to be there and then evicted before it can be accessed. |
...and also explicitly turn it back on again, whatever the calling code had in mind. |
This (either inside or outside the context manager) seems to completely avoid the hanging. I'll run it some more times to see if it ever shows up.
Yes, I think that's correct. |
To be clear, this only happens in combination with h5. I'm not sure if we have proof that the gc in involved, beyond @TomAugspurger 's statement that calling a collect might prevent the problem. Collecting garbage might be more complex in the presence of h5netcdf/xarray. |
FWIW, it might not be caching specifically. The following should disable fsspec's caching and still reproduces the deadlock import fsspec
import h5netcdf
import h5py
import gc
def main():
url = "https://pangeo.blob.core.windows.net/pangeo-public/hurs_day_ACCESS-CM2_historical_r1i1p1f1_gn_1950.nc"
for i in range(100):
with fsspec.open(url, cache_type="none").open() as o:
h5netcdf.core.File(o, mode="r", decode_vlen_strings=True)
# gc.collect()
print("--------------------------------------", i)
if __name__ == "__main__":
main() So more likely, it's some interaction between the multiple threads / event loops. |
ok, a theory:
Another test is to disable / renable gc around the read. Another test to be start putting prints showing the thread name just before grabbing the lock in Lines 194 to 206 in e3740aa
|
Thanks a ton for helping with this @tacaswell. I'm going to try to return to it later today or tomorrow (going to try the |
As an aside, I recently learned that referencing
|
I added some debug to
diff --git a/h5py/_locks.pxi b/h5py/_locks.pxi
index bc8b2dd9..50063d0f 100644
--- a/h5py/_locks.pxi
+++ b/h5py/_locks.pxi
@@ -51,13 +51,18 @@ cdef class FastRLock:
def __enter__(self):
# self.acquire()
- return lock_lock(self, pythread.PyThread_get_thread_ident(), True)
+ print("before locklock", pythread.PyThread_get_thread_ident(), self._owner, self._count, self._pending_requests, self._is_locked)
+ l = lock_lock(self, pythread.PyThread_get_thread_ident(), True)
+ print("after locklock", pythread.PyThread_get_thread_ident(), self._owner, self._count, self._pending_requests, self._is_locked)
+ return l
def __exit__(self, t, v, tb):
# self.release()
if self._owner != pythread.PyThread_get_thread_ident():
raise RuntimeError("cannot release un-acquired lock")
+ print("before unlock", pythread.PyThread_get_thread_ident(), self._owner, self._count, self._pending_requests, self._is_locked)
unlock_lock(self)
+ print("after unlock", pythread.PyThread_get_thread_ident(), self._owner, self._count, self._pending_requests, self._is_locked)
def _is_owned(self):
return self._owner == pythread.PyThread_get_thread_ident() Things progress normally for a while, but eventually
The last line shows thread |
Ugh. I suspect this is hard to totally avoid if you pass in a Python file-like object that uses threads internally. The ability to open a file-like object is... not quite a hack, but I'd hesitate to use it for anything important. I don't think HDF5 is expecting a It looks like fsspec is wrapping async code in a thread to present it as a synchronous interface. If you found something like fsspec that just used synchronous code directly, it would probably avoid this problem. All abstractions are leaky. 🙂 |
It looks like @tacaswell's theory of what's going on matches the theory in this comment on the linked Dask issue. |
Taking the |
🤦🏻 failed to read all the links. I think the story is a little bit worse lock wise as it is not only our |
Thanks again all. I think I'm starting to get a clearer picture of what's going on. #2020 will be good, but other than that I don't see much else that h5py can do. It seems like a read from h5py / HDF triggering Python code is fundamentally problematic if the garbage collector can kick in at any time. h5netcdf might be able to alleviate the problem slightly by creating fewer circular references that then need to be cleaned up by the gc. But even then, there's probably not a ton that can be done. fsspec might be able to provide an explicit single-threaded, sync-only interface, we might not hit the same issue? I could use some help on understanding this. Why couldn't the gc kick in (and try to acquire a lock) at a time when the main thread already held a lock? And finally, application code can mess with gc directly (disabling, explicit collections) to try to avoid the problem. So... I guess we close this issue? |
The GC could kick in, but if it runs on the thread that already has the lock, it should be OK (it's a re-entrant lock, so the same thread can acquire it several times). We think it's only when GC happens on another thread that this bites. (There may still be other problems - I doubt HDF5 really expects that something will pause during a read and do some random other HDF5 operations. But I'd guess that so long as the file it's trying to read from isn't affected by the garbage collection, it will probably be OK. 🤞 ) |
Ah right, that's the bit I was missing. @martindurant what are your thoughts on an explicit sync mode? This sounds like a decent amount of work, but maybe worthwhile? Otherwise, I don't see a way to safely use (async) fsspec filesystems with NetCDF files. |
That would amount to a completely new backend for http, and each of the async implementations (s3, gcs, adl/b) would need to have sync variants too. Plus, of course, you would loose the benefits of concurrent operations. h5netcdf is in a good place to set the gc state during reads; or even xarray (which is the only place, in practice, that multiple hdf files are in flight at once, so that references become a problem). Might it be possible to explicitly call the h5py cleanup actions in the main thread without waiting for the gc and deallocation? One potential interesting thing, an added complication, is the new fsspec.parquet module, which shows how you might prospectively fetch all the pieces of a file when you know where those pieces are, so that your reading application accesses them all from cache and makes no reads. This would require dedicated code for hdf (maybe also needing the h5py library to bootstrap), and I don't know how practical that is. The idea is almost like a mini-kerchunk. |
I think we should close this issue with a PR to add a warning to the docs on the file-like usage with threads in the implementation. Working on that now.
Yes, if you can find and break all of the circular references / replace some of them with weakrefs etc. @TomAugspurger If you still have a dev environment set up with this, it might also be worth trying with h5py's locking disabled (I think this is a build time choice). You may risk segfault, but it will tell us if this is something we can workaround at the Python level (I am imagining a context manager to use as an escape hatch on |
If you have enough control over the code in question, yes, it should be. You can explicitly close all HDF5 objects - in h5py, this is like The downside is it's easy to slip up and do something apparently innocuous which makes the code hang. And the hangs might not appear consistently, and will almost certainly hop around if you change things to try to debug them... |
I would just like to chime in to say THANK YOU to the h5py maintainers for responding so quickly and helpfully to this tricky issue, which has become a major stumbling block for a downstream project. I can't claim to understand all of the details being discussed here, but it seems we have several promising leads to investigate. 🙏 |
+1 to everything Ryan said.
Yeah, that seemed to avoid the hanging. I didn't get a segfault in the 100 reads I did, but maybe I just got lucky. I'll make a small PR to h5netcdf that avoids the problematic circular references. It seems to not hang in basic testing. I do worry about the fragility of this kind of fix, since it'll be very easy for h5netcdf to reintroduce a circular reference, and it only fixes things for one library in the stack. But it's a start. edit: h5netcdf/h5netcdf#117 |
Thanks all! To reiterate, if there's any reasonable alternative to passing h5py a Python level file-like object, I'd say it's worth investigating. The docs already mention alternatives for a couple of cases (in-memory HDF5, secure temp files). There's the optional read-only S3 driver (although we don't build this for the wheels, because it would mean bundling C libraries for network access and encryption, but e.g. conda-forge enables it). But unfortunately I don't know of a direct alternative for reading an HDF5 file over http(s). And if you do need the file-like object support, you want the internals of the 'file' you give h5py to be as simple as possible. |
Closing this because there's nothing obvious to do in h5py. We can reopen if people feel this is premature. |
Thanks for the link to those docs. Agreed that this is all pretty fragile. Ideally the C HDF5 library would support reading files over http. And thanks again for all the help in debugging this! |
Doing it in the C library isn't really ideal either - we'd have to bundle some more C libraries for that (IIRC it uses libcurl & libopenssl for S3), which is then an additional headache for security. I struggle to think of any really good way to do this. |
I want to strongly object to this, and the line of reasoning preceding. The adoption of python file-like objects into h5py/hdf, and duck-typing via fsspec has been a big success, and has proven the dynamism and rapid progress that can only be reasonably done in a high-level language. As a point of reference, the arrow project, which has immensely more developer manpower than fsspec, has only implemented local, s3 and hdfs after several years (cf here, here for the list of fsspec implementations). Yes, there are problems, but that's why we are here! Probably no one should be monkey-patching instance attributes, and it shouldn't be that hard to break cycles. Async is hard, but usually worth the cost. It would indeed be easy to make independent sync versions of http et al., if someone thought it necessary (they all used to be sync). Furthermore, the kerchunk idea shows just how versatile one can be in this framework: you no longer need the hdf5 C library at read time at all - at least in some cases, if you do some work up-front, using h5py to find offsets. You get concurrent and parallel reads, and much faster metadata ingest. Conversely, one could follow the fsspec.parquet pattern to prospectively load all the chunks from an hdf5 file into memory cache (concurrently) by having a format-specific interface. |
This case was a problem caused by cycles and gc in combination with threads, but I think the general problem here is something we can't really fix. HDF5 doesn't expect a file driver to run Python code, with all the runtime complexities that brings. E.g. if you write a program where more than one thread uses h5py, it's OK: they won't get concurrent access to HDF5, but they will all get access in turn. But if you give h5py a Python file-like object, now the same code might be vulnerable to deadlock: one thread could be holding the HDF5 lock and waiting for the GIL (or some application lock) while another has the opposite situation. I can't really see a way to avoid this in h5py or in fsspec, so we're left saying to application developers 'this feature exists, but if you want to use it, be really carefuly with threads and reference cycles'. 😞 Ultimately, the only way to get round it is to reimplement part of HDF5 in a way that fits better with Python code supplying the underlying 'file' data. It sounds like this is part of what kerchunk does - reimplement the parts to read and decompress dataset chunks from an HDF5 file. But HDF5 is a massively complex format, and still growing new features, so reimplementing it completely is largely off the table. |
We can get to >90% :). That's especially true for data designed to be archival, which is less likely to use new, cutting edge features of HDF. |
I second everything @takluyver said.
I am very skeptical of this outside of narrow use-cases. That said, I have faith that for any given file layout (filling/missing data, vds, weird float layouts, structured data, multi-file hdf, interfile links, circular groups, custom filters, target types on the client side, cross/multi chunk slicing dense slicing, slicing arbitrary hyper slabs, ...) if you have enough (static) data in that layout then writing a bespoke fast path around hdf5 is both possible and possibly worth it (e.g. kerchunk or http://icesat2sliderule.org/h5coro/ (which @takluyver just pointed me at)). However once you start trying to support "any" hdf5 file the complexity escalates rapidly. |
https://github.com/jjhelmus/pyfive might be another option when working with fsspec (assuming the pure-python nature avoids the issues with async...). Maybe I'm missing something, but why not use a FUSE (or equivalent) file system like rclone mount, rather than relying on either the python file wrapper or waiting on HDF5 to add support for your remote protocol? |
One possible long-term idea would be to look at HDF5 gaining a read-only mode which drivers can opt into to avoid requiring locks (as I'm presuming they're there to avoid multiple conflicting writes), if the interest is primarily in read-only usage (rather than read-write, which is where I've used rclone). |
Neat, I'm glad to see someone has had a go at a pure Python HDF5 reader. On the other hand, a glance at the issues suggests it has precisely the kind of difficulties I'd expect ("HDF5 looks to be using "new style" groups when track_order=True and there are more than 8 groups...")
I was wondering about suggesting this. I don't know about FUSE in any detail, so I have some concern that if libhdf5 calling Python code is a bad idea, the kernel calling Python code might be even worse. 🤷 More pragmatically, using FUSE means an application has to make global changes to the system (mounting an additional filesystem), I guess FUSE as a whole needs to be enabled by an admin, and even though it's possible on Windows, I'd be surprised if it's simple to set up. So I imagine there will be lots of cases where FUSE isn't really an option.
There are two kinds of locking going on. The locks we're talking about are to enforce that only one thread is using HDF5 at a time. IIUC, this is to do with the data libhdf5 has in memory - in particular, the fact that HDF5 IDs are reference counted: you have to ensure the counts are updated atomically, and that you don't free an object in one thread while another thread is still using it. Reference counting is also a big part of what makes it so hard to get rid of the GIL in Python. Then there are filesystem locks, as a completely separate matter. They prevent conflicting writes, but also prevent reading and writing at the same time (unless you go into SWMR mode). By default (non-SWMR), writing may temporarily put the file in an inconsistent state which would cause problems if you were trying to read it at just that moment. But these should never result in a deadlock, because HDF5 doesn't try to wait for these locks - it errors straight away if it can't get the appropriate lock on the file. |
Even when it is an option, like a hosted JupyterHub set up by an admin, I've never heard great things about performance and reliability. It might be worth re-confirming that, but thus far I've stayed away from it. |
I should have specified: of the data that's of interest to xarray, which means no complex strings, structs, tables, ragged arrays or deep hierarchies. Anecdotally, kerchunk has so far faced few problems, which were generally surmountable without too much effort ( fsspec/kerchunk#40 , fsspec/kerchunk#35 , zarr-developers/numcodecs#273 for interest). |
I can definitely believe that the HDF5 spec is an 80/20 proposition, where implementing 20% of the spec lets you read 80% of the data. Though I think we're somewhat talking at cross purposes here: if I've understood correctly, kerchunk uses h5py (and thus libhdf5) to find the chunks, so you're actually reimplementing very little of HDF5, because all you need are the filters to decompress/decode a chunk. You can avoid using libhdf5 only once someone has already used libhdf5 to make an index in another format. I think @tacaswell is thinking of something more like Pyfive. If that works for the data you're interested in, you can avoid libhdf5 entirely, with no need to create an index beforehand. But it's massively more work even to cover normal scenarios reliably. |
Yes, @takluyver , your understanding of kerchunk is correct. The initial scan of a file needs h5py, but it is done serially and had never run foul of deadlocks (so far). Once someone has done this, others need only the resultant set of references. On FUSE: fsspec does support this for any of its implementations, so you split the data access into (at least) two processes. That model has been used for HDF5 access, but is probably even more flaky, as well as slow. Of course, there are plenty of FUSE implementations out there. |
Version info:
This script creates a deadlock, typically by the 12th or so iteration:
The notable features are:
h5netcdf
to open the file. Apologies for not having a completely minimal example, but I haven't been able to reproduce the hang with just h5py. Dask cluster hangs while processing file using s3fs dask/dask#7547 (comment) indicates that this might be interacting with CPython's garbage collector, so it's possibele that h5netcdf is creating the circular references necessary to trigger the deadlockHere's the traceback when I interrupt it:
Happy to provide additional info / do more debugging, but I'm at a bit of a loss right now.
I also tried to dump the gc debug info with
gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_COLLECTABLE | gc.DEBUG_UNCOLLECTABLE | gc.DEBUG_LEAK)
. It's pretty verbose and I'm not sure what to make of it.xref dask/dask#7547, pangeo-forge/pangeo-forge-recipes#177
The text was updated successfully, but these errors were encountered: