Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential deadlock from some combination of h5py, gc, and fsspec #2019

Closed
TomAugspurger opened this issue Dec 16, 2021 · 38 comments · Fixed by #2021
Closed

Potential deadlock from some combination of h5py, gc, and fsspec #2019

TomAugspurger opened this issue Dec 16, 2021 · 38 comments · Fixed by #2021

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 16, 2021

Version info:

Summary of the h5py configuration
---------------------------------

h5py    3.6.0
HDF5    1.12.1
Python  3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) 
[GCC 9.4.0]
sys.platform    linux
sys.maxsize     9223372036854775807
numpy   1.20.0
cython (built with) 0.29.24
numpy (built against) 1.18.5
HDF5 (built against) 1.12.1

This script creates a deadlock, typically by the 12th or so iteration:

import fsspec
import h5netcdf
import h5py


def main():
    url = "https://pangeo.blob.core.windows.net/pangeo-public/hurs_day_ACCESS-CM2_historical_r1i1p1f1_gn_1950.nc"
    for i in range(100):
        with fsspec.open(url).open() as o:
            h5netcdf.core.File(o, mode="r", decode_vlen_strings=True)

        print("--------------------------------------", i)


if __name__ == "__main__":
    main()

The notable features are:

  1. It uses fsspec / aiohttp to open the URL and provide a file-like object. fsspec does some things with threads and the asyncio event loop that I don't really understand (cc @martindurant)
  2. It uses h5netcdf to open the file. Apologies for not having a completely minimal example, but I haven't been able to reproduce the hang with just h5py. Dask cluster hangs while processing file using s3fs dask/dask#7547 (comment) indicates that this might be interacting with CPython's garbage collector, so it's possibele that h5netcdf is creating the circular references necessary to trigger the deadlock

Here's the traceback when I interrupt it:

(notebook) jovyan@jupyter-taugspurger-40microsoft-2ecom:~/cmip6$ $ python hang.py 
-------------------------------------- 0
-------------------------------------- 1
-------------------------------------- 2
-------------------------------------- 3
-------------------------------------- 4
^CTraceback (most recent call last):
  File "hang.py", line 16, in <module>
    main()
  File "hang.py", line 10, in main
    h5netcdf.core.File(o, mode="r", decode_vlen_strings=True)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/h5netcdf/core.py", line 780, in __init__
    super(File, self).__init__(self, self._h5path)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/h5netcdf/core.py", line 309, in __init__
    self._current_dim_sizes[k] = self._determine_current_dimension_size(
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/h5netcdf/core.py", line 382, in _determine_current_dimension_size
    name = _name_from_dimension(var_d)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/h5netcdf/core.py", line 51, in _name_from_dimension
    return dim[0].name.split("/")[-1]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/h5py/_hl/dims.py", line 74, in __getitem__
    h5ds.iterate(self._id, self._dimension, scales.append, 0)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5ds.pyx", line 167, in h5py.h5ds.iterate
  File "h5py/h5fd.pyx", line 162, in h5py.h5fd.H5FD_fileobj_read
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/spec.py", line 1575, in readinto
    data = self.read(out.nbytes)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/implementations/http.py", line 566, in read
    return super().read(length)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/spec.py", line 1565, in read
    out = self.cache._fetch(self.loc, self.loc + length)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/caching.py", line 391, in _fetch
    self.cache = self.fetcher(start, bend)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/asyn.py", line 59, in sync
    if event.wait(1):
  File "/srv/conda/envs/notebook/lib/python3.8/threading.py", line 558, in wait
    signaled = self._cond.wait(timeout)
  File "/srv/conda/envs/notebook/lib/python3.8/threading.py", line 306, in wait
    gotit = waiter.acquire(True, timeout)
KeyboardInterrupt

Happy to provide additional info / do more debugging, but I'm at a bit of a loss right now.

I also tried to dump the gc debug info with gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_COLLECTABLE | gc.DEBUG_UNCOLLECTABLE | gc.DEBUG_LEAK). It's pretty verbose and I'm not sure what to make of it.

gc: collecting generation 0...
gc: objects in each generation: 747 3197 24898
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 731 3641 24889
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 637 4012 24889
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 746 4268 24889
gc: objects in permanent generation: 0
gc: collectable <_CountingAttr 0x7fbe6595b2c0>
gc: collectable <dict 0x7fbe65947dc0>
gc: collectable <type 0x55d947f89c80>
gc: collectable <getset_descriptor 0x7fbe65959340>
gc: collectable <getset_descriptor 0x7fbe65959480>
gc: collectable <tuple 0x7fbe659592c0>
gc: collectable <_CountingAttr 0x7fbe29654f40>
gc: collectable <dict 0x7fbe65959300>
gc: collectable <type 0x55d947f8d2a0>
gc: collectable <getset_descriptor 0x7fbe2972c340>
gc: collectable <getset_descriptor 0x7fbe6599f740>
gc: collectable <tuple 0x7fbe65965900>
gc: collectable <_CountingAttr 0x7fbe6595b4a0>
gc: collectable <_CountingAttr 0x7fbe6595b540>
gc: collectable <_CountingAttr 0x7fbe6595b5e0>
gc: collectable <dict 0x7fbe65959540>
gc: collectable <type 0x55d947f8e590>
gc: collectable <getset_descriptor 0x7fbe29af18c0>
gc: collectable <getset_descriptor 0x7fbe6596a300>
gc: collectable <tuple 0x7fbe65965b40>
gc: collectable <_CountingAttr 0x7fbe6595b860>
gc: collectable <dict 0x7fbe659659c0>
gc: collectable <type 0x55d947f8fe40>
gc: collectable <getset_descriptor 0x7fbe6596ac40>
gc: collectable <getset_descriptor 0x7fbe6596ac80>
gc: collectable <tuple 0x7fbe6596a540>
gc: collectable <_CountingAttr 0x7fbe6595b9a0>
gc: collectable <dict 0x7fbe6596a080>
gc: collectable <type 0x55d947f90de0>
gc: collectable <getset_descriptor 0x7fbe2984f300>
gc: collectable <getset_descriptor 0x7fbe6596f3c0>
gc: collectable <tuple 0x7fbe6596a100>
gc: collectable <_CountingAttr 0x7fbe6595bae0>
gc: collectable <dict 0x7fbe6596a380>
gc: collectable <type 0x55d947f91fd0>
gc: collectable <getset_descriptor 0x7fbe6596fb40>
gc: collectable <getset_descriptor 0x7fbe6596fb80>
gc: collectable <tuple 0x7fbe6596f180>
gc: done, 38 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 1...
gc: objects in each generation: 816 4758 24889
gc: objects in permanent generation: 0
gc: collectable <_CountingAttr 0x7fbe6595bc20>
gc: collectable <_CountingAttr 0x7fbe6595bcc0>
gc: collectable <dict 0x7fbe6596f8c0>
gc: collectable <type 0x55d947f93f90>
gc: collectable <getset_descriptor 0x7fbe29e25840>
gc: collectable <getset_descriptor 0x7fbe2999f500>
gc: collectable <tuple 0x7fbe6596fdc0>
gc: collectable <_CountingAttr 0x7fbe658fa0e0>
gc: collectable <_CountingAttr 0x7fbe658fa180>
gc: collectable <_CountingAttr 0x7fbe658fa220>
gc: collectable <dict 0x7fbe65973240>
gc: collectable <type 0x55d947f98b90>
gc: collectable <getset_descriptor 0x7fbe658fe1c0>
gc: collectable <getset_descriptor 0x7fbe658fe200>
gc: collectable <tuple 0x7fbe65973b40>
gc: collectable <_CountingAttr 0x7fbe658fa7c0>
gc: collectable <_CountingAttr 0x7fbe658fa860>
gc: collectable <_CountingAttr 0x7fbe658fa900>
gc: collectable <_CountingAttr 0x7fbe658fa9a0>
gc: collectable <dict 0x7fbe658fe780>
gc: collectable <type 0x55d947f9c080>
gc: collectable <getset_descriptor 0x7fbe299c2d40>
gc: collectable <getset_descriptor 0x7fbe299c2e80>
gc: collectable <tuple 0x7fbe65901740>
gc: done, 24 unreachable, 0 uncollectable, 0.0009s elapsed
gc: collecting generation 0...
gc: objects in each generation: 662 0 29857
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 750 490 29832
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 694 972 29832
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 724 1249 29832
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 705 1895 29832
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 705 2166 29832
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 683 2421 29832
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 798 2624 29832
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 852 2937 29832
gc: objects in permanent generation: 0
gc: collectable <dict 0x7fbe6568bbc0>
gc: collectable <type 0x55d94802dbe0>
gc: collectable <getset_descriptor 0x7fbe6568bc00>
gc: collectable <getset_descriptor 0x7fbe6568bcc0>
gc: collectable <tuple 0x7fbe65669ec0>
gc: collectable <dict 0x7fbe6561c680>
gc: collectable <type 0x55d948011050>
gc: collectable <getset_descriptor 0x7fbe6561cf00>
gc: collectable <getset_descriptor 0x7fbe6561cf40>
gc: collectable <tuple 0x7fbe6561ca00>
gc: done, 10 unreachable, 0 uncollectable, 0.0002s elapsed
gc: collecting generation 0...
gc: objects in each generation: 777 3456 29832
gc: objects in permanent generation: 0
gc: collectable <dict 0x7fbe65626680>
gc: collectable <type 0x55d94803c5c0>
gc: collectable <getset_descriptor 0x7fbe6562b380>
gc: collectable <getset_descriptor 0x7fbe65621bc0>
gc: collectable <tuple 0x7fbe65621800>
gc: done, 5 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 834 3924 29832
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 1...
gc: objects in each generation: 853 4500 29832
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0011s elapsed
gc: collecting generation 0...
gc: objects in each generation: 912 0 34707
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 652 530 34707
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 815 897 34707
gc: objects in permanent generation: 0
gc: collectable <_CountingAttr 0x7fbe654505e0>
gc: collectable <dict 0x7fbe653e6900>
gc: collectable <type 0x55d9480741f0>
gc: collectable <getset_descriptor 0x7fbe299c2e40>
gc: collectable <getset_descriptor 0x7fbe653e9180>
gc: collectable <tuple 0x7fbe6545d100>
gc: collectable <dict 0x7fbe653e9200>
gc: collectable <type 0x55d948075e60>
gc: collectable <getset_descriptor 0x7fbe653e9b80>
gc: collectable <getset_descriptor 0x7fbe653e9bc0>
gc: collectable <tuple 0x7fbe653e6680>
gc: done, 11 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 742 1406 34699
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 795 1789 34690
gc: objects in permanent generation: 0
gc: collectable <dict 0x7fbe65382d00>
gc: collectable <type 0x55d948082290>
gc: collectable <getset_descriptor 0x7fbe65382d80>
gc: collectable <getset_descriptor 0x7fbe65382dc0>
gc: collectable <tuple 0x7fbe656f18c0>
gc: collectable <dict 0x7fbe65382ec0>
gc: collectable <type 0x55d94808dbb0>
gc: collectable <getset_descriptor 0x7fbe65388980>
gc: collectable <getset_descriptor 0x7fbe653889c0>
gc: collectable <tuple 0x7fbe656f1680>
gc: collectable <dict 0x7fbe65388140>
gc: collectable <type 0x55d94808f000>
gc: collectable <getset_descriptor 0x7fbe6538e200>
gc: collectable <getset_descriptor 0x7fbe6538e240>
gc: collectable <tuple 0x7fbe653708c0>
gc: collectable <dict 0x7fbe653887c0>
gc: collectable <type 0x55d948090450>
gc: collectable <getset_descriptor 0x7fbe6538ea00>
gc: collectable <getset_descriptor 0x7fbe6538ea40>
gc: collectable <tuple 0x7fbe65370400>
gc: done, 20 unreachable, 0 uncollectable, 0.0002s elapsed
gc: collecting generation 0...
gc: objects in each generation: 753 2343 34689
gc: objects in permanent generation: 0
gc: collectable <dict 0x7fbe6538e300>
gc: collectable <type 0x55d948092d30>
gc: collectable <getset_descriptor 0x7fbe65396a40>
gc: collectable <getset_descriptor 0x7fbe65396a80>
gc: collectable <tuple 0x7fbe65372b00>
gc: collectable <type 0x55d9480941a0>
gc: collectable <getset_descriptor 0x7fbe65303480>
gc: collectable <dict 0x7fbe6538eac0>
gc: collectable <getset_descriptor 0x7fbe653034c0>
gc: collectable <tuple 0x7fbe65372b80>
gc: collectable <type 0x55d948095110>
gc: collectable <getset_descriptor 0x7fbe65303c00>
gc: collectable <dict 0x7fbe6538eb00>
gc: collectable <getset_descriptor 0x7fbe65303c40>
gc: collectable <tuple 0x7fbe65303900>
gc: collectable <type 0x55d9480962d0>
gc: collectable <getset_descriptor 0x7fbe6530b340>
gc: collectable <dict 0x7fbe65396280>
gc: collectable <getset_descriptor 0x7fbe6530b380>
gc: collectable <tuple 0x7fbe6530b040>
gc: collectable <type 0x55d948097490>
gc: collectable <getset_descriptor 0x7fbe6530ba40>
gc: collectable <dict 0x7fbe65396b40>
gc: collectable <getset_descriptor 0x7fbe6530ba80>
gc: collectable <tuple 0x7fbe6530b740>
gc: collectable <type 0x55d948098650>
gc: collectable <getset_descriptor 0x7fbe65312180>
gc: collectable <dict 0x7fbe653039c0>
gc: collectable <getset_descriptor 0x7fbe653121c0>
gc: collectable <tuple 0x7fbe6530be40>
gc: collectable <dict 0x7fbe6530b800>
gc: collectable <type 0x55d948099810>
gc: collectable <getset_descriptor 0x7fbe65312880>
gc: collectable <getset_descriptor 0x7fbe653128c0>
gc: collectable <tuple 0x7fbe65312580>
gc: collectable <dict 0x7fbe6530bb40>
gc: collectable <type 0x55d947bf8810>
gc: collectable <getset_descriptor 0x7fbe653151c0>
gc: collectable <getset_descriptor 0x7fbe65315200>
gc: collectable <tuple 0x7fbe65312ac0>
gc: done, 40 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 898 2900 34648
gc: objects in permanent generation: 0
gc: collectable <dict 0x7fbe65312940>
gc: collectable <type 0x55d94809e190>
gc: collectable <getset_descriptor 0x7fbe6531b500>
gc: collectable <getset_descriptor 0x7fbe6531b540>
gc: collectable <tuple 0x7fbe65315e40>
gc: collectable <dict 0x7fbe65315280>
gc: collectable <type 0x55d94809f380>
gc: collectable <getset_descriptor 0x7fbe6531be40>
gc: collectable <getset_descriptor 0x7fbe6531be80>
gc: collectable <tuple 0x7fbe6531b7c0>
gc: collectable <dict 0x7fbe65375080>
gc: collectable <type 0x55d9480a0fe0>
gc: collectable <getset_descriptor 0x7fbe6536ddc0>
gc: collectable <getset_descriptor 0x7fbe65372f80>
gc: collectable <tuple 0x7fbe65629440>
gc: done, 15 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 504 3559 34646
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 788 3925 34646
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 707 4336 34646
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 633 4317 34646
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 1...
gc: objects in each generation: 1002 4639 34646
gc: objects in permanent generation: 0
gc: collectable <type 0x55d948071900>
gc: collectable <tuple 0x7fbe653e6540>
gc: collectable <dict 0x7fbe653e6080>
gc: collectable <getset_descriptor 0x7fbe653e65c0>
gc: collectable <getset_descriptor 0x7fbe653e6780>
gc: collectable <type 0x55d9480918c0>
gc: collectable <tuple 0x7fbe65372a80>
gc: collectable <dict 0x7fbe65388ac0>
gc: collectable <getset_descriptor 0x7fbe65396180>
gc: collectable <getset_descriptor 0x7fbe653961c0>
gc: collectable <type 0x55d94809cfa0>
gc: collectable <tuple 0x7fbe653154c0>
gc: collectable <dict 0x7fbe6530bf80>
gc: collectable <getset_descriptor 0x7fbe65315b40>
gc: collectable <getset_descriptor 0x7fbe65315b80>
gc: done, 15 unreachable, 0 uncollectable, 0.0014s elapsed
gc: collecting generation 0...
gc: objects in each generation: 685 0 39783
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0003s elapsed
gc: collecting generation 0...
gc: objects in each generation: 909 407 39744
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0003s elapsed
gc: collecting generation 0...
gc: objects in each generation: 825 698 39744
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 852 1087 39744
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 826 1371 39730
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 453 1989 39710
gc: objects in permanent generation: 0
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 619 2246 39710
gc: objects in permanent generation: 0
gc: collectable <cell 0x7fbe64f53520>
gc: collectable <tuple 0x7fbe64f69490>
gc: collectable <function 0x7fbe64f52b80>
gc: collectable <File 0x7fbe64f533d0>
gc: collectable <File 0x7fbe64f535e0>
gc: collectable <h5py.h5f.FileID 0x7fbe64f41a40>
gc: collectable <tuple 0x7fbe64f7c380>
gc: collectable <dict 0x7fbe64f75b40>
gc: collectable <dict 0x7fbe64f7c6c0>
gc: collectable <ChainMap 0x7fbe64f53550>
gc: collectable <list 0x7fbe64f7c180>
gc: collectable <dict 0x7fbe64f54880>
gc: collectable <ChainMap 0x7fbe64f69550>
gc: collectable <list 0x7fbe64efb680>
gc: collectable <dict 0x7fbe29b14840>
gc: collectable <ChainMap 0x7fbe64f69580>
gc: collectable <list 0x7fbe64efb780>
gc: collectable <dict 0x7fbe64f75ec0>
gc: collectable <h5py.h5g.GroupID 0x7fbe64eed950>
gc: collectable <Group 0x7fbe64f69460>
gc: collectable <dict 0x7fbe64ef6c80>
gc: collectable <ChainMap 0x7fbe64f69250>
gc: collectable <list 0x7fbe64efb880>
gc: collectable <dict 0x7fbe64eef280>
gc: collectable <_LazyObjectLookup 0x7fbe64f693d0>
gc: collectable <dict 0x7fbe64ef6ac0>
gc: collectable <collections.OrderedDict 0x7fbe64fa4bc0>
gc: collectable <_LazyObjectLookup 0x7fbe64f693a0>
gc: collectable <dict 0x7fbe64f75ac0>
gc: collectable <collections.OrderedDict 0x7fbe64fa4f40>
gc: collectable <cell 0x7fbe64ef5730>
gc: collectable <tuple 0x7fbe64ef56a0>
gc: collectable <function 0x7fbe64f66310>
gc: collectable <File 0x7fbe64f69400>
gc: collectable <File 0x7fbe64f69340>
gc: collectable <h5py.h5f.FileID 0x7fbe64ef3f40>
gc: collectable <tuple 0x7fbe64ef6c40>
gc: collectable <dict 0x7fbe64ef6bc0>
gc: collectable <dict 0x7fbe64eefdc0>
gc: collectable <ChainMap 0x7fbe64f696a0>
gc: collectable <list 0x7fbe64efba00>
gc: collectable <dict 0x7fbe64f7c880>
gc: collectable <ChainMap 0x7fbe64f697c0>
gc: collectable <list 0x7fbe64efb0c0>
gc: collectable <dict 0x7fbe64ef6f00>
gc: collectable <ChainMap 0x7fbe64ef5b20>
gc: collectable <list 0x7fbe64efbd80>
gc: collectable <dict 0x7fbe64ef6b80>
gc: collectable <h5py.h5g.GroupID 0x7fbe64ef3e50>
gc: collectable <Group 0x7fbe64ef58b0>
gc: collectable <dict 0x7fbe64efbb00>
gc: collectable <ChainMap 0x7fbe64ef5880>
gc: collectable <list 0x7fbe64efb300>
gc: collectable <dict 0x7fbe64ef6c00>
gc: collectable <_LazyObjectLookup 0x7fbe64ef59a0>
gc: collectable <dict 0x7fbe64efb280>

xref dask/dask#7547, pangeo-forge/pangeo-forge-recipes#177

@martindurant
Copy link

It uses fsspec / aiohttp to open the URL and provide a file-like object. fsspec does some things with threads and the asyncio event loop that I don't really understand

Specifically, it runs an event loop on another thread, and submits coroutines to it and waits for their completion. Such coroutines will be created every time a read is outside of the file-like object's cache of bytes, and most of the running code will be within aiohttp.

@TomAugspurger
Copy link
Contributor Author

I built h5py from source and set h5py._objects.DEBUG_ID = True. The output is at https://gist.github.com/TomAugspurger/c055a681bb06f30071e71cc24e13efcc, but I'm not really sure it'll be helpful.

I'm not even sure what to look for right now. In theory, we should have two threads trying to acquire the lock, each waiting on the other, right? Or perhaps there are multiple locks in flight?

@tacaswell
Copy link
Member

If you put a gc.collect() either in the loop or in the context manager does it make things better or worse?

This is where we do the most complicated things with gc (we temporarily turn it off 😱 to get around a race condition)

h5py/h5py/_objects.pyx

Lines 113 to 153 in e3740aa

@with_phil
def nonlocal_close():
""" Find dead ObjectIDs and set their integer identifiers to 0.
"""
cdef ObjectID obj
cdef list reg_ids
# create a cached list of ids whilst the gc is disabled to avoid hitting
# the cyclic gc while iterating through the registry dict
gc.disable()
try:
reg_ids = list(registry)
finally:
gc.enable()
for python_id in reg_ids:
ref = registry.get(python_id)
# registry dict has changed underneath us, skip to next item
if ref is None:
continue
obj = ref()
# Object died while walking the registry list, presumably because
# the cyclic GC kicked in.
if obj is None:
continue
# Locked objects are immortal, as they generally are provided by
# the HDF5 library itself (property list classes, etc.).
if obj.locked:
continue
# Invalid object; set obj.id = 0 so it doesn't become a zombie
if not H5Iis_valid(obj.id):
IF DEBUG_ID:
print("NONLOCAL - invalidating %d of kind %s HDF5 id %d" %
(python_id, type(obj), obj.id) )
obj.id = 0
continue

There is also a libhdf5 level lock, but that should be completely protected by h5py's phil.

Given that traceback, it looks like it is in a waiting loop inside of fsspec waiting to get bits back. If I am understanding this right fsspec is completely naive to h5py / hdf5 and asside from the file-like interface vica versa? My guess is that this is race condition in fsspec's cache where something is noted to be there and then evicted before it can be accessed.

@martindurant
Copy link

This is where we do the most complicated things with gc (we temporarily turn it off

...and also explicitly turn it back on again, whatever the calling code had in mind.

@TomAugspurger
Copy link
Contributor Author

If you put a gc.collect() either in the loop or in the context manager does it make things better or worse?

This (either inside or outside the context manager) seems to completely avoid the hanging. I'll run it some more times to see if it ever shows up.

If I am understanding this right fsspec is completely naive to h5py / hdf5 and asside from the file-like interface vica versa?

Yes, I think that's correct.

@martindurant
Copy link

My guess is that this is race condition in fsspec's cache where something is noted to be there and then evicted before it can be accessed.

To be clear, this only happens in combination with h5. I'm not sure if we have proof that the gc in involved, beyond @TomAugspurger 's statement that calling a collect might prevent the problem. Collecting garbage might be more complex in the presence of h5netcdf/xarray.

@TomAugspurger
Copy link
Contributor Author

FWIW, it might not be caching specifically. The following should disable fsspec's caching and still reproduces the deadlock

import fsspec
import h5netcdf
import h5py
import gc


def main():
    url = "https://pangeo.blob.core.windows.net/pangeo-public/hurs_day_ACCESS-CM2_historical_r1i1p1f1_gn_1950.nc"
    for i in range(100):
        with fsspec.open(url, cache_type="none").open() as o:
            h5netcdf.core.File(o, mode="r", decode_vlen_strings=True)
            # gc.collect()

        print("--------------------------------------", i)


if __name__ == "__main__":
    main()

So more likely, it's some interaction between the multiple threads / event loops.

@tacaswell
Copy link
Member

ok, a theory:

  1. h5netcdf.core.File is not being explictly closed so it is not explicitly closing the underlying h5py objects
  2. reading from s3 is very slow so h5py is spending a very long time holding phil while waiting from bit
  3. it looks like there is a 1 s poll loop in fspec (if event.wait(1): in the traceback which I assume is some "keep trying for a timeout"
  4. because there is a non-trivial amount of Python being processed on both the forground and background thread eventually someone decides to do a gc run
  5. because the gc runs on the thread that triggers it, if it is the forground thread this is OK because it has phil and the nonlocal close can run, but if it is the background thread we get the deadlock
  6. we do not see this under most circumstances because phil tends to protect things that are not waiting on another thread so if we do get this conflict the background thread just has to wait 😢 but eventually whatever the main thread is doing finishes, gc finishes, and life moves on (with maybe a surprisingly big latency as now the gc may be waiting on disk I/O).

Another test is to disable / renable gc around the read.

Another test to be start putting prints showing the thread name just before grabbing the lock in

h5py/h5py/_objects.pyx

Lines 194 to 206 in e3740aa

def __dealloc__(self):
with _phil:
IF DEBUG_ID:
print("DEALLOC - unregistering %d HDF5 id %d" % (self._pyid, self.id))
if is_h5py_obj_valid(self) and (not self.locked):
if H5Idec_ref(self.id) < 0:
warnings.warn(
"Reference counting issue with HDF5 id {}".format(
self.id
)
)
if self._pyid is not None:
del registry[self._pyid]

@TomAugspurger
Copy link
Contributor Author

Thanks a ton for helping with this @tacaswell. I'm going to try to return to it later today or tomorrow (going to try the gc.collect() workaround first, but this would be worth solving properly if possible)

@TomAugspurger
Copy link
Contributor Author

As an aside, I recently learned that referencing self in __dealloc__ is potentially dangerous https://cython.readthedocs.io/en/latest/src/userguide/special_methods.html#finalization-method-dealloc

By the time your dealloc() method is called, the object may already have been partially destroyed and may not be in a valid state as far as Python is concerned, so you should avoid invoking any Python operations which might touch the object. In particular, don’t call any other methods of the object or do anything which might cause the object to be resurrected. It’s best if you stick to just deallocating C data.

@TomAugspurger
Copy link
Contributor Author

I added some debug to _locks.pxi. This will print out

  • the stage (before / after the lock is acquired / released)
  • the ID of the thread acquiring / releasing the lock, the ID of the thread that owns the lock
  • the re-entry count
  • the pending requests
  • whether the real lock is already acquired
diff --git a/h5py/_locks.pxi b/h5py/_locks.pxi
index bc8b2dd9..50063d0f 100644
--- a/h5py/_locks.pxi
+++ b/h5py/_locks.pxi
@@ -51,13 +51,18 @@ cdef class FastRLock:
 
     def __enter__(self):
         # self.acquire()
-        return lock_lock(self, pythread.PyThread_get_thread_ident(), True)
+        print("before locklock", pythread.PyThread_get_thread_ident(), self._owner, self._count, self._pending_requests, self._is_locked)
+        l = lock_lock(self, pythread.PyThread_get_thread_ident(), True)
+        print("after  locklock", pythread.PyThread_get_thread_ident(), self._owner, self._count, self._pending_requests, self._is_locked)
+        return l
 
     def __exit__(self, t, v, tb):
         # self.release()
         if self._owner != pythread.PyThread_get_thread_ident():
             raise RuntimeError("cannot release un-acquired lock")
+        print("before   unlock", pythread.PyThread_get_thread_ident(), self._owner, self._count, self._pending_requests, self._is_locked)
         unlock_lock(self)
+        print("after    unlock", pythread.PyThread_get_thread_ident(), self._owner, self._count, self._pending_requests, self._is_locked)
 
     def _is_owned(self):
         return self._owner == pythread.PyThread_get_thread_ident()

Things progress normally for a while, but eventually

# stage         caller id       owner id
before locklock 140712351053632 140712351053632 1 0 False
after  locklock 140712351053632 140712351053632 2 0 False
before locklock 140712351053632 140712351053632 2 0 False
after  locklock 140712351053632 140712351053632 3 0 False
before locklock 140711700506368 140712351053632 3 0 False

The last line shows thread 140711700506368 trying to acquire the lock. It's the first time that thread ID shows up in the logs. I suspect that is on the FsspecIO thread, but I need to confirm that.

@takluyver
Copy link
Member

Ugh. I suspect this is hard to totally avoid if you pass in a Python file-like object that uses threads internally. The ability to open a file-like object is... not quite a hack, but I'd hesitate to use it for anything important. I don't think HDF5 is expecting a read call to go into a Python VM.

It looks like fsspec is wrapping async code in a thread to present it as a synchronous interface. If you found something like fsspec that just used synchronous code directly, it would probably avoid this problem. All abstractions are leaky. 🙂

@takluyver
Copy link
Member

It looks like @tacaswell's theory of what's going on matches the theory in this comment on the linked Dask issue.

@takluyver
Copy link
Member

Taking the gc.collect() workaround one step further: temporarily disable automatic garbage collection (gc.disable() / gc.enable()), and call gc.collect() in the loop. This should ensure that garbage collection never occurs in the fsspec IO thread, no matter what else the code does.

@tacaswell
Copy link
Member

🤦🏻 failed to read all the links.

I think the story is a little bit worse lock wise as it is not only our phil there is also a libhdf5 level lock protecting their caches which I suspect (but can not prove) will also cause the same deadlock for the same reason.

@TomAugspurger
Copy link
Contributor Author

Thanks again all. I think I'm starting to get a clearer picture of what's going on. #2020 will be good, but other than that I don't see much else that h5py can do. It seems like a read from h5py / HDF triggering Python code is fundamentally problematic if the garbage collector can kick in at any time.

h5netcdf might be able to alleviate the problem slightly by creating fewer circular references that then need to be cleaned up by the gc. But even then, there's probably not a ton that can be done.

fsspec might be able to provide an explicit single-threaded, sync-only interface, we might not hit the same issue? I could use some help on understanding this. Why couldn't the gc kick in (and try to acquire a lock) at a time when the main thread already held a lock?

And finally, application code can mess with gc directly (disabling, explicit collections) to try to avoid the problem.

So... I guess we close this issue?

@takluyver
Copy link
Member

fsspec might be able to provide an explicit single-threaded, sync-only interface, we might not hit the same issue? I could use some help on understanding this. Why couldn't the gc kick in (and try to acquire a lock) at a time when the main thread already held a lock?

The GC could kick in, but if it runs on the thread that already has the lock, it should be OK (it's a re-entrant lock, so the same thread can acquire it several times). We think it's only when GC happens on another thread that this bites.

(There may still be other problems - I doubt HDF5 really expects that something will pause during a read and do some random other HDF5 operations. But I'd guess that so long as the file it's trying to read from isn't affected by the garbage collection, it will probably be OK. 🤞 )

@TomAugspurger
Copy link
Contributor Author

(it's a re-entrant lock, so the same thread can acquire it several times)

Ah right, that's the bit I was missing.

@martindurant what are your thoughts on an explicit sync mode? This sounds like a decent amount of work, but maybe worthwhile? Otherwise, I don't see a way to safely use (async) fsspec filesystems with NetCDF files.

@martindurant
Copy link

what are your thoughts on an explicit sync mode?

That would amount to a completely new backend for http, and each of the async implementations (s3, gcs, adl/b) would need to have sync variants too. Plus, of course, you would loose the benefits of concurrent operations.

h5netcdf is in a good place to set the gc state during reads; or even xarray (which is the only place, in practice, that multiple hdf files are in flight at once, so that references become a problem).

Might it be possible to explicitly call the h5py cleanup actions in the main thread without waiting for the gc and deallocation?

One potential interesting thing, an added complication, is the new fsspec.parquet module, which shows how you might prospectively fetch all the pieces of a file when you know where those pieces are, so that your reading application accesses them all from cache and makes no reads. This would require dedicated code for hdf (maybe also needing the h5py library to bootstrap), and I don't know how practical that is. The idea is almost like a mini-kerchunk.

@tacaswell
Copy link
Member

I think we should close this issue with a PR to add a warning to the docs on the file-like usage with threads in the implementation. Working on that now.

Might it be possible to explicitly call the h5py cleanup actions in the main thread without waiting for the gc and deallocation?

Yes, if you can find and break all of the circular references / replace some of them with weakrefs etc.

@TomAugspurger If you still have a dev environment set up with this, it might also be worth trying with h5py's locking disabled (I think this is a build time choice). You may risk segfault, but it will tell us if this is something we can workaround at the Python level (I am imagining a context manager to use as an escape hatch on phil to say "I'm relseasing this but you have to promise to do no hdf5 work on any other thread". A massive foot-cannon, but might be worth it?).

@takluyver
Copy link
Member

Might it be possible to explicitly call the h5py cleanup actions in the main thread without waiting for the gc and deallocation?

If you have enough control over the code in question, yes, it should be. You can explicitly close all HDF5 objects - in h5py, this is like file.close() or dataset.id.close(), and I assume it's accessible from h5netcdf somehow. Or if you can ensure there are no reference cycles, objects should be cleaned up deterministically as soon as they go out of scope (reference count hits 0).

The downside is it's easy to slip up and do something apparently innocuous which makes the code hang. And the hangs might not appear consistently, and will almost certainly hop around if you change things to try to debug them...

@rabernat
Copy link

I would just like to chime in to say THANK YOU to the h5py maintainers for responding so quickly and helpfully to this tricky issue, which has become a major stumbling block for a downstream project. I can't claim to understand all of the details being discussed here, but it seems we have several promising leads to investigate. 🙏

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Dec 17, 2021

+1 to everything Ryan said.

@TomAugspurger If you still have a dev environment set up with this, it might also be worth trying with h5py's locking disabled

Yeah, that seemed to avoid the hanging. I didn't get a segfault in the 100 reads I did, but maybe I just got lucky.


I'll make a small PR to h5netcdf that avoids the problematic circular references. It seems to not hang in basic testing.

I do worry about the fragility of this kind of fix, since it'll be very easy for h5netcdf to reintroduce a circular reference, and it only fixes things for one library in the stack. But it's a start.

edit: h5netcdf/h5netcdf#117

@takluyver
Copy link
Member

Thanks all!

To reiterate, if there's any reasonable alternative to passing h5py a Python level file-like object, I'd say it's worth investigating. The docs already mention alternatives for a couple of cases (in-memory HDF5, secure temp files). There's the optional read-only S3 driver (although we don't build this for the wheels, because it would mean bundling C libraries for network access and encryption, but e.g. conda-forge enables it). But unfortunately I don't know of a direct alternative for reading an HDF5 file over http(s). And if you do need the file-like object support, you want the internals of the 'file' you give h5py to be as simple as possible.

@takluyver
Copy link
Member

Closing this because there's nothing obvious to do in h5py. We can reopen if people feel this is premature.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Dec 18, 2021

Thanks for the link to those docs. Agreed that this is all pretty fragile. Ideally the C HDF5 library would support reading files over http.

And thanks again for all the help in debugging this!

@takluyver
Copy link
Member

Doing it in the C library isn't really ideal either - we'd have to bundle some more C libraries for that (IIRC it uses libcurl & libopenssl for S3), which is then an additional headache for security. I struggle to think of any really good way to do this.

@martindurant
Copy link

Ideally the C HDF5 library would support reading files over http.

I want to strongly object to this, and the line of reasoning preceding. The adoption of python file-like objects into h5py/hdf, and duck-typing via fsspec has been a big success, and has proven the dynamism and rapid progress that can only be reasonably done in a high-level language. As a point of reference, the arrow project, which has immensely more developer manpower than fsspec, has only implemented local, s3 and hdfs after several years (cf here, here for the list of fsspec implementations).

Yes, there are problems, but that's why we are here! Probably no one should be monkey-patching instance attributes, and it shouldn't be that hard to break cycles. Async is hard, but usually worth the cost. It would indeed be easy to make independent sync versions of http et al., if someone thought it necessary (they all used to be sync).

Furthermore, the kerchunk idea shows just how versatile one can be in this framework: you no longer need the hdf5 C library at read time at all - at least in some cases, if you do some work up-front, using h5py to find offsets. You get concurrent and parallel reads, and much faster metadata ingest. Conversely, one could follow the fsspec.parquet pattern to prospectively load all the chunks from an hdf5 file into memory cache (concurrently) by having a format-specific interface.

@takluyver
Copy link
Member

Yes, there are problems, but that's why we are here! Probably no one should be monkey-patching instance attributes, and it shouldn't be that hard to break cycles.

This case was a problem caused by cycles and gc in combination with threads, but I think the general problem here is something we can't really fix. HDF5 doesn't expect a file driver to run Python code, with all the runtime complexities that brings.

E.g. if you write a program where more than one thread uses h5py, it's OK: they won't get concurrent access to HDF5, but they will all get access in turn. But if you give h5py a Python file-like object, now the same code might be vulnerable to deadlock: one thread could be holding the HDF5 lock and waiting for the GIL (or some application lock) while another has the opposite situation. I can't really see a way to avoid this in h5py or in fsspec, so we're left saying to application developers 'this feature exists, but if you want to use it, be really carefuly with threads and reference cycles'. 😞

Ultimately, the only way to get round it is to reimplement part of HDF5 in a way that fits better with Python code supplying the underlying 'file' data. It sounds like this is part of what kerchunk does - reimplement the parts to read and decompress dataset chunks from an HDF5 file. But HDF5 is a massively complex format, and still growing new features, so reimplementing it completely is largely off the table.

@martindurant
Copy link

But HDF5 is a massively complex format, and still growing new features, so reimplementing it completely is largely off the table.

We can get to >90% :). That's especially true for data designed to be archival, which is less likely to use new, cutting edge features of HDF.

@tacaswell
Copy link
Member

I second everything @takluyver said.

We can get to >90%

I am very skeptical of this outside of narrow use-cases.

That said, I have faith that for any given file layout (filling/missing data, vds, weird float layouts, structured data, multi-file hdf, interfile links, circular groups, custom filters, target types on the client side, cross/multi chunk slicing dense slicing, slicing arbitrary hyper slabs, ...) if you have enough (static) data in that layout then writing a bespoke fast path around hdf5 is both possible and possibly worth it (e.g. kerchunk or http://icesat2sliderule.org/h5coro/ (which @takluyver just pointed me at)). However once you start trying to support "any" hdf5 file the complexity escalates rapidly.

@aragilar
Copy link
Member

https://github.com/jjhelmus/pyfive might be another option when working with fsspec (assuming the pure-python nature avoids the issues with async...).

Maybe I'm missing something, but why not use a FUSE (or equivalent) file system like rclone mount, rather than relying on either the python file wrapper or waiting on HDF5 to add support for your remote protocol?

@aragilar
Copy link
Member

One possible long-term idea would be to look at HDF5 gaining a read-only mode which drivers can opt into to avoid requiring locks (as I'm presuming they're there to avoid multiple conflicting writes), if the interest is primarily in read-only usage (rather than read-write, which is where I've used rclone).

@takluyver
Copy link
Member

https://github.com/jjhelmus/pyfive might be another option

Neat, I'm glad to see someone has had a go at a pure Python HDF5 reader. On the other hand, a glance at the issues suggests it has precisely the kind of difficulties I'd expect ("HDF5 looks to be using "new style" groups when track_order=True and there are more than 8 groups...")

Maybe I'm missing something, but why not use a FUSE (or equivalent) file system

I was wondering about suggesting this. I don't know about FUSE in any detail, so I have some concern that if libhdf5 calling Python code is a bad idea, the kernel calling Python code might be even worse. 🤷 More pragmatically, using FUSE means an application has to make global changes to the system (mounting an additional filesystem), I guess FUSE as a whole needs to be enabled by an admin, and even though it's possible on Windows, I'd be surprised if it's simple to set up. So I imagine there will be lots of cases where FUSE isn't really an option.

One possible long-term idea would be to look at HDF5 gaining a read-only mode which drivers can opt into to avoid requiring locks (as I'm presuming they're there to avoid multiple conflicting writes),

There are two kinds of locking going on. The locks we're talking about are to enforce that only one thread is using HDF5 at a time. IIUC, this is to do with the data libhdf5 has in memory - in particular, the fact that HDF5 IDs are reference counted: you have to ensure the counts are updated atomically, and that you don't free an object in one thread while another thread is still using it. Reference counting is also a big part of what makes it so hard to get rid of the GIL in Python.

Then there are filesystem locks, as a completely separate matter. They prevent conflicting writes, but also prevent reading and writing at the same time (unless you go into SWMR mode). By default (non-SWMR), writing may temporarily put the file in an inconsistent state which would cause problems if you were trying to read it at just that moment. But these should never result in a deadlock, because HDF5 doesn't try to wait for these locks - it errors straight away if it can't get the appropriate lock on the file.

@TomAugspurger
Copy link
Contributor Author

Maybe I'm missing something, but why not use a FUSE (or equivalent) file system

Even when it is an option, like a hosted JupyterHub set up by an admin, I've never heard great things about performance and reliability. It might be worth re-confirming that, but thus far I've stayed away from it.

@martindurant
Copy link

We can get to >90%

I am very skeptical of this outside of narrow use-cases.

I should have specified: of the data that's of interest to xarray, which means no complex strings, structs, tables, ragged arrays or deep hierarchies. Anecdotally, kerchunk has so far faced few problems, which were generally surmountable without too much effort ( fsspec/kerchunk#40 , fsspec/kerchunk#35 , zarr-developers/numcodecs#273 for interest).

@takluyver
Copy link
Member

I can definitely believe that the HDF5 spec is an 80/20 proposition, where implementing 20% of the spec lets you read 80% of the data. Though I think we're somewhat talking at cross purposes here: if I've understood correctly, kerchunk uses h5py (and thus libhdf5) to find the chunks, so you're actually reimplementing very little of HDF5, because all you need are the filters to decompress/decode a chunk. You can avoid using libhdf5 only once someone has already used libhdf5 to make an index in another format.

I think @tacaswell is thinking of something more like Pyfive. If that works for the data you're interested in, you can avoid libhdf5 entirely, with no need to create an index beforehand. But it's massively more work even to cover normal scenarios reliably.

@martindurant
Copy link

Yes, @takluyver , your understanding of kerchunk is correct. The initial scan of a file needs h5py, but it is done serially and had never run foul of deadlocks (so far). Once someone has done this, others need only the resultant set of references.

On FUSE: fsspec does support this for any of its implementations, so you split the data access into (at least) two processes. That model has been used for HDF5 access, but is probably even more flaky, as well as slow. Of course, there are plenty of FUSE implementations out there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants