-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened:
Numpy arrays of strings that are saved with h5py cause errors and segfaults, not always the same result.
What you expected to happen:
This works fine with engine='h5netcdf'
:
In [3]: ds = xr.load_dataset("test_str_list.h5", engine='h5netcdf', phony_dims='sort')
but will consistently have a segfault with engine='netcdf4'
.
I'm assuming this is a netcdf backend issue, but thought I'd raise it here since xarray was how I discovered it.
Minimal Complete Verifiable Example:
import h5py
import xarray as xr
with h5py.File("test_str_list.h5", "w") as hf:
hf["pairs"] = np.array([["20200101", "20200201"], ["20200101", "20200301"]]).astype("S")
ds = xr.load_dataset("test_str_list.h5")
*** Error in `/home/scott/miniconda3/envs/mapping/bin/python': munmap_chunk(): invalid pointer: 0x0000559c40956070 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f7c4)[0x7f4a9a6bb7c4]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5MM_xfree+0xf)[0x7f4a7a93c3ef]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5C__untag_entry+0xc6)[0x7f4a7a854836]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5C__flush_single_entry+0x275)[0x7f4a7a846085]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(+0x80de3)[0x7f4a7a846de3]
... (few thousand line backtrace)
Anything else we need to know?:
Even stranger, it doesn't seem to be deterministic. After the crash, I tried the same load_dataset:
In [2]: ds = xr.load_dataset("test_str_list.h5")
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-2-475169bc9c75> in <module>
----> 1 ds = xr.load_dataset("test_str_list.h5")
~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/api.py in load_dataset(filename_or_obj, **kwargs)
242
243 with open_dataset(filename_or_obj, **kwargs) as ds:
--> 244 return ds.load()
245
246
~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/dataset.py in load(self, **kwargs)
871 for k, v in self.variables.items():
872 if k not in lazy_data:
--> 873 v.load()
874
875 return self
~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/variable.py in load(self, **kwargs)
449 self._data = as_compatible_data(self._data.compute(**kwargs))
450 elif not is_duck_array(self._data):
--> 451 self._data = np.asarray(self._data)
452 return self
453
~/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype)
546
547 def __array__(self, dtype=None):
--> 548 self._ensure_cached()
549 return np.asarray(self.array, dtype=dtype)
550
~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in _ensure_cached(self)
543 def _ensure_cached(self):
544 if not isinstance(self.array, NumpyIndexingAdapter):
--> 545 self.array = NumpyIndexingAdapter(np.asarray(self.array))
546
547 def __array__(self, dtype=None):
~/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype)
516
517 def __array__(self, dtype=None):
--> 518 return np.asarray(self.array, dtype=dtype)
519
520 def __getitem__(self, key):
~/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype)
417 def __array__(self, dtype=None):
418 array = as_indexable(self.array)
--> 419 return np.asarray(array[self.key], dtype=None)
420
421 def transpose(self, order):
~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/netCDF4_.py in __getitem__(self, key)
89
90 def __getitem__(self, key):
---> 91 return indexing.explicit_indexing_adapter(
92 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
93 )
~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
708 """
709 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 710 result = raw_indexing_method(raw_key.tuple)
711 if numpy_indices.tuple:
712 # index the loaded np.ndarray
~/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/netCDF4_.py in _getitem(self, key)
102 with self.datastore.lock:
103 original_array = self.get_array(needs_lock=False)
--> 104 array = getitem(original_array, key)
105 except IndexError:
106 # Catch IndexError in netCDF4 and return a more informative
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._get()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd8 in position 0: invalid continuation byte
But then immediately after, another segault
In [4]: ds = xr.load_dataset("test_str_list.h5", engine='netcdf4')
*** Error in `/home/scott/miniconda3/envs/mapping/bin/python': corrupted size vs. prev_size: 0x000055f97e7194a0 ***
======= Backtrace: =========
Beginning of segfault stack trace, but goes on
======= Backtrace: =========
/lib64/libc.so.6(+0x7f7c4)[0x7f1ba11a87c4]
/lib64/libc.so.6(+0x818bb)[0x7f1ba11aa8bb]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5MM_xfree+0xf)[0x7f1b8142d3ef]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5S_close+0x84)[0x7f1b814a69a4]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5I_dec_ref+0x77)[0x7f1b8141a407]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5I_dec_app_ref+0x29)[0x7f1b8141a4d9]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/h5py/../../../libhdf5.so.103(H5Sclose+0x73)[0x7f1b814a7023]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(NC4_get_vars+0x5ad)[0x7f1b7bbc46ad]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(NC4_get_vara+0x12)[0x7f1b7bbc4e62]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(NC_get_vara+0x6f)[0x7f1b7bb6b5df]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/rasterio/../../.././libnetcdf.so.18(nc_get_vara+0x8b)[0x7f1b7bb6c35b]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0xccf21)[0x7f1b4d0daf21]
/home/scott/miniconda3/envs/mapping/bin/python(+0x13a77e)[0x55f97aeca77e]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0x224fd)[0x7f1b4d0304fd]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0x559d9)[0x7f1b4d0639d9]
/home/scott/miniconda3/envs/mapping/bin/python(PyObject_GetItem+0x48)[0x55f97af10aa8]
/home/scott/miniconda3/envs/mapping/bin/python(+0x139acd)[0x55f97aec9acd]
Environment:
Output of xr.show_versions()
In [1]: xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 01:22:49)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1062.4.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.19.0
pandas: 1.1.0
numpy: 1.19.2
scipy: 1.5.3
netCDF4: 1.5.4
pydap: None
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: 2.8.3
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.5
cfgrib: 0.9.8.5
iris: None
bottleneck: 1.3.2
dask: 2021.01.0
distributed: 2.20.0
matplotlib: 3.3.1
cartopy: 0.17.0
seaborn: None
numbagg: None
pint: 0.17
setuptools: 50.3.2
pip: 21.1.3
conda: 4.8.4
pytest: None
IPython: 7.18.1
sphinx: 4.0.2