New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading datasets of numpy string arrays leads to error and/or segfault #5706
Comments
@scottstanie Could you please provide the output of |
sure! here it is: $ h5dump test_str_list.h5
HDF5 "test_str_list.h5" {
GROUP "/" {
DATASET "pairs" {
DATATYPE H5T_STRING {
STRSIZE 8;
STRPAD H5T_STR_NULLPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
DATA {
(0,0): "20200101", "20200201",
(1,0): "20200101", "20200301"
}
}
}
} (and just to include the specific traceback that hapened now, in case my versions are different from what I showed):
In [4]: import h5py
...: import xarray as xr
...:
...: with h5py.File("test_str_list.h5", "w") as hf:
...: hf["pairs"] = np.array([["20200101", "20200201"], ["20200101", "20200301"]]).astype("S")
...:
...: ds = xr.load_dataset("test_str_list.h5")
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/xarray/backends/plugins.py:68: RuntimeWarning: Engine 'cfgrib' loading failed:
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header
warnings.warn(f"Engine {name!r} loading failed:\n{ex}", RuntimeWarning)
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/fsspec/implementations/local.py:29: FutureWarning: The default value of auto_mkdir=True has been deprecated and will be changed to auto_mkdir=False by default in a future release.
warnings.warn(
*** Error in `/home/scott/miniconda3/envs/mapping/bin/python': free(): invalid next size (fast): 0x00005564b64622a0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81679)[0x7f56e752b679]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/../../../libnetcdf.so.18(nc_free_string+0x25)[0x7f54cf53d1a5]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0xcf3c8)[0x7f54cf7313c8]
/home/scott/miniconda3/envs/mapping/bin/python(PyCFunction_Call+0x54)[0x5564b397df44]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0x224fd)[0x7f54cf6844fd]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/netCDF4/_netCDF4.cpython-38-x86_64-linux-gnu.so(+0x559d9)[0x7f54cf6b79d9]
/home/scott/miniconda3/envs/mapping/bin/python(PyObject_GetItem+0x45)[0x5564b39d7935]
/home/scott/miniconda3/envs/mapping/bin/python(+0x128e0b)[0x5564b397ae0b]
/home/scott/miniconda3/envs/mapping/bin/python(_PyEval_EvalFrameDefault+0x947)[0x5564b3a1ec77]
/home/scott/miniconda3/envs/mapping/bin/python(+0x1b0736)[0x5564b3a02736]
/home/scott/miniconda3/envs/mapping/bin/python(_PyEval_EvalFrameDefault+0x947)[0x5564b3a1ec77]
/home/scott/miniconda3/envs/mapping/bin/python(_PyFunction_Vectorcall+0x1a6)[0x5564b3a01fc6]
/home/scott/miniconda3/envs/mapping/bin/python(_PyEval_EvalFrameDefault+0x4e03)[0x5564b3a23133]
/home/scott/miniconda3/envs/mapping/bin/python(_PyFunction_Vectorcall+0x1a6)[0x5564b3a01fc6]
/home/scott/miniconda3/envs/mapping/bin/python(+0x1800cd)[0x5564b39d20cd]
/home/scott/miniconda3/envs/mapping/bin/python(PyObject_GetItem+0x45)[0x5564b39d7935]
/home/scott/miniconda3/envs/mapping/bin/python(_PyEval_EvalFrameDefault+0xd53)[0x5564b3a1f083]
/home/scott/miniconda3/envs/mapping/bin/python(_PyEval_EvalCodeWithName+0x2c3)[0x5564b3a00db3]
/home/scott/miniconda3/envs/mapping/bin/python(_PyFunction_Vectorcall+0x378)[0x5564b3a02198]
/home/scott/miniconda3/envs/mapping/bin/python(+0x1b0841)[0x5564b3a02841]
/home/scott/miniconda3/envs/mapping/bin/python(+0x12404d)[0x5564b397604d]
/home/scott/miniconda3/envs/mapping/bin/python(_PyObject_CallFunction_SizeT+0x99)[0x5564b39761f9]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0xa11fd)[0x7f56dddfe1fd]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0xa54d7)[0x7f56dde024d7]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0x8a2d5)[0x7f56ddde72d5]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0x8adc4)[0x7f56ddde7dc4]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0xa559a)[0x7f56dde0259a]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0xa5ac9)[0x7f56dde02ac9]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0x13f2b7)[0x7f56dde9c2b7]
/home/scott/miniconda3/envs/mapping/bin/python(+0x129082)[0x5564b397b082]
/home/scott/miniconda3/envs/mapping/bin/python(_PyEval_EvalFrameDefault+0x181e)[0x5564b3a1fb4e]
/home/scott/miniconda3/envs/mapping/bin/python(_PyEval_EvalCodeWithName+0x2c3)[0x5564b3a00db3]
/home/scott/miniconda3/envs/mapping/bin/python(_PyFunction_Vectorcall+0x378)[0x5564b3a02198]
/home/scott/miniconda3/envs/mapping/bin/python(+0x1b0841)[0x5564b3a02841]
/home/scott/miniconda3/envs/mapping/bin/python(+0x12404d)[0x5564b397604d]
/home/scott/miniconda3/envs/mapping/bin/python(_PyObject_CallFunction_SizeT+0x99)[0x5564b39761f9]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0xa11fd)[0x7f56dddfe1fd]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0xa54d7)[0x7f56dde024d7]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0x8a2d5)[0x7f56ddde72d5]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0x8adc4)[0x7f56ddde7dc4]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0xa559a)[0x7f56dde0259a]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0xa5ac9)[0x7f56dde02ac9]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0x13f2b7)[0x7f56dde9c2b7]
/home/scott/miniconda3/envs/mapping/bin/python(+0x129082)[0x5564b397b082]
/home/scott/miniconda3/envs/mapping/bin/python(_PyEval_EvalFrameDefault+0x4e03)[0x5564b3a23133]
/home/scott/miniconda3/envs/mapping/bin/python(_PyFunction_Vectorcall+0x1a6)[0x5564b3a01fc6]
/home/scott/miniconda3/envs/mapping/bin/python(_PyEval_EvalFrameDefault+0xa63)[0x5564b3a1ed93]
/home/scott/miniconda3/envs/mapping/bin/python(_PyEval_EvalCodeWithName+0x2c3)[0x5564b3a00db3]
/home/scott/miniconda3/envs/mapping/bin/python(_PyFunction_Vectorcall+0x378)[0x5564b3a02198]
/home/scott/miniconda3/envs/mapping/bin/python(+0x1b0841)[0x5564b3a02841]
/home/scott/miniconda3/envs/mapping/bin/python(+0x12404d)[0x5564b397604d]
/home/scott/miniconda3/envs/mapping/bin/python(_PyObject_CallFunction_SizeT+0x99)[0x5564b39761f9]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0xa11fd)[0x7f56dddfe1fd]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0xa54d7)[0x7f56dde024d7]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so(+0x8a2d5)[0x7f56ddde72d5]
/home/scott/miniconda3/envs/mapping/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so
Aborted (core dumped)
xr.show_versions
In [2]: xr.show_versions()
INSTALLED VERSIONScommit: None xarray: 0.20.2 |
@scottstanie Here is the output of ncdump:
You see the trailing garbage. This is obviously a problem with netcdf-c/netcdf4-python, as it is not there with pure hdf5 (h5py/h5netcdf). But, there is a difference with Attributes and Datasets:
Output:
It's clearly seen, that the Datasets are correct in hdf5 dump, but somehow netcdf-c has issues with the string NULLPAD/NULLTERM. But at least there is no segfault with attributes. Othe than with Datasets/Variables: import h5py
import xarray as xr
with h5py.File("test_str_list_ds.h5", "w") as hf:
blob = np.array([["20200101", "20200201"], ["20200101", "20200301"]]).astype("S")
# Datasets
sid = h5py.h5s.create_simple((2, 2), (2, 2))
tid3 = h5py.h5t.TypeID.copy(h5py.h5t.C_S1)
tid3.set_size(8)
tid3.set_strpad(h5py.h5t.STR_NULLPAD)
tid4 = h5py.h5t.TypeID.copy(h5py.h5t.C_S1)
tid4.set_size(9)
tid4.set_strpad(h5py.h5t.STR_NULLTERM)
aid = h5py.h5d.create(hf.id, b"NULLPAD", tid3, sid)
ret = aid.write(sid, h5py.h5s.ALL, blob)
aid = h5py.h5d.create(hf.id, b"NULLTERM", tid4, sid)
ret = aid.write(sid, h5py.h5s.ALL, blob)
hf["numpy_S"] = blob
hf["numpy_O"] = blob.astype("O")
!h5dump test_str_list_ds.h5
!ncdump test_str_list_ds.h5
with xr.load_dataset("test_str_list_ds.h5", engine="h5netcdf", phony_dims="sort") as ds:
display(ds)
# with xr.load_dataset("test_str_list_ds.h5", engine="netcdf4") as ds:
# display(ds["numpy_O"])
# with nc.Dataset("test_str_list_ds.h5") as ds:
# display(ds)
# #display("NULLTERM:", ds["NULLTERM"][:])
# #display("NULLPAD:", ds["NULLPAD"][:])
# display("numpy_O", ds["numpy_O"][:])
# #display("numpy_S", ds["numpy_S"][:]) Output:
So here, netcdf-c/netcdf4-python will segfault for all variables beside It looks like the only option to achieve this for datasets/variables is to use numpy opaque dtype. |
ah sorry, didn't see the request for
Interesting that my
|
@scottstanie I'll check my h5py/hdf5 settings. But I doubt that might be the difference. I've experienced that the trailing garbage is changing from run to run, sometimes disappearing. |
Sounds good, but it seems like you're correct that it's a netcdf/netcdf4-python problem here, so I'll defer to others as to what the best changes to default settings would be to avoid the segfaults |
Problem source identified in netcdf-c: Unidata/netcdf-c#2159 |
This is resolved in recent |
What happened:
Numpy arrays of strings that are saved with h5py cause errors and segfaults, not always the same result.
What you expected to happen:
This works fine with
engine='h5netcdf'
:but will consistently have a segfault with
engine='netcdf4'
.I'm assuming this is a netcdf backend issue, but thought I'd raise it here since xarray was how I discovered it.
Minimal Complete Verifiable Example:
Anything else we need to know?:
Even stranger, it doesn't seem to be deterministic. After the crash, I tried the same load_dataset:
But then immediately after, another segault
Beginning of segfault stack trace, but goes on
Environment:
Output of xr.show_versions()
In [1]: xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 01:22:49)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1062.4.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.19.0
pandas: 1.1.0
numpy: 1.19.2
scipy: 1.5.3
netCDF4: 1.5.4
pydap: None
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: 2.8.3
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.5
cfgrib: 0.9.8.5
iris: None
bottleneck: 1.3.2
dask: 2021.01.0
distributed: 2.20.0
matplotlib: 3.3.1
cartopy: 0.17.0
seaborn: None
numbagg: None
pint: 0.17
setuptools: 50.3.2
pip: 21.1.3
conda: 4.8.4
pytest: None
IPython: 7.18.1
sphinx: 4.0.2
The text was updated successfully, but these errors were encountered: