New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write and multiple Read ends with segmentation fault #2022
Comments
Do you have an example that can reproduce this? It looks like your writer code doesn't have permission to write to the file, and thus can't close it properly. |
@stangassinger mentioned a relevant detail by email:
The only major change in version 3.4 was that the wheels we publish on PyPI bundled HDF5 1.12.1 instead of 1.12.0. So assuming you're installing h5py with pip, this is almost certainly related to a change in HDF5 itself. Looking at the HDF5 1.12.1 release notes, this entry looks like it might be relevant:
@epourmal is it possible that this change could cause the error above (permission denied, errno 13, when trying to flush a file in SWMR write mode)? It looks like HDF5 is using Windows' |
With some more investigation my guess above seems likely. It appears that SWMR writers unlock the file, and then SWMR readers get a shared lock. On Windows, the But I must be missing something, because that would mean SWMR was entirely broken on Windows, and I assume someone would have either used it or tested it before now. |
@stangassinger sent me example code which reproduces it for him on Windows, though it works for me under Linux. He's agreed that I can post the code here. H5Writer.pyimport numpy as np
import h5py
from collections import deque
import typing
class H5Writer:
""" HDF5 File writer class"""
def __init__( self, fileName: str ):
self.H5_FILE_NAME: str = fileName
self.info_dict: typing.Dict[ typing.Any, typing.Any ] = {}
self.f = h5py.File( self.H5_FILE_NAME, 'w', libver='latest')
def __del__( self ):
self.f.close()
def create_Datasets_and_xyAxis( self, dataSetName: str, xAxisUnit: str, yAxisUnit: str ):
dt = np.dtype('float64', 'float64')
ar = np.array( [ [0], [0] ], dt )
dset = self.f.create_dataset( dataSetName, chunks = ( 2, 1 ), maxshape = ( 2, None ), data = ar , dtype = dt)
h5Set = { dataSetName:[dset,ar] }
self.info_dict.update( h5Set )
dset.attrs['x'] = xAxisUnit
dset.attrs['y'] = yAxisUnit
dset.flush()
self.f.flush()
def store_H5( self, dataSetName: str, xAxis: float, yAxis: float ):
dset = self.info_dict[dataSetName][0]
ar = self.info_dict[dataSetName][1]
ar = np.array( [ [ float( xAxis ) ], [ float( yAxis ) ] ] )
#print( ar )
#print("______")
new_shape = ( dset.shape[0], dset.shape[1] + 1 )
dset.resize( new_shape )
#print( ( len(ar)+1 ) * len (ar[0]) - 1 )
#dset[ :, ( len(ar)+1 ) * len (ar[0]) - 1 : ] = ar
dset[ :, -1 : ] = ar
def flush_H5(self):
self.f.flush()
def swmrH5(self):
"""
setting single write and multiple read option
without setting this option it is not possible
to do a live plot
"""
self.f.swmr_mode = True
self.f.flush()
##########################
import time
# Test Case
if __name__ == '__main__':
dataList = ["data_1","data_2","data_3","data_4"]
h5_writer = H5Writer( "TEST.h5" )
# configure datasets
for item in dataList:
xAxisUnit = 'time(s)'
yAxisUnit = 'Ampere'
h5_writer.create_Datasets_and_xyAxis( item, xAxisUnit, yAxisUnit )
h5_writer.flush_H5()
h5_writer.swmrH5()
# test values
a = 0
for i in range(2000, 2500000):
#print( '----> ' + str( len(arr1) ) )
#print( len (arr1[0] ) )
#print( (i+1) * len (arr1[0] ))
#print( arr1 )
count = 0
for item in dataList:
h5_writer.store_H5( item, i, a + count + 0.1 )
h5_writer.flush_H5()
count = count + 3
time.sleep( 1 )
## test counting dataset
a = a + 1
if ( a % 100 == 0 ):
a = 0
print(' a:', str(a) )
print(' i:', str(i) )
print( '___________', flush = True ) H5Reader.pyimport numpy as np
import h5py
from collections import deque
import typing
class H5Reader:
""" HDF5 File reader class"""
def __init__( self, fileName: str, max_queue_len: int ):
self.H5_FILE_NAME: str = fileName
self.MAX_QUEUE_LEN: int = max_queue_len
self.info_dict: typing.Dict[ typing.Any, typing.Any ] = {}
self.dataSetNameList: typing.List[str] = []
self.__readAttr()
##############################
## open file for reading data
self.f = h5py.File( self.H5_FILE_NAME, 'r', libver = 'latest', swmr = True)
def __del__( self ):
self.f.close()
def __readAttr( self ):
""" reading attributes from hdf5 file """
self.f = h5py.File( self.H5_FILE_NAME, 'r', libver = 'latest', swmr = True)
for dataSet_name in self.f:
self.dataSetNameList.append( dataSet_name )
dict_tmp = {}
for item in self.f[dataSet_name].attrs.keys():
dict_tmp.update({ item : str( self.f[dataSet_name].attrs[item] ) })
dict_tmp.update( { "xq" : deque( maxlen = self.MAX_QUEUE_LEN ) } )
dict_tmp.update( { "yq" : deque( maxlen = self.MAX_QUEUE_LEN ) } )
self.info_dict.update( { dataSet_name : dict_tmp } )
self.f.close()
def getDatasetNamesList( self ):
"""
Returns List of all dataset Names in hdf5 file
"""
return self.dataSetNameList
def get_Dataset( self, datasetName: str, shift_back_in_time_percentage: int = 100, complete_set:bool = False )-> typing.Tuple[str, str, typing.Any, typing.Any]:
""" read data for specific data set
"""
self.f[ datasetName ].refresh()
time.sleep(0.1)
#print( "DatasetLen: " + str( self.f[ datasetName ].shape[1] ) )
DatasetLen = self.f[ datasetName ].shape[1]
shift_back_in_time = int( DatasetLen - ( DatasetLen / 100 ) * shift_back_in_time_percentage )
queue_length = 0
if complete_set == True:
self.info_dict[datasetName]['xq'].clear()
self.info_dict[datasetName]['yq'].clear()
if DatasetLen > self.MAX_QUEUE_LEN:
queue_length = self.MAX_QUEUE_LEN
if shift_back_in_time >= ( self.MAX_QUEUE_LEN + DatasetLen ):
shift_back_in_time = self.MAX_QUEUE_LEN + DatasetLen
else:
shift_back_in_time = 0
queue_length = DatasetLen - 1
else:
queue_length = 1
#print(shift_back_in_time, flush=True)
time.sleep(0.1)
if ( queue_length + shift_back_in_time) >= DatasetLen:
shift_back_in_time = DatasetLen - self.MAX_QUEUE_LEN - 1
#print(shift_back_in_time, flush=True)
if shift_back_in_time <= 0:
shift_back_in_time = 0
if shift_back_in_time == 0:
xAr, yAr = self.f[ datasetName ][ :, -queue_length : ]
else:
xAr, yAr = self.f[ datasetName ][ :, -queue_length - shift_back_in_time : -shift_back_in_time ]
self.info_dict[datasetName]['xq'].extend( xAr )
self.info_dict[datasetName]['yq'].extend( yAr )
xAxisName = self.info_dict[datasetName]['x']
yAxisName = self.info_dict[datasetName]['y']
xQueue = self.info_dict[datasetName]['xq']
yQueue = self.info_dict[datasetName]['yq']
return xAxisName, yAxisName, xQueue, yQueue
##########################
import time
# Test Case
if __name__ == '__main__':
h5_reader = H5Reader( "TEST.h5", 333 )
while True:
time.sleep( 1 )
for datasetName in h5_reader.getDatasetNamesList():
xAxisName, yAxisName, xQueue, yQueue = h5_reader.get_Dataset( datasetName, 100, True )
print("-- xAxis --> " + str( xQueue ) )
print("-- yAxis " + datasetName + " --> " + str( yQueue ) )
print("________________",flush = True) Start the writer, leave it running and start the reader. Apparently this causes the writer to crash immediately, which seems to fit with my ideas above. |
Hello, I am having the same problem, the SWMR doesn't work on Windows 10 Pro, ver 10.0.19044 Build 19044. But it works when using WSL (Ubuntu). Has anybody found a solution or do we have to wait for a new h5py version? |
I think this is a bug in HDF5, so even a new version of h5py won't help until HDF5 fixes it. I'd suggest contacting HDF group, either via help@hdfgroup.org, or their issue tracker (https://github.com/HDFGroup/hdf5/issues ). Feel free to point them to this thread as a starting point I know they're working on a new 'VFD SWMR' implementation - it's possible that that will avoid the issue, but I haven't looked. |
I think you are correct assuming this is a problem of the Windows build of the HDF5 library, and trouble with their locking functionality. I found their document on file locking helpful in that regard: https://github.com/HDFGroup/hdf5/blob/develop/doc/file-locking.md It says there:
... so maybe indeed noone noticed yet. A quick workaround is to disable file locking for the script, e.g. using the environment variable |
Hello, I am having the same exact problem on Windows 11. The test code provided fails as expected. Is there any workaround or news of hdf5 library addressing the issue? |
output:
Traceback (most recent call last):
The text was updated successfully, but these errors were encountered: