Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write and multiple Read ends with segmentation fault #2022

Open
stangassinger opened this issue Dec 20, 2021 · 8 comments
Open

Write and multiple Read ends with segmentation fault #2022

stangassinger opened this issue Dec 20, 2021 · 8 comments

Comments

@stangassinger
Copy link

stangassinger commented Dec 20, 2021

  • Operating System: Windows 10
  • Python version (3.9)
  • h5py version (3.4.0 and 3.5,0 and 3.6.0)
    output:
    Traceback (most recent call last):
  File "C:\Users\GSTANGASSINGER\proj\others\private__MOC\src\dbc\H5Writer.py", line 102, in <module>
    h5_writer.flush_H5()
  File "C:\Users\GSTANGASSINGER\proj\others\private__MOC\src\dbc\H5Writer.py", line 52, in flush_H5
    self.f.flush()
  File "C:\Users\GSTANGASSINGER\proj\others\private__MOC\src\dbc\env\lib\site-packages\h5py\_hl\files.py", line 535, in flush
    h5f.flush(self.id)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5f.pyx", line 164, in h5py.h5f.flush
RuntimeError: Can't flush cache (file write failed: time = Sun Dec 19 20:11:42 2021
, filename = 'TEST.h5', file descriptor = 3, errno = 13, error message = 'Permission denied', buf = 0000022558C8A898, total write size = 16, bytes this sub-write = 16, bytes actually written = 18446744073709551615, offset = 6592)

Exception ignored in: <function H5Writer.__del__ at 0x00000225592C30D0>

Traceback (most recent call last):
  File "C:\Users\name\proj\H5Writer.py", line 19, in __del__
  File "C:\Users\name\proj\env\lib\site-packages\h5py\_hl\files.py", line 525, in close
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5f.pyx", line 358, in h5py.h5f.FileID._close_open_objects
RuntimeError: Can't decrement id ref count (file write failed: time = Sun Dec 19 20:11:42 2021
, filename = 'TEST.h5', file descriptor = 3, errno = 13, error message = 'Permission denied', buf = 0000022558C8A898, total write size = 16, bytes this sub-write = 16, bytes actually written = 18446744073709551615, offset = 6592)
Segmentation fault
@takluyver
Copy link
Member

Do you have an example that can reproduce this? It looks like your writer code doesn't have permission to write to the file, and thus can't close it properly.

@takluyver
Copy link
Member

@stangassinger mentioned a relevant detail by email:

the program worked with version 3.3.0

The only major change in version 3.4 was that the wheels we publish on PyPI bundled HDF5 1.12.1 instead of 1.12.0. So assuming you're installing h5py with pip, this is almost certainly related to a change in HDF5 itself.

Looking at the HDF5 1.12.1 release notes, this entry looks like it might be relevant:

    - File locking now works on Windows

      Since version 1.10.0, the HDF5 library has used a file locking scheme
      to help enforce one reader at a time accessing an HDF5 file, which can
      be helpful when setting up readers and writers to use the single-
      writer/multiple-readers (SWMR) access pattern.

      In the past, this was only functional on POSIX systems where flock() or
      fcntl() were present. Windows used a no-op stub that always succeeded.

      HDF5 now uses LockFileEx() and UnlockFileEx() to lock the file using the
      same scheme as POSIX systems. We lock the entire file when we set up the
      locks (by passing DWORDMAX as both size parameters to LockFileEx()).

      (DER - 2021/03/19, HDFFV-10191)

@epourmal is it possible that this change could cause the error above (permission denied, errno 13, when trying to flush a file in SWMR write mode)?

It looks like HDF5 is using Windows' LockFileEx function as a direct equivalent of flock on Unix systems. But flock is advisory - it only affects other processes getting a lock on the same file, not attempts to read or write - whereas if I'm reading that page right, LockFileEx is mandatory - one process getting a lock stops other processes reading/writing. That seems like a difference that might matter while coordinating SWMR access.

@takluyver
Copy link
Member

With some more investigation my guess above seems likely. It appears that SWMR writers unlock the file, and then SWMR readers get a shared lock. On Windows, the LockFileEx docs say that "Locking a portion of a file for shared access denies all processes write access..."

But I must be missing something, because that would mean SWMR was entirely broken on Windows, and I assume someone would have either used it or tested it before now.

@takluyver
Copy link
Member

@stangassinger sent me example code which reproduces it for him on Windows, though it works for me under Linux. He's agreed that I can post the code here.

H5Writer.py
import numpy as np
import h5py
from collections import deque
import typing 


class H5Writer:
    """ HDF5 File writer class"""
    def __init__( self, fileName: str ):
        self.H5_FILE_NAME: str   = fileName
        self.info_dict: typing.Dict[ typing.Any,  typing.Any ]     = {}
        self.f = h5py.File( self.H5_FILE_NAME, 'w', libver='latest')

    def __del__( self ):
        self.f.close()    

    def create_Datasets_and_xyAxis( self, dataSetName: str, xAxisUnit: str, yAxisUnit: str ):
        dt = np.dtype('float64', 'float64')
        ar  = np.array( [ [0], [0] ], dt )         
        dset = self.f.create_dataset( dataSetName, chunks = ( 2, 1 ), maxshape = ( 2, None ), data = ar , dtype = dt)
        h5Set = { dataSetName:[dset,ar] }
        self.info_dict.update( h5Set )
        dset.attrs['x'] = xAxisUnit
        dset.attrs['y'] = yAxisUnit
        dset.flush()
        self.f.flush()

    def store_H5( self, dataSetName: str, xAxis: float, yAxis: float ):
        dset      = self.info_dict[dataSetName][0]
        ar        = self.info_dict[dataSetName][1]
        ar        = np.array( [ [ float( xAxis ) ], [ float( yAxis ) ] ]  )
        #print( ar )
        #print("______")
        new_shape = (  dset.shape[0], dset.shape[1] + 1 )
        dset.resize( new_shape )
        #print( ( len(ar)+1 ) * len (ar[0]) - 1 )
        #dset[ :, ( len(ar)+1 ) * len (ar[0]) - 1 : ] = ar    
        dset[ :, -1 : ] = ar

    def flush_H5(self):
        self.f.flush()   

    def swmrH5(self):
        """
        setting single write and multiple read option
        without setting this option it is not possible 
        to do a live plot
        """
        self.f.swmr_mode = True   
        self.f.flush()        



##########################
import time
#  Test Case
if __name__ == '__main__':


    dataList = ["data_1","data_2","data_3","data_4"]

    h5_writer = H5Writer( "TEST.h5" )    


    # configure datasets
    for item in  dataList:    
        xAxisUnit   = 'time(s)'
        yAxisUnit   = 'Ampere'
        h5_writer.create_Datasets_and_xyAxis( item, xAxisUnit, yAxisUnit  )

    h5_writer.flush_H5()
    h5_writer.swmrH5()

    # test values
    a =  0
    for i in range(2000, 2500000):    
        #print( '----> ' + str( len(arr1) ) )
        #print( len (arr1[0] ) )
        #print( (i+1) * len (arr1[0] ))
        #print( arr1 )

        count = 0
        for item in dataList:
            h5_writer.store_H5( item, i, a + count + 0.1  ) 
            h5_writer.flush_H5()
            count = count + 3 

        time.sleep( 1 )

        ## test counting dataset 
        a = a + 1
        if ( a % 100 == 0 ):
            a = 0


        print(' a:', str(a) )
        print(' i:', str(i) )

        print( '___________', flush = True  )
H5Reader.py
import numpy as np
import h5py
from collections import deque
import typing


class H5Reader:
    """ HDF5 File reader class"""
    def __init__( self, fileName: str, max_queue_len: int ):
        self.H5_FILE_NAME: str   =  fileName
        self.MAX_QUEUE_LEN: int  = max_queue_len
        self.info_dict: typing.Dict[ typing.Any,  typing.Any ]       = {}
        self.dataSetNameList: typing.List[str] = []
        self.__readAttr()

        ##############################
        ## open file for reading data
        self.f = h5py.File( self.H5_FILE_NAME, 'r', libver = 'latest', swmr = True)

    def __del__( self ):
        self.f.close()    

    def __readAttr( self ):
        """ reading attributes from hdf5 file """
        self.f = h5py.File(  self.H5_FILE_NAME, 'r', libver = 'latest', swmr = True)
         
        for dataSet_name in self.f:   
            self.dataSetNameList.append( dataSet_name )             
            dict_tmp = {}    
            for item   in self.f[dataSet_name].attrs.keys():
                dict_tmp.update({ item : str( self.f[dataSet_name].attrs[item] ) })
                dict_tmp.update( { "xq" : deque( maxlen = self.MAX_QUEUE_LEN ) }  )
                dict_tmp.update( { "yq" : deque( maxlen = self.MAX_QUEUE_LEN ) }  )
            self.info_dict.update( { dataSet_name          : dict_tmp }  )
            
        self.f.close()    

    def getDatasetNamesList( self ):
        """
        Returns List of all dataset Names in hdf5 file
        """
        return self.dataSetNameList

    def get_Dataset( self, datasetName: str, shift_back_in_time_percentage: int = 100, complete_set:bool = False )-> typing.Tuple[str, str, typing.Any, typing.Any]:     
        """ read data for specific data set
        """
        self.f[ datasetName ].refresh()
        time.sleep(0.1)
        #print( "DatasetLen: " + str( self.f[ datasetName ].shape[1] ) )
        DatasetLen = self.f[ datasetName ].shape[1]

        shift_back_in_time = int( DatasetLen - ( DatasetLen / 100 ) * shift_back_in_time_percentage ) 
    
        queue_length = 0
        if complete_set == True:

            self.info_dict[datasetName]['xq'].clear()
            self.info_dict[datasetName]['yq'].clear()            

            if DatasetLen > self.MAX_QUEUE_LEN:
                queue_length = self.MAX_QUEUE_LEN
                if shift_back_in_time >= ( self.MAX_QUEUE_LEN + DatasetLen ):
                    shift_back_in_time = self.MAX_QUEUE_LEN + DatasetLen
            else: 
                shift_back_in_time = 0                   
                queue_length       = DatasetLen - 1
        else:        
            queue_length = 1

        #print(shift_back_in_time, flush=True)
        time.sleep(0.1)

        if ( queue_length + shift_back_in_time) >= DatasetLen:
            shift_back_in_time = DatasetLen - self.MAX_QUEUE_LEN - 1
            #print(shift_back_in_time, flush=True)
            

        if shift_back_in_time <= 0:
            shift_back_in_time = 0
            
        
        if shift_back_in_time == 0:
            xAr, yAr = self.f[ datasetName ][  :, -queue_length   :   ]
        else:    
            xAr, yAr = self.f[ datasetName ][  :, -queue_length - shift_back_in_time  : -shift_back_in_time  ]             


        self.info_dict[datasetName]['xq'].extend( xAr )
        self.info_dict[datasetName]['yq'].extend( yAr )
        xAxisName = self.info_dict[datasetName]['x']
        yAxisName = self.info_dict[datasetName]['y']
        xQueue    = self.info_dict[datasetName]['xq']            
        yQueue    = self.info_dict[datasetName]['yq']   
        
        return  xAxisName, yAxisName, xQueue, yQueue     


##########################
import time
#  Test Case
if __name__ == '__main__':

    h5_reader = H5Reader( "TEST.h5", 333 )    

    while True:
        time.sleep( 1 )
        for datasetName in h5_reader.getDatasetNamesList():       
            xAxisName, yAxisName, xQueue, yQueue = h5_reader.get_Dataset( datasetName, 100, True )
            print("--  xAxis  --> " + str( xQueue ) )
            print("--  yAxis " + datasetName + " --> " + str( yQueue )  )      

        print("________________",flush = True)      

Start the writer, leave it running and start the reader. Apparently this causes the writer to crash immediately, which seems to fit with my ideas above.

@dottcake
Copy link

dottcake commented May 6, 2022

Hello,

I am having the same problem, the SWMR doesn't work on Windows 10 Pro, ver 10.0.19044 Build 19044. But it works when using WSL (Ubuntu). Has anybody found a solution or do we have to wait for a new h5py version?

@takluyver
Copy link
Member

I think this is a bug in HDF5, so even a new version of h5py won't help until HDF5 fixes it. I'd suggest contacting HDF group, either via help@hdfgroup.org, or their issue tracker (https://github.com/HDFGroup/hdf5/issues ). Feel free to point them to this thread as a starting point

I know they're working on a new 'VFD SWMR' implementation - it's possible that that will avoid the issue, but I haven't looked.

@maclomhair
Copy link

I think you are correct assuming this is a problem of the Windows build of the HDF5 library, and trouble with their locking functionality. I found their document on file locking helpful in that regard: https://github.com/HDFGroup/hdf5/blob/develop/doc/file-locking.md

It says there:

SWMR isn't well-tested on Windows, so this scheme hasn't been as thoroughly vetted as the flock-based scheme.

... so maybe indeed noone noticed yet.

A quick workaround is to disable file locking for the script, e.g. using the environment variable HDF5_USE_FILE_LOCKING=FALSE, or using file access property lists. (Tested and worked for me)

@voverius
Copy link

voverius commented May 6, 2023

Hello,

I am having the same exact problem on Windows 11. The test code provided fails as expected. Is there any workaround or news of hdf5 library addressing the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants