You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An issue we've encountered in py4DSTEM is that if you load a memory map to an EMD dataset using ncempy.io.emd.fileEMD.get_memmap(N), the handle to the HDF5 file is forcibly closed when the fileEMD object gets garbage collected.
If a fileEMD object gets created inside a function (and is not retained anywhere else), when that function exists the fileEMD will be deleted, calling its __del__ method and closing the file handle:
In py4DSTEM we use a context manager for most reading, so my approach when a memory map is asked for is to create a new h5py.File outside of the context manager. While this File will get garbage collected, this does not actually close the file handle because there is still a reference to the Dataset. A good question with my approach is whether the file handle would ever get closed; it's possible that when the Dataset is finally garbage collected then the file gets closed as well, but I haven't tested this.
I am using ncempy version 1.8.1
The text was updated successfully, but these errors were encountered:
Thanks for bringing this up! I can see how this would be a problem.
I think your approach is a good solution, and Ill adopt that in an upcoming ncempy release. I don't like the idea that the file object is lost, but h5py documentation explicitly mentions this solution at this link. So, going out of scope with so called 'weak' closing is a suggested way to deal with hdf5 memmaps.
This is a tricky problem, actually. Numpy just totally ignores the issue! See numpy documentation here.
Currently there is no API to close the underlying mmap. It is tricky to ensure the resource is actually closed, since it may be shared between different memmap instances.
Alternatively, a user could keep the fileEMD object in scope by returning the file object with the memmap. Then when you are done with the data you can del the file object and the dataset object. Or just let all of them go out of scope and everything is garbage collected at the same time hopefully closing the object properly. I don't think this is the best option in py4DSTEM since you just want to return data by itself.
This is an example of what I suggested above in case anyone runs into a similar issue.
An issue we've encountered in py4DSTEM is that if you load a memory map to an EMD dataset using
ncempy.io.emd.fileEMD.get_memmap(N)
, the handle to the HDF5 file is forcibly closed when thefileEMD
object gets garbage collected.If a
fileEMD
object gets created inside a function (and is not retained anywhere else), when that function exists thefileEMD
will be deleted, calling its__del__
method and closing the file handle:openNCEM/ncempy/io/emd.py
Lines 162 to 168 in ab0a40a
A minimal reproduction of the problem is like so:
In py4DSTEM we use a context manager for most reading, so my approach when a memory map is asked for is to create a new
h5py.File
outside of the context manager. While thisFile
will get garbage collected, this does not actually close the file handle because there is still a reference to the Dataset. A good question with my approach is whether the file handle would ever get closed; it's possible that when the Dataset is finally garbage collected then the file gets closed as well, but I haven't tested this.I am using
ncempy
version 1.8.1The text was updated successfully, but these errors were encountered: