Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMD memmaps can lose their file handles prematurely #31

Closed
sezelt opened this issue Sep 8, 2021 · 1 comment · Fixed by #35
Closed

EMD memmaps can lose their file handles prematurely #31

sezelt opened this issue Sep 8, 2021 · 1 comment · Fixed by #35

Comments

@sezelt
Copy link

sezelt commented Sep 8, 2021

An issue we've encountered in py4DSTEM is that if you load a memory map to an EMD dataset using ncempy.io.emd.fileEMD.get_memmap(N), the handle to the HDF5 file is forcibly closed when the fileEMD object gets garbage collected.

If a fileEMD object gets created inside a function (and is not retained anywhere else), when that function exists the fileEMD will be deleted, calling its __del__ method and closing the file handle:

openNCEM/ncempy/io/emd.py

Lines 162 to 168 in ab0a40a

def __del__(self):
"""Destructor for EMD file object.
"""
# close the file
# if(not self.file_hdl.closed):
self.file_hdl.close()

A minimal reproduction of the problem is like so:

from ncempy.io.emd import fileEMD
fpath = "Sample_LFP_v0,12.h5"

def load_memmap(fpath,N):
    f = fileEMD(fpath)
    return f.get_memmap(N)

data, dims = load_memmap(fpath,0)

print(data)
# <Closed HDF5 dataset>

In py4DSTEM we use a context manager for most reading, so my approach when a memory map is asked for is to create a new h5py.File outside of the context manager. While this File will get garbage collected, this does not actually close the file handle because there is still a reference to the Dataset. A good question with my approach is whether the file handle would ever get closed; it's possible that when the Dataset is finally garbage collected then the file gets closed as well, but I haven't tested this.

I am using ncempy version 1.8.1

@ercius
Copy link
Owner

ercius commented Sep 8, 2021

Thanks for bringing this up! I can see how this would be a problem.

I think your approach is a good solution, and Ill adopt that in an upcoming ncempy release. I don't like the idea that the file object is lost, but h5py documentation explicitly mentions this solution at this link. So, going out of scope with so called 'weak' closing is a suggested way to deal with hdf5 memmaps.

This is a tricky problem, actually. Numpy just totally ignores the issue! See numpy documentation here.

Currently there is no API to close the underlying mmap. It is tricky to ensure the resource is actually closed, since it may be shared between different memmap instances.

Alternatively, a user could keep the fileEMD object in scope by returning the file object with the memmap. Then when you are done with the data you can del the file object and the dataset object. Or just let all of them go out of scope and everything is garbage collected at the same time hopefully closing the object properly. I don't think this is the best option in py4DSTEM since you just want to return data by itself.

This is an example of what I suggested above in case anyone runs into a similar issue.

from ncempy.io.emd import fileEMD
fpath = "Sample_LFP_v0,12.h5"

def load_memmap(fpath, N):
    f = fileEMD(fpath)
    return f, *f.get_memmap(N)

f, data, dims = load_memmap(fpath,0)

print(data)
# <HDF5 dataset "data": shape (10, 10), type "<f8">
del f, data, dims

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants