-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: close intermediate file descriptor right after it is used in netcdf.py #3429
Conversation
The failures are segfaults on all Python versions. Haven't checked if it's related, but this is new. |
I guess I fixed the problem. Please review my patch. |
Doesn't this now cause a lot of extra data copying? I assume you have some large netcdf files, otherwise you wouldn't see this. Can you take one that does load without this fix and time opening it in read mode? |
Without copy(), segfaults occur. Why does data.flags.writeable become True with copy()? |
I mean take one that works with current scipy master, and compare with this patch. Flags: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flags.html |
Now it sets writeable flag properly and passes the tests. |
Sorry I didn't pay attention to the details. Considering the inefficiency of copy(), I realized this is not a good way of solving the problem. It may be better simply to advertise not to use mmap for netcdf having too many variables, or pass mmap=False in netcdf_file(). A better idea may be to check the number of potential mmap() calls and, if it is greater than resource.getrlimit(resource.RLIMIT_NOFILE), change self.use_mmap to False. How can I count the number of opened file descriptors in Python? Or, any better idea? |
It should be possible to use a single mmap, initialized when the file is opened, |
Also: isn't |
Travis error is a timeout, looks OK for this patch now. |
I further modified to use buffer as pv suggested. Memory efficiency improved significantly! |
There's another major improvement in using ALLOCATIONGRANULARITY when available. See: When ALLOCATIONGRANULARITY is not set the mmap for each variable in the file is created from the start of the file to the end of the variable, since it's not possible to specify an offset; the offset is later specified when creating the Numpy array. For files with many variables this is highly inefficient. The code at bitbucket will check if ALLOCATIONGRANULARITY is available (on Python >= 2.6), and makes use of the paging. |
BUG: io/netcdf: use only a single mmap in netcdf Using a separate mmap for each access of each varible can cause running out of file descriptors on some platforms. Address this by using a single mmap covering the whole file.
Merged with minor changes in 58d8117 |
netcdf opens intermediate file descriptors and they are closed when the main file descriptor is closed. But this causes a too many open files error when the number of intermediate descriptors are too many. This patch closes the intermediate file descriptor right after it is used.