file descriptors are not closed #14

benoit-pierre · 2017-05-03T22:14:34Z

For example, when running the following code on Linux:

import os

import pyexcel

pyexcel.get_book_dict(file_name='test.xlsx')

fd_dir = '/proc/%u/fd' % os.getpid()
for fd_name in os.listdir(fd_dir):
    print(fd_name, '-> ', end='')
    try:
        print(os.readlink('%s/%s' % (fd_dir, fd_name)))
    except FileNotFoundError:
        print()

The last file descriptor open point to test.xlsx.

Beside the file descriptor leak, this is really problematic in Windows, as it make it impossible to concurrently modify the spreadsheet in Office when it has been read in another (still running) application.

The text was updated successfully, but these errors were encountered:

chfw · 2017-05-03T23:12:51Z

the close() function should be called for read-only and write-only mode, which were used in this library. will run your code to test a fix.

benoit-pierre · 2017-05-03T23:21:58Z

I've been trying this, but it's not enough:

 pyexcel_xlsx/xlsx.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git i/pyexcel_xlsx/xlsx.py w/pyexcel_xlsx/xlsx.py
index ed01c1b..bf8a036 100644
--- i/pyexcel_xlsx/xlsx.py
+++ w/pyexcel_xlsx/xlsx.py
@@ -101,7 +101,7 @@ class XLSXBook(BookReader):
 
     def read_sheet(self, native_sheet):
         sheet = XLSXSheet(native_sheet, **self._keywords)
-        return {sheet.name: sheet.to_array()}
+        return {sheet.name: list(sheet.to_array())}
 
     def _load_the_excel_file(self, file_alike_object):
         self._native_book = openpyxl.load_workbook(
@@ -111,6 +111,9 @@ class XLSXBook(BookReader):
         self.skip_hidden_sheets = self._keywords.get(
             'skip_hidden_sheets', True)
 
+    def close(self):
+        self._native_book.close()
+        self._native_book = None
 
 class XLSXSheetWriter(SheetWriter):
     """

There's an issue in openpyxl, it's closing its ZipFile archive, but internally, ZipFile tracks reference counts, and the descriptor is not closed because there's still at least one reference.

benoit-pierre · 2017-05-03T23:39:46Z

This additional patch in openpyxl is necessary:

 openpyxl/worksheet/read_only.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git i/openpyxl/worksheet/read_only.py w/openpyxl/worksheet/read_only.py
index fd533569..93438328 100644
--- i/openpyxl/worksheet/read_only.py
+++ w/openpyxl/worksheet/read_only.py
@@ -89,7 +89,8 @@ class ReadOnlyWorksheet(object):
     def xml_source(self):
         """Parse xml source on demand, default to Excel archive"""
         if self._xml is None:
-            return self.parent._archive.open(self.worksheet_path)
+            from six import BytesIO
+            return BytesIO(self.parent._archive.read(self.worksheet_path))
         return self._xml

Now to see if I clean it up to make a proper PR...

chfw · 2017-05-04T08:50:30Z

The fix seems to be complex. My thought was to allow the developer to consume the data on demand so as to avoid reading the data into memory on behalf of the developer. "streaming=True" would enable on-demand feature. For this feature, the file handle should be kept open and eventually will be closed by garbage collector.

As in your changes, you have noticed that close method alone would cause a RuntimeError: Attempt to read ZIP archive that was already closed. So to cater the need to close the file handle manually, I would have to think of some alternatives.

benoit-pierre · 2017-05-04T12:10:21Z

I certainly don't expect pyexcel.get_book_dict to only read on demand, as opposed to something like iget_records. I don't think letting the garbage collector close file descriptors is a good idea, they should be closed right after I'm done with consuming whatever data it is I'm asking pyexcel for, either explicitly through a call to close on the context I'm using, or better yet with should be supported support so it's closed at the end of the block.

chfw · 2017-05-04T16:01:23Z

yes, you had your point there. the fix would be make those file handle closure explicit. with get_* functions, auto closure is expected whereas iget_*, the closure will be left to the developer after the generator has been consumed.

benoit-pierre · 2017-05-04T16:15:59Z

OK, for now I'll be using a patched version of pyexcel-xlsx (with the above patch) and of openpyxl (with the patch mentioned here).

chfw · 2017-05-05T05:51:27Z

I will put this fix in 0.4.0 branch, next major release. In this way, I could focus the effort on one branch instead of supporting two branches.

chfw · 2017-06-04T06:50:23Z

Inconsistent behaviour was found in this test run:

#. file handle from iget_data was closed in python 3.6, pypy only but not with other python versions
#. file handle from get_data was left open still in pypy .

openpyxl version is v2.4.8.

This was referenced May 6, 2017

file handle not closed pyexcel/pyexcel#83

Closed

file handle not closed pyexcel/pyexcel-io#32

Closed

chfw added a commit that referenced this issue May 6, 2017

#14: close file handle explicitly

7c669ba

chfw mentioned this issue May 6, 2017

file handle not closed pyexcel/pyexcel-xls#15

Closed

chfw added a commit that referenced this issue May 31, 2017

test and verify the xlsx file handle is really closed #14

a435913

chfw added a commit to pyexcel/pyexcel-ods that referenced this issue May 31, 2017

verify odfpy does close file handle, pyexcel/pyexcel-xlsx#14

4c35d8f

chfw added a commit to pyexcel/pyexcel-ods3 that referenced this issue Jun 1, 2017

make sure ods3 close ods file, pyexcel/pyexcel-xlsx#14

28bb779

chfw added a commit to pyexcel/pyexcel-odsr that referenced this issue Jun 1, 2017

make sure pyexcel-odsr close ods file, pyexcel/pyexcel-xlsx#14

fccfc27

chfw mentioned this issue Jun 7, 2017

openpyxl leaks file handle in pypy #17

Closed

chfw closed this as completed Jun 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file descriptors are not closed #14

file descriptors are not closed #14

benoit-pierre commented May 3, 2017

chfw commented May 3, 2017

benoit-pierre commented May 3, 2017

benoit-pierre commented May 3, 2017

chfw commented May 4, 2017

benoit-pierre commented May 4, 2017

chfw commented May 4, 2017

benoit-pierre commented May 4, 2017 •

edited

Loading

chfw commented May 5, 2017

chfw commented Jun 4, 2017

file descriptors are not closed #14

file descriptors are not closed #14

Comments

benoit-pierre commented May 3, 2017

chfw commented May 3, 2017

benoit-pierre commented May 3, 2017

benoit-pierre commented May 3, 2017

chfw commented May 4, 2017

benoit-pierre commented May 4, 2017

chfw commented May 4, 2017

benoit-pierre commented May 4, 2017 • edited Loading

chfw commented May 5, 2017

chfw commented Jun 4, 2017

benoit-pierre commented May 4, 2017 •

edited

Loading