Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: don't close user-provided file handles in C parser (GH14418) #14520

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.19.1.txt
Expand Up @@ -37,6 +37,7 @@ Bug Fixes


- Bug in ``pd.read_csv`` for Python 2.x in which Unicode quote characters were no longer being respected (:issue:`14477`)
- Fixed regression where user-provided file handles were closed in ``read_csv`` (c engine) (:issue:`14418`).
- Bug in localizing an ambiguous timezone when a boolean is passed (:issue:`14402`)
- Bug in ``TimedeltaIndex`` addition with a Datetime-like object where addition overflow in the negative direction was not being caught (:issue:`14068`, :issue:`14453`)

Expand Down
2 changes: 2 additions & 0 deletions pandas/io/parsers.py
Expand Up @@ -1456,6 +1456,8 @@ def __init__(self, src, **kwds):
def close(self):
for f in self.handles:
f.close()

# close additional handles opened by C parser (for compression)
try:
self._reader.close()
except:
Expand Down
23 changes: 23 additions & 0 deletions pandas/io/tests/parser/common.py
Expand Up @@ -1602,3 +1602,26 @@ def test_internal_eof_byte(self):
expected = pd.DataFrame([["1\x1a", 2]], columns=['a', 'b'])
result = self.read_csv(StringIO(data))
tm.assert_frame_equal(result, expected)

def test_file_handles(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we have a c_parser_only set of tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but I would like to keep those tests together, and only the mmap does not work with the python engine (I don't test the actual reading of a mmap, only that the handler is not closed)

# GH 14418 - don't close user provided file handles

fh = StringIO('a,b\n1,2')
self.read_csv(fh)
self.assertFalse(fh.closed)

with open(self.csv1, 'r') as f:
self.read_csv(f)
self.assertFalse(f.closed)

# mmap not working with python engine
if self.engine != 'python':

import mmap
with open(self.csv1, 'r') as f:
m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
self.read_csv(m)
# closed attribute new in python 3.2
if PY3:
self.assertFalse(m.closed)
m.close()
9 changes: 4 additions & 5 deletions pandas/parser.pyx
Expand Up @@ -272,7 +272,7 @@ cdef class TextReader:
parser_t *parser
object file_handle, na_fvalues
object true_values, false_values
object dsource
object handle
bint na_filter, verbose, has_usecols, has_mi_columns
int parser_start
list clocks
Expand Down Expand Up @@ -554,9 +554,9 @@ cdef class TextReader:
def close(self):
# we need to properly close an open derived
# filehandle here, e.g. and UTFRecoder
if self.dsource is not None:
if self.handle is not None:
try:
self.dsource.close()
self.handle.close()
except:
pass

Expand Down Expand Up @@ -641,6 +641,7 @@ cdef class TextReader:
else:
raise ValueError('Unrecognized compression type: %s' %
self.compression)
self.handle = source

if isinstance(source, basestring):
if not isinstance(source, bytes):
Expand Down Expand Up @@ -684,8 +685,6 @@ cdef class TextReader:
raise IOError('Expected file path name or file-like object,'
' got %s type' % type(source))

self.dsource = source

cdef _get_header(self):
# header is now a list of lists, so field_count should use header[0]

Expand Down