TemporaryFile as input to read_table raises TypeError: '_TemporaryFileWrapper' object is not an iterator #13398

mbrucher · 2016-06-08T13:18:20Z

Although the requirement in the doc says that the input can be a file like object, it doesn't work with objects from tempfile. On Windows, they can't be reopened, so I need to pass the object itself.

Code Sample, a copy-pastable example if possible

import pandas as pd
from tempfile import TemporaryFile
new_file = TemporaryFile("w+")
dataframe = pd.read_table(new_file, skiprows=3, header=None, sep=r"\s*")

Expected Output

Not an exception!

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64

pandas: 0.18.0

The text was updated successfully, but these errors were encountered:

jreback · 2016-06-09T19:09:03Z

this is only with engine='python' as the sep you gave is a regex (if you use sep='\s+' which is a more typical whitespace) it works as expected.

jreback · 2016-06-09T19:11:06Z

in the future, pls show the entire show_versions(). you are missing crucial information there (the platform); though you did put it in the comments. we ask for these things to make it easier for people to look.

jreback · 2016-06-09T19:11:32Z

pull-requests are welcome

mbrucher · 2016-06-09T20:21:32Z

Do you mean that if I used sep='\s+', there is no exception?
Yes, I removed some info because it's not relevant here (except the platform, I didn't see I removed the OS) + there are some things that I can't send as well.

jreback · 2016-06-10T18:43:13Z

yes if u were splitting on white space it would use the c engine which would give u an error that the data file is empty

since u used a regex it went to the python engine and gives that weird error (only on Windows)

mbrucher · 2016-06-11T07:31:29Z

Oh, OK. The thing is that I may have several spaces between columns, so I have to use the regex :(

jreback · 2016-06-11T11:01:01Z

\s+ is white space with at least a single space having 0 spaces is very weird

mbrucher · 2016-06-11T18:24:33Z

Yes, agreed that 0 spaces is weird :)
BTW, the data file is not empty, I'm passing the file like object, it shouldn't fail in any case!

jreback · 2016-06-11T18:25:46Z

oh the example above it IS empty

in any case I'd u would like to debug - I think it's a simple fix

mbrucher · 2016-06-11T19:00:50Z

Oh yes, sorry. I forgot I had to remove the data as it is confidential!

mbrucher · 2016-06-11T19:01:21Z

The issue is that you can't call next() on a file apparently.

gfyoung · 2016-06-18T08:57:19Z

@mbrucher :

If you can't provide the original data, create dummy data that can trigger the exception, particularly example data that could be reproduced by just calling read_table(new_file).
If you have confidentiality issues, can you try reproducing the issue on another machine? Full version output is extremely useful when trying to debug.
How does your tempfile have data? Are you calling new_file.write before you call read_table? If so, make sure to call new_file.seek(0) first so as to reset the stream position. Otherwise, none of your written data will be read (you can see this for yourself if you call new_file.read() before and after calling new_file.seek(0)).

I should add that this advise also applies to normal file objects (i.e. those created by calling open(...)), so this issue with tempfiles is not unique IIUC.

jreback · 2016-06-18T12:15:25Z

@gfyoung this repros exactly as above with an empty file

gfyoung · 2016-06-18T13:40:45Z

I know but I thought @mbrucher said the file contained data, and I was addressing that. In any case, unless a more convincing example can provided, I think this is safe to close, as the function does work with tempfiles in the manner I described , data or no data.

jreback · 2016-06-18T14:01:54Z

no it doesn't on Windows

jreback · 2016-06-18T14:33:11Z

In [2]: import pandas as pd

In [3]: pd.__version__
Out[3]: '0.18.1+139.ge24ab24'

In [4]: import pandas as pd

In [5]: from tempfile import TemporaryFile

In [6]: new_file = TemporaryFile("w+")

In [7]: dataframe = pd.read_table(new_file, skiprows=3, header=None, sep=r"\s*")
C:\Miniconda2\envs\pandas3.5\Scripts\ipython-script.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not
 support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying
engine='python'.
  if __name__ == '__main__':
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-43d01852f446> in <module>()
----> 1 dataframe = pd.read_table(new_file, skiprows=3, header=None, sep=r"\s*")

C:\Users\conda\Documents\pandas3.5\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, s
queeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_va
lues, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, itera
tor, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols,
error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lin
es, memory_map, float_precision)
    627                     skip_blank_lines=skip_blank_lines)
    628
--> 629         return _read(filepath_or_buffer, kwds)
    630
    631     parser_f.__name__ = name

C:\Users\conda\Documents\pandas3.5\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    380
    381     # Create the parser.
--> 382     parser = TextFileReader(filepath_or_buffer, **kwds)
    383
    384     if (nrows is not None) and (chunksize is not None):

C:\Users\conda\Documents\pandas3.5\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
    710             self.options['has_index_names'] = kwds['has_index_names']
    711
--> 712         self._make_engine(self.engine)
    713
    714     def close(self):

C:\Users\conda\Documents\pandas3.5\pandas\io\parsers.py in _make_engine(self, engine)
    894             elif engine == 'python-fwf':
    895                 klass = FixedWidthFieldParser
--> 896             self._engine = klass(self.f, **self.options)
    897
    898     def _failover_to_python(self):

C:\Users\conda\Documents\pandas3.5\pandas\io\parsers.py in __init__(self, f, **kwds)
   1742         # infer column indices from self.usecols if is is specified.
   1743         self._col_indices = None
-> 1744         self.columns, self.num_original_columns = self._infer_columns()
   1745
   1746         # Now self.columns has the set of columns that we will process.

C:\Users\conda\Documents\pandas3.5\pandas\io\parsers.py in _infer_columns(self)
   2068         else:
   2069             try:
-> 2070                 line = self._buffered_line()
   2071
   2072             except StopIteration:

C:\Users\conda\Documents\pandas3.5\pandas\io\parsers.py in _buffered_line(self)
   2136             return self.buf[0]
   2137         else:
-> 2138             return self._next_line()
   2139
   2140     def _empty(self, line):

C:\Users\conda\Documents\pandas3.5\pandas\io\parsers.py in _next_line(self)
   2164             while self.pos in self.skiprows:
   2165                 self.pos += 1
-> 2166                 next(self.data)
   2167
   2168             while True:

C:\Users\conda\Documents\pandas3.5\pandas\io\parsers.py in _read()
   1869         else:
   1870             def _read():
-> 1871                 line = next(f)
   1872                 pat = re.compile(sep)
   1873                 yield pat.split(line.strip())

TypeError: '_TemporaryFileWrapper' object is not an iterator

mbrucher · 2016-06-18T14:48:53Z

So if the file is populated, of course same issue:

import pandas as pd
from tempfile import TemporaryFile
new_file = TemporaryFile("w+")
new_file.write("0 0")
new_file.flush()
new_file.seek(0)
dataframe = pd.read_table(new_file, header=None, sep=r"\s+", engine="python")
print(dataframe)

Tested on OS X with Python 2.7 (brew version), works like a charm, so there must be a difference in the implementation. I don't have a 3.5 on my Mac, so can't try it to see if it's the OS or the Python version :/

@gfyoung I know perfectly well how files work, thank you very much. I've been writing Python for more than a decade now, I hit all these issues in the past and obviously I know how to avoid them. But I guess you haven't tried my code before posting your message.

As @jreback said, it should be "easy" to fix, so I'll have a try when I have time.
A completely different question, be can't use a list of strings to generate a DataFrame? (for instance a filtered file would end up being a list of strings that could be read in pandas, that's actually my use use case. Using a TemporaryFile because I couldn't figure another way).

jreback · 2016-06-18T14:52:54Z

@mbrucher what do you mean a 'list of strings', do you mean?

you can! The difference is that this is not very efficient as have to be introspected (to figure out what exactly you are passing, as there are many possibilities), and then converted to a storage format (e.g. numpy). These may not necessarily be cheap; hence from the parser has more info available (e.g. it already knows the layout and can infer dtypes directly).

In [12]: DataFrame(['foo', 'bar', 'baz'])
Out[12]: 
     0
0  foo
1  bar
2  baz

In [13]: DataFrame([['foo', 'bar', 'baz']])
Out[13]: 
     0    1    2
0  foo  bar  baz

mbrucher · 2016-06-18T15:18:48Z

Actually I was thinking of something like pd.read_table(["0 0", "1 1"], header=None, sep=r"\s+", engine="python") as the data is not yet parsed in my case (reading a report file that mixes lots of things together, only looking for specific tables that I then append to a list).

jreback · 2016-06-18T15:28:33Z

Much more efficient to do this with the c-engine, you have whitespace separating. Introduce line separation and you are set.

In [5]: pd.read_csv(StringIO('\n'.join(["0 0", "1 1"])), header=None, sep="\s+")
Out[5]: 
   0  1
0  0  0
1  1  1

mbrucher · 2016-06-18T15:54:24Z

OK, thanks.

It seems that file like object don't implement next(). The issue comes from the fact that to select the type of reader, we check the attribute readline which is used for separators of length 1, but pandas uses next() for the other separators.

gfyoung · 2016-06-18T16:19:35Z

@mbrucher : Whoa, slow down there, aren't we letting our ego get bit in the way of rationale conversation? First of all, your code gave no indication that you were aware of this, so if you would like to update your code example in the initial post, go right ahead and do so.

Second, I did in fact try it out on a newly-acquired Windows 7 machine using Python 2.7.11 using v0.18.1 and could not reproduce the Exception. In addition, I tested the new examples that were later posted and also got not Exception.

mbrucher · 2016-06-18T16:26:00Z

@gfyoung Which is why I specified the Python version, as there is a change in the API AFAIK on the behavior of next. Anyway, the pull request fixes it and I'm adding a test as we speek.

gfyoung · 2016-06-18T16:31:58Z

@mbrucher : fair enough - but it's worthwhile to note since this issue you raise isn't then a general Windows bug but rather a change in the way TemporaryFile is written between Python versions.

mbrucher · 2016-06-18T16:36:11Z

They must have forgotten when they changed the next API :(

dcloses #13398 Author: Matthieu Brucher <matthieu.brucher@gmail.com> Closes #13481 from mbrucher/issue-13398 and squashes the following commits: 8b52631 [Matthieu Brucher] Yet another small update for more general regex 0d54151 [Matthieu Brucher] Simplified 5871625 [Matthieu Brucher] Grammar aa3f0aa [Matthieu Brucher] lint change 1c33fb5 [Matthieu Brucher] Simplified test and added what's new note. d8ceb57 [Matthieu Brucher] lint changes fd20aaf [Matthieu Brucher] Moved the test to the Python parser test file 98e476e [Matthieu Brucher] Using same way of referencing as just above, consistency. 119fb65 [Matthieu Brucher] Added reference to original issue in the test + test the result itself (assuming that previous test is OK) 5af8465 [Matthieu Brucher] Adding a test with Python engine d8decae [Matthieu Brucher] #13398 Change the way of reading back to readline (consistent with the test before entering the function)

gfyoung · 2016-07-02T19:30:31Z

@jreback : this issue should have been closed with @mbrucher 's commit (I think it didn't because the commit says "dcloses" instead of "closes")

jreback added Bug Difficulty Novice IO CSV read_csv, to_csv labels Jun 9, 2016

jreback added this to the Next Major Release milestone Jun 9, 2016

mbrucher mentioned this issue Jun 18, 2016

BUG: windows with TemporaryFile an read_csv #13398 #13481

Closed

jreback modified the milestones: 0.18.2, Next Major Release Jun 19, 2016

jreback closed this as completed Jul 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TemporaryFile as input to read_table raises TypeError: '_TemporaryFileWrapper' object is not an iterator #13398

TemporaryFile as input to read_table raises TypeError: '_TemporaryFileWrapper' object is not an iterator #13398

mbrucher commented Jun 8, 2016 •

edited by jreback

jreback commented Jun 9, 2016

jreback commented Jun 9, 2016

jreback commented Jun 9, 2016

mbrucher commented Jun 9, 2016

jreback commented Jun 10, 2016

mbrucher commented Jun 11, 2016

jreback commented Jun 11, 2016

mbrucher commented Jun 11, 2016

jreback commented Jun 11, 2016

mbrucher commented Jun 11, 2016

mbrucher commented Jun 11, 2016

gfyoung commented Jun 18, 2016 •

edited

jreback commented Jun 18, 2016

gfyoung commented Jun 18, 2016

jreback commented Jun 18, 2016

jreback commented Jun 18, 2016

mbrucher commented Jun 18, 2016

jreback commented Jun 18, 2016

mbrucher commented Jun 18, 2016

jreback commented Jun 18, 2016

mbrucher commented Jun 18, 2016

gfyoung commented Jun 18, 2016

mbrucher commented Jun 18, 2016

gfyoung commented Jun 18, 2016

mbrucher commented Jun 18, 2016

gfyoung commented Jul 2, 2016

TemporaryFile as input to read_table raises TypeError: '_TemporaryFileWrapper' object is not an iterator #13398

TemporaryFile as input to read_table raises TypeError: '_TemporaryFileWrapper' object is not an iterator #13398

Comments

mbrucher commented Jun 8, 2016 • edited by jreback

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Jun 9, 2016

jreback commented Jun 9, 2016

jreback commented Jun 9, 2016

mbrucher commented Jun 9, 2016

jreback commented Jun 10, 2016

mbrucher commented Jun 11, 2016

jreback commented Jun 11, 2016

mbrucher commented Jun 11, 2016

jreback commented Jun 11, 2016

mbrucher commented Jun 11, 2016

mbrucher commented Jun 11, 2016

gfyoung commented Jun 18, 2016 • edited

jreback commented Jun 18, 2016

gfyoung commented Jun 18, 2016

jreback commented Jun 18, 2016

jreback commented Jun 18, 2016

mbrucher commented Jun 18, 2016

jreback commented Jun 18, 2016

mbrucher commented Jun 18, 2016

jreback commented Jun 18, 2016

mbrucher commented Jun 18, 2016

gfyoung commented Jun 18, 2016

mbrucher commented Jun 18, 2016

gfyoung commented Jun 18, 2016

mbrucher commented Jun 18, 2016

gfyoung commented Jul 2, 2016

mbrucher commented Jun 8, 2016 •

edited by jreback

output of `pd.show_versions()`

gfyoung commented Jun 18, 2016 •

edited