Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
to_csv regression in 0.23.1 #21471
Please provide a reproducible example:
@WillAyd Francois's example is reproducible for me on Windows 7 using master. The output file test.txt.gz is empty instead of containing data.
If I let pandas do the compression it appears to work fine:
import sys import pandas as pd df = pd.DataFrame([0,1]) df.to_csv(sys.stdout)
This code writes the dataframe to a file named
I also have a problem with "to_csv" specifically on 0.23.1.
Looks like function "_get_handle()" returns "f" as FD number (int) instead of buf.
# GH 17778 handles zip compression for byte strings separately. buf = f.getvalue() if path_or_buf: f, handles = _get_handle(path_or_buf, self.mode, encoding=encoding, compression=self.compression) f.write(buf) f.close()
File "/Users/wr/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 1745, in to_csv formatter.save() File "/Users/wr/anaconda3/lib/python3.6/site-packages/pandas/io/formats/csvs.py", line 168, in save f.write(buf) AttributeError: 'int' object has no attribute 'write'
@WillAyd , I did a quick research.
It seems that all "file-like" objects which cannot be converted to string file paths are affected. Gzip wrapper, stdout, FD's - all these problems have the same origin.
Example with FD:
import pandas import os with os.fdopen(3, 'w') as f: print(f) pandas.DataFrame([0, 1]).to_csv(f)
I guess, integer comes from "name" attribute of TextIOWrapper. For STDOUT it will be
Hi, here are some additional examples of the changes in the behaviour of
A common use case is to write a file header once and then write many dataframes' data to that file. Our implementation looks like this:
This works in 0.23.0 but in 0.23.1 it produces a file that looks like this:
What happened here is that pandas has opened a second handle to the same file path in write mode, and our
Flushing alone would not help because now pandas will overwrite our data:
One workaround is both flushing manually AND giving pandas a write mode:
IMO this is not expected behaviour: if we give pandas an open file handle, we don't expect pandas to find out what the original path was, and open it again on a second file handle.
This is the bit of code where re-opening is decided: https://github.com/pandas-dev/pandas/blob/master/pandas/io/formats/csvs.py#L139 . This gives the "" behaviour pointed out by @saidie . Data is written to a StringIO first, finally the file is opened again by path and the data in the StringIO is written to it.