File mode in `to_csv` is ignored, when passing a file object instead of a path #19827

colobas · 2018-02-21T21:24:14Z

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> df = pd.read_csv("example.csv")
>>> df.head()
   just  a  file
0     1  2     3
1     4  5     6
2     7  8     9
>>> with open("someother.csv", "wb") as f:
...     df.to_csv(f, mode="wb")
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python3.6/site-packages/pandas/core/frame.py", line 1524, in to_csv
    formatter.save()
  File "/usr/lib/python3.6/site-packages/pandas/io/formats/format.py", line 1652, in save
    self._save()
  File "/usr/lib/python3.6/site-packages/pandas/io/formats/format.py", line 1740, in _save
    self._save_header()
  File "/usr/lib/python3.6/site-packages/pandas/io/formats/format.py", line 1708, in _save_header
    writer.writerow(encoded_labels)
TypeError: a bytes-like object is required, not 'str'

Problem description

When passing a file opened in binary mode to df.to_csv and also passing mode='wb', this mode is ignored. I think it's because of these lines: https://github.com/pandas-dev/pandas/blob/master/pandas/io/common.py#L407-L411 and these ones: https://github.com/pandas-dev/pandas/blob/master/pandas/io/formats/format.py#L1660-L1662

It seems that is_text isn't passed, and so it assumes the default value of True

Expected Output

A file should just be written.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.20-1-lts
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.0
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0b10
sqlalchemy: 1.2.3
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: 0.1.2
fastparquet: 0.1.3
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

chris-b1 · 2018-02-21T21:59:50Z

Isn't csv sort of fundamentally a text format? The python csv writer, which we call out to, is ultimately what's choking. Certainly could raise a better error.

In [36]: import csv

In [37]: writer = csv.writer(open('test.csv', mode='wb'))

In [38]: writer.writerow(['a', 'b', 'c'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-bb2ef92b0247> in <module>()
----> 1 writer.writerow(['a', 'b', 'c'])

TypeError: a bytes-like object is required, not 'str'

colobas · 2018-02-21T22:51:57Z

It is, but sometimes you have a filesystem-sort-of-interface (like Azure's Data Lake Store one - here ) that requires files to be written in binary mode, regardless of the format.

TomAugspurger · 2018-02-22T02:00:54Z

Does passing a ADL's file-like object to to_csv, without the mode argument, work? I know you can pass an s3fs.S3File opened in binary mode to to_csv and everything works fine.

colobas · 2018-02-22T16:36:58Z

Hey @TomAugspurger , just tried it. Same error:

with adlfs_client.open("/dummy.csv", "wb") as f:
    dummy.to_csv(f)

yields

TypeError                                 Traceback (most recent call last)
<ipython-input-186-bbe6f623ffa5> in <module>()
      1 with adlfs_client.open("/dummy.csv", "wb") as f:
----> 2     dummy.to_csv(f)

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
   1522                                      doublequote=doublequote,
   1523                                      escapechar=escapechar, decimal=decimal)
-> 1524         formatter.save()
   1525 
   1526         if path_or_buf is None:

/opt/conda/lib/python3.6/site-packages/pandas/io/formats/format.py in save(self)
   1650                 self.writer = UnicodeWriter(f, **writer_kwargs)
   1651 
-> 1652             self._save()
   1653 
   1654         finally:

/opt/conda/lib/python3.6/site-packages/pandas/io/formats/format.py in _save(self)
   1738     def _save(self):
   1739 
-> 1740         self._save_header()
   1741 
   1742         nrows = len(self.data_index)

/opt/conda/lib/python3.6/site-packages/pandas/io/formats/format.py in _save_header(self)
   1706         if not has_mi_columns or has_aliases:
   1707             encoded_labels += list(write_cols)
-> 1708             writer.writerow(encoded_labels)
   1709         else:
   1710             # write out the mi

/opt/conda/lib/python3.6/site-packages/azure/datalake/store/core.py in write(self, data)
    849             raise ValueError('I/O operation on closed file.')
    850 
--> 851         out = self.buffer.write(ensure_writable(data))
    852         self.loc += out
    853         self.flush(syncFlag='DATA')

TypeError: a bytes-like object is required, not 'str'

As of now the workaround is to create a temporary file and upload it explicitly, so it' not like I'm stuck, but it's just ugly.

TomAugspurger · 2018-02-22T16:48:38Z

I think it works for s3fs because of this line:

pandas/pandas/io/common.py

Line 326 in abc4ef9

need_text_wrapping = (BytesIO, S3File)

Perhaps you can wrap your f in an io.TextIOWrapper?

I'm not sure what the best way to solve this is generically.

TomAugspurger · 2018-02-22T16:49:25Z

Perhaps just checking the buffer for a mode and if it's binary then we know we need to wrap in in a TextIOWrapper?

colobas · 2018-02-22T18:47:13Z

I can confirm that this does work:

from io import TextIOWrapper

with adlfs_client.open("/dummy.csv", "wb") as f:
    buf = TextIOWrapper(f)
    dummy.to_csv(buf)

Thanks for the suggestion

jreback · 2018-02-23T01:56:05Z

I suppose we could add a mini-example like this to the docs (io.rst). It is a useful case I think.
@colobas would you do a PR?

TomAugspurger · 2018-02-23T13:22:49Z

Can anyone think of a reason not to layer a TextIOWrapper over any buffer opened in binary mode? I wonder why the stdlib CSV writer doesn't do this. On the one hand, the user has explicitly given a binary-mode buffer to a thing writing text, which is "wrong". But I think there's no ambiguity to laying a TextIOWrapper on top of the binary buffer. We'll be using the same default encoding as if they user had just passed a text-mode file.

…

On Thu, Feb 22, 2018 at 7:56 PM, Jeff Reback ***@***.***> wrote: I suppose we could add a mini-example like this to the docs (io.rst). It is a useful case I think. @colobas <https://github.com/colobas> would you do a PR? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19827 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIlBDeUvismqTo7XrEA_OoYT3LZjCks5tXhq6gaJpZM4SOWoz> .

colobas · 2018-02-23T14:49:11Z

@TomAugspurger doesn't the mode parameter in to_csv stop having a purpose in that case?

EDIT: I had typed read_csv instead of to_csv

TomAugspurger · 2018-02-23T14:55:37Z

I assume you meant `to_csv`. In that case the mode still matters for write vs. append vs. exclusive. It's just the text vs. binary part of `mode` that would be "ignored". Given that the current behavior is so unfriendly (raising a not great error message), is there a harm in ignoring the text vs. binary part?

…

On Fri, Feb 23, 2018 at 8:49 AM, Guilherme Pires ***@***.***> wrote: @TomAugspurger <https://github.com/tomaugspurger> doesn't the mode parameter in read_csv stop having a purpose in that case? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19827 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIrphQeD3uruekW9zZpicyMhrfCm2ks5tXs_rgaJpZM4SOWoz> .

colobas · 2018-02-23T17:35:01Z

No I think there would be no harm. I can make a PR during the weekend

cbrnr · 2018-06-20T07:17:16Z

Appending to a file doesn't work for text mode either:

import pandas as pd


with open("test.csv", "w") as f:
    f.write("A,B,C\n")


df = pd.DataFrame({"A": [1, 2, 3, 4],
                   "B": [5, 6, 7, 8],
                   "C": [9, 10, 11, 12]})

with open("test.csv", "a") as f:
    df.to_csv(f, header=False, index=False)

Even though f is open in append mode, the file gets overwritten by to_csv.

However, appending does work when a file name is passed instead of a file handle:

df.to_csv("test.csv", header=False, index=False, mode="a")

I think this issue came up only recently, because I think this worked with a previous pandas version (although I didn't check).

jmerkin · 2018-06-27T18:08:10Z

I updated my conda install after probably 6-12 months and this issue crept up for me, so I can confirm that it arose recently.

Roughly, I was doing the following (essentially as cbrnr describes):

with open(filename, 'w') as fout:
    for input in inputs:
        result = some_function(input)
        result.to_csv(fout)

Looking through the installed versions in anaconda2/pkgs/ I see pandas-0.20.1 and pandas-0.23.1. Previously, the code would produce a single file with all the results produced. Now, the resulting file only contains the final results from the last iteration. The above code would work fine with 0.20, but produces the error as cbrnr describes now.

mariusvniekerk · 2018-07-02T18:33:16Z

Setting mode='a' does clear it in the second example. This seems to have changed between 0.22.0 and 0.23.1

Huite · 2018-07-03T17:06:37Z

I've downgraded to 0.23.0 to check, and 0.23.0 works as expected. It appears to be specific to 0.23.1.

yrhooke · 2018-07-06T20:21:07Z

Testing on current dev version (pandas: 0.24.0.dev0+232.g04caa569e) on my machine:

append works fine.

Wrapping the file with TextIOWrapper inside the with open() block as suggested above doesn't raise an error, but doesn't write to file either.

binary write mode appears to be an issue with python3 csv module (see @chris-b1 's comment).

Is there a particular fix that needs to happen? I'd be happy to work on it.

harshit-ag · 2019-04-06T08:15:31Z

can i work on this issue ?

harshit-ag · 2019-04-07T08:09:03Z

`import io
import pandas as pd
towrite = io.BytesIO()
df = pd.read_csv("example.csv")

df.to_excel(towrite)
towrite.seek(0)

with open("someother.csv", "wb") as f:

f.write(towrite.getvalue())
f.close()

`
This can be a probable solution by checking the mode and converting it to a byte-like object. I have tried to make changes but could not figure out the flow of code. @TomAugspurger can you help me to do that?

Enteee · 2020-05-13T19:40:01Z

Hi folks, I wrote an article on my blog on how to Support Binary File Objects with pandas.DataFrame.to_csv. At the end of the article I added a monkey patch I think can also be used as a work around for this problem. Hope this helps until this is resolved in pandas.

remram44 · 2020-06-08T22:41:52Z

I'm very confused about this, because passing a io.BytesIO to pandas will not work (so whatever detection you do isn't very good) but passing a simple file-like object with a write() method will make to_csv() write bytes to it for some reason. Why to_csv() puts bytes into anything is beyond me.

jreback added Docs IO CSV read_csv, to_csv good first issue labels Feb 23, 2018

jreback added this to the Next Major Release milestone Feb 23, 2018

kchawla-pi mentioned this issue Jan 31, 2019

[MRG+1] Fix Nistats test to run on Windows (using Appveyor CI) nilearn/nistats#306

Merged

Kim2212 mentioned this issue May 30, 2019

Serialization csv file with pandas.DataFrame.to_csv and luigi.format.Gzip (python 3) spotify/luigi#2719

Closed

simonjayhawkins mentioned this issue Oct 24, 2019

[WIP] fix --check-untyped-defs for MyPy #28339

Closed

twoertwein mentioned this issue Jul 8, 2020

support binary file handles in to_csv #35129

Merged

5 tasks

jreback removed this from the Contributions Welcome milestone Aug 3, 2020

jreback added this to the 1.2 milestone Aug 3, 2020

jreback closed this as completed in #35129 Aug 7, 2020

Hedingber mentioned this issue Feb 24, 2021

[Artifacts] Fix data frame artifacts failure to write as csv mlrun/mlrun#767

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File mode in `to_csv` is ignored, when passing a file object instead of a path #19827

File mode in `to_csv` is ignored, when passing a file object instead of a path #19827

colobas commented Feb 21, 2018 •

edited

Loading

INSTALLED VERSIONS

chris-b1 commented Feb 21, 2018

colobas commented Feb 21, 2018

TomAugspurger commented Feb 22, 2018

colobas commented Feb 22, 2018 •

edited

Loading

TomAugspurger commented Feb 22, 2018

TomAugspurger commented Feb 22, 2018

colobas commented Feb 22, 2018

jreback commented Feb 23, 2018

TomAugspurger commented Feb 23, 2018 via email

colobas commented Feb 23, 2018 •

edited

Loading

TomAugspurger commented Feb 23, 2018 via email

colobas commented Feb 23, 2018

cbrnr commented Jun 20, 2018

jmerkin commented Jun 27, 2018

mariusvniekerk commented Jul 2, 2018

Huite commented Jul 3, 2018

yrhooke commented Jul 6, 2018

harshit-ag commented Apr 6, 2019

harshit-ag commented Apr 7, 2019

Enteee commented May 13, 2020

remram44 commented Jun 8, 2020 •

edited

Loading

File mode in to_csv is ignored, when passing a file object instead of a path #19827

File mode in to_csv is ignored, when passing a file object instead of a path #19827

Comments

colobas commented Feb 21, 2018 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

chris-b1 commented Feb 21, 2018

colobas commented Feb 21, 2018

TomAugspurger commented Feb 22, 2018

colobas commented Feb 22, 2018 • edited Loading

TomAugspurger commented Feb 22, 2018

TomAugspurger commented Feb 22, 2018

colobas commented Feb 22, 2018

jreback commented Feb 23, 2018

TomAugspurger commented Feb 23, 2018 via email

colobas commented Feb 23, 2018 • edited Loading

TomAugspurger commented Feb 23, 2018 via email

colobas commented Feb 23, 2018

cbrnr commented Jun 20, 2018

jmerkin commented Jun 27, 2018

mariusvniekerk commented Jul 2, 2018

Huite commented Jul 3, 2018

yrhooke commented Jul 6, 2018

harshit-ag commented Apr 6, 2019

harshit-ag commented Apr 7, 2019

Enteee commented May 13, 2020

remram44 commented Jun 8, 2020 • edited Loading

File mode in `to_csv` is ignored, when passing a file object instead of a path #19827

File mode in `to_csv` is ignored, when passing a file object instead of a path #19827

colobas commented Feb 21, 2018 •

edited

Loading

Output of `pd.show_versions()`

colobas commented Feb 22, 2018 •

edited

Loading

colobas commented Feb 23, 2018 •

edited

Loading

remram44 commented Jun 8, 2020 •

edited

Loading