Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with compression in to_csv method #21241

Closed
nvm1 opened this issue May 29, 2018 · 7 comments · Fixed by #21300
Closed

Issue with compression in to_csv method #21241

nvm1 opened this issue May 29, 2018 · 7 comments · Fixed by #21300
Labels
IO CSV read_csv, to_csv
Milestone

Comments

@nvm1
Copy link

nvm1 commented May 29, 2018

Problem description

Hi there,

after upgrading to the lastest version of pandas I have an issue with the code, that worked fine on the previous version (0.22.0):

            df.to_csv(
                path_or_buf=csv_path,
                encoding='utf8',
                compression='gz',
                quoting=1,
                sep='\t',
                index=False)

With pandas 0.23.0 I get:

Traceback (most recent call last):
File "C:_script.py", line 74, in
index=False)
File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1745, in to_csv
formatter.save()
File "C:\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 158, in save
data = f.read()
File "C:\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 298: character maps to

If I comment compression='gz' the code works fine.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 2012ServerR2
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented May 29, 2018

you would have to show a reproducible example

@ghost
Copy link

ghost commented May 31, 2018

My apologies if I'm missing something, but shouldn't the compression value for GNU zip be, 'gzip', rather than 'gz'? https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.to_csv.html

@nvm1
Copy link
Author

nvm1 commented May 31, 2018

Thank you for the responses.

Well, I can't reproduce the issue with pandas 0.23.0 on my home machine, but it appears on two of my windows servers at work. I will try to make a clean install of anaconda on one of them and see what happens.

yeah, @CamWhitelaw, you're absolutely right. I've actually tried to change my code before pasted it here, in original version of my code it uses gzip.

@WillAyd WillAyd added the Needs Info Clarification about behavior needed to assess issue label May 31, 2018
@minggli
Copy link
Contributor

minggli commented May 31, 2018

Please let us know if re-install anaconda solved the issue.

@minggli
Copy link
Contributor

minggli commented Jun 2, 2018

did some digging, it might be different default encoding which open() falls to when not specified. then on windows it tried to decode with CP1252 when your file is UTF-8 encoded.

@jreback jreback added this to the 0.23.1 milestone Jun 3, 2018
@jreback jreback added IO CSV read_csv, to_csv and removed Needs Info Clarification about behavior needed to assess issue labels Jun 3, 2018
@nvm1
Copy link
Author

nvm1 commented Jun 5, 2018

Hi there,

Sorry for being slow. I did a clean installation of anaconda and still I get the same error.

It works fine on my home machine, but fails on both servers.

@WillAyd
Copy link
Member

WillAyd commented Jun 5, 2018

@nvm1 this should be fixed on master via the referenced PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants