New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with compression in to_csv method #21241

Closed
nvm1 opened this Issue May 29, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@nvm1

nvm1 commented May 29, 2018

Problem description

Hi there,

after upgrading to the lastest version of pandas I have an issue with the code, that worked fine on the previous version (0.22.0):

            df.to_csv(
                path_or_buf=csv_path,
                encoding='utf8',
                compression='gz',
                quoting=1,
                sep='\t',
                index=False)

With pandas 0.23.0 I get:

Traceback (most recent call last):
File "C:_script.py", line 74, in
index=False)
File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1745, in to_csv
formatter.save()
File "C:\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py", line 158, in save
data = f.read()
File "C:\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 298: character maps to

If I comment compression='gz' the code works fine.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 2012ServerR2
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback

This comment has been minimized.

Contributor

jreback commented May 29, 2018

you would have to show a reproducible example

@CamWhitelaw

This comment has been minimized.

CamWhitelaw commented May 31, 2018

My apologies if I'm missing something, but shouldn't the compression value for GNU zip be, 'gzip', rather than 'gz'? https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.to_csv.html

@nvm1

This comment has been minimized.

nvm1 commented May 31, 2018

Thank you for the responses.

Well, I can't reproduce the issue with pandas 0.23.0 on my home machine, but it appears on two of my windows servers at work. I will try to make a clean install of anaconda on one of them and see what happens.

yeah, @CamWhitelaw, you're absolutely right. I've actually tried to change my code before pasted it here, in original version of my code it uses gzip.

@WillAyd WillAyd added the Needs Info label May 31, 2018

@minggli

This comment has been minimized.

Contributor

minggli commented May 31, 2018

Please let us know if re-install anaconda solved the issue.

@minggli

This comment has been minimized.

Contributor

minggli commented Jun 2, 2018

did some digging, it might be different default encoding which open() falls to when not specified. then on windows it tried to decode with CP1252 when your file is UTF-8 encoded.

@minggli minggli referenced this issue Jun 3, 2018

Merged

BUG: encoding error in to_csv compression #21300

4 of 4 tasks complete

@jreback jreback added this to the 0.23.1 milestone Jun 3, 2018

@jreback jreback added IO CSV and removed Needs Info labels Jun 3, 2018

@nvm1

This comment has been minimized.

nvm1 commented Jun 5, 2018

Hi there,

Sorry for being slow. I did a clean installation of anaconda and still I get the same error.

It works fine on my home machine, but fails on both servers.

@WillAyd

This comment has been minimized.

Member

WillAyd commented Jun 5, 2018

@nvm1 this should be fixed on master via the referenced PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment