Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv fails if file name cannot be encoded with utf-8 #7345

Closed
hereidk opened this issue Jun 4, 2014 · 3 comments
Closed

read_csv fails if file name cannot be encoded with utf-8 #7345

hereidk opened this issue Jun 4, 2014 · 3 comments
Labels
IO CSV read_csv, to_csv Unicode Unicode strings

Comments

@hereidk
Copy link

hereidk commented Jun 4, 2014

I am attempting to read CSV files with accented names. Below is an example of the error that gets thrown for a file named: Anzoátegui.csv

locs = pandas.read_csv(os.path.join(os.getcwd(),os.listdir()[1]),sep=',',skipinitialspace=True)


Traceback (most recent call last):
  File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_comm.py", line 891, in doIt
    result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)
  File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_vars.py", line 486, in evaluateExpression
    result = eval(compiled, updated_globals, frame.f_locals)
  File "<string>", line 1, in <module>
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 404, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 486, in __init__
    self._make_engine(self.engine)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 594, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 952, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "parser.pyx", line 330, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3040)
  File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5387)
OSError: File b'C:\\Python code\\DisaggregateExposure\\src\\root\\nested\\Anzo\xc3\xa1tegui.csv' does not exist

os.listdir()[1] correctly produces Anzoátegui.csv

The encoding argument in read_csv can sort out accents within the file, but is there a way to force an encoding for the file name aside from the python default utf-8?

Here's what I've got installed:

INSTALLED VERSIONS
------------------
Python: 3.3.3.final.0
OS: Windows
Release: 7
Processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.13.0
Cython: Not installed
Numpy: 1.7.2
Scipy: 0.13.2
statsmodels: 0.5.0
    patsy: 0.2.1
scikits.timeseries: Not installed
dateutil: 2.2
pytz: 2013.9
bottleneck: Not installed
PyTables: Not Installed
    numexpr: Not Installed
matplotlib: 1.3.1
openpyxl: Not installed
xlrd: Not installed
xlwt: Not installed
xlsxwriter: Not installed
sqlalchemy: Not installed
lxml: Not installed
bs4: Not installed
html5lib: Not installed
bigquery: Not installed
apiclient: Not installed
@jreback
Copy link
Contributor

jreback commented Jun 4, 2014

this is fixed in 0.14.0....give it a try

@jreback jreback closed this as completed Jun 4, 2014
@jreback
Copy link
Contributor

jreback commented Jun 4, 2014

here: 79df67a

if that doesn't work, pls reopen and comment

@hereidk
Copy link
Author

hereidk commented Jun 4, 2014

That would explain why I was banging my head. Will give this a try shortly, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Unicode Unicode strings
Projects
None yet
Development

No branches or pull requests

2 participants