Problem reading one-column files with guessed separator #13839

Mark531 · 2016-07-29T16:19:20Z

Code Sample, a copy-pastable example if possible

Create a simple csv file:
----- C:\myfile.csv
name
1
2
3
4
----- C:\myfile.csv
And run:
pd.read_csv("C:\myfile.csv", encoding = "ISO-8859-1", sep = None, engine = "python")
This triggers an error:
ValueError: Expected 2 fields in line 2, saw 1
If a specific separator is entered, it works.

Expected Output

The file should be imported correctly.

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 8.1
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fr_FR

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0

The text was updated successfully, but these errors were encountered:

gfyoung · 2016-07-31T17:00:03Z

When you don't specify a delimiter, the Python engine will try to "sniff" it out from the first line using the Sniffer class from Python's built-in csv library. The reason I think you're getting this error is because the sniffer cannot tell what delimiter you want:

>>> import csv
>>> line = 'name\n'`  # that is your first line
>>> print(csv.Sniffer().sniff(line).delimiter)
'n'

I would not consider this a bug because your data does not provide sufficient information for parsing a delimiter of any kind, nor is the problem originating in the actual pandas implementation. I think as you did, specifying the delimiter, will get around this issue.

Mark531 · 2016-08-01T14:34:21Z

I don't agree, for me this is clearly a bug since the CSV file is perfectly formed. The option "sep=None" should work in this configuration, where no delimiter is required. Specifying a delimiter will not get around the issue since my code is generic and must be able to parse CSV files with any delimiter.

gfyoung · 2016-08-01T14:38:33Z

Perfectly formed is moot. Python's parser has no way of knowing that you have no delimiters in your file, and it has to go with what it is given.

In addition, as I mentioned previously, the issue lies in Python's sniffing, which is out of the control of pandas.

TomAugspurger · 2016-08-01T14:47:52Z

Agreed with @gfyoung, since we're just using python's csv module for sniffing here. I don't think we should try to improve on it.

@Mark531 I would look into whether this is a bug in csv.Sniffer (i.e. whether it should be able to infer that there isn't actually a delimiter; it might just not be supported), or come up with your own sniffer based on the data you'll be seeing.

Mark531 · 2016-08-01T14:48:17Z

No, a correctly-formed CSV file is governed by rules that are met by the file I showed. Any correct tokenizer is able to handle a single token, that's a very basic case. Now, if you rely on another library to tokenize items, I understand that you can't correct this behaviour.

gfyoung · 2016-08-01T14:51:29Z

@Mark531: I think you're not fully understanding our point here. Your point about correctness is moot, as you have not answered the question of how the Python parser is supposed to know that your data has no delimiters.

Mark531 · 2016-08-01T15:04:02Z

The correctness of a CSV file is nothing moot, it's specificed here: https://tools.ietf.org/html/rfc4180. So, to sum up I try to parse a correct CSV file with a generic call to read_csv (with the separator noted as "guessed") and it fails. So, it's a bug. The option "sep=None" whould logically handle the case "no delimiter". If it does not, then it should be corrected.

gfyoung · 2016-08-01T15:08:42Z

It is moot because that wasn't related to what I was saying. In addition, as we have made it clear, the "bug" you are insisting upon is not with pandas.

Mark531 · 2016-08-01T15:09:12Z

I understood, I'll forward it to Python dev then.

TomAugspurger closed this as completed Aug 1, 2016

TomAugspurger added Usage Question IO CSV read_csv, to_csv labels Aug 1, 2016

MrSiuol mentioned this issue May 2, 2023

BUG: #53034

Closed

3 tasks

louisgeisler mentioned this issue May 2, 2023

BUG: read_csv, sep=None, wrong separator guessing in case of one column csv #53035

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem reading one-column files with guessed separator #13839

Problem reading one-column files with guessed separator #13839

Mark531 commented Jul 29, 2016

gfyoung commented Jul 31, 2016 •

edited

Loading

Mark531 commented Aug 1, 2016

gfyoung commented Aug 1, 2016

TomAugspurger commented Aug 1, 2016

Mark531 commented Aug 1, 2016

gfyoung commented Aug 1, 2016

Mark531 commented Aug 1, 2016

gfyoung commented Aug 1, 2016

Mark531 commented Aug 1, 2016

Problem reading one-column files with guessed separator #13839

Problem reading one-column files with guessed separator #13839

Comments

Mark531 commented Jul 29, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

gfyoung commented Jul 31, 2016 • edited Loading

Mark531 commented Aug 1, 2016

gfyoung commented Aug 1, 2016

TomAugspurger commented Aug 1, 2016

Mark531 commented Aug 1, 2016

gfyoung commented Aug 1, 2016

Mark531 commented Aug 1, 2016

gfyoung commented Aug 1, 2016

Mark531 commented Aug 1, 2016

output of `pd.show_versions()`

gfyoung commented Jul 31, 2016 •

edited

Loading