You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The file in the gist at pgdr#e6c6ad236666909426cf841fe2704050 is a CSV file mwe.csv with 3 lines and 195 characters, of which one is a carriage return.
The CSV file is "broken" so it is fine that Pandas doesn't open it correctly.
However, when parsed with read_csv, pandas returns a dataframe with 131071 rows!
In addition, if you delete any one character from the file, the result is
Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
Expected Behavior
Do not buffer overflow
Installed Versions
INSTALLED VERSIONS
commit : f06c96a
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-137-generic
Version : #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_DK.UTF-8
LOCALE : en_DK.UTF-8
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The file in the gist at pgdr#e6c6ad236666909426cf841fe2704050 is a CSV file
mwe.csv
with 3 lines and 195 characters, of which one is a carriage return.The CSV file is "broken" so it is fine that Pandas doesn't open it correctly.
However, when parsed with
read_csv
, pandas returns a dataframe with 131071 rows!In addition, if you delete any one character from the file, the result is
Expected Behavior
Do not buffer overflow
Installed Versions
INSTALLED VERSIONS
commit : f06c96a
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-137-generic
Version : #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_DK.UTF-8
LOCALE : en_DK.UTF-8
pandas : 2.0.0.dev0+1401.gf06c96a93f
numpy : 1.24.1
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 44.0.0
pip : 20.0.2
The text was updated successfully, but these errors were encountered: