New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old read_csv() & EOF character issue back #16559

Closed
gaptekar opened this Issue May 31, 2017 · 6 comments

Comments

Projects
None yet
5 participants
@gaptekar

gaptekar commented May 31, 2017

I recently updated from pandas 0.19.2 to 0.2
I am experiencing the exact same issue as this person did back in 2013
#5500
"Error tokenizing data. C error: EOF inside string starting at line. 140"

Reverting back to 0.19.2 has fixed the issue. Can run the unit test that was created for this problem?
gfyoung@8c4cf85

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 31, 2017

Contributor

can you show a reproducible example.

Contributor

jreback commented May 31, 2017

can you show a reproducible example.

@gorkemozkaya

This comment has been minimized.

Show comment
Hide comment
@gorkemozkaya

gorkemozkaya Jun 7, 2017

I'm having the same problem. Python 3, pandas 0.20.2

reproducible example:

import pandas as pd

with open('test.csv', 'wb') as fout:
    fout.write(b'c1,c2\r\n"test \x1a    test", test\r\n')

pd.read_csv('test.csv')

#ParserError: Error tokenizing data. C error: EOF inside string starting at line 1

gorkemozkaya commented Jun 7, 2017

I'm having the same problem. Python 3, pandas 0.20.2

reproducible example:

import pandas as pd

with open('test.csv', 'wb') as fout:
    fout.write(b'c1,c2\r\n"test \x1a    test", test\r\n')

pd.read_csv('test.csv')

#ParserError: Error tokenizing data. C error: EOF inside string starting at line 1
@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Jun 8, 2017

Member

Is this a windows issue? (cannot reproduce your example above on linux)

Member

jorisvandenbossche commented Jun 8, 2017

Is this a windows issue? (cannot reproduce your example above on linux)

@gorkemozkaya

This comment has been minimized.

Show comment
Hide comment
@gorkemozkaya

gorkemozkaya Jun 8, 2017

Yes, it happened on a 64 bit Windows.

gorkemozkaya commented Jun 8, 2017

Yes, it happened on a 64 bit Windows.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jun 9, 2017

Contributor

@gfyoung can you verfiy / see if you can fix?

Contributor

jreback commented Jun 9, 2017

@gfyoung can you verfiy / see if you can fix?

@jreback jreback added this to the Next Major Release milestone Jun 9, 2017

gfyoung added a commit to gfyoung/pandas that referenced this issue Jun 11, 2017

BUG: Revert gh-16039
gh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes gh-16559.
@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Jun 11, 2017

Member

@jreback : Confirmed! git bisection reveals that #16039 is the culprit. PR coming soon.

Member

gfyoung commented Jun 11, 2017

@jreback : Confirmed! git bisection reveals that #16039 is the culprit. PR coming soon.

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.3, Next Major Release Jun 11, 2017

gfyoung added a commit to gfyoung/pandas that referenced this issue Jun 11, 2017

BUG: Revert gh-16039
gh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes gh-16559.

gfyoung added a commit to gfyoung/pandas that referenced this issue Jun 11, 2017

BUG: Revert gh-16039
gh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes gh-16559.

gfyoung added a commit to gfyoung/pandas that referenced this issue Jun 11, 2017

BUG: Revert gh-16039
gh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes gh-16559.

@jreback jreback closed this in #16663 Jun 11, 2017

jreback added a commit that referenced this issue Jun 11, 2017

BUG: Revert gh-16039 (#16663)
gh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes gh-16559.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jul 6, 2017

BUG: Revert gh-16039 (#16663)
gh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes gh-16559.

(cherry picked from commit c550372)

TomAugspurger added a commit that referenced this issue Jul 7, 2017

BUG: Revert gh-16039 (#16663)
gh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes gh-16559.

(cherry picked from commit c550372)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment