New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revert #16663, which was a revert of #16039 #16675

Merged
merged 7 commits into from Jun 13, 2017

Conversation

Projects
None yet
4 participants
@dgwynne
Contributor

dgwynne commented Jun 12, 2017

i believe the issue with #16039 was the loss of the semantics provided by the "b" part of the mode in fopen(). this is a nop on unix-like systems, but appears to be significant on windows.

the equivalent functionality on windows in open() is the O_BINARY flag. this adds that to the open() call, and provides compat for it on platforms without the flag by defining it to 0.

while here, fix an fd leak in new_file_source() if the buffer allocation fails.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
  • whatsnew entry

David Gwynne added some commits Jun 12, 2017

David Gwynne
always treat files as binary to cope with windows and EOF.
on windows, EOF can appear "in band" if the file is considered text. when
moving from fread() to read(), i lost the "b" part of the mode. at the
time i believed this was a nop, since unix doesnt treat files differently
based on that flag.

this adds O_BINARY to the flags to open to restore the behaviour lost
when taking "b" away from fopen. if a platform doesn't provide O_BINARY,
this defines it to 0 so it can still be used without effect later on in
the code.
@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Jun 12, 2017

Member

You need to re-add the test that I wrote to ensure that your change is correct.

Member

gfyoung commented Jun 12, 2017

You need to re-add the test that I wrote to ensure that your change is correct.

@dgwynne

This comment has been minimized.

Show comment
Hide comment
@dgwynne

dgwynne Jun 12, 2017

Contributor

i did think it was going too well...

Contributor

dgwynne commented Jun 12, 2017

i did think it was going too well...

@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Jun 12, 2017

Codecov Report

Merging #16675 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16675      +/-   ##
==========================================
- Coverage   90.92%   90.92%   -0.01%     
==========================================
  Files         161      161              
  Lines       49272    49271       -1     
==========================================
- Hits        44803    44802       -1     
  Misses       4469     4469
Flag Coverage Δ
#multiple 88.69% <100%> (-0.01%) ⬇️
#single 40.22% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.43% <100%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0281886...0a831b6. Read the comment docs.

codecov bot commented Jun 12, 2017

Codecov Report

Merging #16675 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16675      +/-   ##
==========================================
- Coverage   90.92%   90.92%   -0.01%     
==========================================
  Files         161      161              
  Lines       49272    49271       -1     
==========================================
- Hits        44803    44802       -1     
  Misses       4469     4469
Flag Coverage Δ
#multiple 88.69% <100%> (-0.01%) ⬇️
#single 40.22% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.43% <100%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0281886...0a831b6. Read the comment docs.

@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Jun 12, 2017

Codecov Report

Merging #16675 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16675      +/-   ##
==========================================
- Coverage   90.92%   90.92%   -0.01%     
==========================================
  Files         161      161              
  Lines       49272    49271       -1     
==========================================
- Hits        44803    44802       -1     
  Misses       4469     4469
Flag Coverage Δ
#multiple 88.69% <100%> (-0.01%) ⬇️
#single 40.22% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.43% <100%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0281886...0a831b6. Read the comment docs.

codecov bot commented Jun 12, 2017

Codecov Report

Merging #16675 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16675      +/-   ##
==========================================
- Coverage   90.92%   90.92%   -0.01%     
==========================================
  Files         161      161              
  Lines       49272    49271       -1     
==========================================
- Hits        44803    44802       -1     
  Misses       4469     4469
Flag Coverage Δ
#multiple 88.69% <100%> (-0.01%) ⬇️
#single 40.22% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/parsers.py 95.43% <100%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0281886...0a831b6. Read the comment docs.

@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Jun 12, 2017

Codecov Report

Merging #16675 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16675      +/-   ##
==========================================
- Coverage   90.92%   90.92%   -0.01%     
==========================================
  Files         161      161              
  Lines       49272    49276       +4     
==========================================
+ Hits        44803    44804       +1     
- Misses       4469     4472       +3
Flag Coverage Δ
#multiple 88.68% <ø> (-0.01%) ⬇️
#single 40.18% <ø> (-0.04%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.23% <0%> (-0.1%) ⬇️
pandas/plotting/_core.py 81.83% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0281886...28ec409. Read the comment docs.

codecov bot commented Jun 12, 2017

Codecov Report

Merging #16675 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16675      +/-   ##
==========================================
- Coverage   90.92%   90.92%   -0.01%     
==========================================
  Files         161      161              
  Lines       49272    49276       +4     
==========================================
+ Hits        44803    44804       +1     
- Misses       4469     4472       +3
Flag Coverage Δ
#multiple 88.68% <ø> (-0.01%) ⬇️
#single 40.18% <ø> (-0.04%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.23% <0%> (-0.1%) ⬇️
pandas/plotting/_core.py 81.83% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0281886...28ec409. Read the comment docs.

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Jun 12, 2017

Member

@dgwynne : LGTM, well done re-patching your patch 😄

Member

gfyoung commented Jun 12, 2017

@dgwynne : LGTM, well done re-patching your patch 😄

@dgwynne

This comment has been minimized.

Show comment
Hide comment
@dgwynne

dgwynne Jun 12, 2017

Contributor

@gfyoung : thanks. sorry for the breakage though. it never even occurred to me that there'd be encoding issues like this.

it's also encouraging to see how quickly such issues are handled. reverting the diff was obviously right at the time.

Contributor

dgwynne commented Jun 12, 2017

@gfyoung : thanks. sorry for the breakage though. it never even occurred to me that there'd be encoding issues like this.

it's also encouraging to see how quickly such issues are handled. reverting the diff was obviously right at the time.

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Jun 12, 2017

Member

@dgwynne : No worries! We can't see everything that breaks if we don't test for it. ;)

Member

gfyoung commented Jun 12, 2017

@dgwynne : No worries! We can't see everything that breaks if we don't test for it. ;)

@jreback

approach lgtm. couple of comments.

@@ -54,7 +54,6 @@ Indexing
I/O
^^^
- Bug in ``pd.read_csv()`` in which files containing EOF characters mid-field could fail with the C engine on Windows (:issue:`16039`, :issue:`16559`)

This comment has been minimized.

@jreback

jreback Jun 12, 2017

Contributor

re-add this (and add this PR number on as well)

@jreback

jreback Jun 12, 2017

Contributor

re-add this (and add this PR number on as well)

This comment has been minimized.

@dgwynne

dgwynne Jun 12, 2017

Contributor

ok.

@dgwynne

dgwynne Jun 12, 2017

Contributor

ok.

Show outdated Hide outdated pandas/_libs/src/parser/io.c
Show outdated Hide outdated pandas/_libs/src/parser/io.c
Show outdated Hide outdated pandas/_libs/src/parser/io.c
Show outdated Hide outdated pandas/_libs/src/parser/io.c

@jreback jreback added the IO CSV label Jun 12, 2017

@jreback jreback added this to the 0.20.3 milestone Jun 12, 2017

David Gwynne added some commits Jun 13, 2017

@jreback jreback merged commit d298414 into pandas-dev:master Jun 13, 2017

5 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch Coverage not affected when comparing 0281886...28ec409
Details
codecov/project 90.92% (target 82%)
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jun 13, 2017

Contributor

thanks @dgwynne !

Contributor

jreback commented Jun 13, 2017

thanks @dgwynne !

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Jul 6, 2017

revert #16663, which was a revert of #16039 (#16675)
* Revert "BUG: Revert gh-16039 (#16663)"

This reverts commit c550372.

* always treat files as binary to cope with windows and EOF.

on windows, EOF can appear "in band" if the file is considered text. when
moving from fread() to read(), i lost the "b" part of the mode. at the
time i believed this was a nop, since unix doesnt treat files differently
based on that flag.

this adds O_BINARY to the flags to open to restore the behaviour lost
when taking "b" away from fopen. if a platform doesn't provide O_BINARY,
this defines it to 0 so it can still be used without effect later on in
the code.

* dont leak the fd in new_file_source() if buffer allocation fails.

* reapply the test for EOF in the middle of a stream.

part of c550372

* pass rb to _get_handle on python 3, otherwise stick to r.

part of c550372

* replace goto with inline unwinding of state.

requested by @jreback in #16675 feedback.

* describe the fixes to the read_csv() backend and issue numbers.

requested by @jreback in feedback on #16675

(cherry picked from commit d298414)

TomAugspurger added a commit that referenced this pull request Jul 7, 2017

revert #16663, which was a revert of #16039 (#16675)
* Revert "BUG: Revert gh-16039 (#16663)"

This reverts commit c550372.

* always treat files as binary to cope with windows and EOF.

on windows, EOF can appear "in band" if the file is considered text. when
moving from fread() to read(), i lost the "b" part of the mode. at the
time i believed this was a nop, since unix doesnt treat files differently
based on that flag.

this adds O_BINARY to the flags to open to restore the behaviour lost
when taking "b" away from fopen. if a platform doesn't provide O_BINARY,
this defines it to 0 so it can still be used without effect later on in
the code.

* dont leak the fd in new_file_source() if buffer allocation fails.

* reapply the test for EOF in the middle of a stream.

part of c550372

* pass rb to _get_handle on python 3, otherwise stick to r.

part of c550372

* replace goto with inline unwinding of state.

requested by @jreback in #16675 feedback.

* describe the fixes to the read_csv() backend and issue numbers.

requested by @jreback in feedback on #16675

(cherry picked from commit d298414)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment