BUG: Delegate more of Excel parsing to CSV #23544

gfyoung · 2018-11-07T09:17:45Z

The idea is that we read the Excel file, get the data, and let the TextParser handle reading and parsing, or at least most of it. We shouldn't be doing a lot of work that is already defined in parsers.py

In doing so, identified several bugs:

index_col=None was not being respected
usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs
usecols was not being validated as proper Excel column names when passed as a string.

Closes #18273.
Closes #20480.

With regards to the latter issue, I believe this PR is a cleaner implementation that integrates the new usecols functionality without having to introduce a new parameter.

pep8speaks · 2018-11-07T09:17:48Z

Hello @gfyoung! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/io/excel.py !
There are no PEP8 issues in the file pandas/tests/io/test_excel.py !

jreback · 2018-11-07T10:30:51Z

wow nice cleanup!

linting

check import format using isort
ERROR: /home/travis/build/pandas-dev/pandas/pandas/io/excel.py Imports are incorrectly sorted.

codecov · 2018-11-07T11:25:06Z

Codecov Report

Merging #23544 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #23544      +/-   ##
==========================================
- Coverage   92.24%   92.23%   -0.01%     
==========================================
  Files         161      161              
  Lines       51278    51278              
==========================================
- Hits        47299    47298       -1     
- Misses       3979     3980       +1

Flag	Coverage Δ
#multiple	`90.62% <ø> (-0.01%)`	⬇️
#single	`42.28% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/io/parsers.py	`95.55% <0%> (-0.07%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cea659...a36aa76. Read the comment docs.

gfyoung · 2018-11-07T21:23:05Z

@jreback : Fixed the import error, so ready for review.

WillAyd

Nice!

pandas/io/excel.py

jreback

lgtm. minor comments.

doc/source/io.rst

pandas/io/excel.py

pandas/tests/io/test_excel.py

gfyoung · 2018-11-08T21:09:47Z

@jreback : Addressed comments, and all is still green. PTAL.

doc/source/io.rst

pandas/io/excel.py

gfyoung · 2018-11-10T23:41:03Z

@jreback : Need to rebase this onto master because there are several places where tm.assert_raises_regex is used instead of pyest.raises.

gfyoung · 2018-11-11T01:36:24Z

@jreback : Comments addressed, PR rebased, and all is green. PTAL.

The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.

* upstream/master: BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524) BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544) API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561) CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583) DOC: Fix Order of parameters in docstrings (pandas-dev#23611) TST: Unskip some Categorical Tests (pandas-dev#23613) TST: Fix integer ops comparison test (pandas-dev#23619) DOC: Fixes to docstring to add validation to CI (pandas-dev#23560) DOC: Remove incorrect periods at the end of parameter types (pandas-dev#23600) MAINT: tm.assert_raises_regex --> pytest.raises (pandas-dev#23592) DOC: Updating Series.resample and DataFrame.resample docstrings (pandas-dev#23197)

…fixed * upstream/master: DOC: Enhancing pivot / reshape docs (pandas-dev#21038) TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620) BUILD: Simplifying contributor dependencies (pandas-dev#23522) BUG/REF: TimedeltaIndex.__new__ (pandas-dev#23539) BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524) BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544) API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561) CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583) DOC: Fix Order of parameters in docstrings (pandas-dev#23611) TST: Unskip some Categorical Tests (pandas-dev#23613) TST: Fix integer ops comparison test (pandas-dev#23619)

Follow-up to pandas-devgh-23544.

Follow-up to gh-23544.

The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.

Follow-up to pandas-devgh-23544.

The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.

Follow-up to pandas-devgh-23544.

The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.

Follow-up to pandas-devgh-23544.

The idea is that we read the Excel file, get the data, and then let the TextParser handle the reading and parsing. We shouldn't be doing a lot of work that is already defined in parsers.py In doing so, we identified several bugs: * index_col=None was not being respected * usecols behavior was inconsistent with that of read_csv for list of strings and callable inputs * usecols was not being validated as proper Excel column names when passed as a string. Closes pandas-devgh-18273. Closes pandas-devgh-20480.

Follow-up to pandas-devgh-23544.

gfyoung added Bug Enhancement API Design IO Excel read_excel, to_excel labels Nov 7, 2018

gfyoung force-pushed the excel-csv-standardize branch from 9be00cb to 56244c8 Compare November 7, 2018 09:20

gfyoung force-pushed the excel-csv-standardize branch from 56244c8 to bb8825f Compare November 7, 2018 10:35

gfyoung force-pushed the excel-csv-standardize branch from bb8825f to fc27613 Compare November 8, 2018 01:37

WillAyd reviewed Nov 8, 2018

View reviewed changes

pandas/io/excel.py Show resolved Hide resolved

jreback requested changes Nov 8, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Nov 8, 2018

jreback mentioned this pull request Nov 8, 2018

BUG: read_excel return empty dataframe when using usecols #20480

Closed

4 tasks

gfyoung force-pushed the excel-csv-standardize branch 2 times, most recently from ddbc258 to 487a336 Compare November 8, 2018 18:44

jreback requested changes Nov 10, 2018

View reviewed changes

doc/source/io.rst Show resolved Hide resolved

pandas/io/excel.py Show resolved Hide resolved

gfyoung force-pushed the excel-csv-standardize branch 2 times, most recently from 113bc37 to 0e12260 Compare November 10, 2018 23:40

gfyoung force-pushed the excel-csv-standardize branch from 0e12260 to a36aa76 Compare November 11, 2018 10:39

jreback approved these changes Nov 11, 2018

View reviewed changes

jreback merged commit da23030 into pandas-dev:master Nov 11, 2018

gfyoung deleted the excel-csv-standardize branch November 11, 2018 22:36

gfyoung added a commit to forking-repos/pandas that referenced this pull request Nov 12, 2018

DEPR: Deprecate usecols as int in read_excel

a64e6d5

Follow-up to pandas-devgh-23544.

gfyoung added a commit to forking-repos/pandas that referenced this pull request Nov 12, 2018

DEPR: Deprecate usecols as int in read_excel

b8e98b6

Follow-up to pandas-devgh-23544.

gfyoung mentioned this pull request Nov 12, 2018

DEPR: Deprecate usecols as int in read_excel #23635

Merged

jreback pushed a commit that referenced this pull request Nov 12, 2018

DEPR: Deprecate usecols as int in read_excel (#23635)

0bc4580

Follow-up to gh-23544.

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

DEPR: Deprecate usecols as int in read_excel (pandas-dev#23635)

7fc3732

Follow-up to pandas-devgh-23544.

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

DEPR: Deprecate usecols as int in read_excel (pandas-dev#23635)

4e044e2

Follow-up to pandas-devgh-23544.

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DEPR: Deprecate usecols as int in read_excel (pandas-dev#23635)

7c8bf8d

Follow-up to pandas-devgh-23544.

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DEPR: Deprecate usecols as int in read_excel (pandas-dev#23635)

86f8d16

Follow-up to pandas-devgh-23544.

WillAyd mentioned this pull request Jun 4, 2019

Remove SharedItems from test_excel #26579

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Delegate more of Excel parsing to CSV #23544

BUG: Delegate more of Excel parsing to CSV #23544

gfyoung commented Nov 7, 2018

pep8speaks commented Nov 7, 2018

jreback commented Nov 7, 2018

codecov bot commented Nov 7, 2018 •

edited

Loading

gfyoung commented Nov 7, 2018

WillAyd left a comment

jreback left a comment

gfyoung commented Nov 8, 2018

gfyoung commented Nov 10, 2018

gfyoung commented Nov 11, 2018

BUG: Delegate more of Excel parsing to CSV #23544

BUG: Delegate more of Excel parsing to CSV #23544

Conversation

gfyoung commented Nov 7, 2018

pep8speaks commented Nov 7, 2018

jreback commented Nov 7, 2018

codecov bot commented Nov 7, 2018 • edited Loading

Codecov Report

gfyoung commented Nov 7, 2018

WillAyd left a comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

gfyoung commented Nov 8, 2018

gfyoung commented Nov 10, 2018

gfyoung commented Nov 11, 2018

codecov bot commented Nov 7, 2018 •

edited

Loading