ENH: inconsistent naming convention for read_excel column selection (#4988) #16488

abarber4gh · 2017-05-24T21:58:23Z

update API to use ‘usecols’ instead of ‘parse_cols’. still functionally the same as ‘parse_col’,
added test cases for ‘usecols’, added assert_produces_warning(FutureWarning) to other test
cases that use ‘parse_cols’.
refactor column use column parsing to only occur once per sheet.
updated whats new with deprecated parse_col argument.
closes ENH: inconsistent naming convention for read_csv and read_excel column selection #4988
tests added / passed
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
whatsnew entry
packers.* BENCHMARKS NOT SIGNIFICANTLY CHANGED.

chris-b1 · 2017-05-24T22:47:08Z

FYI, just discussing a bug related to usecols #15316, might be easiest to work that change into your refactoring, or can be separate, cc @stanleyguan

jreback · 2017-05-24T22:49:35Z

doc/source/whatsnew/v0.21.0.txt

@@ -35,6 +35,7 @@ Other Enhancements
 - ``RangeIndex.append`` now returns a ``RangeIndex`` object when possible (:issue:`16212`)
 - :func:`to_pickle` has gained a protocol parameter (:issue:`16252`). By default, this parameter is set to `HIGHEST_PROTOCOL <https://docs.python.org/3/library/pickle.html#data-stream-format>`__
 - :func:`api.types.infer_dtype` now infers decimals. (:issue: `15690`)
+- :func:`read_excel` now allows a column character list (E.G. ['A', 'C', 'D']) with the ``usecols`` parameter (:issue:`4988`).


we are listed this in deprecations so this is unecessary

jreback · 2017-05-24T22:49:59Z

doc/source/whatsnew/v0.21.0.txt

@@ -60,6 +61,7 @@ Other API Changes
 Deprecations
 ~~~~~~~~~~~~
 - :func:`read_excel()` has deprecated ``sheetname`` in favor of ``sheet_name`` for consistency with to_excel() (:issue:`10559`).
+- :func:`read_excel()` has deprecated ``parse_cols`` in favor of ``usecols`` for consistency with other read_ functions (:issue:`4988`).


pd.read_* functions

jreback · 2017-05-24T22:51:14Z

pandas/io/excel.py

    * If string then indicates comma separated list of column names and
      column ranges (e.g. "A:E" or "A,C,E:F")
+parse_cols : int or list, default None
+    .. deprecated:: 0.21.0


hmm, didn't know sphinx had this deprecated tag. @TomAugspurger @jorisvandenbossche should maybe change other deprecations to use this if it looks nice.

appears as:

parse_cols : int or list, default None
Deprecated since version 0.21.0: Use usecols instead

http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.read_excel.html Has an example from sheetname. Looks ok

@TomAugspurger if we like this (and lgtm), let's make an issue to updated all DEPRECATED in codebase with this directive

jreback · 2017-05-24T22:52:25Z

pandas/tests/io/test_excel.py

+        dfref = self.get_csv_refdf('test1')
+        dfref = dfref.reindex(columns=['A', 'B', 'C'])
+        df1 = self.get_exceldf('test1', 'Sheet1', index_col=0, usecols=3)
+        df2 = self.get_exceldf('test1', 'Sheet2', skiprows=[1], index_col=0,


so you need to change ALL tests to use the new one (usecols), except for a single test to actually hit the deprecation.

jreback · 2017-05-24T22:52:47Z

pandas/tests/io/test_excel.py

+
+        tm.assert_frame_equal(df1, dfref, check_names=False)
+        tm.assert_frame_equal(df2, dfref, check_names=False)  # backward compat
+


are these new tests?

yes, new tests for the new functionality.

codecov · 2017-05-24T23:31:56Z

Codecov Report

❗ No coverage uploaded for pull request base (master@b0a51df). Click here to learn what that means.
The diff coverage is 38.88%.

@@            Coverage Diff            @@
##             master   #16488   +/-   ##
=========================================
  Coverage          ?    90.4%           
=========================================
  Files             ?      161           
  Lines             ?    51038           
  Branches          ?        0           
=========================================
  Hits              ?    46139           
  Misses            ?     4899           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`88.24% <38.88%> (?)`
#single	`40.16% <11.11%> (?)`

Impacted Files	Coverage Δ
pandas/io/excel.py	`62.29% <38.88%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0a51df...5b3693d. Read the comment docs.

codecov · 2017-05-24T23:31:59Z

Codecov Report

Merging #16488 into master will decrease coverage by 0.36%.
The diff coverage is 45.45%.

@@            Coverage Diff             @@
##           master   #16488      +/-   ##
==========================================
- Coverage   90.79%   90.43%   -0.37%     
==========================================
  Files         161      161              
  Lines       51063    51046      -17     
==========================================
- Hits        46363    46162     -201     
- Misses       4700     4884     +184

Flag	Coverage Δ
#multiple	`88.27% <45.45%> (-0.37%)`	⬇️
#single	`40.16% <18.18%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/io/excel.py	`62.37% <45.45%> (-18.27%)`	⬇️
pandas/io/formats/excel.py	`74.24% <0%> (-22.41%)`	⬇️
pandas/conftest.py	`95.83% <0%> (-0.6%)`	⬇️
pandas/io/parsers.py	`95.33% <0%> (-0.33%)`	⬇️
pandas/util/testing.py	`80.79% <0%> (-0.2%)`	⬇️
pandas/core/series.py	`94.71% <0%> (-0.19%)`	⬇️
pandas/core/generic.py	`92.16% <0%> (-0.1%)`	⬇️
pandas/core/resample.py	`96.08% <0%> (-0.02%)`	⬇️
pandas/core/reshape/pivot.py	`95.08% <0%> (ø)`	⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0d9ee0...03593a7. Read the comment docs.

jreback · 2017-05-25T21:04:48Z

pandas/io/excel.py

    * If string then indicates comma separated list of column names and
      column ranges (e.g. "A:E" or "A,C,E:F")
+parse_cols : int or list, default None
+    .. deprecated:: 0.21.0


@TomAugspurger if we like this (and lgtm), let's make an issue to updated all DEPRECATED in codebase with this directive

jreback · 2017-05-25T21:05:37Z

pandas/io/excel.py

@@ -312,11 +327,6 @@ def _range2cols(areas):
            >>> _range2cols('A,C,Z:AB')
            [0, 2, 25, 26, 27]
            """
-            def _excel2num(x):
-                "Convert Excel column name like 'AB' to 0-based column index"
-                return reduce(lambda s, a: s * 26 + ord(a) - ord('A') + 1,


is there a reason you are changing this code? does this involve the deprecation somehow? if you are cleaning/fixing, pls do in another PR.

code changed to allow passing list of strings, similar to other read_* functions, and keep the read_excel "superpower" to pass a string that is parsed as mentioned in the original issue (#4988, point 3).

and that's fine, but needs to be in a separate PR from the deprecation change.

ok, opened #16510 for this functionality. will re-submit with only the argument change (parse_cols -> usecols).

jreback · 2017-05-25T21:06:33Z

pandas/tests/io/test_excel.py


-    def test_parse_cols_str(self):


leave the original tests structure (sure you can change the name to conform), but don't change the tests (in THIS PR).

tests are back to original but with changed function & kwarg names.

TomAugspurger · 2017-05-26T14:27:49Z

@abarber4gh can you rebase, now that the excel tests are running?

- removed usecols mention in Other Enhancments section, remains in Deprecations. - removed test_parse_* test methods in favor of test_usecols_* methods. - changed parse_cols to usecols in test_read_one_empty_col_* instead of catching warning.

…n selection (#4988) update API to use ‘usecols’ instead of ‘parse_cols’. still functionally the same as ‘parse_col’, added test cases for ‘usecols’, added assert_produces_warning(FutureWarning) to other test cases that use ‘parse_cols’. refactor column use column parsing to only occur once per sheet. updated whats new with deprecated parse_col argument and other enhancements to usecols functionality. update documentation to show new usecols functionality in read_excel().

add `check_stacklevel=False` to `test_excel_oldindex_format()`

codecov · 2017-05-26T18:35:52Z

Codecov Report

Merging #16488 into master will increase coverage by 0.14%.
The diff coverage is 98.4%.

@@            Coverage Diff             @@
##           master   #16488      +/-   ##
==========================================
+ Coverage   90.79%   90.93%   +0.14%     
==========================================
  Files         161      161              
  Lines       51063    49267    -1796     
==========================================
- Hits        46363    44802    -1561     
+ Misses       4700     4465     -235

Flag	Coverage Δ
#multiple	`88.69% <96%> (+0.06%)`	⬆️
#single	`40.22% <27.2%> (+0.07%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/reshape/pivot.py	`95.08% <ø> (ø)`	⬆️
pandas/core/reshape/reshape.py	`99.28% <ø> (ø)`	⬆️
pandas/core/reshape/concat.py	`97.62% <ø> (ø)`	⬆️
pandas/core/internals.py	`93.43% <ø> (-0.01%)`	⬇️
pandas/core/indexes/period.py	`92.74% <ø> (ø)`	⬆️
pandas/tseries/offsets.py	`97.12% <ø> (-0.01%)`	⬇️
pandas/io/common.py	`69.91% <ø> (ø)`	⬆️
pandas/util/testing.py	`100% <ø> (+19.01%)`	⬆️
pandas/core/dtypes/cast.py	`86.89% <0%> (ø)`	⬆️
pandas/compat/pickle_compat.py	`69.51% <100%> (ø)`	⬆️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0d9ee0...a525222. Read the comment docs.

closes #16499

…6518)

Replaces most uses of implicit global state from matplotlib in test_datetimelike.py. This was potentially causing random failures where a figure expected to be on a new, blank figure would instead plot on an existing axes (that's the guess at least).

* Adding some more documentation on dataframe with regards to dtype * Making example for creating dataframe from np matrix easier

…xes (#16486)

…6317)

…16549) closes #16361

* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation

* BUG: Handle numpy strings in index names in HDF5 #13492 * REF: refactor to _ensure_str

…ches (#16460) * gh-14671 Check if usecols with type string contains a subset of names, if not throws an error * tests added for gh-14671, expected behavior of simultaneous use of usecols and names unclear so these tests are commented out * Review comments

closes #16608

…t for it (#16633)

fix #16524

…t for it (#16633) (#16650)

Splits extra information about the license and copyright holders to AUTHORS.md.

* COMPAT: numpy 1.13 test compat * CI: fix doc build to 1.12

- removed usecols mention in Other Enhancments section, remains in Deprecations. - removed test_parse_* test methods in favor of test_usecols_* methods. - changed parse_cols to usecols in test_read_one_empty_col_* instead of catching warning.

…n selection (#4988) update API to use ‘usecols’ instead of ‘parse_cols’. still functionally the same as ‘parse_col’, added test cases for ‘usecols’, added assert_produces_warning(FutureWarning) to other test cases that use ‘parse_cols’. refactor column use column parsing to only occur once per sheet. updated whats new with deprecated parse_col argument and other enhancements to usecols functionality. update documentation to show new usecols functionality in read_excel().

add `check_stacklevel=False` to `test_excel_oldindex_format()`

…o issue#4988 * 'issue#4988' of https://github.com/abarber4gh/pandas: add `deprecate_kwarg` from `_decorators` add `check_stacklevel=False` to `test_excel_oldindex_format()` removed excess blank line. change parse_cols to usecols change tests keyword from parse_cols to usecol. no message ENH: inconsistent naming convention for read_csv and read_excel column selection (#4988) implement changes request in PR#16488 - removed usecols mention in Other Enhancments section, remains in Deprecations. - removed test_parse_* test methods in favor of test_usecols_* methods. - changed parse_cols to usecols in test_read_one_empty_col_* instead of catching warning. rebased

jreback · 2017-07-19T10:36:09Z

can you rebase / update according to comments

jreback · 2017-09-10T14:49:24Z

closing as stale. pls ping to reopen if you want to continue.

jreback requested changes May 24, 2017

View reviewed changes

jreback added Deprecate Functionality to remove in pandas IO Excel read_excel, to_excel labels May 24, 2017

jsexauer mentioned this pull request May 24, 2017

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

jreback requested changes May 25, 2017

View reviewed changes

abarber4gh mentioned this pull request May 26, 2017

Implement usecols functionality for read_excel #16510

Closed

abarber4gh added 7 commits May 26, 2017 08:12

implement changes request in PR#16488

52f2c11

- removed usecols mention in Other Enhancments section, remains in Deprecations. - removed test_parse_* test methods in favor of test_usecols_* methods. - changed parse_cols to usecols in test_read_one_empty_col_* instead of catching warning.

no message

a4341de

change tests keyword from parse_cols to usecol.

e985488

change parse_cols to usecols

d58669c

removed excess blank line.

058177b

add deprecate_kwarg from _decorators

03593a7

add `check_stacklevel=False` to `test_excel_oldindex_format()`

abarber4gh and others added 12 commits May 26, 2017 15:11

TST: ujson tests are not being run (#16499) (#16500)

6649157

closes #16499

DOC: Remove preference for pytest paradigm in assert_raises_regex (#1…

ef487d9

…6518)

TST: Specify HTML file encoding on PY3 (#16526)

e60dc4c

BUG: Fixed tput output on windows (#16496)

7efc4e8

BUG: Incorrect handling of rolling.cov with offset window (#16244)

4ca29f4

DOC: Update to docstring of DataFrame(dtype) (#14764) (#16487)

fbdae2d

* Adding some more documentation on dataframe with regards to dtype * Making example for creating dataframe from np matrix easier

DOC: correct docstring examples (#3439) (#16432)

d4f80b0

Fix unbound local with bad engine (#16511)

9b0ea41

return empty MultiIndex for symmetrical difference on equal MultiInde…

d31ffdb

…xes (#16486)

BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (#1…

03d44f3

…6317)

BUG: Bug in .resample() and .groupby() when aggregating on integers (#…

e437ad5

…16549) closes #16361

TomAugspurger and others added 26 commits June 4, 2017 05:39

PERF: vectorize _interp_limit (#16592)

473615e

* PERF: vectorize _interp_limit * CLN: remove old implementation * fixup! CLN: remove old implementation

DOC: Fix typo in merge doc for validate kwarg (#16595)

ce3b0c3

BUG: convert numpy strings in index names in HDF #13492 (#16444)

18c316b

* BUG: Handle numpy strings in index names in HDF5 #13492 * REF: refactor to _ensure_str

DOC: Whatsnew fixups (#16596)

91057f3

DOC: Update release.rst

bf99975

BUG: pickle compat with UTC tz's (#16611)

697d026

closes #16608

Fix some lgtm alerts (#16613)

10c17d4

BLD: fix numpy on 3.6 build as 1.13 was released but no deps are buil…

dfebd8a

…t for it (#16633)

BUG: Fix Series.get failure on missing NaN (#8569) (#16619)

2b44868

TST: NaN in MultiIndex should not become a string (#7031) (#16625)

722b386

TST: verify we can add and subtract from indices (#8142) (#16629)

73930c5

BUG: conversion of Series to Categorical (#16557)

9fdea65

fix #16524

BLD: fix numpy on 2.7 build as 1.13 was released but no deps are buil…

789f7bb

…t for it (#16633) (#16650)

CLN: make license file machine readable (#16649)

5aba665

Splits extra information about the license and copyright holders to AUTHORS.md.

fix pytest-xidst version as 1.17 appears buggy (#16652)

ec6bf6d

COMPAT: numpy 1.13 test compat (#16654)

dc716b0

* COMPAT: numpy 1.13 test compat * CI: fix doc build to 1.12

implement changes request in PR#16488

d6c3189

- removed usecols mention in Other Enhancments section, remains in Deprecations. - removed test_parse_* test methods in favor of test_usecols_* methods. - changed parse_cols to usecols in test_read_one_empty_col_* instead of catching warning.

no message

8025c0c

change tests keyword from parse_cols to usecol.

f07a002

change parse_cols to usecols

440e6a6

removed excess blank line.

f299ea2

add deprecate_kwarg from _decorators

5948c01

add `check_stacklevel=False` to `test_excel_oldindex_format()`

rebase with #16522 changes.

a525222

jreback closed this Sep 10, 2017

jreback mentioned this pull request May 25, 2019

DEPR: deprecations log for removed issues #13777

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: inconsistent naming convention for read_excel column selection (#4988) #16488

ENH: inconsistent naming convention for read_excel column selection (#4988) #16488

abarber4gh commented May 24, 2017 •

edited

Loading

chris-b1 commented May 24, 2017

jreback May 24, 2017

jreback May 24, 2017

jreback May 24, 2017

abarber4gh May 24, 2017

TomAugspurger May 24, 2017

jreback May 25, 2017

jreback May 24, 2017

jreback May 24, 2017

abarber4gh May 25, 2017

codecov bot commented May 24, 2017

codecov bot commented May 24, 2017 •

edited

Loading

jreback May 25, 2017

jreback May 25, 2017

abarber4gh May 25, 2017

jreback May 25, 2017

abarber4gh May 25, 2017

jreback May 25, 2017

abarber4gh Jun 10, 2017

TomAugspurger commented May 26, 2017

codecov bot commented May 26, 2017 •

edited

Loading

jreback commented Jul 19, 2017

jreback commented Sep 10, 2017


		tm.assert_frame_equal(df1, dfref, check_names=False)
		tm.assert_frame_equal(df2, dfref, check_names=False) # backward compat

ENH: inconsistent naming convention for read_excel column selection (#4988) #16488

ENH: inconsistent naming convention for read_excel column selection (#4988) #16488

Conversation

abarber4gh commented May 24, 2017 • edited Loading

chris-b1 commented May 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 24, 2017

Codecov Report

codecov bot commented May 24, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented May 26, 2017

codecov bot commented May 26, 2017 • edited Loading

Codecov Report

jreback commented Jul 19, 2017

jreback commented Sep 10, 2017

abarber4gh commented May 24, 2017 •

edited

Loading

codecov bot commented May 24, 2017 •

edited

Loading

codecov bot commented May 26, 2017 •

edited

Loading