Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate convert parameter in take #17352

Merged
merged 1 commit into from Oct 1, 2017

Conversation

@gfyoung
Copy link
Member

commented Aug 27, 2017

xref #16948.

The parameter is not respected, nor is it a parameter in many 'take' implementations.

cc @jorisvandenbossche

@gfyoung gfyoung added the Deprecate label Aug 27, 2017

@gfyoung gfyoung force-pushed the forking-repos:take-convert-depr branch 2 times, most recently from 9e0bcad to 4471360 Aug 28, 2017

@codecov

This comment has been minimized.

Copy link

commented Aug 28, 2017

Codecov Report

Merging #17352 into master will decrease coverage by 0.01%.
The diff coverage is 86.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17352      +/-   ##
==========================================
- Coverage   91.03%   91.02%   -0.02%     
==========================================
  Files         162      162              
  Lines       49567    49577      +10     
==========================================
+ Hits        45125    45126       +1     
- Misses       4442     4451       +9
Flag Coverage Δ
#multiple 88.8% <86.66%> (ø) ⬆️
#single 40.23% <23.33%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby.py 92.21% <100%> (ø) ⬆️
pandas/core/sparse/series.py 95.13% <100%> (+0.04%) ⬆️
pandas/core/series.py 95.01% <100%> (ø) ⬆️
pandas/core/indexing.py 93.94% <75%> (ø) ⬆️
pandas/core/frame.py 97.72% <75%> (-0.1%) ⬇️
pandas/core/generic.py 92% <87.5%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 473a7f3...4471360. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Aug 28, 2017

Codecov Report

Merging #17352 into master will decrease coverage by 0.01%.
The diff coverage is 92.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17352      +/-   ##
==========================================
- Coverage   91.27%   91.25%   -0.02%     
==========================================
  Files         163      163              
  Lines       49766    49785      +19     
==========================================
+ Hits        45422    45433      +11     
- Misses       4344     4352       +8
Flag Coverage Δ
#multiple 89.05% <92.45%> (ø) ⬆️
#single 40.34% <60.37%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/core/sparse/series.py 95.15% <100%> (+0.05%) ⬆️
pandas/core/series.py 95.04% <100%> (+0.1%) ⬆️
pandas/core/groupby.py 92.23% <100%> (ø) ⬆️
pandas/core/frame.py 97.73% <75%> (-0.1%) ⬇️
pandas/core/indexing.py 93.94% <75%> (ø) ⬆️
pandas/core/generic.py 92.12% <96%> (+0.04%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 030e374...9325f21. Read the comment docs.

@gfyoung gfyoung force-pushed the forking-repos:take-convert-depr branch from 4471360 to 3a1de27 Aug 28, 2017

return self.data.take(self.sort_idx, axis=self.axis, convert=False)
with warnings.catch_warnings():
warnings.simplefilter("ignore")
return self.data.take(self.sort_idx, axis=self.axis, convert=False)

This comment has been minimized.

Copy link
@gfyoung

gfyoung Aug 28, 2017

Author Member

@jreback @jorisvandenbossche : This is a tricky one for me here. On the one hand, I want to remove convert=False, but because axis of the groupby object might be invalid when calling Series.take, this is not possible for me to do (when True, it re-indexes the value according to the length of the slice along the index).

On the other hand, this catch_warnings(), while an easy way to get things to pass for now, is not a solution I want to merge. Suggestions on how to handle this problem (or is it best that we just expose convert for all relevant implementations, though I'm not a fan).

This comment has been minimized.

Copy link
@gfyoung

gfyoung Aug 28, 2017

Author Member

Actually, might have found the answer myself 😄

@gfyoung gfyoung force-pushed the forking-repos:take-convert-depr branch 2 times, most recently from 78f0c4e to bfa4bf6 Aug 28, 2017

@@ -3354,7 +3354,7 @@ def dropna(self, axis=0, how='any', thresh=None, subset=None,
else:
raise TypeError('must specify how or thresh')

result = self.take(mask.nonzero()[0], axis=axis, convert=False)

This comment has been minimized.

Copy link
@chris-b1

chris-b1 Aug 28, 2017

Contributor

I don't think these internal uses should be removed. We're guaranteed the indexer is inbounds, so it's an unnecessary performance hit to check it.

This comment has been minimized.

Copy link
@gfyoung

gfyoung Aug 28, 2017

Author Member

What do you mean by this? DataFrame.take doesn't even respect the convert parameter. Thus, this will have no effect on anything. 😄

This comment has been minimized.

Copy link
@chris-b1

chris-b1 Aug 28, 2017

Contributor

Ok, this example doesn't matter b/c it's on a frame, but does with the usages of _data.take

This comment has been minimized.

Copy link
@chris-b1

chris-b1 Aug 28, 2017

Contributor

And Series.take

This comment has been minimized.

Copy link
@gfyoung

gfyoung Aug 28, 2017

Author Member

I don't think there's a bounds check anymore in Series.take or _data.take because I removed all of them for that same reason. We have so many bounds checks all over the place that removing just one of them is okay now.

if kwargs:
nv.validate_take(tuple(), kwargs)

# check/convert indicies here
if convert:
indices = maybe_convert_indices(indices, len(self._get_axis(axis)))

This comment has been minimized.

Copy link
@chris-b1

chris-b1 Aug 28, 2017

Contributor

Not sure this is safe to remove - with non numpy types, we call to an internal take impl that doesn't bounds check. Can you try pd.Series([1,2,3], dtype='category').take([-1, -2]) on your branch?

This comment has been minimized.

Copy link
@gfyoung

gfyoung Aug 28, 2017

Author Member

Okay, so that breaks:

2    NaN
1    1.0
dtype: category
Categories (3, int64): [1, 2, 3]

That being said, that call works without dtype='category', meaning we do handle negative indices correctly in some cases already. I might as well add that handling for category too.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2017

So I originally added this because I wanted the ability to a) convert neg to positive, and b) bounds check; these are not the default because of perf (IOW internally we often know that we don't need either of these things, but from a public perspetive you do need this).

These shouldn't have been on the public NDFrame though (so good to remove). So the question is, do we need a ._take() to do these things? (and .take() would just be a wrapper)?

@jsexauer jsexauer referenced this pull request Aug 29, 2017

Open

DEPR: deprecations from prior versions #6581

0 of 89 tasks complete
@chris-b1

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2017

I'd be in favor of a _take with options to be unsafe. It's not like boundschecking is that expensive, but it's such a core operation I hate to give up any performance.

@gfyoung

This comment has been minimized.

Copy link
Member Author

commented Aug 29, 2017

I suppose that in the name of performance, that's fair. I think I would just need to implement this solely for DataFrame and Series right?

@chris-b1

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2017

Yes, that's correct. BlockManager and SingleBlockManager will need to keep their take implementation, but I would keep it unprefixed since they are already internal.

@gfyoung

This comment has been minimized.

Copy link
Member Author

commented Aug 29, 2017

@chris-b1 : Sorry, somehow your answer only confused me more. Yes or no, do I need to implement _take for just Series and DataFrame ?

@chris-b1

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2017

@gfyoung

This comment has been minimized.

Copy link
Member Author

commented Aug 30, 2017

@chris-b1 : Alright, here goes. Let's see what CI has to say about my implementation.

@gfyoung

This comment has been minimized.

Copy link
Member Author

commented Aug 30, 2017

@chris-b1 @jreback : Successfully implemented _take with all green. PTAL.

@@ -2033,7 +2033,7 @@ def _ixs(self, i, axis=0):
return self.loc[:, lab_slice]
else:
if isinstance(label, Index):
return self.take(i, axis=1, convert=True)
return self.take(i, axis=1)

This comment has been minimized.

Copy link
@jreback

jreback Aug 30, 2017

Contributor

every reference to an internal take should be to ._take (this is pretty much all of them)

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Aug 30, 2017

Member

Is this really needed?
Can't we keep take and only use _take where it is needed? IMO it just makes it more complex to understand the code.

This comment has been minimized.

Copy link
@gfyoung

gfyoung Sep 2, 2017

Author Member

Agree with @jorisvandenbossche . I'm not sure I really understand what you mean by "reference to an internal take" since the original references are to a public function.

This comment has been minimized.

Copy link
@jreback

jreback Sep 2, 2017

Contributor

you have now 2 functions to do the same thing , a public and private version

it is much more clear to simply use _take internally in all cases

the public function is just an interface

otherwise a future reader will not know which to use - recipe for confusion

pls change all uses to _take

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Sep 2, 2017

Member

There will be confusion anyhow, because then people will wonder what the difference is with pubkic take, and in majority of the cases, there is no difference. IMO that is more clear by using the public function.

This comment has been minimized.

Copy link
@gfyoung

gfyoung Sep 3, 2017

Author Member

There is AFAIK only one class where negative indexing isn't handled properly, and that is Categorical. Makes we wonder whether we should revisit the option of adding supporting given this disagreement about design though (@chris-b1 thoughts?)

This comment has been minimized.

Copy link
@gfyoung

gfyoung Sep 3, 2017

Author Member

Let me boil it down to this (in reference to my previous comment): besides potential test failures, is there reason why we call take_1d in pandas.core.categorical instead of just calling self._codes.take(...) ?

This comment has been minimized.

Copy link
@jreback

jreback Sep 3, 2017

Contributor

pls change all references to _take

.take is ONLY a wrapper function and should not be used anywhere in the code base
this eliminates any possibility of confusion and is inline with the existing style
mixing usages is cause for trouble

your other references are simply not relevant here and are again purely internal uses (take_1d)
if u want to bring them up in another issue by all means

let's keep the scope to changing refs to the NDFrame .take changes

This comment has been minimized.

Copy link
@gfyoung

gfyoung Sep 5, 2017

Author Member

If there were multiple places where I had to use ._take(), then I think I would be more inclined to agree with you @jreback . However, at this point, there is only one place where this had to happen, and if necessary, I can easily "clear up" any confusion by adding a comment.

IMO one usage isn't convincing enough for me to convert all of other invocations (both in NDFrame and by any other pandas class to use ._take()). I think I still have to side with @jorisvandenbossche on this one.

This comment has been minimized.

Copy link
@jreback

jreback Sep 5, 2017

Contributor

@gfyoung if this were a new public function, and you wrote a private impl _take as well. Then you certainly would use the private impl everywhere. I don't see a difference here. Pls make the change.

@@ -2058,6 +2058,25 @@ def __delitem__(self, key):
except KeyError:
pass

def _take(self, indices, axis=0, convert=True, is_copy=False):
"""
The internal version of `self.take` that will house the `convert`

This comment has been minimized.

Copy link
@jreback

jreback Aug 30, 2017

Contributor

more descriptive doc-string with parameters and such

This comment has been minimized.

Copy link
@gfyoung

gfyoung Sep 2, 2017

Author Member

Yep, done.

result._set_is_copy(self)

return result

def take(self, indices, axis=0, convert=True, is_copy=True, **kwargs):

This comment has been minimized.

Copy link
@jreback

jreback Aug 30, 2017

Contributor

also deprecate is_copy; that is another internal parameter; its only necessary for _take

This comment has been minimized.

Copy link
@gfyoung

gfyoung Sep 2, 2017

Author Member

That's a fair point. If you don't mind, I'll do this in a subsequent PR once the infrastructure for _take has been merged in.

See also
--------
numpy.ndarray.take

This comment has been minimized.

Copy link
@jreback

jreback Aug 30, 2017

Contributor

why do we have this? now that it is generic is shouldn't be needed

This comment has been minimized.

Copy link
@gfyoung

gfyoung Aug 30, 2017

Author Member

Have what exactly? You commented on a not-so clear part of the diff.

@@ -604,14 +604,38 @@ def sparse_reindex(self, new_index):

def take(self, indices, axis=0, convert=True, *args, **kwargs):
"""
Sparse-compatible version of ndarray.take

This comment has been minimized.

Copy link
@jreback

jreback Aug 30, 2017

Contributor

use a shared doc string for all public .take()

This comment has been minimized.

Copy link
@gfyoung

gfyoung Sep 2, 2017

Author Member

Done.


self._consolidate_inplace()
new_data = self._data.take(indices,
axis=self._get_block_manager_axis(axis),

This comment has been minimized.

Copy link
@chris-b1

chris-b1 Aug 30, 2017

Contributor

pass through convert

This comment has been minimized.

Copy link
@gfyoung

gfyoung Sep 2, 2017

Author Member

Done.

@gfyoung gfyoung force-pushed the forking-repos:take-convert-depr branch 2 times, most recently from e0b209b to 35812d0 Sep 23, 2017

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2017

so this still needs to change virtually all uses of .take() -> ._take(). We want to limit the exposure to user-facing .take() in any code that is not explicitly user facing.

@gfyoung gfyoung force-pushed the forking-repos:take-convert-depr branch from 47e1879 to f30634f Sep 23, 2017

if convert:
indices = maybe_convert_indices(indices, len(self._get_axis(axis)))

indices = _ensure_platform_int(indices)
new_index = self.index.take(indices)
new_values = self._values.take(indices)
return (self._constructor(new_values, index=new_index, fastpath=True)
.__finalize__(self))

This comment has been minimized.

Copy link
@jreback

jreback Sep 23, 2017

Contributor

any reason you can't just call the super method here?

This comment has been minimized.

Copy link
@gfyoung

gfyoung Sep 23, 2017

Author Member

For ._constructor ? I was trying to preserve existing (working) code as much as possible when making these changes.

This comment has been minimized.

Copy link
@jreback

jreback Sep 23, 2017

Contributor

no the entire routine
give it s try

This comment has been minimized.

Copy link
@gfyoung

gfyoung Sep 24, 2017

Author Member

https://travis-ci.org/pandas-dev/pandas/builds/279118227

I gave it a try, and there were a bunch of failures.

This comment has been minimized.

Copy link
@jreback

jreback Sep 27, 2017

Contributor

ok, not really sure why it shoulnt' work. its almost the same code

@gfyoung gfyoung force-pushed the forking-repos:take-convert-depr branch from f30634f to e9c9bd9 Sep 23, 2017

@gfyoung

This comment has been minimized.

Copy link
Member Author

commented Sep 23, 2017

https://travis-ci.org/pandas-dev/pandas/jobs/279023089

@jreback : Not sure why doing this conversion to _.take() is causing test failures.

@gfyoung gfyoung force-pushed the forking-repos:take-convert-depr branch from e9c9bd9 to 41504af Sep 23, 2017

@gfyoung gfyoung force-pushed the forking-repos:take-convert-depr branch 8 times, most recently from 889fb83 to ec299bc Sep 30, 2017

DEPR: Deprecate convert parameter in take
xref gh-16948. The parameter is not respected,
nor is it a parameter in many 'take' implementations.

@gfyoung gfyoung force-pushed the forking-repos:take-convert-depr branch from ec299bc to 9325f21 Oct 1, 2017

@gfyoung

This comment has been minimized.

Copy link
Member Author

commented Oct 1, 2017

@jreback : I've done all of the converting (without breaking tests). PTAL.

@jreback jreback added this to the 0.21.0 milestone Oct 1, 2017

@jreback

jreback approved these changes Oct 1, 2017

@jreback jreback merged commit 458c1dc into pandas-dev:master Oct 1, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 1, 2017

thanks @gfyoung !

@gfyoung gfyoung deleted the forking-repos:take-convert-depr branch Oct 1, 2017

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 2, 2017

Krzysztof Chomski
Merge branch 'master' of github.com:pandas-dev/pandas
* 'master' of github.com:pandas-dev/pandas: (188 commits)
  Separate out _convert_datetime_to_tsobject (pandas-dev#17715)
  DOC: remove whatsnew note for xref pandas-dev#17131
  BUG: Regression in .loc accepting a boolean Index as an indexer (pandas-dev#17738)
  DEPR: Deprecate cdate_range and merge into bdate_range (pandas-dev#17691)
  CLN: replace %s syntax with .format in pandas.core: categorical, common, config, config_init (pandas-dev#17735)
  Fixed the memory usage explanation of categorical in gotchas from O(nm) to O(n+m) (pandas-dev#17736)
  TST: add backward compat for offset testing for pickles (pandas-dev#17733)
  remove unused time conversion funcs (pandas-dev#17711)
  DEPR: Deprecate convert parameter in take (pandas-dev#17352)
  BUG:Time Grouper bug fix when applied for list groupers (pandas-dev#17587)
  BUG: Fix some PeriodIndex resampling issues (pandas-dev#16153)
  BUG: Fix unexpected sort in groupby (pandas-dev#17621)
  DOC: Fixed typo in documentation for 'pandas.DataFrame.replace' (pandas-dev#17731)
  BUG: Fix series rename called with str altering name rather index (GH17407) (pandas-dev#17654)
  DOC: Add examples for MultiIndex.get_locs + cleanups (pandas-dev#17675)
  Doc improvements for IntervalIndex and Interval (pandas-dev#17714)
  BUG: DataFrame sort_values and multiple "by" columns fails to order NaT correctly
  Last of the timezones funcs (pandas-dev#17669)
  Add missing file to _pyxfiles, delete commented-out (pandas-dev#17712)
  update imports of DateParseError, remove unused imports from tslib (pandas-dev#17713)
  ...
@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 3, 2017

pandas/tests/sparse/test_series.py::TestSparseSeries::()::test_numpy_take
/home/travis/miniconda3/envs/pandas/lib/python3.5/site-packages/numpy/core/fromnumeric.py:123: FutureWarning: The 'convert' parameter is deprecated and will be removed in a future version.
return take(indices, axis, out, mode)

coming up on the 3.5 build of master

seems we r not suppressing somewhere (or prob needs to change to use _take maybe)

can u take care of @gfyoung

@gfyoung

This comment has been minimized.

Copy link
Member Author

commented Oct 3, 2017

@jreback : The cause of the warning is that an older version of numpy passes in None to convert by default when calling np.take (see here).

Thus, this is a compatibility issue, as it disappears in later versions of numpy (I actually worked on fixing fromnumeric compatibility issues at some point).

I'll put a check in the test to expect that warning for older versions of numpy, as checking if not convert is pretty idiomatic, and I wouldn't want to change it just because an older version of numpy is issuing a warning when we call .take() in a non-recommended way.

gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 3, 2017

gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 3, 2017

jreback added a commit that referenced this pull request Oct 3, 2017

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017

alanbato added a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

DEPR: Deprecate convert parameter in take (pandas-dev#17352)
xref pandas-devgh-16948. The parameter is not respected,
nor is it a parameter in many 'take' implementations.

alanbato added a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

No-Stream added a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

DEPR: Deprecate convert parameter in take (pandas-dev#17352)
xref pandas-devgh-16948. The parameter is not respected,
nor is it a parameter in many 'take' implementations.

No-Stream added a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

@jreback jreback referenced this pull request Jul 1, 2019

Open

DEPR: deprecations log for removed issues #13777

141 of 141 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.