Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: .argmax should be positional, not an alias for idxmax #16830

Closed
TomAugspurger opened this issue Jul 5, 2017 · 12 comments
Closed

API: .argmax should be positional, not an alias for idxmax #16830

TomAugspurger opened this issue Jul 5, 2017 · 12 comments
Labels
API Design Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 5, 2017

In #6214 it was reported that argmax changed (I think since Series didn't subclass ndarray anymore). It's sometimes useful to get the index position of the max, so it'd be nice if .argmax did that, leaving idxmax to always be label-based.

In [1]: import pandas as pd
pd.Ser
In [2]: s = pd.Series([1, 2, 3, 0], index=['a', 'b', 'c', 'd'])

In [3]: s.idxmax()
Out[3]: 'c'

In [4]: s.argmax()
Out[4]: 'c'

I would expect Out[4] to be 2. The current workaround is np.argmax(x), which is OK, but doesn't generalize to DataFrames well.

We could deprecate .argmax in 0.21, pointing users to .idxmax. Then later we can change .argmax to be always be positional.


Update: We've deprecated .argmax/min. Need to change the behavior next.

@TomAugspurger TomAugspurger added this to the 0.21.0 milestone Jul 5, 2017
@TomAugspurger TomAugspurger added API Design Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves Difficulty Novice and removed Effort Low labels Jul 5, 2017
@lphk92
Copy link
Contributor

lphk92 commented Jul 15, 2017

I am working on this at the SciPy 2017 sprints

lphk92 pushed a commit to lphk92/pandas that referenced this issue Jul 15, 2017
lphk92 pushed a commit to lphk92/pandas that referenced this issue Jul 15, 2017
lphk92 pushed a commit to lphk92/pandas that referenced this issue Jul 15, 2017
lphk92 pushed a commit to lphk92/pandas that referenced this issue Jul 15, 2017
lphk92 pushed a commit to lphk92/pandas that referenced this issue Jul 15, 2017
lphk92 pushed a commit to lphk92/pandas that referenced this issue Jul 15, 2017
lphk92 pushed a commit to lphk92/pandas that referenced this issue Jul 15, 2017
lphk92 pushed a commit to lphk92/pandas that referenced this issue Jul 16, 2017
lphk92 pushed a commit to lphk92/pandas that referenced this issue Jul 17, 2017
TomAugspurger pushed a commit to lphk92/pandas that referenced this issue Sep 19, 2017
Added statements about correcting behavior in future commit

Add reference to github ticket

Fixing placement of github comment

Made test code more explicit

Fixing unrelated tests that are also throwing warnings

Updating whatsnew to give more detail about deprecation

Fixing whatsnew and breaking out tests to catch warnings

Additional comments and more concise whatsnew

Updating deprecate decorator to support custom message

DOC: Update docstrings, depr message, and whatsnew
TomAugspurger pushed a commit that referenced this issue Sep 27, 2017
* Deprecating Series.argmin and Series.argmax (#16830)

Added statements about correcting behavior in future commit

Add reference to github ticket

Fixing placement of github comment

Made test code more explicit

Fixing unrelated tests that are also throwing warnings

Updating whatsnew to give more detail about deprecation

Fixing whatsnew and breaking out tests to catch warnings

Additional comments and more concise whatsnew

Updating deprecate decorator to support custom message

DOC: Update docstrings, depr message, and whatsnew

* Added debug prints

* Try splitting the filters

* Reword whatsnew

* Change sparse series test

* Skip on py2

* Change to idxmin

* Remove py2 skips

* Catch more warnings

* Final switch to idxmax

* Consistent tests, refactor to_string

* Fixed tests
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.21.0, 1.0 Sep 28, 2017
alanbato pushed a commit to alanbato/pandas that referenced this issue Nov 10, 2017
…s-dev#16955)

* Deprecating Series.argmin and Series.argmax (pandas-dev#16830)

Added statements about correcting behavior in future commit

Add reference to github ticket

Fixing placement of github comment

Made test code more explicit

Fixing unrelated tests that are also throwing warnings

Updating whatsnew to give more detail about deprecation

Fixing whatsnew and breaking out tests to catch warnings

Additional comments and more concise whatsnew

Updating deprecate decorator to support custom message

DOC: Update docstrings, depr message, and whatsnew

* Added debug prints

* Try splitting the filters

* Reword whatsnew

* Change sparse series test

* Skip on py2

* Change to idxmin

* Remove py2 skips

* Catch more warnings

* Final switch to idxmax

* Consistent tests, refactor to_string

* Fixed tests
No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017
…s-dev#16955)

* Deprecating Series.argmin and Series.argmax (pandas-dev#16830)

Added statements about correcting behavior in future commit

Add reference to github ticket

Fixing placement of github comment

Made test code more explicit

Fixing unrelated tests that are also throwing warnings

Updating whatsnew to give more detail about deprecation

Fixing whatsnew and breaking out tests to catch warnings

Additional comments and more concise whatsnew

Updating deprecate decorator to support custom message

DOC: Update docstrings, depr message, and whatsnew

* Added debug prints

* Try splitting the filters

* Reword whatsnew

* Change sparse series test

* Skip on py2

* Change to idxmin

* Remove py2 skips

* Catch more warnings

* Final switch to idxmax

* Consistent tests, refactor to_string

* Fixed tests
@rgommers
Copy link
Contributor

np.argmax(some_series) does the correct thing right now (and did in the past, and will remain correct). But it gives:

\site-packages\numpy\core\fromnumeric.py:52: FutureWarning: 'argmax' is deprecated. Use 'idxmax' instead. The behavior of 'argmax' will be corrected to return the positional maximum in the future. Use 'series.values.argmax' to get the position of the maximum now.
  return getattr(obj, method)(*args, **kwds)

The deprecation warning appears to be for np.argmax which is confusing. So two suggestions:

  • amend the message to say pd.Series.argmax is ...
  • don't use the word "deprecated" - Series.argmax is not deprecated, it will just change behavior (hence FutureWarning). Deprecation normally means that it will be removed.

And a possible more involved change for this orpossibly for future related warnings: check whether the deprecated method is called by numpy's _wrapfunc rather than by the user directly, and avoid giving a warning if that is the case and the user code is and remains correct.

@jorisvandenbossche
Copy link
Member

np.argmax(some_series) does the correct thing right now (and did in the past, and will remain correct)

I don't think it does? Eg

In [8]: np.argmax(pd.Series([1, 2, 3], index=['a', 'b', 'c']))
/home/joris/miniconda3/envs/dev/lib/python3.5/site-packages/numpy/core/fromnumeric.py:57: FutureWarning: 'argmax' is deprecated, use 'idxmax' instead. The behavior of 'argmax'
will be corrected to return the positional maximum in the future.
Use 'series.values.argmax' to get the position of the maximum row.
  return getattr(obj, method)(*args, **kwds)
Out[8]: 'c'

In [9]: np.argmax(pd.Series([1, 2, 3], index=['a', 'b', 'c']).values)
Out[9]: 2

So it gives 'c', while it should give 2.

But I certainly agree that it is confusing that it seems that np.argmax is deprecated (not sure if we can solve this apart from the wording, as this has to do with python stacklevel printing), and that the wording can be improved (+1 on your two suggestions).

check whether the deprecated method is called by numpy's _wrapfunc rather than by the user directly, and avoid giving a warning if that is the case and the user code is and remains correct.

Is that possible? (how to do it exactly?)
Because that would certainly be nice, also in some other cases. For example in #21930 (comment) where we deprecated Series.compress (to remove it eventually) which also raises a warning for np.compress(s) (in this case that was actually OK, as it will not work anymore in the future, but I can imagine similar cases).

@jorisvandenbossche
Copy link
Member

I suppose it is easy to do with inspecting the stack. Is it guaranteed to always be _wrapfunc , or should we rather check in general if the call is coming from the numpy module?

@jorisvandenbossche
Copy link
Member

@rgommers see #22310 for edit to the deprecation message

@rgommers
Copy link
Contributor

So it gives 'c', while it should give 2.

You're right. My code does:

x = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
x[np.argmax(x)]

I think this is normal though - the majority of users probably won't look at the actual value of the index.

Is it guaranteed to always be _wrapfunc, or should we rather check in general if the call is coming from the numpy module?

_wrapfunc probably won't change, but it's private so no hard guarantee. These are the functions that use _wrapfunc:

  • take
  • reshape
  • choose
  • repeat
  • swapaxes
  • transpose
  • argpartition
  • argsort
  • argmax
  • argmin
  • searchsorted
  • nonzero
  • compress
  • clip
  • cumsum
  • cumprod
  • round

So that's not all - there are functions like np.sum that implement the logic themselves:

if type(a) is not mu.ndarray:
    try:
        sum = a.sum
    except AttributeError:
        pass
    else:
        return sum(axis=axis, dtype=dtype, out=out, **kwargs)

Indeed checking that the caller is a numpy object is probably best. But having thought about it, there's probably some way to break that, e.g. by a user wrapping a call to pandas in np.vectorize. Probably best to ignore my suggestion, not worth the trouble.

@rgommers see #22310 for edit to the deprecation message

Thanks

@jorisvandenbossche
Copy link
Member

You're right. My code does:

       x = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
       x[np.argmax(x)]

I think this is normal though - the majority of users probably won't look at the actual value of the index.

That's a good point. In case you have a non-numerical index like in the small example above, the x[np.argmax(x)] will keep working in the future (when argmax will return a positional integer), but with numerical indexes, this will not be guaranteed to give the same result. So therefore, if you have this deprecation message, you are still encouraged to switch to idxmax.
Which of course makes it more difficult to write code that works for both a Series and numpy 1D-array, but as long as [] mixes positional (numpy-like) and label-based indexing, there is no way around this complexity.

@wesm
Copy link
Member

wesm commented Aug 27, 2018

I just ran into this problem when working through errata for Python for Data Analysis. Can argmax/argmin return the positional values in 0.24.0?

@wesm wesm modified the milestones: 1.0, 0.24.0 Aug 27, 2018
@jorisvandenbossche
Copy link
Member

Generally, our rule is to keep deprecations two additional feature releases (after the first release having the warning), and only remove/change it in the release after that. For argmax/argmin, that would mean to only change it in 0.25/1.0.

@wesm the reason you ask is because you actually want to get the positional value? (and don't want to resort to np.argmin(np.array(values)) that would only be needed for one additional release?

@wesm
Copy link
Member

wesm commented Aug 27, 2018

Correct, the positional value. That's what it did before the cited refactoring caused index labels to be returned

@jreback jreback modified the milestones: 0.24.0, Contributions Welcome Nov 6, 2018
@jbrockmendel
Copy link
Member

@TomAugspurger is this closed by #30113 and #16955?

@jorisvandenbossche
Copy link
Member

Yes, the argmax method and np.argmax now work positionally.

@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, 1.0 Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants