Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError on argmax of object dtype (change from 0.20.3) #18021

Closed
keerthanpg opened this issue Oct 29, 2017 · 19 comments · Fixed by #54109
Closed

TypeError on argmax of object dtype (change from 0.20.3) #18021

keerthanpg opened this issue Oct 29, 2017 · 19 comments · Fixed by #54109
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version

Comments

@keerthanpg
Copy link

keerthanpg commented Oct 29, 2017

>>> import pandas as pd
>>> pd.Series([0, 0], dtype='object').argmax()

I was doing action = state_action.idxmax() where state_action was of type 'pandas.core.series.Series'. When I run in 0.21.0, it gives the following error:

File "/usr/local/lib/python3.5/dist-packages/pandas/core/series.py", line 1357, in idxmax
i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/nanops.py", line 74, in _f
raise TypeError(msg.format(name=f.name.replace('nan', '')))
TypeError: reduction operation 'argmax' not allowed for this dtype

However, when I downgraded to pandas 0.20.3, it worked just fine. You might wanna look into this. :)

@TomAugspurger
Copy link
Contributor

Can you give a reproducible example?

@jorisvandenbossche jorisvandenbossche added the Needs Info Clarification about behavior needed to assess issue label Oct 30, 2017
@barondu
Copy link

barondu commented Oct 30, 2017

I meet the same problem.
At first i use action = state_action.argmax(), it saysFutureWarning: 'argmax' is deprecated. Use 'idxmax' instead. The behavior of 'argmax' will be corrected to return the positional maximum in the future. Use 'series.values.argmax' to get the position of the maximum now. action = state_action.argmax()
So I change to action = state_action.idxmax()
When I run in 0.21.0, it gives the following error:

Traceback (most recent call last):
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/tkinter/__init__.py", line 1699, in __call__
    return self.func(*args)
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/tkinter/__init__.py", line 745, in callit
    func(*args)
  File "/Users/baron/PycharmProjects/HelloPython/test_Q.py", line 26, in update
    action = RL.choose_action(str(observation))
  File "/Users/baron/PycharmProjects/HelloPython/RL_brain.py", line 40, in choose_action
    action = state_action.idxmax()
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/site-packages/pandas/core/series.py", line 1357, in idxmax
    i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/site-packages/pandas/core/nanops.py", line 74, in _f
    raise TypeError(msg.format(name=f.__name__.replace('nan', '')))
TypeError: reduction operation 'argmax' not allowed for this dtype

@TomAugspurger
Copy link
Contributor

Can you provide a copy-pastable example @barondu?

@barondu
Copy link

barondu commented Oct 30, 2017

@TomAugspurger
Copy link
Contributor

Do you have a minimal test-case, something that could go in a unit test?

@barondu
Copy link

barondu commented Oct 30, 2017

@TomAugspurger

import pandas as pd
import numpy as np

q_table = pd.DataFrame(columns=['a', 'b', 'c', 'd'])
q_table = q_table.append(pd.Series([0] * 4, index=q_table.columns, name='test1', ))
q_table = q_table.append(pd.Series([0] * 4, index=q_table.columns, name='test2', ))
print(q_table)
state_action = q_table.ix['test2', :]
print(state_action)
state_action = state_action.reindex(
    np.random.permutation(state_action.index))
print(state_action)
action = state_action.idxmax()
# action = state_action.argmax()
print('\naction: ', action)

@barondu
Copy link

barondu commented Oct 30, 2017

Here is the error message

Traceback (most recent call last):
  File "/Users/baron/PycharmProjects/HelloPython/pandas_exercise.py", line 13, in <module>
    action = state_action.idxmax()
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/site-packages/pandas/core/series.py", line 1357, in idxmax
    i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
  File "/Users/baron/.pyenv/versions/3.6.3/lib/python3.6/site-packages/pandas/core/nanops.py", line 74, in _f
    raise TypeError(msg.format(name=f.__name__.replace('nan', '')))
TypeError: reduction operation 'argmax' not allowed for this dtype

@TomAugspurger
Copy link
Contributor

Thanks, simplified a bit:

In [11]: pd.Series([0, 0], dtype='object')
Out[11]:
0    0
1    0
dtype: object

In [12]: pd.Series([0, 0], dtype='object').argmax()
/Users/taugspurger/Envs/pandas-dev/bin/ipython:1: FutureWarning: 'argmax' is deprecated. Use 'idxmax' instead. The behavior of 'argmax' will be corrected to return the positional maximum in the future. Use 'series.values.argmax' to get the position of the maximum now.
  #!/Users/taugspurger/Envs/pandas-dev/bin/python3.6
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-e0ba19c8565d> in <module>()
----> 1 pd.Series([0, 0], dtype='object').argmax()

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/util/_decorators.py in wrapper(*args, **kwargs)
     34     def wrapper(*args, **kwargs):
     35         warnings.warn(msg, klass, stacklevel=stacklevel)
---> 36         return alternative(*args, **kwargs)
     37     return wrapper
     38

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/series.py in idxmax(self, axis, skipna, *args, **kwargs)
   1355         """
   1356         skipna = nv.validate_argmax_with_skipna(skipna, args, kwargs)
-> 1357         i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
   1358         if i == -1:
   1359             return np.nan

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/nanops.py in _f(*args, **kwargs)
     72             if any(self.check(obj) for obj in obj_iter):
     73                 msg = 'reduction operation {name!r} not allowed for this dtype'
---> 74                 raise TypeError(msg.format(name=f.__name__.replace('nan', '')))
     75             try:
     76                 with np.errstate(invalid='ignore'):

TypeError: reduction operation 'argmax' not allowed for this dtype

Is there a reason you're using object dtype here?

@TomAugspurger
Copy link
Contributor

Seems like #16449 maybe have been the root issues (cc @DGrady)

NumPy will (somehow) handle object arrays in argmax/min, so I suppose @disallow('O') is a bit too strict.

@TomAugspurger TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version labels Oct 30, 2017
@TomAugspurger TomAugspurger added this to the Next Major Release milestone Oct 30, 2017
@TomAugspurger TomAugspurger removed the Needs Info Clarification about behavior needed to assess issue label Oct 30, 2017
@TomAugspurger TomAugspurger changed the title Error using pandas version 0.21.0 TypeError on argmax of object dtype (change from 0.20.3) Oct 30, 2017
@TomAugspurger
Copy link
Contributor

We'll need to think about whether we want to emulate NumPy here though. It's nice to know ahead of time whether you function is valid or not for the type of the values being passed. With object dtype there's no way of knowing that.

@jorisvandenbossche
Copy link
Member

I think for object dtype we should not, beforehand, decide whether such an operation works or not, but IMO we should defer that to the actual objects. Eg min/max works on strings, and so it seems logical that argmax/argmin does as well.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 30, 2017

Fortunately, argmin/max didn't work on strings before :)

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.20.3'

In [3]: pd.Series(['a', 'b']).argmax()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-4747fce7cbb5> in <module>()
----> 1 pd.Series(['a', 'b']).argmax()

~/miniconda3/envs/pandas-0.20.3/lib/python3.6/site-packages/pandas/core/series.py in idxmax(self, axis, skipna, *args, **kwargs)
   1262         """
   1263         skipna = nv.validate_argmax_with_skipna(skipna, args, kwargs)
-> 1264         i = nanops.nanargmax(_values_from_object(self), skipna=skipna)
   1265         if i == -1:
   1266             return np.nan

~/miniconda3/envs/pandas-0.20.3/lib/python3.6/site-packages/pandas/core/nanops.py in nanargmax(values, axis, skipna)
    476     """
    477     values, mask, dtype, _ = _get_values(values, skipna, fill_value_typ='-inf',
--> 478                                          isfinite=True)
    479     result = values.argmax(axis)
    480     result = _maybe_arg_null_out(result, axis, mask, skipna)

~/miniconda3/envs/pandas-0.20.3/lib/python3.6/site-packages/pandas/core/nanops.py in _get_values(values, skipna, fill_value, fill_value_typ, isfinite, copy)
    194     values = _values_from_object(values)
    195     if isfinite:
--> 196         mask = _isfinite(values)
    197     else:
    198         mask = isnull(values)

~/miniconda3/envs/pandas-0.20.3/lib/python3.6/site-packages/pandas/core/nanops.py in _isfinite(values)
    237             is_integer_dtype(values) or is_bool_dtype(values)):
    238         return ~np.isfinite(values)
--> 239     return ~np.isfinite(values.astype('float64'))
    240
    241

ValueError: could not convert string to float: 'b'

But yes, I suppose that we should attempt to support it.

@jorisvandenbossche
Copy link
Member

Ah, yes :-) Although in numpy it works:

In [118]: a = np.array(['a', 'b', 'c'], dtype=object)

In [119]: a.min()
Out[119]: 'a'

In [120]: a.argmin()
Out[120]: 0

@DGrady
Copy link
Contributor

DGrady commented Nov 1, 2017

Just refreshing my memory — so in the course of tracking down the bug that prompted #16449, it turned out that argmax etc were always trying to coerce their inputs to float, which is why they used to fail with string data. They no longer do that. But, at least at the time, it seemed pretty tricky to get argmax etc to behave consistently with arbitrary object dtypes that could also contain nulls, and we decided to disallow that case. If you remove the disallow decorator, they currently work as expected with string data, as long as there are no null values, but once you start including null values or possibly using other types of objects things would not work as expected. I think that marking argmax as not allowed with object dtypes was done mainly for expediency.

@JB712
Copy link

JB712 commented Nov 2, 2017

I'm not sure to get all the thing involving here, but in the exemple given by @barondu (for MorvanZhou code), is the only solution to downgrade pandas ? Is there a simpler solution like to replace argmax by an other function ?
(Sorry I'm very new to python)

@DGrady
Copy link
Contributor

DGrady commented Nov 2, 2017

As a workaround, you can call argmax on the underlying NumPy array:

Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 27 2017, 12:14:30) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.Series(['one', 'two']).values.argmax()
1

@Choxmi
Copy link

Choxmi commented Nov 19, 2017

I faced the same issue and I tried with pandas 0.19.2 and 0.18.1. Non of them worked for me. I was able to run it successfully only after downgrading to pandas 0.20.3. Hope this will help someone. (y)

@mar-ses
Copy link

mar-ses commented Nov 4, 2021

I'm getting this issue in pandas 1.1.5 with a Series of dtype object containing pd.Timestamp. Not sure if it was decided to fix this in the end or not, but if there needs to be a reason for why idxmax should work in this case, it is that .max() does work; if .max() works .idxmax() should work too.

@fleimgruber
Copy link
Contributor

Seeing the same issue as @mar-ses in pandas 1.3.3, i.e. .max() works on pd.Timestamps, but idxmax() does not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants