Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum of ordered categorical data in Panda DataFrames #25299

Closed
Guillaume1801 opened this issue Feb 13, 2019 · 7 comments

Comments

@Guillaume1801
Copy link

commented Feb 13, 2019

I have a Pandas DataFrame with one Serie containing ordered Categorical data. Some value of this Serie may be missing (NaN). I want to get the minimum without taking into account NaNs but I obtained strange results ...

Code:

raw_cat = pd.Categorical(["a", "b", "c", "a"],
                         categories=["b", "c", "d"],
                         ordered=True)
s = pd.Series(raw_cat)
raw_cat.min(numeric_only=True), s.min(numeric_only=True)

Output:

('b', nan)

Expected utput:

('b', 'b')

I am getting the desired output when running this code with pandas 0.23.4 but not with pandas 0.24.0 and above.

Is this an issue or a misunderstanding? Thank you for your help.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

Thanks for the report! I can confirm this regression.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

So it seems that the numeric_only keyword is no longer properly passed through to the Categorical.min implementation. Investigation welcome!

@arnov

This comment has been minimized.

Copy link
Contributor

commented Feb 13, 2019

I dove into this a bit, but shouldn't the argument be skipna? Because I am unsure what numeric_only would mean for a categorical series.

@Guillaume1801

This comment has been minimized.

Copy link
Author

commented Feb 13, 2019

I agree with you ! It makes no sense to use this argument while the argument used to removed NaNs in all other Pandas' methods is skipna ...

@arnov

This comment has been minimized.

Copy link
Contributor

commented Feb 13, 2019

To add to the confusion, Categorical supports the dropna argument in the mode method, while it seems to be skipna in a lot of other places.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

@arnov I actually thought exactly the same when answering on this issue, so I opened #25303 (but forgot to link to it here).

So I agree that skipna is more logical, but I don't think we can't simply change it as you did in #25304, we will have to deprecate the keyword and behaviour first.

Short term, I think it would be good to "just" fix it using numeric_only (so we can include this for 0.24.2), and then for 0.25.0 we could think about deprecating it. But let's first discuss that in #25303

@jreback jreback modified the milestones: 0.24.2, 0.25.0 Feb 16, 2019

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.25.0, 0.24.2 Feb 16, 2019

@jreback

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2019

closed by #25304

@jreback jreback closed this Feb 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.