Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Some str reducers fail in Groupby.transform with axis=1 #45986

Closed
4 of 7 tasks
rhshadrach opened this issue Feb 14, 2022 · 4 comments
Closed
4 of 7 tasks

BUG: Some str reducers fail in Groupby.transform with axis=1 #45986

rhshadrach opened this issue Feb 14, 2022 · 4 comments
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby

Comments

@rhshadrach
Copy link
Member

rhshadrach commented Feb 14, 2022

Reducers:

All raise a ValueError when used with axis=1, e.g.

df = DataFrame({"a": [1, 2], "b": [3, 4], "c": [5, 6]}, index=["x", "y"])
result = df.groupby([0, 0, 1], axis=1).transform("first")
@rhshadrach rhshadrach added Bug Groupby Apply Apply, Aggregate, Transform, Map labels Feb 14, 2022
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Feb 14, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@rhshadrach rhshadrach changed the title Some str reducers fail in Groupby.transform with axis=1 BUG: Some str reducers fail in Groupby.transform with axis=1 Dec 1, 2022
@iofall
Copy link
Contributor

iofall commented Dec 4, 2022

For idxmax and idxmin.

When we make the transform call, result = df.groupby([0, 0, 1], axis=1).transform("idxmax"). The axis information from the groupby is not used.

transform calls _transform which then calls idxmax on this line:

result = getattr(self, func)(*args, **kwargs)

In the idxmax function we expect axis as an argument, which is by default set to 0. So even if the axis for self is 1, the axis for idxmax will not get overridden since it is not passed in the args or kwargs in the above line.

def idxmax(
self,
axis: Axis = 0,
skipna: bool = True,
numeric_only: bool = False,
) -> DataFrame:
axis = DataFrame._get_axis_number(axis)

This is why it works for axis = 0, since it will take 0 by default. But to work with axis = 1, we will have to explicitly pass it in the kwargs as .transform("idxmax", axis = 1)

So, to fix:

  • One way could be to explicitly pass the required axis and mention that as a caveat in the documentation.
  • The second one could be to modify the _transform method to check if axis exists in kwargs, if not add it as self.axis inside the else block.

Let me know which way makes the most sense and I will make a PR for it.

@iofall
Copy link
Contributor

iofall commented Dec 4, 2022

For ngroup, it seems the error is only raised when index length does not match the len of the list provided.

df = pd.DataFrame({"a": [1, 2, 3], "b": [3, 4, 5], "c": [5, 6, 7]}, index=["x", "y", "z"])
result = df.groupby([0, 0, 1], axis=1).transform("ngroup")

The above transform will work, but:

df = pd.DataFrame({"a": [1, 2,], "b": [3, 4], "c": [5, 6]}, index=["x", "y"])
result =  df.groupby([0, 0, 1], axis=1).transform("ngroup")

will fail for .ngroup() as well as .transform("ngroup")


For nth,

>>> result = df.groupby([0, 0, 1], axis=1).transform("nth", 1)

# Output
ValueError: 'nth' is not a valid function name for transform(name)

It fails because nth does not exist in base.transform_kernel_allowlist when we check for it here:

elif func not in base.transform_kernel_allowlist:
msg = f"'{func}' is not a valid function name for transform(name)"
raise ValueError(msg)

@rhshadrach
Copy link
Member Author

Thanks for digging into these!

So, to fix:

  • One way could be to explicitly pass the required axis and mention that as a caveat in the documentation.
  • The second one could be to modify the _transform method to check if axis exists in kwargs, if not add it as self.axis inside the else block.

I don't think we should require the user to pass axis=1 twice to get axis=1 behavior. The 2nd looks like a good solution to me, assuming specification of axis gives the correct result (the column label of the min/max value).

It fails because nth does not exist in base.transform_kernel_allowlist when we check for it here:

This looks correct to me - nth is a filter and as such we may end up with multiple or no values in a given group. As there isn't a reliable way to transform, I believe raising is the correct thing to do.

@rhshadrach rhshadrach added this to the 2.0 milestone Jan 5, 2023
@rhshadrach rhshadrach removed this from the 2.0 milestone Jan 24, 2023
@rhshadrach
Copy link
Member Author

Closing since axis=1 is deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby
Projects
None yet
Development

No branches or pull requests

3 participants