Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce documentation groupby (AwkwardExtensionArray object has no attribute all) #35

Closed
rtbs-dev opened this issue Aug 15, 2023 · 2 comments

Comments

@rtbs-dev
Copy link

Hi all! Lovely utility here. I was playing with the example from the docs and can't quite seem to find a good workaround for this bug:

(df
 .set_index('name')
 .groupby('team', group_keys=True)
 .apply(lambda x: x.goals.ak.mean(axis=1))
)
[...] lib/python3.10/site-packages/pandas/core/groupby/groupby.py:1630, in GroupBy._python_apply_general(self, f, data, not_indexed_same, is_transform, is_agg)
   1623     # We want to behave as if `self.group_keys=False` when reconstructing
   1624     # the object. However, we don't want to mutate the stateful GroupBy
   1625     # object, so we just override it.
...
    986 # error: Unsupported left operand type for & ("ExtensionArray")
    987 equal_na = self.isna() & other.isna()  # type: ignore[operator]
--> 988 return bool((equal_values | equal_na).all())

AttributeError: 'AwkwardExtensionArray' object has no attribute 'all'

This seems to be happening with the .agg operator as well, and the .groupby(['team','name']).apply(...) method I would usually use returns an error complaining about no attribute 'any'.

Here's my version info, as in the docs:

awkward         2.3.2
awkward_pandas  2023.8.0
numpy           1.23.5
pandas          1.5.2

I should mention that the behavior of s.ak.to_columns() appears to have changed as well, since my version returns only a single column named awkward-data, vs. the docs that have a column for every field in the array.

@douglasdavis
Copy link
Collaborator

douglasdavis commented Aug 16, 2023

Hi!

I have these version installed:

awkward         2.3.2
awkward_pandas  2023.8.0
numpy           1.23.5
pandas          1.5.2

And I'm unable to reproduce the error you're seeing (the example in docs is running for me with those versions). Would you be able to spin up a fresh conda/virtual environment with this versions and try again?

For completeness here's what I see locally:

In [20]: data = """
    ...: - name: Bob\n  team: tigers\n  goals: [0, 0, 0, 1, 2, 0, 1]\n\n- name: Alice\n  team: bears\n  goals: [3, 2, 1, 0, 1]\n\n- name: Jack\n  team: bears\n  goals: [0, 0, 0, 0,
    ...:  0, 0, 0, 0, 1]\n\n- name: Jill\n  team: bears\n  goals: [3, 0, 2]\n\n- name: Ted\n  team: tigers\n  goals: [0, 0, 0, 0, 0]\n\n- name: Ellen\n  team: tigers\n  goals: [1, 
    ...: 0, 0, 0, 2, 0, 1]\n\n- name: Dan\n  team: bears\n  goals: [0, 0, 3, 1, 0, 2, 0, 0]\n\n- name: Brad\n  team: bears\n  goals: [0, 0, 4, 0, 0, 1]\n\n- name: Nancy\n  team: ti
    ...: gers\n  goals: [0, 0, 1, 1, 1, 1, 0]\n\n- name: Lance\n  team: bears\n  goals: [1, 1, 1, 1, 1]\n\n- name: Sara\n  team: tigers\n  goals: [0, 1, 0, 2, 0, 3]\n\n- name: Ryan
    ...: \n  team: tigers\n  goals: [1, 2, 3, 0, 0, 0, 0]\n
    ...: """

In [21]: import yaml
    ...: 
    ...: data = yaml.load(data, Loader=yaml.SafeLoader)
    ...: data = ak.Array(data)

In [22]: s = akpd.from_awkward(data)

In [23]: df = s.ak.to_columns(extract_all=True)

In [24]: (df
    ...:  .set_index('name')
    ...:  .groupby('team', group_keys=True)
    ...:  .apply(lambda x: x.goals.ak.mean(axis=1))
    ...: )
Out[24]: 
team    name 
bears   Alice         1.4
        Jack     0.111111
        Jill     1.666667
        Dan          0.75
        Brad     0.833333
        Lance         1.0
tigers  Bob      0.571429
        Ted           0.0
        Ellen    0.571429
        Nancy    0.571429
        Sara          1.0
        Ryan     0.857143
dtype: awkward

In [25]: (df
    ...:  .set_index('name')
    ...:  .groupby(['team', 'name'], group_keys=True)
    ...:  .apply(lambda x: x.goals.ak.mean(axis=1))
    ...: )
Out[32]: 
team    name   name 
bears   Alice  Alice         1.4
        Brad   Brad     0.833333
        Dan    Dan          0.75
        Jack   Jack     0.111111
        Jill   Jill     1.666667
        Lance  Lance         1.0
tigers  Bob    Bob      0.571429
        Ellen  Ellen    0.571429
        Nancy  Nancy    0.571429
        Ryan   Ryan     0.857143
        Sara   Sara          1.0
        Ted    Ted           0.0
dtype: awkward

I'm also unable to reproduce this:

I should mention that the behavior of s.ak.to_columns() appears to have changed as well, since my version returns only a single column named awkward-data, vs. the docs that have a column for every field in the array.

In [18]: s.ak.to_columns()
Out[18]: 
     name    team                            awkward-data
0     Bob  tigers        {'goals': [0, 0, 0, 1, 2, 0, 1]}
1   Alice   bears              {'goals': [3, 2, 1, 0, 1]}
2    Jack   bears  {'goals': [0, 0, 0, 0, 0, 0, 0, 0, 1]}
3    Jill   bears                    {'goals': [3, 0, 2]}
4     Ted  tigers              {'goals': [0, 0, 0, 0, 0]}
5   Ellen  tigers        {'goals': [1, 0, 0, 0, 2, 0, 1]}
6     Dan   bears     {'goals': [0, 0, 3, 1, 0, 2, 0, 0]}
7    Brad   bears           {'goals': [0, 0, 4, 0, 0, 1]}
8   Nancy  tigers        {'goals': [0, 0, 1, 1, 1, 1, 0]}
9   Lance   bears              {'goals': [1, 1, 1, 1, 1]}
10   Sara  tigers           {'goals': [0, 1, 0, 2, 0, 3]}
11   Ryan  tigers        {'goals': [1, 2, 3, 0, 0, 0, 0]}
In [19]: s.ak.to_columns(extract_all=True)
Out[19]: 
     name    team                        goals
0     Bob  tigers        [0, 0, 0, 1, 2, 0, 1]
1   Alice   bears              [3, 2, 1, 0, 1]
2    Jack   bears  [0, 0, 0, 0, 0, 0, 0, 0, 1]
3    Jill   bears                    [3, 0, 2]
4     Ted  tigers              [0, 0, 0, 0, 0]
5   Ellen  tigers        [1, 0, 0, 0, 2, 0, 1]
6     Dan   bears     [0, 0, 3, 1, 0, 2, 0, 0]
7    Brad   bears           [0, 0, 4, 0, 0, 1]
8   Nancy  tigers        [0, 0, 1, 1, 1, 1, 0]
9   Lance   bears              [1, 1, 1, 1, 1]
10   Sara  tigers           [0, 1, 0, 2, 0, 3]
11   Ryan  tigers        [1, 2, 3, 0, 0, 0, 0]

@rtbs-dev
Copy link
Author

So I downloaded the exact notebook for your "quickstart", and I started a new environment with defaults via conda, and used pip install awkward awkward-pandas ipykernel pyyaml (with a subsequent python -m ipykernel install --user --name awkward to access the kernel).

Here's the versions that gets:

awkward         2.3.2
awkward_pandas  2023.8.0
numpy           1.25.2
pandas          2.0.3

And interestingly the groupby now works, but I do reproduce the to_columns error perfectly:

s.ak.to_columns() gives

	awkward-data
0	{'name': 'Bob', 'team': 'tigers', 'goals': [0,...
1	{'name': 'Alice', 'team': 'bears', 'goals': [3...
2	{'name': 'Jack', 'team': 'bears', 'goals': [0,...
3	{'name': 'Jill', 'team': 'bears', 'goals': [3,...
4	{'name': 'Ted', 'team': 'tigers', 'goals': [0,...
5	{'name': 'Ellen', 'team': 'tigers', 'goals': [...
6	{'name': 'Dan', 'team': 'bears', 'goals': [0, ...
7	{'name': 'Brad', 'team': 'bears', 'goals': [0,...
8	{'name': 'Nancy', 'team': 'tigers', 'goals': [...
9	{'name': 'Lance', 'team': 'bears', 'goals': [1...
10	{'name': 'Sara', 'team': 'tigers', 'goals': [0...
11	{'name': 'Ryan', 'team': 'tigers', 'goals': [1...

I'll have to go now but I can try to reproduce the main error with older pandas later today, hopefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants