Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group_by >> mutate return error with multiple grouping variables and different sizes #63

Closed
Gedevan-Aleksizde opened this issue Sep 26, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@Gedevan-Aleksizde
Copy link

Gedevan-Aleksizde commented Sep 26, 2021

I found that mutate function returns error after group_by if the grouping variables are more than 1 variable and thieir sizes are different.

from datar import f
from datar.dplyr import mutate, group_by
from datar.tibble import tibble

d = tibble(
    g1=['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'],
    g2=['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'b']
) >> mutate(x=range(9))

print(d.groupby(['g1', 'g2']).size())

The size of each groups are the following:

g1  g2
a   a     3
b   a     1
    b     2
c   b     1
    c     2

Then I try to using mutate after group_by, the following error occured.

print(d >> group_by(f.g1, f.g2) >> mutate(x=f.x))
ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements
The full log

ValueError                                Traceback (most recent call last)
/mnt/g/User/Games/Blade-and-Sorcery/translation-jp/test.py in 
     10 print(d.groupby(['g1', 'g2']).size())
     11
---> 12 print(d >> group_by(f.g1, f.g2) >> mutate(x=f.x))

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/pipda/function.py in _pipda_eval(self, data, context)
     94             # leave args/kwargs for the child
     95             # verb/function/operator to evaluate
---> 96             return func(*bondargs.args, **bondargs.kwargs)  # type: ignore
     97
     98         args = evaluate_expr(

~/.pyenv/versions/3.8.10/lib/python3.8/functools.py in wrapper(*args, **kw)
    873                             '1 positional argument')
    874
--> 875         return dispatch(args[0].__class__)(*args, **kw)
    876
    877     funcname = getattr(func, '__name__', 'singledispatch function')

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/datar/dplyr/mutate.py in _(_data, _keep, _before, _after, base0_, *args, **kwargs)
    196         return ret
    197
--> 198     out = _data._datar_apply(apply_func, _drop_index=False).sort_index()
    199     if out.shape[0] > 0:
    200         # keep the original row order

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/datar/core/grouped.py in _datar_apply(self, _func, _mappings, _method, _groupdata, _drop_index, *args, **kwargs)
    214
    215             # keep the order
--> 216             out = self._grouped_df.apply(_applied).sort_index(level=-1)
    217
    218         if not _groupdata:

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
   1270         with option_context("mode.chained_assignment", None):
   1271             try:
-> 1272                 result = self._python_apply_general(f, self._selected_obj)
   1273             except TypeError:
   1274                 # gh-20949

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/pandas/core/groupby/groupby.py in _python_apply_general(self, f, data)
   1304             data after applying f
   1305         """
-> 1306         keys, values, mutated = self.grouper.apply(f, data, self.axis)
   1307
   1308         return self._wrap_applied_output(

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/pandas/core/groupby/ops.py in apply(self, f, data, axis)
    781             try:
    782                 sdata = splitter.sorted_data
--> 783                 result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
    784
    785             except IndexError:

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/pandas/core/groupby/ops.py in fast_apply(self, f, sdata, names)
   1326         # must return keys::list, values::list, mutated::bool
   1327         starts, ends = lib.generate_slices(self.slabels, self.ngroups)
-> 1328         return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
   1329
   1330     def _chop(self, sdata: DataFrame, slice_obj: slice) -> DataFrame:

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/pandas/_libs/reduction.pyx in pandas._libs.reduction.apply_frame_axis0()

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/datar/core/grouped.py in _applied(subdf)
    210                 subdf.attrs["_group_index"] = group_index
    211                 subdf.attrs["_group_data"] = self._group_data
--> 212                 ret = _func(subdf, *args, **kwargs)
    213                 return None if ret is None else ret
    214

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/datar/dplyr/mutate.py in apply_func(df)
    193             **kwargs,
    194         )
--> 195         ret.index = rows
    196         return ret
    197

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
   5498         try:
   5499             object.__getattribute__(self, name)
-> 5500             return object.__setattr__(self, name, value)
   5501         except AttributeError:
   5502             pass

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/pandas/core/generic.py in _set_axis(self, axis, labels)
    764     def _set_axis(self, axis: int, labels: Index) -> None:
    765         labels = ensure_index(labels)
--> 766         self._mgr.set_axis(axis, labels)
    767         self._clear_item_cache()
    768

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/pandas/core/internals/managers.py in set_axis(self, axis, new_labels)
    214     def set_axis(self, axis: int, new_labels: Index) -> None:
    215         # Caller is responsible for ensuring we have an Index object.
--> 216         self._validate_set_axis(axis, new_labels)
    217         self.axes[axis] = new_labels
    218

~/.pyenv/versions/3.8.10/lib/python3.8/site-packages/pandas/core/internals/base.py in _validate_set_axis(self, axis, new_labels)
     55
     56         elif new_len != old_len:
---> 57             raise ValueError(
     58                 f"Length mismatch: Expected axis has {old_len} elements, new "
     59                 f"values have {new_len} elements"

ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements

Note:

  • I confirmed this problem in Python 3.9.5 or 3.8.10 + iPython 7.28.0 with the latest datar package (73c58da)
  • This error didn't occur with same sizes of groups.
  • This error didn't occur with a single grouping variable (e.g.; d >> group_by(f.g2) >> mutate(x=f.x)) )

p.s. Thank you for the attension to my post.

@pwwang pwwang added the bug Something isn't working label Sep 26, 2021
@pwwang
Copy link
Owner

pwwang commented Sep 26, 2021

Nice catch, should be fixed in the next version.

Your post is very helpful, and I started optimizing some functions based on your evaluation. I noticed that you don't have a license in the repo you used for your post, let me know if I can use it as well for some evaluation and optimization (I have already forked it)

pwwang added a commit that referenced this issue Sep 26, 2021
@Gedevan-Aleksizde
Copy link
Author

Oops, It's my careless. Now MIT license added. https://github.com/Gedevan-Aleksizde/pandas-tidyverse-trials

@pwwang pwwang mentioned this issue Oct 5, 2021
@pwwang pwwang closed this as completed in 4e2b5db Oct 5, 2021
@Gedevan-Aleksizde
Copy link
Author

Great. Thank you for the speedy work.

pwwang added a commit that referenced this issue Oct 18, 2021
pwwang added a commit that referenced this issue Oct 21, 2021
* 👷 Fix coverage uploads multiple times in a single CI run

* 💚 Fix CI

* 🐛 Fix `filter()` restructures group_data incorrectly (#69)

* 🐛 Refix #63

* 🔖 0.5.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants