Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summarize/group_by with categoricals sometimes throws incompatible aggregated result #142

Closed
ftobin opened this issue Aug 30, 2022 · 1 comment · Fixed by #143
Closed
Labels
bug Something isn't working
Projects

Comments

@ftobin
Copy link

ftobin commented Aug 30, 2022

The following will throw a ValueError

mtcars = datar.datasets.mtcars
(mtcars
   >> mutate(cyl = as_factor(f.cyl))
   >> group_by(f.cyl, f.gear)
   >> summarize(myx = sum_(f.disp*f.hp), _groups="drop"))
ValueError: `myx` is an incompatible aggregated result.

But this will not (the only difference is the grouping)

(mtcars
   >> mutate(cyl = as_factor(f.cyl))
   >> group_by(f.cyl, f.am)
   >> summarize(myx = sum_(f.disp*f.hp), _groups="drop"))

I have no idea why one would fail but not the other. Maybe something to do with the created groups having all the categorical values or not.

Note: Fails with mutate too, not just summarize.

ValueError: Incompatible value to recycle.
@pwwang pwwang added the bug Something isn't working label Aug 30, 2022
@pwwang
Copy link
Owner

pwwang commented Aug 30, 2022

This is because f.disp * f.hp generates a result like:

cyl  gear
6    3        52005.0
     4        76429.6
     5        25375.0
4    3        11649.7
     4        64670.5
     5        21693.6
8    3       846042.0
     4            0.0
     5       193499.0
Name: x, dtype: float64

However, the index (8, 4) shouldn't be there.

pwwang added a commit that referenced this issue Aug 30, 2022
@pwwang pwwang added this to To do in 0.9.0 via automation Aug 30, 2022
@pwwang pwwang mentioned this issue Sep 14, 2022
@pwwang pwwang moved this from To do to In progress in 0.9.0 Sep 14, 2022
pwwang added a commit that referenced this issue Sep 14, 2022
* ⬆️ Update deps for docs

* 🐛 Fix weighted_mean not working for grouped data (#133)

* ✅ Add tests for weighted_mean on grouped data

* ⚡️ Optimize distinct on existing columns

* 🔖 0.8.6

* 🐛 Fix core.broadcast._broadcast_base losing NA groups (#137)

* 🐛 Inherit pandas groupby() arguments wherever possible

* 🐛 Fix weighted_mean on NA raising error (#139)

* 🐛 Inherit pandas groupby() arguments wherever possible even when grouping by a grouper (fixing #138 and #142)

* ♻️ [wip] Factor func_factory to make it work on multiple arguments

* ♻️  [wip] Allow varargs in func_factory

* ✅ Pass all tests

* 📝 Fix all notebooks

* ⬆️ Upgrade pipda to 0.7.1

* 💚 Fix CI

* 🐛 Remove varname from get_versions

* ✅ Allow warnings in tests

* ✅ Use assert_ in tests so ast node retains

* 🔖 0.9.0

* 🚨 Fix linting
0.9.0 automation moved this from In progress to Done Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
0.9.0
Done
Development

Successfully merging a pull request may close this issue.

2 participants