Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(3232): fix R.groupBy when used as transducer #3234

Merged
merged 1 commit into from
Feb 25, 2022

Conversation

xgbuils
Copy link
Contributor

@xgbuils xgbuils commented Feb 8, 2022

It fixes #3232

@xgbuils
Copy link
Contributor Author

xgbuils commented Feb 8, 2022

Sorry, I didn't see jaawerth fix in #3233 . I'm gonna close it.

@xgbuils xgbuils closed this Feb 8, 2022
@jaawerth
Copy link

jaawerth commented Feb 8, 2022

While I did submit quick-and-dirty fix, I think yours might actually be better - I think the problem is that transducers are inherently stateful (especially in the step function), they just happen to hide the state under the hood (this being a huge optimization in clojure, or in JS using ImmutableJS, since you can use transient mutable versions of your immutable structures to save the overhead while still looking stateless to the outside observer:

I don't know this codebase super well, but if I had to guess, I think your fix could end up solving similar issues if there are other "also usable as a transducer" reductions that aren't being tested in transducer form.
.
The step function in particular is often some kind of state mutation, just bookended betwen the init and result functions, though in JS this is a bit made up as we go (and tbh ends up being way more complicated to make sense of all the internal indirection you end up with) since clojure gets immutability for free and outright names the mutable versions of its structures "transients".

Other JS implementations of transduce, I usually see this handled by the final transform created by e.g. into, where into([], ...) "clones" the array when it's empty, but if a populated plain object/array is passed in, they just mutate it when plain or else leave it up to a @@transducer/init function to take care of things. I don't know what'd be best for ramda, just figured it'd be worth mentioning.

@xgbuils
Copy link
Contributor Author

xgbuils commented Feb 9, 2022

Ok, I'm gonna open the PR because my solution seems more consistent with the no-transducers implementation.

After reading what you say, @jaawerth, I don't like to much the _clone approach we have in functions like into and maybe delegating in @@transducer/init function is better. But I think it's something that I have to review more deeply. Having the same solution than the default implementation can reduce the number of unexpected behaviours for now.

@xgbuils xgbuils reopened this Feb 9, 2022
@customcommander
Copy link
Member

I am having a busy week so worse case scenario I'm going to look at this over the weekend. But thank you both for coordinating your efforts. It's great to see 🙇‍♂️

@adispring
Copy link
Member

Sorry for introduce the bug, I wll see what happen here,

@adispring
Copy link
Member

I spend some time to see the source code of groupBy, reduceBy and _xreduceBy , find out that this fix is more robust.

When using reduce and reduceBy, People should not mutate the acc in place in general. But groupBy has break this rule for performance, and use one hack method to let itself behave normally, as follows:

var groupBy = _curry2(_checkForMethod('groupBy', reduceBy(function(acc, item) {
  if (acc == null) {
    acc = [];
  }
  acc.push(item);
  return acc;
}, null)));

when groupBy act as a transducer, if acc is null, this means it's a new subgroup iteration, then groupBy make a new empty array [] to subgroup, then each subgroup has its own acc.

@adispring
Copy link
Member

When I change the source code to:

var groupBy = _curry2(_checkForMethod('groupBy', reduceBy(function(acc, item) {
  acc.push(item);
  return acc;
}, [])));

groupBy behaves normally when process data directly, but will break as a transducer.

Because when process data directly, it will clone(valueAcc, false); and when used as a transducer, it does not clone valueAcc, all subgroups share the same valueAcc.

@adispring
Copy link
Member

adispring commented Feb 10, 2022

All other Ramda apis base on reduceBy (countBy, indexBy) don't have this problem, because they do not mutate acc in place.

var countBy = reduceBy(function(acc, elem) { return acc + 1; }, 0);

var indexBy = reduceBy(function(acc, elem) { return elem; }, null);

@adispring
Copy link
Member

Sorry again for introducing this bug. Both this pr and @jaawerth 's pr will fix the bug, and this pr will be better in my opinion, because it unifies the direct data procedure and transducer behavior.

Copy link
Member

@CrossEye CrossEye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. This looks great!

@CrossEye CrossEye merged commit 8b61243 into ramda:master Feb 25, 2022
@xgbuils xgbuils deleted the fix-3232 branch February 26, 2022 13:16
@adispring adispring mentioned this pull request Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

groupBy (permanently?) enters a broken state when used as transducer
5 participants