New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of duplicates for replace
on has_many-through
#33954
Fix handling of duplicates for replace
on has_many-through
#33954
Conversation
Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @georgeclaghorn (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review. Please see the contribution instructions for more information. |
Just fixed the naming mixup between union and intersection. |
Hi @febeling! This is a great find. There were a couple things I noticed that might be improved:
I also took inspiration from this stackoverflow answer, which accomplishes the same work that this PR is doing with the def difference(a, b)
counts_b = occurrences(b)
a.reject do |object|
occurrence?(counts_b, object)
end
end
def intersection(a, b)
counts_b = occurrences(b)
a.select do |object|
occurrence?(counts_b, object)
end
end
def occurrences(array)
array.each_with_object(Hash.new(0)) do |object, counts|
counts[object] += 1
end
end
def occurrence?(counts, object)
counts[object] > 0 && counts[object] -= 1
end I ran benchmarks of the original version against the modified code in this gist. The speed of execution is comparable (within a ~5% margin of error), but this modified code uses roughly 5x less memory. |
Thanks for your thorough review, @petestreet. The optimization you have researched looks absolutely valid, and I've changed the code to apply it. I also changed the method names to some I find more descriptive of the now changed algorithm. I hope you follow with these. What do you think? |
Looks good, @febeling. Potentially would want to have comments indicating that |
I realize the methods So I'd love to hear an opinion by a core member if, firstly, this should move over to, say activesupport/core_ext/array or some other more appropriate place. I'm sure there are comparable cases. Cases where code is general in applicability, but rarely used. And if that's considered good material for AS, or not. Secondly, what can be done to help this get merged? I do realize this is plenty of code for a bug fix. The reason for that is, duplicates weren't actually using the correct operators up to now, meaning the implementation was actually missing (for a very narrow edge case, to be fair: partial replacement while also using duplicates - something that's uncommon generally, and not even possible with regular has_many). |
Just added very basic test examples for illustration of |
@georgeclaghorn Any thoughts on this fix? |
c528c54
to
ffface6
Compare
I'm curious if perhaps it makes sense to define & and - methods directly in ActiveRecord::Associations::HasManyThroughAssociation rather than introducing the intersection/difference methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation in this PR looks good to me, but can we change the test around a little? Thanks!
assert_equal association.send(:difference, | ||
[1, 1, 2, 3, 3], | ||
[1, 3, 3, 3, 4] | ||
), [1, 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer we don't use send
on an internal API. Is the above integration test not enough to prevent regression? If not, can we add another integration test? That way we are free to change the internal API later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review!
Yes, good point. I looked at the code more, and I think we're good with only the integration test.
I tossed the respective commit and pushed again.
There was a bug in the handling of duplicates when assigning (replacing) associated records, which made the result dependent on whether a given record was associated already before being assigned anew. E.g. post.people = [person, person] post.people.count # => 2 while post.people = [person] post.people = [person, person] post.people.count # => 1 This change adds a test to provoke the former incorrect behavior, and fixes it. Cause of the bug was the handling of record collections as sets, and using `-` (difference) and `&` (union) operations on them indiscriminately. This temporary conversion to sets would eliminate duplicates. The fix is to decorate record collections for these operations, and only for the `has_many :through` case. It is done by counting occurrences, and use the record together with the occurrence number as element, in order to make them work well in sets. Given a, b = *Person.all then the collection used for finding the difference or union of records would be internally changed from [a, b, a] to [[a, 1], [b, 1], [a, 2]] for these operations. So a first occurrence and a second occurrence would be distinguishable, which is all that is necessary for this task. Fixes rails#33942.
ffface6
to
f915758
Compare
@tenderlove I think the concerns are addressed, wdyt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Sorry for the delayed feedback.
@tenderlove thanks! |
@tenderlove Actually, I’m really stoked about suddenly becoming a Rails contributor 🎉😁 |
…any-through-33942 Fix handling of duplicates for `replace` on has_many-through
There is a bug in the handling of duplicates when assigning (replacing) associated records, which made the result dependent on whether a given record was associated already, before being assigned anew. E.g.
while
This change adds a test to provoke the former incorrect behavior, and fixes it.
Cause of the bug was the handling of record collections as sets, and using
-
(difference) and&
(intersection) operations on them indiscriminately. This temporary conversion to sets would eliminate duplicates.The fix calculates an occurrence distribution hash, with counts for each element. Based on these counts items are kept or removed in the difference and intersection operations.
Fixes #33942.