-
Notifications
You must be signed in to change notification settings - Fork 415
Parallel Enumerable #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel Enumerable #231
Conversation
|
I know this is fairly low priority compared to other issues, but does anyone have any feedback on the API implemented here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question: Why are we declaring these methods as protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are delegate accessor methods inherited from SimpleDelegator but have no usefulness in the public API of Parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are they protected instead of private ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid question. They probably should be private; even subclasses shouldn't care about that particular implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, making those methods private breaks SimpleDelegate as it hides them from Delegator which SimpleDelegate inherits from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SebastianEdwards Thank you very much for answering it. I'm super excited with this pull request.
How about the protected declaration at line 58, could we change it to private?
private methods in ruby can still be called by the subclasses, so I think all non-public methods should be private - unless we have a good reason to make them protected.
|
Please don't take this personally, but I really dislike the name NOTE: I just realized that those are the protected methods. So it doesn't bother me nearly as much. At first I thought those were public methods. Sorry! |
|
As we discussed on the original PR, I'm still not sold on the For me it comes down to this: Which implementation is least astonishing to the user? In one case, I call a method through the I'd like to hear the thoughts of others on the team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SebastianEdwards I don't understand the use of the run_in_threads_return_parallel and run_in_threads_return_original methods. The collect method, for example, should return a new array. If I understand this implementation correctly, won't a call to Parallel.collect return another parallel rather than an array?
As I think about this a little more I think I see where you are going. Rather than return an array we return an array-like object (because of the Delegator). The returned object acts just like an array, but all following operations on the returned array will be in parallel. Is this correct?
If my understanding is correct then this is definitely an ingenious implementation. I can definitely see the value, but I can also see this leading to some confusion. Least astonishment again. Now we're talking about explicit versus implicit behavior.
If a parallelized collect method returns a delegator then we get implicit parallelization of subsequent operations:
``ruby
[1, 2, 3].parallel.collect{|x| x * 2}.collect{|y| y * 3}
In this case *both* collect calls are parallel operations. The second call is implicitly parallel. The intent of the code isn't explicit, it must be inferred that the first parallel begets a second parallel call.
If, instead, we return a regular array from the first `collect` call it looks like this:
``ruby
[1, 2, 3].parallel.collect{|x| x * 2}.parallel.collect{|y| y * 3}
This example takes more typing but the parallelization of the second call is explicit. There is no ambiguity as to the intent of the code.
So, again, I'm a little on the fence. Personally, I prefer extra typing and explicit code. So my inclination would be to the second example. But I also see the value in the first example. I'm curious to hear what others on the team think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you @jdantonio. I'm a big fan of explicit behavior so I think the second example would be a better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea was to have a fluent interface whereby one could parallize an enumerable and it would run all subsequent calls in parallel (where possible). The user could then get back the delegate at any time by calling serial on the parallel.
The advantage, as I saw it, isn't the saving of typing out the parallel method call. Rather, it gives an interesting story when it comes to integrating with existing code. For example: a Repository-like object could, rather than return a simple array of records, call parallel on that same array before handing it off. The result is that with the one declarative statement, large quantities of work across the application will be parallized (even across third-party code). This approach only works if Parallel quacks exactly like an Enumerable of course.
I completely agree this is perhaps several pegs higher on the astonishment scale than a explicit utility-like interface but I think it gains some fairly hefty advantages for the trade-off.
|
I definitely appreciate where you are trying to go with this as a fluid interface. However, I'm not sure that a low-level library like this is a good fit. When talking about concurrency side effects such as "will be parallelized (even across third-party code)" cause me concern. The implications of concurrent/parallel code are, I believe, a little too severe to indiscriminately pass such an object off to third-party code that we have no knowledge of. I think it's also important to note that adding to an API is always easier than removing from it. We incur much less risk by starting slowly and conservatively then expanding later. We should also consider the possibility that some of the parallel methods may warrant method-specific optimizations (such as short-circuiting predicate methods like For now I feel our user base will be best served by starting with a more explicit implementation, one that contains no surprises. We can evolve this toward a more fluent implementation over time. I'd like to start with this (and see where how it evolves):
Eventually this may evolve into the suggested |
|
Happy to defer to wisdom. Your approach is definitely more congruent with the rest of the library. I'd like to continue to work on the PR to implement the API as you've defined. Also, since my current code was driven by a personal use-case I'll extract the WIP into a separate gem for the time being. |
|
@SebastianEdwards We'd love to have your help implementing this API, and I greatly appreciate your willingness to be flexible. Collaboration and community are what make open source great. I'm also excited to see how your separate gem evolves. One of our goals with this gem is to be a toolkit that other gem authors can use. That's why we expose our thread pools, synchronization, objects, etc. in loosely coupled abstractions. We love seeing people use them within their projects! |
|
Closing the PR. It was a spike of one possible implementation. New development on this feature will begin after the 1.0 release. |
Do Not Merge
Ongoing development of a parallel implementation of Ruby
Enumerablemodule.