Skip to content

Conversation

@jdantonio
Copy link
Member

Do Not Merge

Ongoing development of a parallel implementation of Ruby Enumerable module.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.08%) to 90.6% when pulling 9a3fbd1 on SebastianEdwards:parallel into 30c2c47 on ruby-concurrency:master.

@SebastianEdwards
Copy link
Contributor

I know this is fairly low priority compared to other issues, but does anyone have any feedback on the API implemented here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: Why are we declaring these methods as protected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are delegate accessor methods inherited from SimpleDelegator but have no usefulness in the public API of Parallel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are they protected instead of private ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid question. They probably should be private; even subclasses shouldn't care about that particular implementation detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, making those methods private breaks SimpleDelegate as it hides them from Delegator which SimpleDelegate inherits from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SebastianEdwards Thank you very much for answering it. I'm super excited with this pull request.

How about the protected declaration at line 58, could we change it to private?
private methods in ruby can still be called by the subclasses, so I think all non-public methods should be private - unless we have a good reason to make them protected.

@jdantonio
Copy link
Member Author

Please don't take this personally, but I really dislike the name run_in_threads. It describes implementation details that should be hidden (and could potentially change in future versions of Ruby or this gem). Your original spike used parallel for this method call and I thought it was quite appropriate. Can we please change the naming back to parallel?

NOTE: I just realized that those are the protected methods. So it doesn't bother me nearly as much. At first I thought those were public methods. Sorry!

@jdantonio
Copy link
Member Author

As we discussed on the original PR, I'm still not sold on the Delegator implementation. It's all about the Principle of Least Astonishment. If I call a method called parallel I expect the method to run in parallel. I appreciate that without delegation we may run into situations where the user tries to call a method in parallel and, instead, gets a NoMethodError exception. But that's what's actually happening.

For me it comes down to this: Which implementation is least astonishing to the user? In one case, I call a method through the Parallel object and the method runs, but not in parallel. In the other case I call the method through the Parallel object and I get a NoMethodError exception indicating that there is no parallel implementation.

I'd like to hear the thoughts of others on the team.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SebastianEdwards I don't understand the use of the run_in_threads_return_parallel and run_in_threads_return_original methods. The collect method, for example, should return a new array. If I understand this implementation correctly, won't a call to Parallel.collect return another parallel rather than an array?

As I think about this a little more I think I see where you are going. Rather than return an array we return an array-like object (because of the Delegator). The returned object acts just like an array, but all following operations on the returned array will be in parallel. Is this correct?

If my understanding is correct then this is definitely an ingenious implementation. I can definitely see the value, but I can also see this leading to some confusion. Least astonishment again. Now we're talking about explicit versus implicit behavior.

If a parallelized collect method returns a delegator then we get implicit parallelization of subsequent operations:

``ruby
[1, 2, 3].parallel.collect{|x| x * 2}.collect{|y| y * 3}


In this case *both* collect calls are parallel operations. The second call is implicitly parallel. The intent of the code isn't explicit, it must be inferred that the first parallel begets a second parallel call.

If, instead, we return a regular array from the first `collect` call it looks like this:

``ruby
[1, 2, 3].parallel.collect{|x| x * 2}.parallel.collect{|y| y * 3}

This example takes more typing but the parallelization of the second call is explicit. There is no ambiguity as to the intent of the code.

So, again, I'm a little on the fence. Personally, I prefer extra typing and explicit code. So my inclination would be to the second example. But I also see the value in the first example. I'm curious to hear what others on the team think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you @jdantonio. I'm a big fan of explicit behavior so I think the second example would be a better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea was to have a fluent interface whereby one could parallize an enumerable and it would run all subsequent calls in parallel (where possible). The user could then get back the delegate at any time by calling serial on the parallel.

The advantage, as I saw it, isn't the saving of typing out the parallel method call. Rather, it gives an interesting story when it comes to integrating with existing code. For example: a Repository-like object could, rather than return a simple array of records, call parallel on that same array before handing it off. The result is that with the one declarative statement, large quantities of work across the application will be parallized (even across third-party code). This approach only works if Parallel quacks exactly like an Enumerable of course.

I completely agree this is perhaps several pegs higher on the astonishment scale than a explicit utility-like interface but I think it gains some fairly hefty advantages for the trade-off.

@jdantonio
Copy link
Member Author

I definitely appreciate where you are trying to go with this as a fluid interface. However, I'm not sure that a low-level library like this is a good fit. When talking about concurrency side effects such as "will be parallelized (even across third-party code)" cause me concern. The implications of concurrent/parallel code are, I believe, a little too severe to indiscriminately pass such an object off to third-party code that we have no knowledge of.

I think it's also important to note that adding to an API is always easier than removing from it. We incur much less risk by starting slowly and conservatively then expanding later. We should also consider the possibility that some of the parallel methods may warrant method-specific optimizations (such as short-circuiting predicate methods like any?).

For now I feel our user base will be best served by starting with a more explicit implementation, one that contains no surprises. We can evolve this toward a more fluent implementation over time. I'd like to start with this (and see where how it evolves):

  • Create a Concurrent::Parallel class with a constructor that takes an Enumerable as the first parameter and an options hash as the second
    • The Enumerable is the object over which the parallel operation will be performed
    • The options hash will, at a minimum, support the executor common option (to explicitly provide a thread pool)
  • Provide an optional require statement that monkey-patches Array and Hash with a parallel proxy method
  • The Proxy class does not extend SimpleDelegator or implement method_missing (yet)
    • We can change this in the future without implementing breaking changes
  • We don't implement a serial method (yet) since "serial" is implied by the absence of parallel
  • Every Enumerable, Array, and Hash method is explicitly implemented
    • Each method will be optimized as appropriate when opportunities exist
    • We'll use emergent design and "extract method" to DRY-up the Parallel class and tease out redundant behavior

Eventually this may evolve into the suggested run_in_threads/run_in_threads_return_original/run_in_threads_return_parallel implementation, but we'll take our time.

@SebastianEdwards
Copy link
Contributor

Happy to defer to wisdom. Your approach is definitely more congruent with the rest of the library. I'd like to continue to work on the PR to implement the API as you've defined.

Also, since my current code was driven by a personal use-case I'll extract the WIP into a separate gem for the time being.

@jdantonio
Copy link
Member Author

@SebastianEdwards We'd love to have your help implementing this API, and I greatly appreciate your willingness to be flexible. Collaboration and community are what make open source great. I'm also excited to see how your separate gem evolves. One of our goals with this gem is to be a toolkit that other gem authors can use. That's why we expose our thread pools, synchronization, objects, etc. in loosely coupled abstractions. We love seeing people use them within their projects!

@pitr-ch pitr-ch added this to the Post 1.0 milestone Apr 23, 2015
@jdantonio
Copy link
Member Author

Closing the PR. It was a spike of one possible implementation. New development on this feature will begin after the 1.0 release.

@jdantonio jdantonio closed this Apr 24, 2015
@pitr-ch pitr-ch modified the milestones: 0.9.0 Release, Post 1.0 Jun 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants