Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SI-6998] Implement variants of 'par' method that allow user to set number of threads or thread pool in-line #5212

Closed
wants to merge 1 commit into from

Conversation

erikerlandson
Copy link
Contributor

No description provided.

@scala-jenkins scala-jenkins added this to the 2.12.0-RC1 milestone Jun 3, 2016
@soc
Copy link
Member

soc commented Jun 3, 2016

I'd prefer (if we can't manage to overload par) parWith to parUsing, as "using" seems to be used in resource handling contexts, and might therefore be confusing.

@erikerlandson
Copy link
Contributor Author

parWith is fine by me (I even had a previous draft using 'With'). I tried overloading par directly, but it caused a subtle problem where:

collection.par(5).rest.of.expression

Got interpreted by the compiler's operator precedence as:

(collection.par)(5).rest...

And so it failed to compile. However, I assume the par overloading would work for a task-support object.

@erikerlandson erikerlandson changed the title [SI-9800] Implement variants of 'par' method that allow user to set number of threads or thread pool in-line [SI-6998] Implement variants of 'par' method that allow user to set number of threads or thread pool in-line Jun 4, 2016
* @param pool the number of threads to use for supporting parallelism
* @return a parallel implementation of this collection using `n` threads
*/
final def parWith(n: Int): ParRepr = parWith(new ForkJoinPool(n))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this overload?
I can live with overloaded parWith for ForkJoinPool and TaskSupport as the types are quite specific, but parWith(Int) is not self-explanatory.
As you already provide parWith(ForkJoinPool) I think we don't really need parWith(Int), right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I could live without it, although I like the Int overloading because it allows me to simply say "just give me n-fold parallelism", and not have to muck about with all the ForkJoinPool or ForkJoinTaskSupport.

It may be that the primary use case is experimenting with throughput as a function of number of threads, but it's a thing I do with some frequency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I admit an advantage is without the Int overloading we could probably just overload par

@soc
Copy link
Member

soc commented Jun 4, 2016

Regarding the general approach, the last time I reviewed this code my main concern was that setTaskSupport mutated the underlying field of the collection instance instead of returning a new instance with changed task support. As your changes also leverage setTaskSupport I fear that they suffer from the same issue.

I think it would be better to return a new instance instead of mutating internal fields and deprecate setTaskSupport for good.

@erikerlandson
Copy link
Contributor Author

I also agree with the mutable setTaskSupport issue. The new overloadings do effectively return a new copy, since the mutation is internal to the method, however if I remove any reference to setTaskSupport that makes it one step easier to deprecate

@erikerlandson
Copy link
Contributor Author

I looked into the current roles played by setTaskSupport and tasksupport_= and resultWithTaskSupport

My current view on fixing the design as it stands is that there are broadly 3 related but somewhat independent goals:

  1. Provide an immutable style interface for specifying task support (I believe the current PR does this)
  2. Deprecate the public-facing use of setTaskSupport and tasksupport_=
  3. Excise the internal use of setTaskSupport and tasksupport_=

setTaskSupport and resultWithTaskSupport are used pretty widely in the scala collection code, and presumably now out in the broader community as well. If the goal is to ultimately get rid of these mutable methods (and their associated var members) altogether, I propose something like:

  1. adopt the overloadings of par to provide people with a desirable interface to start using
  2. start the public-facing deprecation process for setTaskSupport and tasksupport_= (and also the corresponding methods on Combiner?)
  3. either afterward, or in parallel, excise their use in the internal scala collections code.

Assuming this approach, it might be worth an umbrella ticket, with jiras for 2 and 3. (3) itself might take the form of multiple stages.

@erikerlandson
Copy link
Contributor Author

Hmm. since _tasksupport is private in ParIterableLike, and we have Repr in a recursive type signature, this might be less unwieldy than I thought. could possibly be easy as:

def withTaskSupport(taskSupport: TaskSupport): Repr = {
  // construct copy of Repr, set task support ?
}

@erikerlandson
Copy link
Contributor Author

erikerlandson commented Jun 5, 2016

Here is a question that will help clarify the problem: Is it a design contract that a combiner is always used to construct a new ParSeq? Specifically, if I do something like this pseudocode:

val cb = this.newCombiner
this.foreach { x => cb += x }
cb.result()

Will the result always adhere to the correct sub-class of collection, with any sub-class specific details?
Is the above guaranteed to be the same as this.par ?

@erikerlandson
Copy link
Contributor Author

@soc, am I barking up the right tree with these questions, or just off in the weeds?

@soc
Copy link
Member

soc commented Jun 15, 2016

@erikerlandson Sorry for the delay, things are pretty busy here. I think many people are traveling to/from ScalaDays this week. Maybe @som-snytt can have a look at your combiner question?

@soc
Copy link
Member

soc commented Jun 15, 2016

I just had a look at par and taskSupport. I think your code still has issues here, as par can return the instance itself if it is already some Par..., and the usage of setTaskSupport will mutate the existing instance.

This is the issue:

val arr = Array(1,2,3).par
val task = arr.tasksupport
val newArr = arr.par(yourTaskSupport)
arr.tasksupport != task

@erikerlandson
Copy link
Contributor Author

@soc, yes that was a concern for me as well. It is pretty easy to return new collections using the idiom

val cb = this.newCombiner
this.foreach { x => cb += x }
cb.result()

I think that works fine with any immutable collections, but currently .par is also supporting some specializations for mutable collections, where I get the impression that just changing to use the above will break the expected behavior in the mutable cases.

Respecting both cases may require a lot of local overrides for .par, which I was hoping to avoid, because (a) it is ugly (b) it touches a lot more files, and (c) it may very well require 3rd-party collections to change their own code.

I'm not sure how to unwind that knot. If the convention was changed so that the combiner logic above was the universal behavior, it would be nice but that is a higher-impact strategy, at least in the mutable container cases

@adriaanm adriaanm modified the milestone: 2.12.0-RC1 Jul 18, 2016
@adriaanm adriaanm modified the milestone: 2.12.1 Jul 26, 2016
@adriaanm adriaanm modified the milestones: 2.13.0-M1, 2.12.1 Oct 28, 2016
@adriaanm
Copy link
Contributor

I regret that we didn't get this into 2.12, but we can no longer change the API here. With 2.13 being the library release (2.12 was a compiler focussed cycle), I'm moving this PR there. We'll create a branch for 2.13.x dev in the next few weeks.

@SethTisue
Copy link
Member

SethTisue commented Dec 19, 2016

The 2.13.x branch now exists, so this could be retargeted.

@SethTisue
Copy link
Member

this code has been relocated to https://github.com/scala/scala-parallel-collections ; this could be taken up there

@SethTisue SethTisue closed this Feb 10, 2017
@som-snytt
Copy link
Contributor

I'm sorry this didn't reach the point of par1, par2, .... par22.

I only just realized that when you turn it all the way up to eleven, and then double it, you get 22. OMG I have to step outside for some air.

@SethTisue
Copy link
Member

🤘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants