Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ndarray-parallel a direct optional part of ndarray? #551

Closed
bluss opened this issue Nov 20, 2018 · 9 comments
Closed

Make ndarray-parallel a direct optional part of ndarray? #551

bluss opened this issue Nov 20, 2018 · 9 comments

Comments

@bluss
Copy link
Member

bluss commented Nov 20, 2018

Rayon is now 1.0 and the plumbing parts we use to implement parallel iterators are as stable as the rest of the crate. See that written in this PR: rayon-rs/rayon#469

Rayon plumbing docs: https://docs.rs/rayon/*/rayon/iter/plumbing/index.html

This makes it viable to make rayon a direct dependency of ndarray, presumably an optional one, and it lets us simplify its usage.

@LukeMathWalker
Copy link
Member

It would be great to have all methods that could benefit from parallelization to use it out of the box with rayon as a direct dependency.
But given that you mention that it might be better to put it behind a feature flag (thus retaining a separate par_ version for all those methods I assume) I wanted to ask: what are the drawbacks? Are there significant reasons that might lead someone not to use ndarray because of a non-optional dependency on rayon?
Genuine curiosity :)

@bluss
Copy link
Member Author

bluss commented Nov 24, 2018

Sure, I wouldn't use it if I couldn't deselect features I wasn't using.

clean build of ndarray: 22 sec
clean build of ndarray + rayon: 41 sec

on a hot slow laptop

@bluss
Copy link
Member Author

bluss commented Nov 24, 2018

If you're thinking of default parallelization of a method like Array::map, is that it? That's a separate question from having rayon optional, I think. But with parallelization in the default map we need to use thread safety bounds, and that impacts the generality. (That's Send and Sync).

@LukeMathWalker
Copy link
Member

LukeMathWalker commented Nov 24, 2018

Array::map and iterators, as well as all the functions and methods using those under the covers (all axes-related methods computing reductions, for example).
But yes, introducing Send and Sync bounds everywhere is damaging.

@bluss
Copy link
Member Author

bluss commented Nov 24, 2018

Once upon a time in nightly you could specialize on presence or absence of Send/Sync and parallelize transparently if allowed. Not even sure that's a good idea, but it's pretty cool. At the moment we don't know when/if specialization will arrive though.

@LukeMathWalker
Copy link
Member

We could approach the problem from a different perspective: map and related functionalities (mapv, *_inplace, etc.) could be moved out of the default struct implementation and be provided in a trait (Mappable?); we could the provide another trait (ParMappable?) behind a feature gate that uses exactly the same names for methods but requires Send+Sync to use rayon and make them parallel by default.

If the rayon feature gate is off, then Mappable is in the prelude, otherwise we put ParMappable in the prelude. In this way a user can still use both types of methods if necessary by explictly importing the Mappable trait with the rayon feature flag on.
What do you think?

@bluss
Copy link
Member Author

bluss commented Nov 25, 2018

It doesn't match the "feature flags are purely additive" rule which we see that we need as soon as there is more than one crate that uses ndarray in the currently compiling project. What happens is that one crate can pull in ParMappable without the other crate being ready for it.

This is a rule for cargo feature flags in general.

@bluss
Copy link
Member Author

bluss commented Nov 25, 2018

Let's work on making the parallel methods visible. I think that in general we don't want this to happen automatically, the users will know which operations to parallelize. That said there could certainly be ways we can make it easier to opt in to this -- even a newtype, a wrapper type?

ndarray-parallel is also missing some important operations, most notably .map()! It only has in place mappings. With the code in the main crate and more visible, this can get some attention.

@bluss
Copy link
Member Author

bluss commented Nov 25, 2018

Hrm, the LinalgScalar trait doesn't have the foresight of requiring Send + Sync, but it does require 'static which allows ad hoc type dispatch. So we can parallelize linalg methods for specific scalar types (like primitive numeric types) transparently.

@bluss bluss closed this as completed in #563 Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants