Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upReduce use of the pub-in-private hack #327
Conversation
cuviper
added some commits
Apr 28, 2017
This comment has been minimized.
This comment has been minimized.
Sorry if that makes this PR a little scatter-brained... the more I look through the standard library, the more I find that we can parallelize! |
nikomatsakis
reviewed
Apr 30, 2017
|
This looks great. I am mostly wondering whether |
| pub trait ParallelString { | ||
| private_decl!{} | ||
|
|
||
| pub trait ParallelString: Borrow<str> { |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Apr 30, 2017
Member
Hmm, so I wonder -- is Borrow a better choice than Deref? The overlap between those two traits is fairly extreme, iirc the major difference has to do with the constraints that the hash of the "borrow" value and the hash of the original have to be equal. But it's also possible (in theory, at least) for a single type T to support multiple Borrow<U> implementations (i.e., you may have some FooString that implements Borrow<str> but also Borrow<Vec<_>> and so forth), so using Borrow may add flexibility.
I wonder if anyone on @rust-lang/libs might have thoughts here -- the question is, if we want to implement parallel operations on anything that can "yield up" a str, which of the various traits ought we to use? @cuviper selected Borrow, but Deref seems potentially suitable; I think @cuviper rejected AsRef because of the ambiguity it can induce:
I also considered AsRef and AsMut, which are very similar to borrows, but I decided against it for the single fact that strings implement both AsRef and AsRef<[u8]>. This would make it ambiguous if ParallelString and ParallelSlice shared any method names, and to drive that point home I went ahead and added par_split and par_split_mut on slices.
That same point might be used as an argument in favor of Deref<Target = str>, no?
This comment has been minimized.
This comment has been minimized.
cuviper
Apr 30, 2017
Author
Member
The linked internals thread was more about calling pub-in-private a negative pattern, to the point that it might deserve a warning or error - especially for cases like this that couldn't just use pub(restricted).
I didn't consider Deref, but I see how that's an even tighter choice since there can be only one Target for a given type. I'm in favor.
| fn split(mut self) -> (Self, Option<Self>) { | ||
| let SplitProducer { slice, separator, tail } = self; | ||
|
|
||
| // Look forward for the separator, and failing that look backward. |
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Apr 30, 2017
Member
This is basically analogous to the split we do on strings right now? Seems very familiar. =)
This comment has been minimized.
This comment has been minimized.
cuviper
Apr 30, 2017
Author
Member
Yes indeed, literally copied and tweaked to fit the new use cases.
Which I agree is not ideal, but the code is just small enough that I fear an abstraction would weigh as much or more in boilerplate. But maybe the abstracted parts will still be simple enough compared to the tricky parts that could be shared. I'll try it.
This comment has been minimized.
This comment has been minimized.
|
When switching ParallelString to /// Parallel iterator over lines in a string
pub struct Lines<'ch>(&'ch str);
impl<'ch> ParallelIterator for Lines<'ch> {
type Item = &'ch str;
fn drive_unindexed<C>(self, consumer: C) -> C::Result
where C: UnindexedConsumer<Self::Item>
{
self.0
.par_split_terminator('\n')
.map(|line| if line.ends_with('\r') {
&line[..line.len() - 1]
} else {
line
})
.drive_unindexed(consumer)
}
}I get:
Why does And there's a similar error in |
This comment has been minimized.
This comment has been minimized.
|
They're happy if I use these type definitions instead: pub struct Lines<'ch>(SplitTerminator<'ch, char>);
pub struct SplitWhitespace<'ch>(Split<'ch, fn(char) -> bool>);i.e. creating the inner splits up front. But I worry there's a real gotcha lingering here... |
This comment has been minimized.
This comment has been minimized.
|
Is there something about |
This comment has been minimized.
This comment has been minimized.
|
If I try to give that updated type a impl<'ch> Lines<'ch> {
fn new(s: &str) -> Self {
Lines(s.par_split_terminator('\n'))
}
}
|
This comment has been minimized.
This comment has been minimized.
|
Backing up, one reason
|
This comment has been minimized.
This comment has been minimized.
|
(on abstracting
I started slightly down that path, but I'm really not feeling it. It's a mess of slightly different ways to find, rfind, split_at, index with different ranges, turn into split iterators, and probably more... |
This comment has been minimized.
This comment has been minimized.
I'll have to look. I'm not sure why |
This comment has been minimized.
This comment has been minimized.
With I also tried |
This comment has been minimized.
This comment has been minimized.
|
Hmm. I think because the methods use use std::ops::Deref;
trait Foo: Deref<Target = str> {
fn string(&self) -> &str { self }
}
impl<T: ?Sized + Deref<Target = str>> Foo for T {}
fn main() {
let x = "".string();
}
I think that And in the actual use case, we're getting tied to the lifetime of |
This comment has been minimized.
This comment has been minimized.
Oh, not for all |
This comment has been minimized.
This comment has been minimized.
|
I figured out a split abstraction that I don't hate. :) |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis How do you feel about making |
This comment has been minimized.
This comment has been minimized.
Well, there is a downside, in that users then cannot "layer" on top of the API. i.e., they may be able to call |
This comment has been minimized.
This comment has been minimized.
|
FWIW, that's also the case for |
This comment has been minimized.
This comment has been minimized.
|
@cuviper well i'm not opposed to doing it for now. |
This comment has been minimized.
This comment has been minimized.
|
OK, it's hidden now. |
This comment has been minimized.
This comment has been minimized.
|
OK, so, I was thinking about it, and I think I see why To that end, I considered an alternative design that feels "more right" to me -- adding a JFTR, partial diff for the alternate design: diff --git a/src/slice.rs b/src/slice.rs
index 471343d..4695b90 100644
--- a/src/slice.rs
+++ b/src/slice.rs
@@ -9,14 +9,14 @@ use std::borrow::{Borrow, BorrowMut};
use std::cmp;
/// Parallel extensions for slices.
-pub trait ParallelSlice<T: Sync>: Borrow<[T]> {
+pub trait ParallelSlice<T: Sync>: AsRef<[T]> {
/// Returns a parallel iterator over subslices separated by elements that
/// match the separator.
fn par_split<P>(&self, separator: P) -> Split<T, P>
where P: Fn(&T) -> bool + Sync
{
Split {
- slice: self.borrow(),
+ slice: self.as_ref(),
separator: separator,
}
}
@@ -26,7 +26,7 @@ pub trait ParallelSlice<T: Sync>: Borrow<[T]> {
fn par_windows(&self, window_size: usize) -> Windows<T> {
Windows {
window_size: window_size,
- slice: self.borrow(),
+ slice: self.as_ref(),
}
}
@@ -35,12 +35,12 @@ pub trait ParallelSlice<T: Sync>: Borrow<[T]> {
fn par_chunks(&self, chunk_size: usize) -> Chunks<T> {
Chunks {
chunk_size: chunk_size,
- slice: self.borrow(),
+ slice: self.as_ref(),
}
}
}
-impl<T: Sync, V: ?Sized + Borrow<[T]>> ParallelSlice<T> for V {}
+impl<T: Sync, V: ?Sized + AsRef<[T]>> ParallelSlice<T> for V {}
/// Parallel extensions for mutable slices.
diff --git a/src/str.rs b/src/str.rs
index 3c63541..29024f1 100644
--- a/src/str.rs
+++ b/src/str.rs
@@ -43,10 +43,12 @@ fn find_char_midpoint(chars: &str) -> usize {
/// Parallel extensions for strings.
-pub trait ParallelString: Borrow<str> {
+pub trait ParallelString {
+ fn as_str(&self) -> &str;
+
/// Returns a parallel iterator over the characters of a string.
fn par_chars(&self) -> Chars {
- Chars { chars: self.borrow() }
+ Chars { chars: self.as_str() }
}
/// Returns a parallel iterator over substrings separated by a
@@ -55,7 +57,7 @@ pub trait ParallelString: Borrow<str> {
/// Note: the `Pattern` trait is private, for use only by Rayon itself.
/// It is implemented for `char` and any `F: Fn(char) -> bool + Sync`.
fn par_split<P: Pattern>(&self, separator: P) -> Split<P> {
- Split::new(self.borrow(), separator)
+ Split::new(self.as_str(), separator)
}
/// Returns a parallel iterator over substrings terminated by a
@@ -66,7 +68,7 @@ pub trait ParallelString: Borrow<str> {
/// Note: the `Pattern` trait is private, for use only by Rayon itself.
/// It is implemented for `char` and any `F: Fn(char) -> bool + Sync`.
fn par_split_terminator<P: Pattern>(&self, terminator: P) -> SplitTerminator<P> {
- SplitTerminator::new(self.borrow(), terminator)
+ SplitTerminator::new(self.as_str(), terminator)
}
/// Returns a parallel iterator over the lines of a string, ending with an
@@ -74,7 +76,7 @@ pub trait ParallelString: Borrow<str> {
/// The final line ending is optional, and line endings are not included in
/// the output strings.
fn par_lines(&self) -> Lines {
- Lines(self.borrow())
+ Lines(self.as_str())
}
/// Returns a parallel iterator over the sub-slices of a string that are
@@ -83,11 +85,21 @@ pub trait ParallelString: Borrow<str> {
/// As with `str::split_whitespace`, 'whitespace' is defined according to
/// the terms of the Unicode Derived Core Property `White_Space`.
fn par_split_whitespace(&self) -> SplitWhitespace {
- SplitWhitespace(self.borrow())
+ SplitWhitespace(self.as_str())
+ }
+}
+
+impl ParallelString for str {
+ fn as_str(&self) -> &str {
+ self
}
}
-impl<T: ?Sized + Borrow<str>> ParallelString for T {}
+impl<T: ?Sized + ::std::ops::Deref<Target=str>> ParallelString for T {
+ fn as_str(&self) -> &str {
+ self
+ }
+}
// ///////////////////////////////////////////////////////////////////////// |
This comment has been minimized.
This comment has been minimized.
|
So I'm a bit torn:
An alternative might be to just add (undefaulted) |
This comment has been minimized.
This comment has been minimized.
Just checked. Basically yes. We were relying on method call's auto-deref instead of manually implementing for String and Vec though. That said, I sort of like the idea of factoring things slightly differently, kind of a fourth proposal:
The idea is that you can make things usable by |
This comment has been minimized.
This comment has been minimized.
pub trait ParallelString {
fn as_str(&self) -> &str;
...
}
impl ParallelString for str {
fn as_str(&self) -> &str {
self
}
}I like the simplicity of this a lot, and I think I'd do the same for slices -- maybe just bikeshedded to
It is where we started, but I think that's a good thing. I'd even leave these implemented just for |
cuviper
force-pushed the
cuviper:less-private
branch
from
5936d62
to
4e51aa6
May 6, 2017
This comment has been minimized.
This comment has been minimized.
|
OK, pushed. We can do it with "stable" |
This comment has been minimized.
This comment has been minimized.
|
@cuviper |
This comment has been minimized.
This comment has been minimized.
So the only caveat, actually: I was wondering what we would do if we wanted to extend this trait with new utilities. I guess that as long as they have defaults, it's not really a breaking change (modulo potential method conflicts). (Same, presumably, as something like |
This comment has been minimized.
This comment has been minimized.
Right, having the required It's a breaking change only in the sense that it's a breaking change to add a public method anywhere. As long as we're adding to the public namespace, especially in a prelude trait, there's a chance we could collide. The only way I see around that is if we flipped these to inherent methods on a newtype of our own, so you'd write something more like (obligatory "API design is hard") |
This comment has been minimized.
This comment has been minimized.
Huh, interesting thought. I'm not too worried about adding methods, but I sort of like the "par mode" style, where you can use the API names you were accustomed to. But I'm not sure about the names. I'm sort of tempted by just I'm envisioning something like pub struct ParallelSlice<T, U>
where T: Deref<Target=[U]>
{
slice: T
}
impl ParallelSlice<T>
where T: Deref<Target=[U]>
{
fn new(self: T) -> Self { ... }
pub fn split(...) { ... }
pub fn split_mut(...)
where T: DerefMut<Target=U>
{ ... }
}If we went this way, though, I have to wonder if we could do it more universally. i.e., instead of If this were Rust project, I think I'd say "this needs an RFC"... |
This comment has been minimized.
This comment has been minimized.
No reason we can't have RFCs...and maybe we should. |
This comment has been minimized.
This comment has been minimized.
|
I think probably we should land this PR, which is mostly a refactoring, and then consider this |
This comment has been minimized.
This comment has been minimized.
|
I tried out But yeah, I'm happy to see this PR land if you're ready. :) |
cuviper commentedApr 29, 2017
A recent thread on internals got me thinking whether we could avoid the
private_impl!{}hack after all. With a few extra constraints, this does work!With
ParallelString: Borrow<str>, we can provide default implementations of all its methods, and should be free to add new methods the same way. Then a blanketimplonly has to meet the constraints, so there's also less repetition defining these methods.Similarly
ParallelSlice<T: Sync>: Borrow<[T]>works for slices, and a newParallelSliceMut<T: Send>: BorrowMut<[T]>for mutable slices. The latter is also added to the prelude.I also considered
AsRefandAsMut, which are very similar to borrows, but I decided against it for the single fact that strings implement bothAsRef<str>andAsRef<[u8]>. This would make it ambiguous ifParallelStringandParallelSliceshared any method names, and to drive that point home I went ahead and addedpar_splitandpar_split_muton slices.The only remaining
private_impl!{}is inrayon::str::Pattern. I don't think we can use a similar trick on that one, but the more I think about it, the more I think we should just hidePatternaltogether as pub-in-private itself. Its API is not great, and not something we really want folks to call directly. All users really need to know are which types implement it, and we can document that on the public splitter functions that use it.