Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`repeat` method for (byte) slices? #48784

Open
alexreg opened this issue Mar 6, 2018 · 14 comments

Comments

Projects
None yet
9 participants
@alexreg
Copy link
Contributor

commented Mar 6, 2018

There exists a str::repeat method, but no equivalent for (byte) slices. This would seem like a useful feature. Should we consider adding it to the stdlib?

@Centril

This comment has been minimized.

Copy link
Member

commented Mar 7, 2018

Possibly generalized (or not, but let's entertain the notion...)

pub trait Repeat {
    type Repeated;
    fn repeat(&self, times: usize) -> Self::Repeated;
}
@sfackler

This comment has been minimized.

Copy link
Member

commented Mar 7, 2018

Seems reasonable to add this to [T] where T: Clone.

@alexreg

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2018

@Centril Heh, I should expect something abstract from you! Could definitely do that, and implement it even for things like iterators, though it's a bit more work...

@sfackler Yep, that certainly wouldn't be too liberal

@scottmcm

This comment has been minimized.

Copy link
Member

commented Mar 7, 2018

Doesn't the general one exist already? iter::repeat(my_into_iter).take(n).flatten().collect()

That said, #48657 could easily be moved to [T] where T: Copy, and that way the str/String version would just wrap the slice/Vec version. That seems like a more interesting option to me, since it's taking advantage of the "slice-ness", rather than just being a wrapper around iterator methods.

@alexreg

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2018

@scottmcm What's the efficiency of that one like? I bet it's nothing close to memcpy, which could effectively be done by specialising on Repeat for [T] where T: Copy.

@scottmcm

This comment has been minimized.

Copy link
Member

commented Mar 7, 2018

@alexreg That's what I meant by my second paragraph. slice::repeat<T:Copy>(x: &[T], n: usize) -> Vec<T> sounds great, but I don't see a need for it for iterators or T: Clone.

@alexreg

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2018

@scottmcm Yeah, that's certainly a fair idea. Do you know if iter::repeat(my_into_iter).take(n).flatten().collect() is as efficient as pre-allocating a Vec of the requisite size and repeatedly calling extend on it? If so, I agree with you fully. Otherwise, there may be a need for a Repeat trait.

@sfackler

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

We can implement it for T: Clone and specialize to #48657's implementation for T: Clone. None of that needs to affect the method's public API though.

@alexreg

This comment has been minimized.

Copy link
Contributor Author

commented Mar 8, 2018

@sfackler Yep, that sounds like a good solution to be honest. Even if it's no more efficient that the iterator approach for T: Clone, it would at least provide a consistent interface!

@scottmcm

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

@alexreg Unfortunately I know for sure that literally .flatten().collect() is unlikely to be great (see #48544 for a discussion of why, though that PR wouldn't help this case). But I would expect that using #48597 .collect_into(Vec::with_capacity(my_slice.len() * n)) instead would let the iterator method do just as well as something more manual.

@alexreg

This comment has been minimized.

Copy link
Contributor Author

commented Mar 8, 2018

@scottmcm Fair enough, though I think in that case why might as well provide that as a default fn on the Repeat trait, and allow for specialisations when T: Copy (and potentially other cases too).

bors added a commit that referenced this issue Apr 23, 2018

bors added a commit that referenced this issue Apr 24, 2018

@bors bors closed this in #48999 Apr 24, 2018

@dtolnay dtolnay reopened this Apr 24, 2018

@kennytm

This comment has been minimized.

Copy link
Member

commented Oct 27, 2018

@rfcbot poll @rust-lang/libs I'd like to stabilize [T]::repeat.

The signature in #48999 is

fn repeat(&self, n: usize) -> Vec<T> where T: Copy;

and the implementation uses on copy_nonoverlapping which relies on the property of Copy. There's suggestion to generalize the bound to T: Clone, however I don't think this is a stabilization blocker:

  1. T: Copy is what makes slice::repeat special. If T: Clone + !Copy, the performance is probably just similar to iter::repeat(_).take(_).flatten().collect() and can be used instead.

  2. Since Copy is a subtrait of Clone, we can upgrade the bound to T: Clone without breaking backward compatibility.

OTOH it is really simple to extend the bound to T: Clone today as we could use specialization internally (as explained in #48784 (comment)):

// note: private trait 
trait RepeatSlice: Sized {
    fn repeat_slice(slice: &[Self], n: usize) -> Vec<Self>;
}

impl<T: Clone> RepeatSlice for T {
    default fn repeat_slice(slice: &[Self], n: usize) -> Vec<Self> {
        let mut res = Vec::with_capacity(n * slice.len());
        for _ in 0..n {
            res.extend_from_slice(slice);
        }
        res
    }
}

impl<T: Copy> RepeatSlice for T {
    fn repeat_slice(slice: &[Self], n: usize) -> Vec<Self> {
        // the original copy_nonoverlapping implementation
    }
}

impl<T> [T] {
    pub fn repeat(&self, n: usize) -> Vec<T> 
    where 
        T: Clone,
    {
        T::repeat_slice(self, n)
    }
}
@rfcbot

This comment has been minimized.

Copy link

commented Oct 27, 2018

Team member @kennytm has asked teams: T-libs, for consensus on:

I'd like to stabilize [T]::repeat.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Oct 28, 2018

From the implementation:

        // If `n` is larger than zero, it can be split as
        // `n = 2^expn + rem (2^expn > rem, expn >= 0, rem >= 0)`.
        // `2^expn` is the number represented by the leftmost '1' bit of `n`,
        // and `rem` is the remaining part of `n`.
        // `2^expn` repetition is done by doubling `buf` `expn`-times.

I guess that it’s this way because fewer calls to copy_nonoverlapping with larger slices are assumed to be faster. This may be the case up to "medium" sizes compared to e.g. copying a single byte at a time. However at some point, going back "far away" in memory might hurt cache locality.

Do we have evidence one way or another to justify such complex code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.