Skip to content

Conversation

@alexchandel
Copy link

This moves the String methods push_char and pop_char under a MutableSeq<char> implementation and renames them to push and pop respectively. This is appropriate given String's UTF-8 guarantee, and the existing parallels between String and Vec.

@alexchandel
Copy link
Author

Unsure why the Travis build stalled.

@alexcrichton
Copy link
Member

If accepted this needs to retain the push_char and pop_char methods with a #[deprecated] tag pointing at the new methods. Additionally the commit message needs to be reworded in accordance with our breaking changes policy

@alexchandel
Copy link
Author

Ah my bad, [breaking-change] added.

@alexcrichton
Copy link
Member

Note that the policy is not just adding "breaking-change" to the commit message, in particular:

The template which breaking changes will be required to look like is:

First, a brief one-line summary of the change

Second, take as long as is necessary to explain exactly what the change is,
why it's being changed, what it can be replaced with (if applicable) and
general guidelines about dealing with the change.

In addition to a few paragraphs about the change itself, the literal string
"[breaking-change]" must appear at the end of the commit message in order
to indicate that it is a commit that has a breaking change. This will allow
filtering commits on this string to only take a look at breaking changes.

[breaking-change]

as well as

In addition to a stricter policy around commit messages, we're going to start
encouraging more aggressive use of the #[deprecated] attribute to help
transitioning code.

In short, this change needs to preserve the two functions with a #[deprecated] tag and the commit message needs an explanation as to why this change was made, why it was worth it to break, and how to migrate all current users from one function to another.

@lilyball
Copy link
Contributor

I'm not convinced it makes sense for String to implement MutableSeq<char>. This is treating String as a "collection of characters", but it's actually specifically a collection of bytes that comprise a valid UTF-8 encoding of some sequence of characters.

I guess this isn't really a strong objection, and it's primarily motivated by the fact that I don't like using .push() and .pop() on String, I much prefer .push_char() and .pop_char() to make it clear what's actually going on.

Is there any actual use-case for having String implement MutableSeq or is this being done just because it can?

@alexchandel
Copy link
Author

Sorry for the delay, vim lost my commit message the first time. The primary rationale is that because String enforces that its contents are valid UTF-8, String behaves as a collection of valid Unicode code points. So a MutableSeq<char> implementation makes sense.

@alexchandel
Copy link
Author

Added #[deprecated] tag to push_char and pop_char.

@lilyball
Copy link
Contributor

Nitpick: it behaves as a collection of Unicode scalar values.

Also, String does not actually behave the way a normal collection of scalar values would. Trivial example:

let mut s = String::new();
assert_eq!(s.len(), 0);
s.push_char('\u2022');
s.push_char('\U0001F419');
assert_eq!(s.len(), 2); // FAILS

One would normally expect a collection's len() to change in lockstep with the number of pushes/pops (assuming no other mutating operations). This is not at all true for String.

I think this is actually a big enough issue that String should definitely not implement MutableSeq.

@alexchandel
Copy link
Author

@kballard Good point. Maybe len() method of str and String should return the number of scalar values? String already has complementary char and byte methods for a few things, like push_char()/push_byte(), pop_char()/pop_byte(), shift_char()/shift_byte(). We would add a len_bytes() method to return the number of bytes, and have len() return the number of Unicode scalar values, just as push() and pop() would add and remove Unicode scalar values.

It's worth noting that the two other collection traits of String, FromIterator<T> and Extendable<T>, are both implemented as FromIterator<char> and Extendable<char>.

@lilyball
Copy link
Contributor

That ship sailed a long time ago. We can't make len() return scalar values without changing nearly the entire API, and breaking all the current performance guarantees.

Performance is the biggest reason why our String API is primarily byte-focused, and that's important to a lot of people.

If you have a real need for a MutableSeq String you could experiment with creating a newtype wrapper that provides this.

@alexchandel
Copy link
Author

I figured as much. The other alternative would be to have String implement MutableSeq<u8>, but since push_byte() and pop_byte() unsafe, this would require some hypothetical "unsafe" implementation.

You mean with #[Deriving]?

@lilyball
Copy link
Contributor

It doesn't make sense for String to implement MutableSeq<u8>, because it very deliberately does not support pushing/popping/inserting/removing bytes.

You mean with #[Deriving]?

No, you can't derive MutableSeq. I mean something like

// this is a tuple struct because clients only construct these and
// we like the tuple struct construction form
pub struct StringSeq<'a>(&'a mut String);

impl<'a> StringSeq<'a> {
    fn inner_mut(&mut self) -> &mut String {
        let StringSeq(&ref mut s) = *self;
        s
    }

    fn inner(&self) -> &String {
        let StringSeq(&ref s) = *self;
        s
    }
}

impl<'a> MutableSeq<char> for StringSeq<'a> {
    #[inline]
    fn push(&mut self, value: char) {
        self.inner_mut().push_char(value)
    }

    #[inline]
    fn pop(&mut self) -> Option<char> {
        self.inner_mut().pop_char()
    }
}

impl<'a> Mutable for StringSeq<'a> {
    #[inline]
    fn clear(&mut self) {
        self.inner_mut().clear()
    }
}

impl<'a> Collection for StringSeq<'a> {
    #[inline]
    fn len(&self) -> uint {
        self.inner().len()
    }

    #[inline]
    fn is_empty(&self) -> bool {
        self.inner().is_empty()
    }
}

fn replace_last<T, S: MutableSeq<T>>(seq: &mut S, val: T) {
    if seq.pop().is_some() {
        seq.push(val)
    }
}

This is then usable like so:

fn replace_last<T, S: MutableSeq<T>>(seq: &mut S, val: T) {
    if seq.pop().is_some() {
        seq.push(val)
    }
}

fn main() {
    let mut s = String::from_str("this works?");
    replace_last(&mut StringSeq(&mut s), '!');
    assert_eq!(s.as_slice(), "this works!");
}

@alexchandel
Copy link
Author

It doesn't make sense for String to implement MutableSeq, because it very deliberately does not support pushing/popping/inserting/removing bytes.

Isn't it possible to push and pop bytes with String::push_byte() and String::pop_byte()? And String can't insert or remove chars at arbitrary positions either. I might have missed it, but I couldn't find any insertion or removal methods for either bytes or chars.

I see the appeal of exposing byte-interfaces. Is there any particular code in Rust's libraries that relies on an O(1) guarantee from String::len()? Or is this a guarantee for all Collection implementors?

@lilyball
Copy link
Contributor

@alexchandel push_byte() and pop_byte() are unsafe functions, and personally, I think those should actually live as free functions in a collections::string::raw module (similar to how std::str::raw is where unsafe &str operations live). Although really we could actually remove them entirely and have clients use .as_mut_vec().push() etc.

Is there any particular code in Rust's libraries that relies on an O(1) guarantee from String::len()?

I don't know, but it absolutely is the default assumption by effectively all potential String users, just as how Vec::len() is assume to be O(1).

@alexchandel
Copy link
Author

push_byte() and pop_byte() are unsafe functions, and personally, I think those should actually live as free functions in a collections::string::raw module (similar to how std::str::raw is where unsafe &str operations live). Although really we could actually remove them entirely and have clients use .as_mut_vec().push() etc.

I'd be in favor of that.

I don't know, but it absolutely is the default assumption by effectively all potential String users, just as how Vec::len() is assume to be O(1).

I probably wouldn't assume Vec::len() is O(1) a priori: Vec is backed by an array. I wouldn't assume much about the time complexity of an unknown &Collection's len() method, unless Collection specified some constraint. The docs could be more explicit about the time-complexity of Vec's operations in general, especially for the innumerable trait implementations. For example, look at the documentation of Java's Collections framework. /bikeshedding

Although you're right, String carries the properties of both a sequence of bytes and chars enough that it would be misleading to do this. I have a better idea: would these push_char()/pop_char()/push_byte()/shift_char()/etc fall under the category of a "MutableStr" trait? These methods could be refactored out into a more clear unit of organization.

@lilyball
Copy link
Contributor

@alexchandel What would be the purpose of pulling these out into a MutableStr? We have no other types that would implement it. A trait with only a single implementation is not useful.

@alexchandel
Copy link
Author

What would be the purpose of pulling these out into a MutableStr? We have no other types that would implement it. A trait with only a single implementation is not useful.

@kballard Then let's delete StrSlice, UnicodeStrSlice, MutableVector, MutableVectorAllocating, ImmutableVector, VectorVector, ImmutableEqVector, ImmutableOrdVector, MutableCloneableVector, MutableByteVector, ImmutableCloneableVector, MutableVectorAllocating, MutableOrdVector, Char, UnicodeChar, UpdateIoError, BoxAny, and every other trait with only a single implementation.

Frankly that's a terrible argument. The fact that those methods form a coherent, composable unit of behavior is reason enough to factor them into a trait. Dividing the functionality of such a ubiquitous type into traits lends additional clarity to the API. And even if no other rust type implements MutableStr, that's not to say no user-defined type will.

@lilyball
Copy link
Contributor

@alexchandel

The only reason those traits exist is because they're adding methods to built-in types and you can't add methods to built-in types without a trait. There's also a chance that in the future we'll change the rules to allow libstd to define methods on built-in types so we can get rid of them.

The only exceptions in your list are UpdateIoError and BoxAny, which are used to add methods to certain parameterizations of other types.

The fact that those methods form a coherent, composable unit of behavior is reason enough to factor them into a trait.

That is not the general philosophy of the Rust standard library. Methods are only put into traits when either

  1. the trait represents a coherent set of methods that multiple types want to implement, or
  2. the trait is being used to add methods to a type defined outside of the crate (such as built-in types, or Result<_, IoError>), or
  3. the trait is working around a limitation in rust, e.g. Box<Any> (which, according to the #[unstable] attribute, will be removed once the requisite DST and coherence changes are made that will allow it to be a direct impl)

Moving methods into a trait actually makes it harder for people to find in the documentation where a method comes from, makes it harder to call the method if they don't have the trait in scope, and bloats the prelude with more and more traits.

@alexchandel
Copy link
Author

I'm convinced this isn't the right trait for String to implement. @kballard Thank you for your feedback!

@alexchandel
Copy link
Author

This issue deserves further consideration, given that fn push(&mut self, ch: char) and fn pop(&mut self) -> Option<char> were just recently implemented for String.

In particular, the complaint:

One would normally expect a collection's len() to change in lockstep with the number of pushes/pops (assuming no other mutating operations). This is not at all true for String.

is now manifestly untrue.

@alexchandel alexchandel reopened this Sep 25, 2014
@alexcrichton
Copy link
Member

Despite push and pop being implemented now, the current plan will likely involve removing all the collections traits outright.

@bors bors merged commit 8b12fb3 into rust-lang:master Oct 10, 2014
flip1995 pushed a commit to flip1995/rust that referenced this pull request Nov 15, 2025
…ust-lang#16040)

Closes rust-lang/rust-clippy#16026

changelog: [`missing_asserts_for_indexing`] fix wrongly changing
`assert_eq` to `assert`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants