Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement into_chars for String #268

Closed
al-yisun opened this issue Sep 12, 2023 · 6 comments
Closed

Implement into_chars for String #268

al-yisun opened this issue Sep 12, 2023 · 6 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@al-yisun
Copy link

Proposal

Problem statement

Currently there is no owning analog to the chars iterator. In most contexts this has no real effect, as char is Copy, but it disables some cases where one wants to pass ownership of the underlying String along with the iterator.

Motivating example

Implementing the following function requires some gymnastics.

fn get_chars() -> impl Iterator<Item=char> {
    let s = format!("for one reason or another, you have a String");
    s.chars().filter(|c| ['a','e','i','o','u'].contains(c))
    // error[E0597]: `s` does not live long enough
}

Solution sketch

If into_chars existed, the above function would be simple to write.

fn get_chars() -> impl Iterator<Item=char> {
    let s = format!("for a very good reason, you have a String");
    s.into_chars().filter(|c| ['a','e','i','o','u'].contains(c))
}

Alternatives

It is relatively straightforward to write an owning chars iterator outside of std. As a struct:

struct IntoChars {
    string: String,
    position: usize,
}

impl Iterator for IntoChars {
    type Item = char;

    fn next(&mut self) -> Option<char> {
        let c = self.string[self.position..].chars().next()?;
        self.position += c.len_utf8();
        Some(c)
    }
}

As a closure:

fn into_chars(s: String) -> impl Iterator<Item = char> {
    let mut i = 0;
    std::iter::from_fn(move ||
        if i < s.len() {
            let c = s[i..].chars().next().unwrap();
            i += c.len_utf8();
            Some(c)
        } else {
            None
        }
    )
}

Links and related work

Discussion on irlo.

@al-yisun al-yisun added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Sep 12, 2023
@Amanieu
Copy link
Member

Amanieu commented Oct 4, 2023

We discussed this in the libs-api meeting yesterday and couldn't think of any other way to achieve this functionality. As such it's fine to add this as an unstable API.

@Amanieu Amanieu closed this as completed Oct 4, 2023
@Amanieu Amanieu added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Oct 4, 2023
@tisonkun
Copy link

@Amanieu I don't find this method on the codebase, it is still valid and I may spend some time to write an impl?

Didn't see referred PR also.

@tisonkun
Copy link

Found this related PR - rust-lang/rust#50845

But it's even 5 years earlier than this issue. Let me see if we still want to add such methods.

@tisonkun
Copy link

Seems that to avoid duplicate code (of Chars internal) and avoid always make a Chars to access those ability, we'd factor out the Chars functionality into a shared structure and perhaps extend next_code_point to accept Iterator<Item = u8> also.

See https://github.com/rust-lang/rust/pull/50845/files#r189071758 with @shepmaster and @nagisa .

I may spend some time to try to make a draft in weeks.

@programmerjake
Copy link
Member

maybe use something like:

pub struct IntoChars {
    bytes: vec::IntoIter<u8>,
}

impl Iterator for IntoChars {
    type Item = char;
    fn next(&mut self) -> Option<Self::Item> {
        let mut chars = unsafe { str::from_utf8_unchecked(self.bytes.as_slice()) }.char_indexes();
        let retval = chars.next()?;
        self.bytes.advance_by(chars.offset());
        Some(retval)
    }
    ...
}

...

@tisonkun
Copy link

@programmerjake Thanks for sharing the snippet. This would create a wrapper struct (CharIndices) each time, but it can be cheap?

I'll start with this direction then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

4 participants