Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for str_split_once #74773

Closed
4 tasks done
matklad opened this issue Jul 26, 2020 · 32 comments · Fixed by #81940
Closed
4 tasks done

Tracking Issue for str_split_once #74773

matklad opened this issue Jul 26, 2020 · 32 comments · Fixed by #81940
Labels
A-str Area: str and String B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@matklad
Copy link
Member

matklad commented Jul 26, 2020

Feature gate: #![feature(str_split_once)]

Public API

impl str {
    pub fn split_once<'a, P: Pattern<'a>>(&'a self, delimiter: P) -> Option<(&'a str, &'a str)>;
    pub fn rsplit_once<'a, P: Pattern<'a>>(&'a self, delimiter: P) -> Option<(&'a str, &'a str)>;
}

Steps / History

Unresolved Questions

@matklad matklad added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Jul 26, 2020
@jonas-schievink jonas-schievink added the B-unstable Blocker: Implemented in the nightly compiler and unstable. label Jul 26, 2020
@KodrAus KodrAus added A-str Area: str and String Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. labels Jul 29, 2020
@KamilaBorowska
Copy link
Contributor

I wonder if it shouldn't return (&str, Option<&str>), I don't think the first element when using split method could ever be None.

@jsdw
Copy link

jsdw commented Nov 20, 2020

The way I am interpreting the current signature of Option<(&str,&str)> is that either the separator is matched, and so you get a pair of &strs (they might both be empty, but that's ok), or alternately the separator is not matched, in which case the splitting failed and so you get None. This seems consistent with other functions.

Returning (&str, Option<&str>) IMO leaves the meaning of the Option more open to interpretation (do you get None back if we failed to split, or if the second string is zero length?). Having the Option at the top level feels clearer to me (you get your two strings back if it worked, and nothing back if it didnt; I have difficulty working out how else I'd interpret it (which is good; don't want ambiguity)).

@Niedzwiedzw
Copy link

Niedzwiedzw commented Nov 26, 2020

I've just used it in one of my projects and it felt very natural, I agree with your reasoning James

@yoshuawuyts
Copy link
Member

yoshuawuyts commented Dec 1, 2020

Tried it today for a quick string matching script, and it felt really good to use. Not returning the separator felt consistent with str::split, which I've used a lot in the past. I think the API as-is is probably what it should be.

edit: probably two things to consider: vec::split_first also exists, and feels similar. Should this use the same name?

@clemenswasser
Copy link
Contributor

I have used it here.

@CryZe
Copy link
Contributor

CryZe commented Dec 2, 2020

I seems like the current implementation compiles in a bunch of panic checks, which shouldn't be necessary: https://rust.godbolt.org/z/WTTn6K

@matklad
Copy link
Member Author

matklad commented Dec 2, 2020

Yeah, the impl should be changed to use unchecked indexing (I believe there's no safe API we can use here), but it probably makes sense to do this after stabilization.

@matklad
Copy link
Member Author

matklad commented Dec 2, 2020

I think we should stabilize the API as is:

  • many folks find this API useful
  • it have been backing in the nightly for some (not too long) time
  • the API surface is small enough that it feels unlikely we can improve this further

I think we should not complicate the API to return the ranged matched by the delimiter pattern: this makes the common case more difficult to use, and it goes inline with other pattern-using APIs. We can always add split_once_with_delim later, if we deem it necessary.

I don't think we should change the split_once name -- it feels like it fits the purpose well. split_first for &str would return a (char, &str) I think.

Is anyone willing to submit a stabilization PR? Or are there some concerns I am missing here.

@internetionals
Copy link

The only thing that I miss is the more generic:

 pub fn split_n<'a, P: Pattern<'a>>(&'a self, delimiter: P, n: usize) -> Option<[&'a str; n]>;

But that would need const generics and the split into two parts is by far the most common subcase, so a dedicated and easily recognizable function for that wouldn't hurt.

@internetionals
Copy link

That is different from the current splitn in that it could be used through destructuring, which helps when parsing simple delimited formats with a freeform last part.

@matklad
Copy link
Member Author

matklad commented Dec 3, 2020

@internetionals this is possible today using itertools:

use itertools::Itertools;

let (a, b, c) = "foo bar baz".split_whitespace().collect_tupple().unwrap();

I believe we'll add an fn collect_array<const N: usize>(self) -> Option<[T; N]> to Iterator in std in some form one day (not sure if there's existing feature for that already).

EDIT: submitted #79659

@matklad
Copy link
Member Author

matklad commented Dec 3, 2020

Actually, before we stabilize this, let's dogfood this a bit! I see there are quite a few opportunities to use this function in rust-lang/rust:

λ rg -F 'splitn(2'

Could someone send a PR replacing those usages with .split_once and checking that everything works?

@Eric-Arellano
Copy link
Contributor

Could someone send a PR replacing those usages with .split_once and checking that everything works?

Hey @matklad, I'd be interested in doing this, but first want to check if it's sensible for a first-time contributor to take this? With checking that it all works, would running through CI be sufficient or you'd want manual testing?

@matklad
Copy link
Member Author

matklad commented Dec 5, 2020

@Eric-Arellano yup, this would be good first issue, especially given that you can do it running only x.py check / x.py test tidy locally (which is loads faster then the whole test suite).

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Dec 6, 2020

Thanks @matklad! I got it set up and am making progress.

FYI: one thing I'm encountering a lot is a difference in API, that split_once models a missing value as "" whereas splitn(2) as None. The call sites do seem to care about the empty string case, so I'm in most places wrapping like this:

let (key, value) = option.split_once('=').unwrap();
let value = if value.is_empty() { None } else { Some(value) };

(As discussed in https://internals.rust-lang.org/t/pre-rfc-none-if-string-is-empty/6742, sounds like we don't want to add sugar to that second line.)

Compared to:

let mut iter = option.splitn(2, '=');
let key = iter.next().unwrap();
let value = iter.next();

Note that my rewrite is missing an assertion that the key is non-empty. I'm not sure if I should be adding assert!() statements; at that point, the rewrite would be 3 LoC and more complicated than before.

While I do agree with the decision to return Option<(&'a str, &'a str)> rather than Option<(&str, Option<&str>)> or Option<(Option<&str>, Option<&str>)>, it does make this less comprehensive than before.

Lmk if it would help to put up my WIP so that you can see this diff.

@KamilaBorowska
Copy link
Contributor

KamilaBorowska commented Dec 7, 2020

The way I am interpreting the current signature of Option<(&str,&str)> is that either the separator is matched, and so you get a pair of &strs (they might both be empty, but that's ok), or alternately the separator is not matched, in which case the splitting failed and so you get None. This seems consistent with other functions.

Returning (&str, Option<&str>) IMO leaves the meaning of the Option more open to interpretation (do you get None back if we failed to split, or if the second string is zero length?). Having the Option at the top level feels clearer to me (you get your two strings back if it worked, and nothing back if it didnt; I have difficulty working out how else I'd interpret it (which is good; don't want ambiguity)).

The following program:

fn main() {
    println!("{:?}", "".splitn(2, ":").collect::<Vec<_>>());
}

Returns the following string:

[""]

Note how the first element is always here when splitting.

If a string is :: then we get the following.

["", ""]

@matklad
Copy link
Member Author

matklad commented Dec 7, 2020

@Eric-Arellano I think both versions treat the missing value ("foo=") as "". The difference is how they treat missing separator ("foo", without =). One will panic, the other would return (Some("foo"), None).

I don't think we should use split_once API for the case where the separator is optional. It will be useful to survey how many cases in rustc require support for optional values, and how many are fine without it. Posting the links to the source code into this issue would be helpful.

@Eric-Arellano
Copy link
Contributor

Thank you for pointing that out, @matklad and @xfix. See #79809 - there were several cases where the nuance that splitn(2)'s first element is always Some wasn't being used, e.g. unnecessary checks for the first element. That nuance was not obvious to me, and I like that split_once() makes the situation more clear imo.

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Dec 10, 2020
Dogfood `str_split_once()`

Part of rust-lang#74773.

Beyond increased clarity, this fixes some instances of a common confusion with how `splitn(2)` behaves: the first element will always be `Some()`, regardless of the delimiter, and even if the value is empty.

Given this code:

```rust
fn main() {
    let val = "...";
    let mut iter = val.splitn(2, '=');
    println!("Input: {:?}, first: {:?}, second: {:?}", val, iter.next(), iter.next());
}
```

We get:

```
Input: "no_delimiter", first: Some("no_delimiter"), second: None
Input: "k=v", first: Some("k"), second: Some("v")
Input: "=", first: Some(""), second: Some("")
```

Using `str_split_once()` makes more clear what happens when the delimiter is not found.
tmandry added a commit to tmandry/rust that referenced this issue Dec 11, 2020
Dogfood `str_split_once()`

Part of rust-lang#74773.

Beyond increased clarity, this fixes some instances of a common confusion with how `splitn(2)` behaves: the first element will always be `Some()`, regardless of the delimiter, and even if the value is empty.

Given this code:

```rust
fn main() {
    let val = "...";
    let mut iter = val.splitn(2, '=');
    println!("Input: {:?}, first: {:?}, second: {:?}", val, iter.next(), iter.next());
}
```

We get:

```
Input: "no_delimiter", first: Some("no_delimiter"), second: None
Input: "k=v", first: Some("k"), second: Some("v")
Input: "=", first: Some(""), second: Some("")
```

Using `str_split_once()` makes more clear what happens when the delimiter is not found.
@rvolgers
Copy link

rvolgers commented Dec 11, 2020

This function is very useful for writing simple ad-hoc parsers, and I find its current signature perfect for that.

I looked at the other suggestions but returning a single enum which shows whether or not the separator was found seems clearly the right choice to me.

I briefly considered the minor variation Result<(&str, &str), &str> (in other words, like the current implementation, but it gives you the input back as the Err variant when the separator is not found), not because I think it's actually a nicer signature, but because I thought it might lead to more ergonomical chaining. After playing with it for a bit though, I don''t think it's worth it.

(I vaguely remember there being some precedent for this kind of API where the Err variant gives you back the rejected input, but I can't for the life of me remember which std API that might have been. I remember noticing it because it felt out of place to me, so perhaps not the greatest precedent, but still.)

TLDR: I think the current signature is the right one and would very much like this to be stabilized.

@rvolgers
Copy link

rvolgers commented Feb 9, 2021

Please do! I missed this somehow, yikes.

@madsmtm
Copy link
Contributor

madsmtm commented Feb 9, 2021

No sure if this is the right place to ask, but shouldn't we have slice::split_once as well?

@jhpratt
Copy link
Member

jhpratt commented Feb 9, 2021

PR is up.

@matklad
Copy link
Member Author

matklad commented Feb 10, 2021

TOCTOU: stabilization rules changed a bit in the meantime, see https://github.com/rust-lang/rustc-dev-guide/pull/1036/files#diff-c41e6c5c33d6960efeff23f2f54c1d948ea73f64d604deda521e93ae0b4bd79bR75?

@m-ou-se could you kick an FCP here based on this excellent summary?

@m-ou-se
Copy link
Member

m-ou-se commented Feb 10, 2021

Unresolved Questions

Has this question been resolved yet?

@matklad
Copy link
Member Author

matklad commented Feb 10, 2021

I would think so (by #74773 (comment)):

  • it complicates the common case.
  • other pattren based API ('find, split`) don't return the range matched by the pattern.

@m-ou-se
Copy link
Member

m-ou-se commented Feb 10, 2021

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Feb 10, 2021

Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Feb 10, 2021
@rfcbot
Copy link

rfcbot commented Feb 15, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Feb 15, 2021
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Feb 25, 2021
@rfcbot
Copy link

rfcbot commented Feb 25, 2021

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Feb 26, 2021
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Feb 26, 2021
@bors bors closed this as completed in 0db8349 Feb 26, 2021
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Mar 4, 2021
bjorn3 pushed a commit to bjorn3/rust that referenced this issue Mar 5, 2021
@nalply
Copy link

nalply commented Oct 16, 2021

@rvolgers wrote on 11 Dec 2020:

(I vaguely remember there being some precedent for this kind of API where the Err variant gives you back the rejected input, but I can't for the life of me remember which std API that might have been. I remember noticing it because it felt out of place to me, so perhaps not the greatest precedent, but still.)

Perhaps you meant OsString::into_string?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-str Area: str and String B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.