Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty strings in result of "abc".split("") #33882

Closed
frewsxcv opened this issue May 26, 2016 · 11 comments
Closed

Empty strings in result of "abc".split("") #33882

frewsxcv opened this issue May 26, 2016 · 11 comments
Labels
E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.

Comments

@frewsxcv
Copy link
Member

fn main() {
    println!("{:?}", ("abc".to_owned()).split("").collect::<Vec<_>>());
}
["", "a", "b", "c", ""]

Is this expected behavior? If so, it should probably be documented.

@frewsxcv
Copy link
Member Author

I guess it's sort of in the same category as:

If a string contains multiple contiguous separators, you will end up with empty strings in the output:

and

This can lead to possibly surprising behavior when whitespace is used as the separator. This code is correct:

Which is in the documentation here.

@steveklabnik
Copy link
Member

I am on my phone at the moment, but it is expected and documented IIRC.

On May 26, 2016, 12:39 -0400, Corey Farwellnotifications@github.com, wrote:

fnmain() { println!("{:?}",("abc".to_owned()).split("").collect::<Vec<_>>());}
["", "a", "b", "c", ""]

Is this expected behavior? If so, it should probably be documented.


You are receiving this because you are subscribed to this thread.
Reply to this email directly orview it on GitHub(#33882)

@est31
Copy link
Member

est31 commented May 26, 2016

The examples given in the documentation are

let x = "||||a||b|c".to_string();
let d: Vec<_> = x.split('|').collect();

assert_eq!(d, &["", "", "", "", "a", "", "b", "c"]);

and

let x = "    a  b c".to_string();
let d: Vec<_> = x.split(' ').collect();

assert_eq!(d, &["", "", "", "", "a", "", "b", "c"]);

The text doesn't go further than "will contain empty strings", but the examples suggest that each time the separator is directly followed by another occurence of it in the string, an empty string will be the output.

From reading that specific section of the documentation, I can't really find out the output of the split method for the empty string.

The empty string is a real edge case and at least deserves to be documented IMO, or, if it is considered invalid input, it should panic.

@frewsxcv
Copy link
Member Author

The empty string is a real edge case and at least deserves to be documented IMO

I agree with this. Especially since doing split("") in other languages results in different behavior.

@frewsxcv
Copy link
Member Author

JavaScript:

"abc".split("")
["a", "b", "c"]

Python:

>>> "abc".split("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: empty separator

Ruby:

irb(main):002:0> "abc".split("")
=> ["a", "b", "c"]

@est31
Copy link
Member

est31 commented May 26, 2016

C/C++:

#include <string.h>
#include <stdio.h>

int main(void)
{
    char input[] = "abc";
    char delim[] = "";
    char *token = strtok(input, delim);
    printf("[");
    while (token) {
        printf("\"%s\"", token);
        token = strtok(NULL, delim);
        if (token) {
            printf(", ");
        }
    }
    printf("]\n");
}

outputs

["abc"]

@nagisa nagisa reopened this May 26, 2016
@nagisa nagisa added the A-docs label May 26, 2016
@nagisa
Copy link
Member

nagisa commented May 26, 2016

The behaviour is expected (algorithm and behaviour we implement is greedy).

Consider "".split(""): in the string being split, there’s one empty string you can split on (greedily, that is), adjoined by (…at most…, remember, algorithm is greedy) two other empty strings, which is the result (a ["", ""]). Similar reasoning can be applied to the original report.

It should be documented if it isn’t already. (To me it looks like it isn’t)

@nagisa nagisa added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels May 26, 2016
@nagisa
Copy link
Member

nagisa commented May 26, 2016

This seems like an easy enough to do change, thus marking as E-easy.

This is the documentation that needs to be edited, other examples for the function should be used as a reference. The algorithm behaviour is described in the comment above. In case anything is not clear, ask either here or on IRC (#rust-internals/#rust-libs).

@bluss
Copy link
Member

bluss commented May 26, 2016

Previous related discussion in #25986

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue May 29, 2016
…uillaumeGomez

Added examples/docs to split in str.rs

Added documentation clarifying the behavior of split when used with the empty string and contiguous separators. Addresses issue [33882](rust-lang#33882). This is my first time contributing to rust, so forgive me if I'm skipping any of the contribution steps.
Fixes rust-lang#33882
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue May 29, 2016
…uillaumeGomez

Added examples/docs to split in str.rs

Added documentation clarifying the behavior of split when used with the empty string and contiguous separators. Addresses issue [33882](rust-lang#33882). This is my first time contributing to rust, so forgive me if I'm skipping any of the contribution steps.
Fixes rust-lang#33882
Manishearth added a commit to Manishearth/rust that referenced this issue May 30, 2016
…uillaumeGomez

Added examples/docs to split in str.rs

Added documentation clarifying the behavior of split when used with the empty string and contiguous separators. Addresses issue [33882](rust-lang#33882). This is my first time contributing to rust, so forgive me if I'm skipping any of the contribution steps.
Fixes rust-lang#33882
@snowmanzzz
Copy link

it's best to not surprise people

@frewsxcv
Copy link
Member Author

Determining what is surprising is not an easy problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.
Projects
None yet
Development

No branches or pull requests

7 participants