Skip to content

Conversation

devinus
Copy link
Contributor

@devinus devinus commented Mar 27, 2013

No description provided.

@yrashk
Copy link
Contributor

yrashk commented Mar 27, 2013

I strongly suggest implementing http://www.unicode.org/reports/tr29/ instead as it a more correct way to split in Unicode. @josevalim ?

@devinus
Copy link
Contributor Author

devinus commented Mar 27, 2013

@yrashk That report deals with grouping words into word boundaries, not splitting on whitespace. It's not that it's more "correct," it's entirely different functionality. Like text_segments or something. If we have a traditional split on whitespace function I'd prefer it to use the same codepoints stripped by strip as well.

@yrashk
Copy link
Contributor

yrashk commented Mar 27, 2013

I understand your point. I am just trying to figure out what's the best thing to do. Isn't splitting by whitespace a subset of text segmentation? Text segmentation allows to truly break text into words, accounting for punctuation, parens and such...

@devinus
Copy link
Contributor Author

devinus commented Mar 27, 2013

@yrashk Yep, and I think it's awesome and should be added. But this is merely a split on whitespace function that should treat the same codepoints as whitespace that strip et al do too.

@devinus
Copy link
Contributor Author

devinus commented Mar 27, 2013

String.split "  foo  bar  baz  " #=> ["foo", "bar", "baz"]

@devinus
Copy link
Contributor Author

devinus commented Mar 27, 2013

String.text_segments

@josevalim
Copy link
Member

@devinus could you please update the docs too to make it clear that it will split any unicode whitespace (and not word boudaries) and add an unicode example? Thanks!

@devinus
Copy link
Contributor Author

devinus commented Mar 27, 2013

@josevalim Done.

@yrashk
Copy link
Contributor

yrashk commented Apr 3, 2013

bump?

josevalim pushed a commit that referenced this pull request Apr 6, 2013
@josevalim josevalim merged commit c0fbea8 into elixir-lang:master Apr 6, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants