split proc in strutils inconsistent for set[char] #4305

Closed
jyapayne opened this Issue Jun 9, 2016 · 7 comments

Projects

None yet

4 participants

@jyapayne
Contributor
jyapayne commented Jun 9, 2016

For example:

import strutils
echo ":test:stuff:".split(':', maxsplit=1)
echo ":test:stuff:".split(":", maxsplit=1)
echo " test stuff ".split(Whitespace, maxsplit=1)

Outputs:

@["", "test:stuff:"]
@["", "test:stuff:"]
@["test", "stuff "]

I can create a pull request for this once #4276 is absorbed :)

@Araq
Member
Araq commented Jun 10, 2016

Well it's inconsistent, but quite useful. Split with set[char] is almost always used for whitespace and who wants leading/trailing whitespace?

@Varriount
Contributor

While I do agree that it is convenient for that use case, it's going to trip up people who decide they need to split on more than one character and go from split(..., ":") to split(..., {':', ';'}).

Additionally, it's easy enough to re-implement this behavior by adding 'strip', whereas the alternative cannot be achieved without a new version of the strip procedure.

@dom96 dom96 added the Stdlib label Jun 10, 2016
@jyapayne
Contributor

@Araq, I think that the functionality you're talking about might be useful under a different procedure name. I agree that it's useful to strip out all characters specified like how the current test performs:

let s = " this   is     an example   "
doAssert s.split() == @["this", "is", "an", "example"]

However, this is inconsistent with the other procs and the assert should be expecting something like this:

doAssert s.split() == @["", "this", "", "", "is", "", "", "", "an", "example", ""]

Which, I agree, is very difficult to work with if you don't want any extra blank strings. Maybe a new proc called stripSplit, or something similar, could exist that would replicate the functionality you mentioned for strings, chars, and set[char]. What do you think?

@Araq
Member
Araq commented Jun 17, 2016

Seems overkill to me, let's just fix the existing split for set[char] and hope it doesn't break too much. And document this breaking change!

@jyapayne jyapayne added a commit to jyapayne/Nim that referenced this issue Jun 17, 2016
@jyapayne jyapayne Fix #4305: Make split proc for set[char] consistent feb45fb
@dom96
Member
dom96 commented Jun 18, 2016

For whitespace split the rule should be that calling s.split(" ").join(" ") == s. So is the current split procedure wrong?

@jyapayne
Contributor

The two split procs using string and char work that way. split with set[char] does not.

var s = "   whitespace  is    not   cool"
assert s.split(" ").join(" ") == s
assert s.split(' ').join(" ") == s
assert s.split({' '}).join(" ") == s # This fails
# s == "whitespace is not cool"
@dom96
Member
dom96 commented Jun 18, 2016

That's good :)

@Araq Araq closed this in 76f81d4 Jul 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment