-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: strings: add backward iterator #66206
Comments
How often does one need to iterate backwards over a string? Iterating backward over ASCII strings is trivial. Iterating backwards over non-ASCII strings is a dubious operation since many runes (flags, joiners, diacritics, etc) only make sense in the forward sequence. |
I think of the numerous last match methods. How often can you be sure to have ASCII strings? It would be the equivalent to char_indices().rev() in rust. |
Indeed... I think it any one function in the standard library would to the wrong thing for at least some use-cases and it may not be obvious to the caller that it's doing "the wrong thing" unless they are very familiar with the details of Unicode and various different languages. There are at least three different things this could mean:
All of these three interpretations are wrong in some way in various specific contexts, while all three are also reasonable things to do in certain situations. Reversing the characters in a string is also a pretty esoteric thing to do regardless of exactly how you choose to define it. Therefore this seems more like something that would be better served by three third-party libraries, rather than by a single function in stdlib. It might even be something you should just implement inline in your program directly so that you can tailor it to exactly whatever weird requirements you are trying to meet by implementing it. |
I am thinking of the counterpart of iterating of a string with range, so 2. Assume you'd want the last 5 code points. I would convert it to []rune and slice it. If do not see why it should be so complicated or inefficient to do. I am not thinking of reversing a string, then you would probably be using a lib, which provides Unicode segmentation. |
In the package strings there are func lastIndexFunc(s string, f func(rune) bool, truth bool) int {
for i := len(s); i > 0; {
r, size := utf8.DecodeLastRuneInString(s[0:i])
i -= size
if f(r) == truth {
return i
}
}
return -1
} |
I think this isn’t a bad idea but it’s uncommon enough that it can go in a Unicode package (maybe both utf8 and utf16) and not strings itself. That will also make the context of returning runes (not graphemes) more clear. |
It sounds then like the idea is the definition I numbered as 2 in my earlier comment. If this does end up being a function in stdlib somewhere, then I would suggest trying to name it to suggest "runes in reverse order" rather than "reversed string", so at least there's a hint that this is doing something quite specific that might not be what the potential caller wants. I like @earthboundkid's compromise of possibly putting it in package utf8
func Runes(p []byte) iter.Seq2[int, rune] { /* ... */ }
func RunesInString(s string) iter.Seq2[int, rune] { /* ... */ }
func RunesReversed(p []byte) iter.Seq2[int, rune] { /* ... */ }
func RunesReversedInString(s string) iter.Seq2[int, rune] { /* ... */ } I do still feel a little skeptical about the reverse cases being common enough for the standard library, but I'll concede that they are reasonably straightforward -- but still slightly subtle -- wrappers around A call to
( |
The InString variants can be unified with the bytes variants using generics, I think, saving a lot of name duplication. |
That's fair. I was aiming for consistency, but not considering that I was trying to be consistent with a pre-generics API. |
Proposal Details
I propose to add the function Backward to the package strings, as iterating over strings in reverse order is not trivial without allocating a new slice.
func Backward(s string) iter.Seq2[int, rune]
For now would it be:
The text was updated successfully, but these errors were encountered: