-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strings, bytes: Document behavior of Replace when old is empty #8143
Labels
Milestone
Comments
Why does bytes do this? The code path has been there since the Replace function was added four years ago, but there's no discussion of this in the code review[1]. Was it a mistake in copying the strings.Replace code to bytes.Replace? 1. https://golang.org/cl/1731048 |
I think it can be changed. """" Bugs. If a compiler or library has a bug that violates the specification, a program that depends on the buggy behavior may break if the bug is fixed. We reserve the right to fix such bugs. """"[0] [0]: https://golang.org/doc/go1compat#expectations |
Feel free to change the docs. The code is correct as written. These functions are operating on UTF-8 and should generate correctly formed UTF-8. strings and bytes are meant to do the same thing, just on different input types (string and []byte). In particular, if one treats input as UTF-8, the other does too. |
I don't understand #6. You say that "these functions are operating on UTF-8" -- but no, UTF-8 only enters into it in the special case of old being empty. Consider utf8.Valid(bytes.Replace([]byte("世界"), []byte{184}, nil, -1)) // false So the special case behavior for empty 'old' is quite surprising. This bug arose from a case where the UTF-8 behavior of bytes.Replace was undesired and ultimately a custom replace function was needed. bytes.Split becomes UTF-8 aware only when provided with an empty separator, but that is well-documented. |
In general these routines guarantee that if given valid UTF-8 arguments, they produce valid UTF-8 output. UTF-8 is self-synchronizing. A valid UTF-8 sequence cannot appear in the middle of some other unrelated sequence, EXCEPT for the empty string. So to make that guarantee, the empty string is the only case that needs special handling. For example, if you are trying to do search and replace on UTF-8 text correctly, then the empty string has to be handled carefully: you only want to match the empty string between adjacent runes, not the one in the middle of multibyte runes. You can disagree with the decision to make these routines UTF-8-friendly, but they are. Count, Replace, Split all take care to handle the empty string correctly. I am sorry it surprised you, but it's working as intended. Like I said, it's fine to change the docs. |
CL https://golang.org/cl/135760043 mentions this issue. |
This issue was closed by revision 2c121b6. Status changed to Fixed. |
wheatman
pushed a commit
to wheatman/go-akaros
that referenced
this issue
Jun 25, 2018
Fixes golang#8143. LGTM=r R=rsc, bradfitz, r CC=golang-codereviews https://golang.org/cl/135760043
wheatman
pushed a commit
to wheatman/go-akaros
that referenced
this issue
Jul 9, 2018
Fixes golang#8143. LGTM=r R=rsc, bradfitz, r CC=golang-codereviews https://golang.org/cl/135760043
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: