Implement str.replace#168
Conversation
trotterdylan
left a comment
There was a problem hiding this comment.
Thanks for taking this on! I didn't comment on the stuff relating to #167.
| func strReplace(f *Frame, args Args, _ KWArgs) (*Object, *BaseException) { | ||
| // TODO: Support unicode replace. | ||
| expectedTypes := []*Type{StrType, StrType, StrType, ObjectType} | ||
| argc, n := len(args), -1 |
There was a problem hiding this comment.
It's better not to group variables that are unrelated since it's harder to find their definition.
There was a problem hiding this comment.
Thank you, make sense.
| } | ||
| s := toStrUnsafe(args[0]).Value() | ||
| old := toStrUnsafe(args[1]).Value() | ||
| new := toStrUnsafe(args[2]).Value() |
There was a problem hiding this comment.
Try to avoid using names that collide with builtin names.
There was a problem hiding this comment.
Sure, too much copying from CPython.
| s := toStrUnsafe(args[0]).Value() | ||
| old := toStrUnsafe(args[1]).Value() | ||
| new := toStrUnsafe(args[2]).Value() | ||
| // CPython only, pypy supposed to be same as Go |
There was a problem hiding this comment.
Can you elaborate on this comment?
There was a problem hiding this comment.
I haven't really tested on pypy, but looks like it is different.
http://bugs.python.org/issue28029
assert ''.replace('', 'x') == 'x'
assert ''.replace('', 'x', 1000) == ''
| if len(s) == 0 && len(old) == 0 && n >= 0 { | ||
| return NewStr("").ToObject(), nil | ||
| } | ||
| return NewStr(strings.Replace(s, old, new, n)).ToObject(), nil |
There was a problem hiding this comment.
I think strings.Replace() may not be compatible with the Python version because when old == "", strings.Replace: matches at the beginning of the string and after each UTF-8 sequence whereas in Python it will match after each byte. So if the string contains UTF-8 sequences, the results will differ (this is a good test case to have, as well).
There was a problem hiding this comment.
Oh, I haven't think about UTF-8 yet, also unicode.
There was a problem hiding this comment.
I am not very sure which utf8 sequence is best to test, but added some, tests still passed.
aa6852d to
4c614b0
Compare
4c614b0 to
b68b367
Compare
|
|
Thanks for the changes! The issue I'm pointing out is that in Go strings.Replace() treats the input as a UTF-8 encoded string and if the old string is outputs: [0 208 178 0 208 190 0 208 187 0] vs. CPython I don't think it's a good idea to use strings.Replace() because of these subtle incompatibilities. |
|
Hmm, I see. Is that mean, need to implement by looping through bytes? |
|
Looks like there's a bytes replace: https://golang.org/pkg/bytes/#Replace
…On Thu, Jan 19, 2017 at 10:10 AM YOU ***@***.***> wrote:
Hmm, I see. Is that mean, need to implement by looping through bytes?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#168 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHLe3PEOmLAF9BityX9EMQMf3q1X4gHks5rT6b8gaJpZM4LoXS1>
.
|
|
Thank you but it mention same thing, The issue is only on empty old string? then I can just write a loop for that. |
|
It's probably worth looking at the go source code to make sure that's the
only peculiarity. But yeah maybe that's the best approach.
…On Thu, Jan 19, 2017 at 10:28 AM YOU ***@***.***> wrote:
Thank you but it mention same thing,
If old is empty, it matches at the beginning of the slice and after each
UTF-8 sequence
The issue is only on empty old string? then I can just write a loop for
that.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#168 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHLe77RhrmxRCh4SMFbkyldE2ZmU3Oiks5rT6tegaJpZM4LoXS1>
.
|
|
Updated. I checked both strings.Replace and bytes.Replace source. It appears that they doing special checking on len(old) == 0. So I added a loop to copy manually for len(old) == 0 case. |
trotterdylan
left a comment
There was a problem hiding this comment.
Thanks for working on this! I have a few suggestions.
| s := toStrUnsafe(args[0]).Value() | ||
| old := toStrUnsafe(args[1]).Value() | ||
| sub := toStrUnsafe(args[2]).Value() | ||
| // CPython only, pypy supposed to be same as Go. |
There was a problem hiding this comment.
I'm a little confused about the meaning of this comment. Could you try to make the message a little clearer?
| old := toStrUnsafe(args[1]).Value() | ||
| sub := toStrUnsafe(args[2]).Value() | ||
| // CPython only, pypy supposed to be same as Go. | ||
| if len(s) == 0 && len(old) == 0 && n >= 0 { |
There was a problem hiding this comment.
I'm probably overlooking something but don't we always want to return "" when len(s) is zero?
There was a problem hiding this comment.
"".replace("", "x") returns "x", but "".replace("", "x",1) return "" in CPython, so it is needed imo.
| buf.WriteString(sub) | ||
| n-- | ||
| } | ||
| for i := 0; i < len(s); i++ { |
There was a problem hiding this comment.
Store len(s) as a temporary numBytes or something so it's not recomputed every iteration.
| if n == 0 { | ||
| return NewStr(s).ToObject(), nil | ||
| } | ||
| if len(old) == 0 { |
There was a problem hiding this comment.
Maybe switch the conditional to len(old) > 0 and do the strings.Replace() call to exit the function early. That would mean this big block of code doesn't have to be indented and it's probably easier to read.
| if n == 0 { | ||
| return NewStr(s).ToObject(), nil | ||
| } | ||
| if len(old) == 0 { |
There was a problem hiding this comment.
Worth adding a comment about why we do all this extra work instead of using strings.Replace()
| if n == 0 { | ||
| return NewStr(s).ToObject(), nil | ||
| } | ||
| if len(old) == 0 { |
There was a problem hiding this comment.
Prefer old == "" to len(old) == 0
| n-- | ||
| } else { | ||
| i++ | ||
| if i < len(s) { |
There was a problem hiding this comment.
Can we do this branch after the loop? I think the code would read more clearly: do the required replacements, then write the remainder of the string.
I think things would be tidier if your loop condition dictated when you exit the loop:
i := 0
for n > 0 && i < numBytes {
// .. replacement logic
i++
n--
}
As written it's not immediately obvious what the termination condition is.
There was a problem hiding this comment.
Great idea, thank you.
|
Updated those, Please take a look again. |
trotterdylan
left a comment
There was a problem hiding this comment.
This is great! Thanks for the implementation and for the thorough test cases. Merging.
Implementing str.replace.
This will have some updates from #167, so this is WIP for now.