Skip to content
This repository was archived by the owner on Mar 23, 2023. It is now read-only.

Implement str.replace#168

Merged
trotterdylan merged 5 commits intogoogle:masterfrom
S-YOU:add_str_replace
Jan 21, 2017
Merged

Implement str.replace#168
trotterdylan merged 5 commits intogoogle:masterfrom
S-YOU:add_str_replace

Conversation

@S-YOU
Copy link
Copy Markdown
Contributor

@S-YOU S-YOU commented Jan 19, 2017

Implementing str.replace.
This will have some updates from #167, so this is WIP for now.

Copy link
Copy Markdown
Contributor

@trotterdylan trotterdylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! I didn't comment on the stuff relating to #167.

Comment thread runtime/str.go Outdated
func strReplace(f *Frame, args Args, _ KWArgs) (*Object, *BaseException) {
// TODO: Support unicode replace.
expectedTypes := []*Type{StrType, StrType, StrType, ObjectType}
argc, n := len(args), -1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better not to group variables that are unrelated since it's harder to find their definition.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, make sense.

Comment thread runtime/str.go Outdated
}
s := toStrUnsafe(args[0]).Value()
old := toStrUnsafe(args[1]).Value()
new := toStrUnsafe(args[2]).Value()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid using names that collide with builtin names.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, too much copying from CPython.

Comment thread runtime/str.go Outdated
s := toStrUnsafe(args[0]).Value()
old := toStrUnsafe(args[1]).Value()
new := toStrUnsafe(args[2]).Value()
// CPython only, pypy supposed to be same as Go
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't really tested on pypy, but looks like it is different.
http://bugs.python.org/issue28029
assert ''.replace('', 'x') == 'x'
assert ''.replace('', 'x', 1000) == ''

Comment thread runtime/str.go Outdated
if len(s) == 0 && len(old) == 0 && n >= 0 {
return NewStr("").ToObject(), nil
}
return NewStr(strings.Replace(s, old, new, n)).ToObject(), nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think strings.Replace() may not be compatible with the Python version because when old == "", strings.Replace: matches at the beginning of the string and after each UTF-8 sequence whereas in Python it will match after each byte. So if the string contains UTF-8 sequences, the results will differ (this is a good test case to have, as well).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I haven't think about UTF-8 yet, also unicode.

Copy link
Copy Markdown
Contributor Author

@S-YOU S-YOU Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very sure which utf8 sequence is best to test, but added some, tests still passed.

@S-YOU
Copy link
Copy Markdown
Contributor Author

S-YOU commented Jan 19, 2017

  • Rebased, fix reviewed issues
  • Added utf8 replace test cases, but not sure I am testing real issues.

@S-YOU S-YOU changed the title [WIP] Implement str.replace Implement str.replace Jan 19, 2017
@trotterdylan
Copy link
Copy Markdown
Contributor

Thanks for the changes! The issue I'm pointing out is that in Go strings.Replace() treats the input as a UTF-8 encoded string and if the old string is "" then it will add a replacement in between each UTF-8 code point. E.g.:

package main

import (
	"fmt"
	"strings"
)

func main() {
	fmt.Println([]byte(strings.Replace("\xd0\xb2\xd0\xbe\xd0\xbb", "", "\x00", -1)))
}

outputs: [0 208 178 0 208 190 0 208 187 0]

vs. CPython

>>> [ord(c) for c in"\xd0\xb2\xd0\xbe\xd0\xbb".replace('', '\0')]
[0, 208, 0, 178, 0, 208, 0, 190, 0, 208, 0, 187, 0]

I don't think it's a good idea to use strings.Replace() because of these subtle incompatibilities.

@S-YOU
Copy link
Copy Markdown
Contributor Author

S-YOU commented Jan 19, 2017

Hmm, I see. Is that mean, need to implement by looping through bytes?

@trotterdylan
Copy link
Copy Markdown
Contributor

trotterdylan commented Jan 19, 2017 via email

@S-YOU
Copy link
Copy Markdown
Contributor Author

S-YOU commented Jan 19, 2017

Thank you but it mention same thing,
If old is empty, it matches at the beginning of the slice and after each UTF-8 sequence

The issue is only on empty old string? then I can just write a loop for that.

@trotterdylan
Copy link
Copy Markdown
Contributor

trotterdylan commented Jan 19, 2017 via email

@S-YOU
Copy link
Copy Markdown
Contributor Author

S-YOU commented Jan 20, 2017

Updated. I checked both strings.Replace and bytes.Replace source. It appears that they doing special checking on len(old) == 0.

	            if len(old) == 0 {
			if i > 0 {
				_, wid := utf8.DecodeRuneInString(s[start:])
				j += wid
			}

So I added a loop to copy manually for len(old) == 0 case.
I was thinking to copy codes from go source, but a lot of code to be copied on.

Copy link
Copy Markdown
Contributor

@trotterdylan trotterdylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I have a few suggestions.

Comment thread runtime/str.go Outdated
s := toStrUnsafe(args[0]).Value()
old := toStrUnsafe(args[1]).Value()
sub := toStrUnsafe(args[2]).Value()
// CPython only, pypy supposed to be same as Go.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about the meaning of this comment. Could you try to make the message a little clearer?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try.

Comment thread runtime/str.go Outdated
old := toStrUnsafe(args[1]).Value()
sub := toStrUnsafe(args[2]).Value()
// CPython only, pypy supposed to be same as Go.
if len(s) == 0 && len(old) == 0 && n >= 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably overlooking something but don't we always want to return "" when len(s) is zero?

Copy link
Copy Markdown
Contributor Author

@S-YOU S-YOU Jan 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"".replace("", "x") returns "x", but "".replace("", "x",1) return "" in CPython, so it is needed imo.

Comment thread runtime/str.go Outdated
buf.WriteString(sub)
n--
}
for i := 0; i < len(s); i++ {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Store len(s) as a temporary numBytes or something so it's not recomputed every iteration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Comment thread runtime/str.go Outdated
if n == 0 {
return NewStr(s).ToObject(), nil
}
if len(old) == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe switch the conditional to len(old) > 0 and do the strings.Replace() call to exit the function early. That would mean this big block of code doesn't have to be indented and it's probably easier to read.

Comment thread runtime/str.go Outdated
if n == 0 {
return NewStr(s).ToObject(), nil
}
if len(old) == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a comment about why we do all this extra work instead of using strings.Replace()

Comment thread runtime/str.go Outdated
if n == 0 {
return NewStr(s).ToObject(), nil
}
if len(old) == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer old == "" to len(old) == 0

Comment thread runtime/str.go Outdated
n--
} else {
i++
if i < len(s) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this branch after the loop? I think the code would read more clearly: do the required replacements, then write the remainder of the string.

I think things would be tidier if your loop condition dictated when you exit the loop:

i := 0
for n > 0 && i < numBytes {
  // .. replacement logic
  i++
  n--
}

As written it's not immediately obvious what the termination condition is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, thank you.

@S-YOU
Copy link
Copy Markdown
Contributor Author

S-YOU commented Jan 21, 2017

Updated those, Please take a look again.

Copy link
Copy Markdown
Contributor

@trotterdylan trotterdylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks for the implementation and for the thorough test cases. Merging.

@trotterdylan trotterdylan merged commit a09bb56 into google:master Jan 21, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants