New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize Write([]byte(stringVal)) to not copy the string #18822

Open
ianlancetaylor opened this Issue Jan 27, 2017 · 10 comments

Comments

Projects
None yet
6 participants
@ianlancetaylor
Contributor

ianlancetaylor commented Jan 27, 2017

It would be nice to optimize Write([]byte(stringVal)) to not copy the string value. This is normally safe because most Write methods do not modify the byte slice passed in. In fact, the documentation for io.Writer requires that the Write method not modify the byte slice, although this is not (and can not be) enforced.

Here is how we can do this.

For each function and method that takes a parameter of slice type, record whether the slice's underlying array is modified. This will require generating annotations similar to the ones we generate for escape analysis. Assembly functions will require an annotation similar to //go:noescape. (Naturally, a slice that escapes must be treated as modified.)

For each call of the form F([]byte(stringVal)), where we know that F does not modify the slice, we can pass the string value directly. This would do essentially the same thing as the existing optimization for map lookups in which a []byte is converted to a string. This fixes direct calls, but of course the interesting cases all involve calls to methods of values of type io.Writer.

For any type with a Write method that does modify the slice, generate at compile time an additional Write·2 method that makes a copy of the slice and then calls the Write method. The method is named Write·2 to ensure that it does not conflict with any user written method.

When converting any type to io.Writer (or any interface type that inherits from io.Writer), check for the presence of a Write·2 method. If that method exists, add it to the interface as an additional entry in the itab.

For any call to an interface method Write([]byte(stringVal)), modify the call to call a special function doWrite in the io package, without copying the string. doWrite will check for a Write·2 method, and call it if it exists; that will cause the string to be copied as happens today. If the Write·2 method does not exist, which will be the normal case, doWrite will call the Write method as usual, knowing that the Write method does not modify the slice.

Generalizing this to other methods is left as an exercise for the reader.

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Jan 27, 2017

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jan 27, 2017

I feel like there was a dup bug with the same plan but rejected because as I recall: nobody should be working with such large strings and stay in bytes if they want performance.

Was this motivated by something concrete?

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jan 27, 2017

There was no specific motivation; it was just an idea I had.

@philhofer

This comment has been minimized.

Contributor

philhofer commented Jan 27, 2017

Your compiler annotation for Write sounds a lot like the compiler secretly interpreting Write([]byte) to be Write(immutable []byte); the same could be done for other pointer arguments that are provably unmodified.

Such an (internal) annotation would have additional benefits:

  • Methods that do not modify their receiver won't require callers to re-load fields (I don't know how common this is or how much it would matter, but it's something)
  • Functions/methods with only immutable arguments (including implicit globals) may be hoisted, etc. (Similar to __attribute__((const)) and __attribute__((pure)).

@dsnet dsnet added the Performance label Jan 27, 2017

@dsnet

This comment has been minimized.

Member

dsnet commented Jan 27, 2017

This would remove the need for my attempted optimization in #13848, which tried to optimize string writing at the standard library level.

@dsnet

This comment has been minimized.

Member

dsnet commented Jan 27, 2017

How does this work for nested calls of io.Writer?

func (t *T) Write(p []byte) (int, error) {
	return t.wr.Write(p)
}

You do not know if T.Write mutates p or not because it depends on whether t.wr.Write mutates p. To be conservative, the compiler will need to assume that it is mutated and then it will emit T.Write·2, removing any benefit from the proposed optimization.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jan 27, 2017

@dsnet For a case like that you turn T.Write·2 into a copy of the entire method, that passes p along uninterpreted for possibly copying by t.wr.Write·2.

@dsnet

This comment has been minimized.

Member

dsnet commented Jan 27, 2017

@ianlancetaylor, Doesn't that approach fail for this example (that nobody should ever write):

	t := &T{wr: ...}
	t.Write([]byte("hello"))

func (t *T) Write(p []byte) (int, error) {
	n, err := t.wr.Write(p) // Assume t.wr.Write mutates p, but not known at compile time
	fmt.Println(p[0]) // Read p, expect to see mutated value
	return n, err
}

So T.Write·2 is a copy of T.Write but the call to t.wr.Write is replaced by doWrite(t.wr, p). If t.wr.Write does mutate p, then we are guaranteed that the "hello" string is not accidentally mutated (correct). However, the fmt.Println(p[0]) will not be able to observe the mutation that t.wr.Write caused because of the copy (wrong).

I guess that this is solvable if doWrite was changed such that the new copy was known to the caller. Something like func doWrite(w io.Writer, p *[]byte)

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jan 27, 2017

You're right: in general, if the method does something with the buffer after passing it to Write, it may not be possible to apply this optimization to callers of that method.

@minux

This comment has been minimized.

Member

minux commented Jan 28, 2017

@ALTree

This comment has been minimized.

Member

ALTree commented Feb 1, 2017

Related to #2205 (cmd/compile: read-only escape analysis and avoiding string <-> []byte copies).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment