Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: read-only escape analysis and avoiding string <-> []byte copies #2205

Open
bradfitz opened this issue Aug 29, 2011 · 22 comments

Comments

@bradfitz
Copy link
Member

commented Aug 29, 2011

Many functions take a []byte but only read from it.

If the escape analysis code could flag parameters of a function as read-only, then code
which passes in a []byte(string) conversion could be cheaper.

bytes.IndexByte is one example.

For example, http/sniff.go does:

// -------------------
  // Index of the first non-whitespace byte in data.                                                                                               
  firstNonWS := 0
  for ; firstNonWS < len(data) && isWS(data[firstNonWS]); firstNonWS++ {
  }

func isWS(b byte) bool {
        return bytes.IndexByte([]byte("\t\n\x0C\n "), b) != -1
}

// -------------------

But it's faster to avoid re-creating the []byte:

func BenchmarkIndexByteLiteral(b *testing.B) {
        for i := 0; i < b.N; i++ {
        IndexByte([]byte("\t\n\x0C\n "), 'x')
        }
}

func BenchmarkIndexByteVariable(b *testing.B) {
        var whitespace = []byte("\t\n\x0C\n ")
        for i := 0; i < b.N; i++ {
                IndexByte(whitespace, 'x')
        }
}

bytes_test.BenchmarkIndexByteLiteral    20000000           125 ns/op
bytes_test.BenchmarkIndexByteVariable   100000000           25.4 ns/op

Related is issue #2204.
@gopherbot

This comment has been minimized.

Copy link

commented Aug 30, 2011

Comment 1 by jp@webmaster.ms:

It seem to be much easier to add the possibility of zerocopish taking string slices from
byte slices/arrays.
As far I understand, string slices are actually readonly and cap()'less byte slices:
http://research.swtch.com/2009/11/go-data-structures.html
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Aug 30, 2011

Comment 2:

zerocopish?
@rsc

This comment has been minimized.

Copy link
Contributor

commented Aug 30, 2011

Comment 3:

Can you be more specific?
Examples of programs that you think should be
handled specially would be a good way to do that.
@rsc

This comment has been minimized.

Copy link
Contributor

commented Sep 15, 2011

Comment 4:

Owner changed to @rsc.

Status changed to Accepted.

@lvdlvd

This comment has been minimized.

Copy link

commented Nov 7, 2011

Comment 5:

Labels changed: added compilerbug, performance.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2011

Comment 6:

Labels changed: added priority-later.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2012

Comment 8:

Labels changed: added go1.1maybe.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Oct 25, 2012

Comment 9:

I run into this often, but I should start listing examples.
In goprotobuf, text.go calls writeString with both a string and a []byte converted to a
string:
// writeAny writes an arbitrary field.
func writeAny(w *textWriter, v reflect.Value, props *Properties) {
        v = reflect.Indirect(v)
        // We don't attempt to serialise every possible value type; only those
        // that can occur in protocol buffers, plus a few extra that were easy.
        switch v.Kind() {
        case reflect.Slice:
                // Should only be a []byte; repeated fields are handled in writeStruct.
                writeString(w, string(v.Interface().([]byte)))
        case reflect.String:
                writeString(w, v.String())
Note that the function writeString reallly just wants a read-only slice of bytes:
func writeString(w *textWriter, s string) {
        w.WriteByte('"')
        // Loop over the bytes, not the runes.
        for i := 0; i < len(s); i++ {
                // Divergence from C++: we don't escape apostrophes.
                // There's no need to escape them, and the C++ parser
                // copes with a naked apostrophe.
                switch c := s[i]; c {
                case '\n':
                        w.Write([]byte{'\\', 'n'})
                case '\r':
                        w.Write([]byte{'\\', 'r'})
                case '\t':
                        w.Write([]byte{'\\', 't'})
                case '"':
                        w.Write([]byte{'\\', '"'})
                case '\\':
                        w.Write([]byte{'\\', '\\'})
                default:
                        if isprint(c) {
                                w.WriteByte(c)
                        } else {
                                fmt.Fprintf(w, "\\%03o", c)
                        }
                }
        }
        w.WriteByte('"')
}
It doesn't matter that it's frozen (like a string), nor writable (like a []byte).  But
Go lacks that type, so if instead it'd be nice to write writeAny with a []byte parameter
and invert the switch above to be like:
        switch v.Kind() {
        case reflect.Slice:
                // Should only be a []byte; repeated fields are handled in writeStruct.
                writeString(w, v.Interface().([]byte))
        case reflect.String:
                writeString(w, []byte(v.String())) // no copy!
Where the []byte(v.String()) just makes a slice header pointing in to the string's
memory, since the compiler can verify that writeAny never mutates its slice.
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Nov 17, 2012

Comment 10:

See patch and CL description at http://golang.org/cl/6850067 for the opposite
but very related case: strconv.ParseUint, ParseBool, etc take a string but calling code
has a []byte.
@robpike

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2013

Comment 11:

Labels changed: removed go1.1maybe.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Mar 12, 2013

Comment 12:

[The time for maybe has passed.]
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Mar 31, 2013

Comment 13:

People who like this bug also like issue #3512 (cmd/gc: optimized map[string] lookup from
[]byte key)
@dvyukov

This comment has been minimized.

Copy link
Member

commented Mar 31, 2013

Comment 14:

FWIW
var m map[string]int
var b []byte
_ = m[string(b)]
case must be radically simpler to implement than general read-only analysis.
It's peephole optimization, when the compiler sees such code it can generate hacky
string object using the []byte pointer.
Another example would be len(string(b)), but this seems useless.
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Mar 31, 2013

Comment 15:

Yes. That's why they're separate bugs.
I want issue #3512 first, because it's easy.
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Apr 26, 2013

Comment 16:

Labels changed: added garbage.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2013

Comment 18:

Labels changed: added priority-someday, removed priority-later.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2013

Comment 19:

Labels changed: added repo-main.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Mar 3, 2014

Comment 20:

Adding Release=None to all Priority=Someday bugs.

Labels changed: added release-none.

@gopherbot

This comment has been minimized.

Copy link

commented Oct 30, 2018

Change https://golang.org/cl/146018 mentions this issue: strings: declare Index as noescape

gopherbot pushed a commit that referenced this issue Oct 30, 2018
strings: declare IndexByte as noescape
This lets []byte->string conversions which are used as arguments to
strings.IndexByte and friends have their backing store allocated on
the stack.

It only prevents allocation when the string is small enough (32
bytes), so it isn't perfect. But reusing the []byte backing store
directly requires a bunch more compiler analysis (see #2205 and
related issues).

Fixes #25864.

Change-Id: Ie52430422196e3c91e5529d6e56a8435ced1fc4c
Reviewed-on: https://go-review.googlesource.com/c/146018
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@josharian

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

I run into this often, but I should start listing examples.

Good idea. From #29802/#29810: encoding/hex.Decode.

@cespare

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

This comes up with hash functions (example: sha256.Sum256([]byte(s))).

For github.com/cespare/xxhash in addition to Sum64 I added Sum64String which does the unsafe string-to-slice conversion:

https://github.com/cespare/xxhash/blob/3767db7a7e183c0ad3395680d460d0b61534ae7b/xxhash_unsafe.go#L28-L35

but I'd rather the caller just be able to use Sum64([]byte(s)) and have the compiler optimize it.

@go101

This comment has been minimized.

Copy link

commented Apr 18, 2019

If read-only byte slice is supported, we can set the cap of read-only byte slices as a negative value to indicate the underlying bytes are immutable. []byte(aString) results a read-only byte slice without duplicating the underlying bytes of aString. Convertingthe result byte slice, if its cap is a negative value, back again to a string also needs not to duplicate the underlying bytes.

Without the read-only slice feature, is it possible to lazy duplicate the underlying bytes for coversion []byte(aString) on demand?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.