Skip to content

Commit

Permalink
encoding/binary: use an offset instead of slicing
Browse files Browse the repository at this point in the history
While running make.bash, over 5% of all pointer writes
come from encoding/binary doing struct reads.

This change replaces slicing during such reads with an offset.
This avoids updating the slice pointer with every
struct field read or write.

This has no impact when the write barrier is off.
Running the benchmarks with GOGC=1, however,
shows significant improvement:

name          old time/op    new time/op    delta
ReadStruct-8    13.2µs ± 6%    10.1µs ± 5%  -23.24%  (p=0.000 n=10+10)

name          old speed      new speed      delta
ReadStruct-8  5.69MB/s ± 6%  7.40MB/s ± 5%  +30.18%  (p=0.000 n=10+10)

Change-Id: I22904263196bfeddc38abe8989428e263aee5253
Reviewed-on: https://go-review.googlesource.com/98757
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
  • Loading branch information
josharian committed Mar 6, 2018
1 parent f7739c0 commit b854339
Showing 1 changed file with 29 additions and 27 deletions.
56 changes: 29 additions & 27 deletions src/encoding/binary/binary.go
Original file line number Diff line number Diff line change
Expand Up @@ -419,70 +419,71 @@ func sizeof(t reflect.Type) int {
}

type coder struct {
order ByteOrder
buf []byte
order ByteOrder
buf []byte
offset int
}

type decoder coder
type encoder coder

func (d *decoder) bool() bool {
x := d.buf[0]
d.buf = d.buf[1:]
x := d.buf[d.offset]
d.offset++
return x != 0
}

func (e *encoder) bool(x bool) {
if x {
e.buf[0] = 1
e.buf[e.offset] = 1
} else {
e.buf[0] = 0
e.buf[e.offset] = 0
}
e.buf = e.buf[1:]
e.offset++
}

func (d *decoder) uint8() uint8 {
x := d.buf[0]
d.buf = d.buf[1:]
x := d.buf[d.offset]
d.offset++
return x
}

func (e *encoder) uint8(x uint8) {
e.buf[0] = x
e.buf = e.buf[1:]
e.buf[e.offset] = x
e.offset++
}

func (d *decoder) uint16() uint16 {
x := d.order.Uint16(d.buf[0:2])
d.buf = d.buf[2:]
x := d.order.Uint16(d.buf[d.offset : d.offset+2])
d.offset += 2
return x
}

func (e *encoder) uint16(x uint16) {
e.order.PutUint16(e.buf[0:2], x)
e.buf = e.buf[2:]
e.order.PutUint16(e.buf[e.offset:e.offset+2], x)
e.offset += 2
}

func (d *decoder) uint32() uint32 {
x := d.order.Uint32(d.buf[0:4])
d.buf = d.buf[4:]
x := d.order.Uint32(d.buf[d.offset : d.offset+4])
d.offset += 4
return x
}

func (e *encoder) uint32(x uint32) {
e.order.PutUint32(e.buf[0:4], x)
e.buf = e.buf[4:]
e.order.PutUint32(e.buf[e.offset:e.offset+4], x)
e.offset += 4
}

func (d *decoder) uint64() uint64 {
x := d.order.Uint64(d.buf[0:8])
d.buf = d.buf[8:]
x := d.order.Uint64(d.buf[d.offset : d.offset+8])
d.offset += 8
return x
}

func (e *encoder) uint64(x uint64) {
e.order.PutUint64(e.buf[0:8], x)
e.buf = e.buf[8:]
e.order.PutUint64(e.buf[e.offset:e.offset+8], x)
e.offset += 8
}

func (d *decoder) int8() int8 { return int8(d.uint8()) }
Expand Down Expand Up @@ -646,15 +647,16 @@ func (e *encoder) value(v reflect.Value) {
}

func (d *decoder) skip(v reflect.Value) {
d.buf = d.buf[dataSize(v):]
d.offset += dataSize(v)
}

func (e *encoder) skip(v reflect.Value) {
n := dataSize(v)
for i := range e.buf[0:n] {
e.buf[i] = 0
zero := e.buf[e.offset : e.offset+n]
for i := range zero {
zero[i] = 0
}
e.buf = e.buf[n:]
e.offset += n
}

// intDataSize returns the size of the data required to represent the data when encoded.
Expand Down

6 comments on commit b854339

@navytux
Copy link
Contributor

@navytux navytux commented on b854339 Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josharian thinking out loud maybe it is better to instead teach the compiler to know that slice updates with destination array pointer pointing to the same array's underlying object as of original slice, e.g. as in

buf = buf[1:]

does not need write barrier at all?

The rationale (to my limited write barriers understanding) is that it is whole allocated object that is shaded by write barrier:

//     writePointer(slot, ptr):
//         shade(*slot)
//         if current stack is grey:
//             shade(ptr)
//         *slot = ptr
...
// 1. shade(*slot) prevents a mutator from hiding an object by moving
// the sole pointer to it from the heap to its stack. If it attempts
// to unlink an object from the heap, this will shade it.
//
// 2. shade(ptr) prevents a mutator from hiding an object by moving
// the sole pointer to it from its stack into a black object in the
// heap. If it attempts to install the pointer into a black object,
// this will shade it.

(https://github.com/golang/go/blob/d7eb4901/src/runtime/mbarrier.go#L21)

func shade(b uintptr) {
        if obj, span, objIndex := findObject(b, 0, 0); obj != 0 {
        ...

(https://github.com/golang/go/blob/d7eb4901/src/runtime/mgcmark.go#L1211)

func findObject(p, refBase, refOff uintptr) (base uintptr, s *mspan, objIndex uintptr) {
        s = spanOf(p)
        ...

(https://github.com/golang/go/blob/d7eb4901/src/runtime/mbitmap.go#L354)

and so for buf = buf[1:] the slice, either before or after the update, will be pointing to inside the same allocated object -> thus the mutator for sure won't hide the allocated object from GC and so there is no need to shade it with such updates.

Doing so in the compiler will fix the performance issue this patch solves, as well as automatically remove write barriers in other, probably many, places thus helping not only speed but also code size (#6853).

I appologize if there is something trivial preventing doing this, that I missed.

Kirill

/cc @aclements

@aclements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The danger here is subtle. We haven't done this because the update to buf could be racy and if we did this that race could break GC. Specifically, if one goroutine executes buf = buf[1:] without a write barrier while another executes buf = x with a write barrier, it's possible for the final value of buf to continue pointing to the original buf allocation, but for the garbage collector to think it points to the x allocation and free the original buf allocation.

Of course, racy Go programs are undefined, but right now the garbage collector is immune to these sorts of non-type-safety-breaking "benign" races (this is obviously a somewhat fuzzy argument, since races let you break type safety and that lets you break memory safety, and saying anything formal about "benign" races is notoriously impossible so what I'm saying is probably not true anyway, but we try :)

@dgryski
Copy link
Contributor

@dgryski dgryski commented on b854339 Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aclements It would be interesting to see the effect such an update would have on code size and performance to know how much we're leaving on the table with this safety check on.

@navytux
Copy link
Contributor

@navytux navytux commented on b854339 Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aclements thanks for explaining. Am I right that with write-barriers enabled for both racy buf updates, the mechanism that prevents GC from deallocating wrong object is that in the current GC written ptr is always shaded too (https://github.com/golang/go/blob/go1.10-1-g678dede7bc/src/runtime/mbarrier.go#L162 (go1.10), https://github.com/golang/go/blob/0add9a4dcf/src/runtime/mwbbuf.go#L226 (tip)) ?

By the way the thread that performs buf = x with write barrier will always do shade(*slot). If there is buf = buf[1:] racing with buf = x, even if the former does not use write barrier, the shade(*slot) of buf = x thread will shade original buf object in any case (it either sees buf or buf+1 as *slot but they both actually lead to the same allocated object). If so, since shade actually greys an object and grey objects never become white - only eventually black, we can say that original buf underlying object will stay alive - not deallocated, and so it is safe to do buf = buf[1:] without write barrier.

But this goes contrary to what you say, and so there is probably some mistake on my side. Could you please explain where the above logic fails?

I'm GC newbie so please forgive me if I miss something well-known.

@josharian
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navytux would you mind opening a new issue and migrating this (interesting) discussion there? We tend to try to keep all conversation on CLs and issues. Thanks!

@navytux
Copy link
Contributor

@navytux navytux commented on b854339 Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josharian, good idea -> #24314.

Please sign in to comment.