cmd/gc: make interface updates atomic wrt garbage collector #8405

Closed
rsc opened this Issue Jul 22, 2014 · 27 comments

Comments

Projects
None yet
6 participants
Contributor

rsc commented Jul 22, 2014

Today garbage collection treats interfaces as multiword objects: the type word tells
whether the data word is a pointer.

When we move to a concurrent garbage collector that effectively races with the other
goroutines, interface updates made by an ordinary goroutine must be seen atomically by
the concurrent collector. It must not be possible for the collector to observe a
half-updated interface value.

(The same general problem applies to slices and strings, but the solution will be
different, so they are a separate issue, issue #8404.)

There are at least three possibilities:

(1) Use double-word atomic operations to read and write any interface value stored in
the heap. Interface values on the stack can still be modified using non-atomic
operations, because stack scans happen only at safe points.

(2) Change interface representations so that the data word is always a pointer. The
garbage collector would ignore type and always scan data sa a pointer, reducing the read
to a single word, making it atomic.

(3) Change interface representations to be three words: type, ptr-data, non-ptr data.
The garbage collector would ignore type and always scan ptr-data as a pointer, reducing
the read to a single word, making it atomic.

The benefit of (1) is no representation changes. The drawback is that the atomic
operation may be too expensive and may not even be available on all systems (64-bit
Intel Atom chips?). We don't actually know how many interface updates occur on the stack
vs the heap. My guess is that probably more than half are on the stack, so maybe making
heap updates more expensive is not terrible.

The benefit of (2) is no atomics and no interface value size change. It also brings the
gc and gccgo implementations in line (gccgo already does this). The drawback is the
allocations when converting from integer/bool/tiny struct to interface.

The benefit of (3) is no allocations. The drawback is the larger interface value size. I
expect that there are many places in the code that assume two-word interfaces and that
changing this will require tracking down these many latent assumptions.

I don't think we know which of these is the right solution. We need more data about (1)
how many interface read/writes are from the heap as opposed to the stack, and how
expensive/available the atomic operations are, (2) how many interface values would now
allocate in a typical program, and (3) how much extra space is taken up by the larger
interface values and how invasive the change in size is to other code.
Member

minux commented Jul 22, 2014

Comment 1:

double-word atomic operations are very rarely supported:
(hardware transaction memory based implementation not included,
and not to mention that HTM is still a new concept with supports
from only the latest processors: intel x86 [amd has another
incompatible proposal] and power)
ARMv6K+, x86/amd64 and Power ISA v2.07+ (e.g. Power 8) support
double-word atomic instructions.
ARMv5 or below, mips, pre-v2.07 power, sparc all don't support
double-word atomic operations.
Also, some (not all?) double-word atomic instructions require
double-word alignment.
For those reasons, I suggest we don't use double-word atomics
for interface values.
Member

dvyukov commented Jul 22, 2014

Comment 2:

On Intel64 double-word atomic operations must be done with LOCK CMPXCHG16B, 128-bit SSE
loads and stores are not guaranteed to be atomic. LOCK CMPXCHG16B sort of does not
require 16-byte alignment of data; but if the value happen to cross cache-line boundary
the cost just goes from 20 cycles up to 5000 cycles. I don't think we want to use
atomics here.
Member

dvyukov commented Jul 22, 2014

Comment 3:

The data is here:
https://docs.google.com/spreadsheets/d/1PTJjoWdgKJvY2S63V2Lh3VgLJrXXW3wO_IXxy2rNA8s/preview
If we rule out atomics, then it answers all questions.
Member

dvyukov commented Jul 22, 2014

Comment 4:

Through this link you can also see comments on values, which explain the meaning:
https://docs.google.com/spreadsheets/d/1PTJjoWdgKJvY2S63V2Lh3VgLJrXXW3wO_IXxy2rNA8s/edit#gid=0
Member

dvyukov commented Jul 22, 2014

Comment 5:

There is another possibility that I consider the best for now (based on the data).
Change interface representations so that the data word is always looks like a pointer.
Value that would currently be embed is allocated on heap iff it looks like a pointer
into heap. Copyin looks along the line of:
static void
copyin(Type *t, void *src, void **dst)
{
    uintptr size;
    void *p;
    byte *v;
    Alg *alg;
    size = t->size;
    alg = t->alg;
    if(size > sizeof(*dst)) {
        // large, so must be on heap anyway
        p = runtime·cnew(t);
        alg->copy(size, p, src);
        *dst = p;
    } else if((t->kind&KindNoPointers) == 0) {
        // pointer, map, chan - store directly
        alg->copy(size, dst, src);
    } else {
        // not a pointer, but fits into iface
        v = nil;
        alg->copy(size, &v, src);  // clear unused bytes if any
        if(v < arena_start || v >= arena_end) {  // note arena_end, not arena_used
            // does not look like a pointer, store directly
            alg->copy(size, dst, src);
        } else {
            // not a pointer, but looks like a pointer - move to heap
            p = runtime·cnew(t);
            alg->copy(size, p, src);
            *dst = p;
        }       
    }
}
The data word is always marked as pointer, and GC uses the usual [arena_start,
arena_used) check to determine whether it's an interesting pointer or not.
According to the data, there are very few values that fall into that last unfortunate
"not a pointer, but looks like a pointer" case even on 386 ("heap" column in "iface"
tab). So this solution allows to preserve both memory consumption and performance of the
current implementation.
Contributor

cznic commented Jul 22, 2014

Comment 6:

I initially wanted to vote for (3). I consider it breaking bad code as a bonus and an
opportunity to clean it up (in an implementation independent way where possible).
@5: Nice. Has my vote.
Contributor

rsc commented Jul 22, 2014

Comment 7:

I don't understand the spreadsheet. Did you remove the inlining of interface operations?
It doesn't look like it. Interface conversions that fit exactly a word do not use copyin
today. They are inlined.
g% cat x.go
package p
func f(x uintptr) interface{} {
    return x
}
g% go tool 6g -S x.go
"".f t=1 size=32 value=0 args=0x18 locals=0x0
    0x0000 00000 (x.go:3)   TEXT    "".f+0(SB),4,$0-24
    0x0000 00000 (x.go:3)   NOP ,
    0x0000 00000 (x.go:3)   NOP ,
    0x0000 00000 (x.go:3)   FUNCDATA    $2,gclocals·a73fd2a0c6f832642aa9216fd9c5e6be+0(SB)
    0x0000 00000 (x.go:3)   FUNCDATA    $3,gclocals·3280bececceccd33cb74587feedb1f9f+0(SB)
    0x0000 00000 (x.go:4)   MOVQ    "".x+8(FP),BX
    0x0005 00005 (x.go:4)   MOVQ    BX,"".~r1+24(FP)
    0x000a 00010 (x.go:4)   MOVQ    $type.uintptr+0(SB),"".~r1+16(FP)
    0x0013 00019 (x.go:4)   RET ,
    0x0000 48 8b 5c 24 08 48 89 5c 24 18 48 c7 44 24 10 00  H.\$.H.\$.H.D$..
...
It looks to me like the spreadsheet numbers do not include any of these conversions.
That is going to skew the results considerably.
Contributor

rsc commented Jul 22, 2014

Comment 8:

I also don't see anything in the spreadsheet that talks about the increased memory usage
that would be caused by making interface values 3 words.
Contributor

rsc commented Jul 22, 2014

Comment 9:

I also don't see anything in the spreadsheet that compares the allocation due to
interface conversions with the allocations happening in the rest of the program. Would
allocating instead of embedding double the number of allocations by the program? Would
it increase them by 1%? I can't tell. There are lots of questions it does not answer.
Member

dvyukov commented Jul 22, 2014

Comment 10:

> It looks to me like the spreadsheet numbers do not include any of these conversions.
That is going to skew the results considerably.
When did it happen? Don't say that it's this way for 3 years.
Yes, the data misses these conversions.
I think that the skew affects only iface/{embed, nil, small, heap} values. So the
following calculations are still correct.
The iface/alloc is the number of allocations due to copyin; malloc tab shows total
number of allocations in the same program; heap/iface/eface show number of ifaces in
heap during GC. These numbers must be enough to draw conclusions.
For example on garbage-amd64 benchmark heap is ~62MB; heap contains ~10M iface/efaces
which is 16MB today. If we increase sizeof iface/eface to 3 words, that will increase
heap size by 13%.
Contributor

rsc commented Jul 22, 2014

Comment 11:

changeset:   13832:8c461c433cb5
user:        Nigel Tao <nigeltao@golang.org>
date:        Thu Jun 14 10:43:20 2012 +1000
description: cmd/gc: inline convT2E when T is uintptr-shaped.
changeset:   14281:6d4707371015
user:        Daniel Morsing <daniel.morsing@gmail.com>
date:        Tue Sep 11 21:42:30 2012 +0200
description: cmd/gc: Inline pointer sized T2I interface conversions
In the spreadsheet, you are missing all the int/uint/uintptr -> interface
conversions. You are not seeing the "not a pointer but looks a pointer value" because
those are all missing. You're only seeing int16/int8 and maybe some operations coming
from reflect, and yes, very few of them look like a pointer. But the inlined ones may be
very different.
I think you need to run the numbers again without inlining of these operations enabled.
Member

dvyukov commented Jul 23, 2014

Comment 12:

I've recollected data with the optimization disabled (cl/112090043):
https://docs.google.com/spreadsheets/d/1PTJjoWdgKJvY2S63V2Lh3VgLJrXXW3wO_IXxy2rNA8s/preview
Now there is indeed much more embed allocations and some KindPointer conversions that
were absent previously.
There is still 0 "integers pointing into heap" (another bug?)
Member

dvyukov commented Jul 23, 2014

Comment 13:

> The benefit of (2) is no atomics and no interface value size change. It also brings
the gc and gccgo implementations in line (gccgo already does this).
Does gccgo allocate an object always? or only if it's not a pointer? We do want the
latter, but I suspect that gccgo can be doing the former.
According to the new data, most embed values are either pointers or small integers. We
don't need to allocate pointers, because they are already pointers. And for the small
integers we can have a persistent cache (which has rough edges as was noted previously).
Contributor

rsc commented Jul 23, 2014

Comment 14:

It could be that there are no integers pointing into the heap on a 64-bit system. I
picked the heap address so that it would be unlikely to collide, so that conservative
garbage collection would work better.
gccgo inlines pointers stored in interfaces. Give Ian some credit. :-)
Member

dvyukov commented Jul 23, 2014

Comment 15:

> It could be that there are no integers pointing into the heap on a 64-bit system.
I don't see them on 32-bits as well.
There are some integers, but they are small.
There are also float64, but on 32-bits they do not fit into the word anyway.
Contributor

rsc commented Jul 23, 2014

Comment 16:

You could write a simple test to see if integers pointing into the heap are being
detected correctly.
Member

dvyukov commented Jul 23, 2014

Comment 17:

Yes, it works. On the following program on 32 bits:
package main
func main() {
    for i := uint(0); i < 2<<30; i += 1<<20 {
        var x interface{} = i
        _ = x
    }
}
IFACE STATS
total=2051
embed=2051
alloc=0
nil=4
small=4
heap=1719
Member

dvyukov commented Jul 24, 2014

Comment 18:

I am still inclined towards #5.
(1) does not look feasible.
(3) can considerably increase heap size for some app, and looks much more complex from
implementation point of view.
(2) is strictly worser than #5 with regard to memory consumption and performance (#5 is
somewhat more complex from implementation POV, though)

Comment 19:

CL https://golang.org/cl/129090043 mentions this issue.

Comment 20:

CL https://golang.org/cl/130240043 mentions this issue.
Contributor

rsc commented Aug 23, 2014

Comment 21:

This issue was closed by revision 950ad4f.

Status changed to Fixed.

Comment 22:

CL https://golang.org/cl/133810043 mentions this issue.
Contributor

davecheney commented Aug 24, 2014

Comment 23:

This issue was closed by revision 5b70b71.

Contributor

rsc commented Aug 24, 2014

Comment 24:

Rolled back due to breakage on ARM. Take 2:
133820043   runtime: adjust errorCString definition to avoid allocation
133830043   cmd/gc: re-enable IfacePointerOnly

Owner changed to @rsc.

Status changed to Started.

Comment 25:

CL https://golang.org/cl/133820043 mentions this issue.

Comment 26:

CL https://golang.org/cl/133830043 mentions this issue.
Contributor

rsc commented Aug 25, 2014

Comment 27:

This issue was closed by revision 75b72b1.

Status changed to Fixed.

rsc self-assigned this Aug 25, 2014

rsc added this to the Go1.4 milestone Apr 14, 2015

rsc removed the release-go1.4 label Apr 14, 2015

gopherbot locked and limited conversation to collaborators Jun 25, 2016

This issue was closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.