Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: leads to memory leaks #5749

Closed
dvyukov opened this issue Jun 21, 2013 · 2 comments

Comments

Projects
None yet
4 participants
@dvyukov
Copy link
Member

commented Jun 21, 2013

This is uncomfortably amusing bug.

So what we have:
 - a bug that depends on fraction of second when RPC connection is established 
 - a bug that intelligently works around -Z linker flag
 - a bug that exploits details of code generation to combine high part of a real pointer with a random garbage to generate random pointers
 - a bug that wipes out own tracks over time

Too much coincidences for one bug, don't you find? :)
As a result it non-deterministically but reliably pins random memory blocks in 1GB of
heap, non-deterministically unpins it over time and pins new blocks.

Not sure where to start...

Let's start with the time.Time struct layout:
https://code.google.com/p/go/source/browse/src/pkg/time/time.go
type Time struct {
        sec int64
        nsec int32
        loc *Location
}

Note that nsec is int32 with a padding right after it (on amd64).

Then time.Unix() functions:
func Unix(sec int64, nsec int64) Time {
    ...
        return Time{sec + unixToInternal, int32(nsec), Local}
}

When it stores nsec into return value, it writes 4 bytes leaving the high 4 bytes
unchanged:
  4c9ea7:       48 89 54 24 18          mov    %rdx,0x18(%rsp)
  4c9eac:       89 5c 24 20             mov    %ebx,0x20(%rsp)
  4c9eb0:       48 89 44 24 28          mov    %rax,0x28(%rsp)
  4c9eb5:       c3                      retq   

On timer goroutine this stack location episodically contains a real pointer (from
runtime.ready form or from other function calls). So this write combines 0x00c2 high
part of a real pointer with nsec value, producing completely random but real pointers
into first GB of heap.

Note that -Z does not prevent this, because (1) it does nothing for functions w/o stack
frames (time.Unix() does not have a stack frame) and (2) it does not zeroize return
values at all.

time.Ticker uses time.Unix() to create Time values. So the values received from
time.Ticker.C frequently contain values that look like random pointers.

Then, the program contains healthChecks() function along the lines of:

func (conn *Conn) healthChecks() {
    ticker := time.NewTicker(1 * time.Second)
    var err error
    msg := newMessage()
    err = conn.sendHealthCheck(msg)
    for err == nil {
        now := <-ticker.C
        if conn.needToSendHealthCheck(now) {
            err = conn.sendHealthCheck(msg, side)
        }
    }
}

It has a ticker with 1 second period (which is important as well).

After receiving from the ticker it obtains that random pointer value in 0x18(rsp).
Usually the function does not send health checks and so preserves the pointer value till
next ticker chan wait.

GC usually finds healthChecks goroutines blocked exactly on ticker chan.

When it scans the stack, it can not obtain arglen info for runtime.chanrecv1. So it
conservatively assumes that args are the whole frame of the caller (healthChecks) w/o
arg pointer info.
This way it finds the fake pointer/time value on the stack and pins a random memory
block in first GB.

Note that precision improvements on tip do not fix this. I actually describe how it
works on tip.

Here is what I observed:
goroutine 59 [chan receive]:
Conn.healthChecks(0xc2100563c0, 0xc2101ce500, 0xc2101ce5a0) fp=0x7f90070f4f88
sp=0x7f90070f4e40 arglen=0x18 varlen=0x110

The frame block 0x7f90070f4e40 is scanned twice. First time to scan local vars of
healthChecks with size 0x110:
block 0x7f90070f4e70-0x7f90070f4f80(0x110)   

and the second time to scan args to runtime.chanrecv1 with size 0x148:
block 0x7f90070f4e40-0x7f90070f4f88(0x148)   

This second scan finds the weird pointer 0xc217defd59 at 0x18(rsp) pointing into middle
of 4MB heap block 0xc217a77000-0xc217e77000.

This bug has a higher probability of pinning large memory blocks. Now, since healthCheck
ticker period is 1 second, it tends to generate similar fake pointers over and over
again (nsec values are close). E.g. for the 4MB block, it has the grace window of 4
msec. But due to time drift and delays, it will release the block eventually leaving no
traces behind itself (but pin a new block).

Phew!

Actions we need to take:

1. Fix -Z so that it is useful in such cases.

2. Of course long term we need 100% precise GC, but in short term:

3. Teach GC about argsize of chanrecv/send/select, because lots of goroutines are
blocked there during the scan.

4. Fix time.Unix() function so that it does not produce fake pointers.

5. Look into compiler code generation to prevent such issues in other places.
@robpike

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2013

Comment 1:

Labels changed: added priority-soon, garbage, removed priority-triage.

Status changed to Accepted.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2013

Comment 2:

I changed the format of time.Time to avoid having any uninitialized padding. That's good
enough for this issue. The rest of the changes are part of more comprehensive garbage
collector revision and don't need to be tracked on this bug.

Status changed to Fixed.

@golang golang locked and limited conversation to collaborators Jun 24, 2016

This issue was closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.