Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: reverse pool locking problem in the coordinator #10750

Closed
bradfitz opened this issue May 7, 2015 · 2 comments

Comments

Projects
None yet
2 participants
@bradfitz
Copy link
Member

commented May 7, 2015

@crawshaw,

farmer.golang.org is hanging. Interesting stack goroutine:

goroutine 46046 [semacquire, 18 minutes]:
sync.runtime_Semacquire(0xc733cc)
    /home/bradfitz/go/src/runtime/sema.go:43 +0x2d
sync.(*Mutex).Lock(0xc733c8)
    /home/bradfitz/go/src/sync/mutex.go:82 +0x1c7
main.(*reverseBuildletPool).WriteHTMLStatus(0xc733c0, 0x7f27f805d588, 0xc20883aaf0)
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/reverse.go:193 +0x16a
main.handleStatus(0x7f27f806bfb8, 0xc208503970, 0xc20821b1e0)
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/status.go:60 +0xf61

...

goroutine 132 [semacquire, 23 minutes]:
sync.runtime_Semacquire(0xc733cc)
    /home/bradfitz/go/src/runtime/sema.go:43 +0x2d
sync.(*Mutex).Lock(0xc733c8)
    /home/bradfitz/go/src/sync/mutex.go:82 +0x1c7
main.(*reverseBuildletPool).tryToGrab(0xc733c0, 0x9f50f0, 0xe, 0x0, 0x0, 0x0)
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/reverse.go:72 +0x64
main.(*reverseBuildletPool).GetBuildlet(0xc733c0, 0x9f50f0, 0xe, 0xc2083df4a0, 0x28, 0x7f27f80739c0, 0xc2082addc0, 0xc2083c0a00, 0x0, 0x0)
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/reverse.go:160 +0x67
main.(*buildStatus).build(0xc2082addc0, 0x0, 0x0)
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/coordinator.go:925 +0x1f6
main.(*buildStatus).start.func1(0xc2082addc0)
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/coordinator.go:910 +0x33
created by main.(*buildStatus).start
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/coordinator.go:916 +0x7e

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented May 7, 2015

In the reverse buildlet healthcheck, this channel receive is blocking for 33+ minutes while holding the mutex:

// reverseHealthCheck requests the status page of each idle buildlet.                                                                                              
// If the buildlet fails to respond promptly, it is removed from the pool.                                                                                         
func (p *reverseBuildletPool) reverseHealthCheck() {
        p.mu.Lock()
        responses := make(map[*reverseBuildlet]chan error)
        for _, b := range p.buildlets {
                if b.inUseAs == "health" { // sanity check                                                                                                         
                        panic("previous health check still running")
                }
                if b.inUseAs != "" {
                        continue // skip busy buildlets                                                                                                            
                }
                b.inUseAs = "health"
                res := make(chan error, 1)
                responses[b] = res
                client := b.client
                go func() {
                        _, err := client.Status()
                        res <- err
                }()
        }
        p.mu.Unlock()
        time.Sleep(5 * time.Second) // give buildlets time to respond                                                                                              
        p.mu.Lock()

        var buildlets []*reverseBuildlet
        for _, b := range p.buildlets {
                res := responses[b]
                if b.inUseAs != "health" || res == nil {
                        // buildlet skipped or registered after health check                                                                                       
                        buildlets = append(buildlets, b)
                        continue
                }
                b.inUseAs = ""
                err, done := <-res     // <------------ HERE

(that final line)

@gopherbot

This comment has been minimized.

Copy link

commented May 7, 2015

CL https://golang.org/cl/9851 mentions this issue.

@golang golang locked and limited conversation to collaborators Jun 25, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.