runtime: GOMAXPROCS not working with cpu-bound goroutines #1492

Closed
rsc opened this Issue Feb 10, 2011 · 9 comments

Comments

Projects
None yet
3 participants
@rsc
Contributor

rsc commented Feb 10, 2011

Report from golang-nuts:

package main
import ("os"; "fmt"; "runtime")
func main() {
       runtime.GOMAXPROCS(2)
       fmt.Println("let's rock")
       go func() {fmt.Println("Hello world"); os.Exit(1)}()
       runtime.Gosched()
       for {}
}

does not exit.

The program should still exit, because GOMAXPROCS is 2.
Don't know why.  One possibility is that the garbage collector
is waiting to run and cannot interrupt the cpu-bound goroutine.
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Feb 10, 2011

Comment 1 by ejsherry:

Yeah, it looks like the garbage collector is holding things up. Here's a slightly more
explicit example:
package main
import (
    "os"
    "runtime"
)
func main() {
    runtime.GOMAXPROCS(2)
    go func() {
        println("b0")
        _ = new(int)
        println("b1")
        os.Exit(0)
    }()
    println("a0")
    for {
    }
}
The one goroutine hangs at new(int) since the allocator is trying to do a garbage
collection. It tries to stoptheworld which will hang since the other goroutine never
enters the scheduler. Removing the new(int) allows the example to exit.

Comment 1 by ejsherry:

Yeah, it looks like the garbage collector is holding things up. Here's a slightly more
explicit example:
package main
import (
    "os"
    "runtime"
)
func main() {
    runtime.GOMAXPROCS(2)
    go func() {
        println("b0")
        _ = new(int)
        println("b1")
        os.Exit(0)
    }()
    println("a0")
    for {
    }
}
The one goroutine hangs at new(int) since the allocator is trying to do a garbage
collection. It tries to stoptheworld which will hang since the other goroutine never
enters the scheduler. Removing the new(int) allows the example to exit.
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Mar 5, 2011

Comment 2:

Maybe, the amount of real time (note: not the CPU time) consumed by the stop-the-world
action could be lowered by unmapping (the 'munmap' system call in Linux) the stacks of
all OS threads except the stack of the OS thread (=X) which was the 1st one to enter the
stop-the-world function. Ys=[all threads except X]. Since Ys will most likely try to use
the stack in a very short time (for example, via the ESP register on i386), it will
trigger a page fault in them. The sole purpose of these page faults is to "stop" Ys from
thread X as soon as possible.
Note: the "stop" is in parentheses because it could also be implemented by a C for-loop
which is started as a consequence of making the page fault.
I wonder what would be the actual performance of this in Linux.
The GNU debugger (gdb) would be confused by this. Maybe, this can be partially solved by
the Go program having a special boolean variable for controlling whether to use the
current stop-the-world implementation or the page-faulting stop-the-world implementation.

Comment 2:

Maybe, the amount of real time (note: not the CPU time) consumed by the stop-the-world
action could be lowered by unmapping (the 'munmap' system call in Linux) the stacks of
all OS threads except the stack of the OS thread (=X) which was the 1st one to enter the
stop-the-world function. Ys=[all threads except X]. Since Ys will most likely try to use
the stack in a very short time (for example, via the ESP register on i386), it will
trigger a page fault in them. The sole purpose of these page faults is to "stop" Ys from
thread X as soon as possible.
Note: the "stop" is in parentheses because it could also be implemented by a C for-loop
which is started as a consequence of making the page fault.
I wonder what would be the actual performance of this in Linux.
The GNU debugger (gdb) would be confused by this. Maybe, this can be partially solved by
the Go program having a special boolean variable for controlling whether to use the
current stop-the-world implementation or the page-faulting stop-the-world implementation.
@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Mar 5, 2011

Contributor

Comment 3:

Instead of unmapping pages I think we will eventually
have to send all the threads a reserved signal, the same
way that NPTL does for things like setuid.  (We'll have to
do that for setuid too, sadly.)
Russ
Contributor

rsc commented Mar 5, 2011

Comment 3:

Instead of unmapping pages I think we will eventually
have to send all the threads a reserved signal, the same
way that NPTL does for things like setuid.  (We'll have to
do that for setuid too, sadly.)
Russ
@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Mar 6, 2011

Comment 4:

Sending a reserved signal ('tgkill' system call) seems like a better option than the
page-faulting (which translates into SIGSEGV).

Comment 4:

Sending a reserved signal ('tgkill' system call) seems like a better option than the
page-faulting (which translates into SIGSEGV).
@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Dec 9, 2011

Contributor

Comment 5:

Labels changed: added priority-later, removed priority-medium.

Contributor

rsc commented Dec 9, 2011

Comment 5:

Labels changed: added priority-later, removed priority-medium.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Sep 12, 2012

Contributor

Comment 7:

Labels changed: added go1.1maybe.

Contributor

rsc commented Sep 12, 2012

Comment 7:

Labels changed: added go1.1maybe.

@robpike

This comment has been minimized.

Show comment
Hide comment
@robpike

robpike Mar 7, 2013

Contributor

Comment 8:

Labels changed: removed go1.1maybe.

Contributor

robpike commented Mar 7, 2013

Comment 8:

Labels changed: removed go1.1maybe.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Mar 12, 2013

Contributor

Comment 9:

[The time for maybe has passed.]
Contributor

rsc commented Mar 12, 2013

Comment 9:

[The time for maybe has passed.]
@robpike

This comment has been minimized.

Show comment
Hide comment
@robpike

robpike Aug 7, 2013

Contributor

Comment 10:

The goroutine preemption works seems to have fixed this. The examples all work correctly
for me today (Aug 7 2013).
Please reopen if I have missed something.

Status changed to Fixed.

Contributor

robpike commented Aug 7, 2013

Comment 10:

The goroutine preemption works seems to have fixed this. The examples all work correctly
for me today (Aug 7 2013).
Please reopen if I have missed something.

Status changed to Fixed.

@rsc rsc added fixed labels Aug 7, 2013

@rsc rsc self-assigned this Aug 7, 2013

@golang golang locked and limited conversation to collaborators Jun 24, 2016

This issue was closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.