Skip to content

runtime: goroutine starvation with runnext and short-running goroutines #73964

Open
@prattmic

Description

@prattmic

At tip, 1.24, probably earlier.

package main

import (
        "fmt"
        "time"
)

func main() {
        go func() {
                for {
                        time.Sleep(30*time.Nanosecond)
                }
        }()

        go func() {
                for {
                        time.Sleep(30*time.Nanosecond)
                }
        }()

        last := time.Now()
        for {
                if since := time.Since(last); since > 50*time.Millisecond {
                        fmt.Println(since, "since last run")
                }
                last = time.Now()
        }
}

When running at GOMAXPROCS=2, I get results like:

$ GOMAXPROCS=2 ./example.com
125.997512ms since last run
645.290254ms since last run
437.888173ms since last run
121.121452ms since last run
152.611431ms since last run
232.376202ms since last run
1.296179568s since last run
...

The three goroutines should get approximately fair scheduling, but the two sleeping goroutines are starving the main goroutine for long periods of time. I believe what is happening here:

  1. One of the goroutine sets its sleep timer for 30ns and then parks.
  2. We enter the scheduler. One of the first things that happens is checking timers. It takes more than 30ns to get this far, so the sleep timer has expired.
  3. The sleep timer fires, placing the goroutine in p.runnext.
  4. The scheduler sees there is a runnable goroutine in p.runnext and runs that goroutine.
  5. GOTO 1

In the meantime, the main goroutine is sitting in the local run queue, which doesn't get considered if p.runnext is set.

We do skip both runnext and the local run queue entirely occasionally and look at the global run queue instead. But that doesn't help here.

The primary mechanism to avoid runnext starvation is that when taking a goroutine from runnext, we don't increment p.schedticks (i.e., inheritTime == true). This should cause sysmon to preempt whichever goroutine is running after 10ms.

I believe that doesn't work in this case because the goroutine runs for such a short time that sysmon rarely successfully preempts it. By the time the preemption flag is seen, the goroutine is already parking in the scheduler, so the flag is simply ignored.

We have discussed two potential ways to address this issue (perhaps we should do both):

  1. Much like the global run queue check, we could periodically ignore runnext and go straight to the main local run queue.
  2. When we reach the scheduler, if the preempt flag is set on the previous goroutine, don't allow use of runnext (maybe only if it is equal to the previous G?). Basically behaving as if we did preempt.

Alternatively, or in addition, we may want to make extremely short sleeps behave more like runtime.Gosched by avoiding putting them on runnext in the first place. Though in theory I believe this issue could occur with any form of wakeup. But it is likely difficult to achieve the tight timing required via other mechanisms.

cc @golang/runtime

Metadata

Metadata

Assignees

Labels

NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions