Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timer related go routine schedule problems under high concurrency #40786

Closed
lw396285v opened this issue Aug 14, 2020 · 7 comments
Closed

Timer related go routine schedule problems under high concurrency #40786

lw396285v opened this issue Aug 14, 2020 · 7 comments

Comments

@lw396285v
Copy link

@lw396285v lw396285v commented Aug 14, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.7 linux/amd64

Does this issue reproduce with the latest release?

Latest is 1.15, so I think the answer should be No

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/wli1/.cache/go-build"
GOENV="/home/wli1/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/wli1/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build258194693=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I write a server to handle requests from clients. I hope for each connection the server should process it in a limited time, for example 10ms, which means no matter has the server done the work or not, it should give a respones to the client when the time reachs 10ms. Pseudo-codes are here:

func RecieveRequest(req Request) {
    // tic here
    done := Process(req) // done is a channel to get the response to send, process usually will be done in 6 ms
    select {
    case result := <-done:
    // toc here
        SendResponse(result)
    case <-time.After(time.Duration(10 * time.Millisecond):
    // toc here
        SendResponse(ServerTimeOut) // ServerTimeOut is a predefined response
    }
}

func Serve() {
    // ... codes ...
    for {
     // ... codes ...
        req := accept()
        go RecieveRequest(req)
     // ... codes ...
    }
    // ... codes ...
}

What did you expect to see?

I use tic-toc to messure the actual time interval, the actual time interval should be within 10ms, or a little bit more than 10ms.

What did you see instead?

under low concurrency, most request is processed normally, but there are still very few(under 0.01%) requests' actual process intervals are like 15ms.

under high concurrency, situation worsened. There are more(above 0.1%) actual intervals are larger than 10ms, and the worst case could reach 50~500 ms, which will cause the client timeout.

I think timers should be correct since there is nothing like tight-loop in the codes, during ervery schedule event the processor would check the timer heap to fire those should be fired. Then the problem seems to be the scheduler itself, the scheduler seems not be able to schedule the timer related goroutines to run immediately. But this is just my guess, I'm not sure.

So is this phenomenon normal? Or I did something wrong?

If my guess is correct, should the timer related routines be pushed on the front of the queue?

Hope someone could help, thanks.

@AlexRouSg
Copy link
Contributor

@AlexRouSg AlexRouSg commented Aug 14, 2020

func RecieveRequest(req Request) {
    // tic here
    done := Process(req) // done is a channel to get the response to send, process usually will be done in 6 ms
    select {
    case result := <-done:
        SendResponse(result)
    case <-time.After(time.Duration(10 * time.Millisecond):
        SendResponse(ServerTimeOut) // ServerTimeOut is a predefined response
    }
    // toc here
}

Time = ~10ms + time for SendResponse

You should measure it like ...

func RecieveRequest(req Request) {
    // tic here
    done := Process(req) // done is a channel to get the response to send, process usually will be done in 6 ms
    select {
    case result := <-done:
        SendResponse(result)
    case <-time.After(time.Duration(10 * time.Millisecond):
        // toc here
        SendResponse(ServerTimeOut) // ServerTimeOut is a predefined response
    }
}
@lw396285v
Copy link
Author

@lw396285v lw396285v commented Aug 14, 2020

@AlexRouSg
Copy link
Contributor

@AlexRouSg AlexRouSg commented Aug 14, 2020

The only way to be sure it's the timer and not SendResponse that's taking longer than expected is to time it as such.

@lw396285v
Copy link
Author

@lw396285v lw396285v commented Aug 14, 2020

@lw396285v
Copy link
Author

@lw396285v lw396285v commented Aug 14, 2020

@AlexRouSg Hello, thanks for your adivice, I have done the test exactly after the timer is fired. I have edited the description. and the result is same.

@networkimprov
Copy link

@networkimprov networkimprov commented Aug 14, 2020

See #38860 which is fixed in 1.15 but not in 1.14... #39383

cc @ianlancetaylor

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 14, 2020

Yes, this looks like a dup of #38860. Closing as a dup, but please comment if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.