Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: goroutines aren't scheduled in time #29394

Closed
zhao-kun opened this issue Dec 22, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@zhao-kun
Copy link

commented Dec 22, 2018

What version of Go are you using (go version)?

$ go 1.8.3

Does this issue reproduce with the latest release?

Unkonwn

What operating system and processor architecture are you using (go env)?

The program run in Centos7.4

What did you do?

I have a kubernetes cluster which version is 1.7.4, build with go 1.8.3. Yesterday at noon, one of the nodes of my cluster didn't work, which stopped reporting its status to master. The node didn't work due to a program named the kubelet hung. I checked the logs, found the program stop printing log at 12:09, and after 20 minutes I killed ABRT the kubelet process. After dumping all goroutines stacktrace it exited.

I grep goroutine, the result is:

...
3144:12月 21 12:31:29 machine-name kubelet[28486]: goroutine 6413899 [select, 22 minutes]:
3149:12月 21 12:31:29 machine-name kubelet[28486]: goroutine 6121144 [select, 22 minutes]:
3154:12月 21 12:31:29 machine-name kubelet[28486]: goroutine 6147441 [sleep, 22 minutes]:
3161:12月 21 12:31:29 machine-name kubelet[28486]: goroutine 16532140 [chan receive, 1283 minutes]:
...

I notice many gorutines were blocked about 20 minitues which is almost to kubelet's hunging time
I check the goroutine 614744 stacktrace which is

...
12月 21 12:31:29 machine-name kubelet[28486]: goroutine 6147441 [sleep, 22 minutes]:
12月 21 12:31:29 machine-name kubelet[28486]: time.Sleep(0x2ba79f8ef)
12月 21 12:31:29 machine-name kubelet[28486]: /usr/local/go/src/runtime/time.go:59 +0xf9
12月 21 12:31:29 machine-name kubelet[28486]: k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager.(*containerData).housekeeping(0xc422794f00)
12月 21 12:31:29 machine-name kubelet[28486]: /workspace/anago-v1.7.16-beta.0.18+e8846c1d7e7e63/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager/container.go:457 +0x340
....

I found the goroutine hung at time.Sleep(0x2ba79f8ef) , the parameter value is 0x2ba79f8ef and nearly equal 11 seconds which is expected as code logic.

My question is the goroutine should sleep nearly 11 seconds, but why did it block almost 22 minutes? I think that the golang's runtime didn't schedule it in time, What's happened at the 12:09 which made scheduler didn't work

PS: there are about 83 goroutines blocked at sleep.

The attachment is whole goroutine stacktrace, I desensitized machine name.

abort.log

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 24, 2018

Hi @zhao-kun - 1.8.3 is quite an old version. Please give a try with 1.12 beta1 and let us know.

Also, without proper steps to reproduce this issue, it is very hard to understand what's going on by just looking at the stack trace. Is there a way you can give us the exact steps for us to reproduce this issue ?

@zhao-kun

This comment has been minimized.

Copy link
Author

commented Dec 24, 2018

Hi @agnivade , It's hard to reproduce. We have no extra operation on our K8s cluster (or Kubelet program), we run it in the normal way. But we have known there are some issues in the 1.7.4 version of the K8s, especially in the Cadvisor implementation, we have the plan to upgrade to the K8s latest version in the future.

So if current information is too few to help you diagnose the problem, can you give me some pieces of advice from the Golang aspect, we can do something to help diagnose it when the issue occurs next time

@agnivade

This comment has been minimized.

Copy link
Member

commented Dec 24, 2018

You could try with the 1.12beta version and see.

Overall, it is hard to say whether the problem is with Go or K8s. I would advise filing an issue on the K8s repo and investigating that. And only when there is some concrete evidence that this is a Go issue, then file a new issue with proper repro steps.

@odeke-em odeke-em changed the title Goroutines aren't scheduled in time runtime: goroutines aren't scheduled in time Dec 25, 2018

@gopherbot

This comment has been minimized.

Copy link

commented Jan 27, 2019

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@gopherbot gopherbot closed this Jan 27, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.