Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: server hangs forever in 1.22.9 #70689

Closed
limpo1989 opened this issue Dec 5, 2024 · 4 comments
Closed

runtime: server hangs forever in 1.22.9 #70689

limpo1989 opened this issue Dec 5, 2024 · 4 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@limpo1989
Copy link

limpo1989 commented Dec 5, 2024

Go version

go version go1.22.9 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/limpo1989/.cache/go-build'
GOENV='/home/limpo1989/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/limpo1989/go/pkg/mod'
GONOPROXY='git.idianhun.com,git.dianhun.cn'
GONOSUMDB='git.idianhun.com,git.dianhun.cn'
GOOS='linux'
GOPATH='/home/limpo1989/go'
GOPRIVATE='git.idianhun.com,git.dianhun.cn'
GOPROXY='https://goproxy.cn,direct'
GOROOT='/usr/local/go'
GOSUMDB='off'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.9'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/limpo1989/common-apps/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build829717516=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Use the stress program to bench our gateapp service, and the server hangs on about the third day.

Before this, I compiled with go version 1.22.7, and the server hungs after about 6 days of stress testing. After that, I upgraded go to 1.22.9 for stress testing, and the server hungs after 3 days of continuous running.

Server: Ubuntu 22.04.4 LTS x86_64

            .-/+oossssoo+/-.                                                                                                                                                                                                                                                      
        `:+ssssssssssssssssss+:`           ---------------- 
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 22.04.4 LTS x86_64 
    .ossssssssssssssssssdMMMNysssso.       Host: Precision Tower 7810 
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 6.8.0-48-generic 
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 16 days, 19 hours, 40 mins 
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 1859 (dpkg), 12 (snap) 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.1.16 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Resolution: 1920x1080 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Terminal: /dev/pts/3 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   CPU: Intel Xeon E5-2690 v3 (48) @ 3.500GHz 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   GPU: AMD ATI Radeon HD 4550 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Memory: 16192MiB / 128848MiB 
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
  +sssssssssdmydMMMMMMMMddddyssssssss+                             
   /ssssssssssshdmNNNNmyNMMMMhssssss/                              
    .ossssssssssssssssssdMMMNysssso.
      -+sssssssssssssssssyyyssss+-
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.

QQ_1733367260359

delve dump core file and gateapp in here: https://drive.google.com/file/d/1WvN1aPAruGL180yjRL-6nDmZ3N-qWpKy/view?usp=sharing

$ ~/go/bin/dlv core gateapp dump2304043.core
(dlv) goroutines -without user
  Goroutine 2 - User: runtime/proc.go:409 runtime.goparkunlock (0x44b44f)
  Goroutine 3 - User: runtime/proc.go:409 runtime.goparkunlock (0x44b44f) [GC sweep wait]
  Goroutine 4 - User: runtime/proc.go:409 runtime.goparkunlock (0x44b44f) [GC scavenge wait]
  Goroutine 5 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [finalizer wait 1196607687507027]
  Goroutine 6 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 7 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 8 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 9 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 10 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 11 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 12 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 13 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 14 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 15 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 16 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 17 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 18 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 19 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 20 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 21 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 22 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 23 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 24 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 33 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 34 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 35 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 36 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 37 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 38 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 39 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 40 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 41 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 42 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 43 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 44 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 49 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 50 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 51 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 52 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 53 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 54 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 55 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 56 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 57 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 58 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 59 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 65 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 66 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 67 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 68 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 69 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 70 - User: runtime/proc.go:403 runtime.gopark (0x44b30e) [debug call]
  Goroutine 499 - User: runtime/select.go:328 runtime.selectgo (0x460b0b) [select 1196512353477187]
[53 goroutines]
(dlv) goroutines -with running
  Goroutine 529 - User: runtime/sigqueue.go:152 os/signal.signal_recv (0x488569) (thread 2304112)
  Goroutine 598629449 - User: git.idianhun.com/linbo/galaxy@v1.6.3/stream/stream.go:150 git.idianhun.com/linbo/galaxy/injector.(*BaseInjector[go.shape.*common-apps/pkg/api.BaseAttrib]).Inject (0x1be8999) (thread 2304372)
  Goroutine 600001576 - User: git.idianhun.com/linbo/drpc@v1.9.1/httpx/proxy-ws-nettyws.go:109 git.idianhun.com/linbo/drpc/httpx.proxyNettyWS.func2 (0x120d914) (thread 2304081)
[3 goroutines]
(dlv) gr 598629449
Switched from 0 to 598629449 (thread 2304372)
(dlv) bt
 0  0x000000000048dde3 in runtime.futex
    at runtime/sys_linux_amd64.s:558
 1  0x0000000000444945 in runtime.futexsleep
    at runtime/os_linux.go:75
 2  0x0000000000412193 in runtime.notetsleep_internal
    at runtime/lock_futex.go:212
 3  0x00000000004122c9 in runtime.notetsleep
    at runtime/lock_futex.go:235
 4  0x000000000044e5df in runtime.stopTheWorldWithSema
    at runtime/proc.go:1505
 5  0x0000000000424065 in runtime.gcStart.func1
    at runtime/mgc.go:681
 6  0x000000000048a18a in runtime.systemstack
    at runtime/asm_amd64.s:509
 7  0x000000000048a128 in runtime.systemstack_switch
    at runtime/asm_amd64.s:474
 8  0x0000000000423ce7 in runtime.gcStart
    at runtime/mgc.go:680
 9  0x0000000000414226 in runtime.mallocgc
    at runtime/malloc.go:1308
10  0x0000000000466989 in runtime.makeslice
    at runtime/slice.go:107
11  0x0000000001be8999 in git.idianhun.com/linbo/galaxy/stream.(*Stream[go.shape.struct { FieldAnn git.idianhun.com/linbo/pbmeta/annotation.Annotation; Local reflect.Value; FieldType reflect.Type; FieldValue reflect.Value },go.shape.*common-apps/pkg/api.BaseAttrib]).AsyncWait
    at git.idianhun.com/linbo/galaxy@v1.6.3/stream/stream.go:150
12  0x0000000001be8999 in git.idianhun.com/linbo/galaxy/injector.(*BaseInjector[go.shape.*common-apps/pkg/api.BaseAttrib]).Inject
    at git.idianhun.com/linbo/galaxy@v1.6.3/injector/base-injector.go:108
13  0x0000000001c20225 in common-apps/pkg/common/injectors.(*baseAttribInjector).Inject
    at <autogenerated>:1
14  0x0000000001989206 in git.idianhun.com/linbo/galaxy/injector.(*groupInjector).DeepInject.(*groupInjector).Inject.func4
    at git.idianhun.com/linbo/galaxy@v1.6.3/injector/injector.go:111
15  0x000000000198d270 in git.idianhun.com/linbo/galaxy/stream.execMap[go.shape.3a806d7a75a4cfb5aa2d266f904c476e25733c6c5d5370d6632ab8a32fde4892,go.shape.bool]
    at git.idianhun.com/linbo/galaxy@v1.6.3/stream/stream.go:213
16  0x000000000198d270 in git.idianhun.com/linbo/galaxy/stream.(*Stream[go.shape.3a806d7a75a4cfb5aa2d266f904c476e25733c6c5d5370d6632ab8a32fde4892,go.shape.bool]).AsyncWait
    at git.idianhun.com/linbo/galaxy@v1.6.3/stream/stream.go:155
17  0x0000000001988ce5 in git.idianhun.com/linbo/galaxy/injector.(*groupInjector).Inject
    at git.idianhun.com/linbo/galaxy@v1.6.3/injector/injector.go:118
18  0x0000000001988ce5 in git.idianhun.com/linbo/galaxy/injector.(*groupInjector).DeepInject
    at git.idianhun.com/linbo/galaxy@v1.6.3/injector/injector.go:203
19  0x000000000198a2d4 in git.idianhun.com/linbo/galaxy/injector.(*groupInjector).AsyncInject
    at git.idianhun.com/linbo/galaxy@v1.6.3/injector/injector.go:232
20  0x00000000019934de in git.idianhun.com/linbo/galaxy/interceptors/starter.(*injectorConfiguration).Injector.func1.1
    at git.idianhun.com/linbo/galaxy@v1.6.3/interceptors/starter/injector-configuration.go:90
21  0x000000000124e37b in git.idianhun.com/linbo/drpc.(*BaseHandler).Invoke.func1.1
    at git.idianhun.com/linbo/drpc@v1.9.1/handler.go:379
22  0x0000000000d28ef5 in github.com/panjf2000/ants/v2.(*goWorker).run.func1
    at github.com/panjf2000/ants/v2@v2.10.0/worker.go:71
23  0x000000000048bfc1 in runtime.goexit
    at runtime/asm_amd64.s:1695

What did you see happen?

The server is hangs forever, http pprof cannot respond, and the tcp connection status of the server is CLOSE_WAIT after the stress program is closed.

# after 10 minues, 192.168.0.90 is stress client ip address
$ netstat -tan | grep "192.168.0.90" | grep "CLOSE_WAIT" | wc -l
4097

What did you expect to see?

Server is running normally

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 5, 2024
@mknyszek
Copy link
Contributor

mknyszek commented Dec 5, 2024

Looks like the GC is trying to preempt everything and wait for it to stop, but it's not happening. Though, without more information, it's hard to say what exactly is going wrong.

  • Can you reproduce the issue?
  • Do you have any diagnostic data you're able to share so we can poke around? A full stack dump would be helpful to try and identify what goroutine is failing to yield to the GC.
  • What version of Linux are you using?

Thanks.

CC @golang/runtime

@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 5, 2024
@mknyszek mknyszek added this to the Backlog milestone Dec 5, 2024
@limpo1989
Copy link
Author

Can you reproduce the issue?

Yes, I can reproduce this, but it will take a few days (maybe 3-6 days)

Do you have any diagnostic data you're able to share so we can poke around? A full stack dump would be helpful to try and identify what goroutine is failing to yield to the GC.

Delve dump core file and gateapp in here: https://drive.google.com/file/d/1WvN1aPAruGL180yjRL-6nDmZ3N-qWpKy/view?usp=sharing

What version of Linux are you using?

OS: Ubuntu 22.04.4 LTS x86_64
Kernel: 6.8.0-48-generic

Also, I found that there are some data-race issues reported by go build -race, and I try to reproduce them here after fixing them

@mknyszek

@seankhliao seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Dec 6, 2024
@limpo1989
Copy link
Author

I think this issue can be closed, after I fixed multiple data-race issues, the stress test has been running properly for more than 6 days, and everything seems to be fine

QQ_1733999486281

Thanks everyone, please close this issue.

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
Development

No branches or pull requests

5 participants