Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s server crashes abruptly when approximately 300 requests are sent in a span of 60 seconds (client QPS is 5 and bursty limit is 10) #10416

Closed
sharrurashmi opened this issue Jun 27, 2024 · 1 comment

Comments

@sharrurashmi
Copy link

sharrurashmi commented Jun 27, 2024

Environmental Info:
K3s Version:
1.30.1

Node(s) CPU architecture, OS, and Version:
Linux 5.15.0-107-generic #117-Ubuntu SMP Fri Apr 26 12:26:49 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
one server one agent

Describe the bug:
K3s server crashes when helm upgrade of a chart is triggered , this basically means that the server crashes when sending around 300 api calls to the server

Steps To Reproduce:

  • Installed K3s:
  • We have a helm chart installation of our product which was installed successfully
  • now when we start upgrading the helm chart we see that k3s server crashes occasionally
  • to the api server this means that the server crashes on handling around 300 requests ( the number of requests is calculated based on the resources we have in our cluster )
    Expected behavior:
    k3s server should not crash

Actual behavior:
k3s server crashes

Additional context / logs:

This is the stack trace

18]: I0617 20:04:05.902329   37918 replica_set.go:676] "Finished syncing" kind="ReplicaSet" key="default/f5-csm-api-engine-5bc6f6b55" duration="167.85µs"
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime: g 16385: unexpected return pc for k8s.io/client-go/pkg/apis/clientauthentication.(*ExecCredential).XXX_DiscardUnknown called from 0x40
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: stack: frame={sp:0xc009ab1d90, fp:0xc009ab1db8} stack=[0xc009ab1000,0xc009ab2000)
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1c90:  0x000000c009ab1cc0  0x000000000293812f <go:(*struct { io.ReadCloser }).Read+0x000000000000002f>
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1ca0:  0x000000c0095681c0  0x000000c00a2d24e8
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1cb0:  0x0000000000000000  0x00007f0151ceca68
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1cc0:  0x000000c009ab1d08  0x00000000004b4efa <io.ReadAtLeast+0x000000000000009a>
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1cd0:  0x000000c00717a340  0x000000c00a2d24e8
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1ce0:  0x000000c009ab1d08  0x000000000040d976 <runtime.convI2I+0x0000000000000036>
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1cf0:  0x00000000051cce60  0x0000000000000000
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1d00:  0x000000c00717a300  0x000000c009ab1d80
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1d10:  0x00000000010e60a8 <k8s.io/apimachinery/pkg/util/framer.(*lengthDelimitedFrameReader).Read+0x0000000000000088>  0x00007f00299c29a8
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1d20:  0x000000c008d4bba0  0x000000c00a2d24e8
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1d30:  0x0000000000000004  0x0000000000000004
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1d40:  0x500000e8b7565000  0x004500082eacb756
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1d50:  0x06ff00401c023c00  0x1e0a44001e0adb64
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1d60:  0x507d010013804500  0x02a000000000ff2f
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1d70:  0x0402000057ca405b  0x020409030301b405
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1d80:  0x000071b8cc270a08  0x00000000010e0000 <k8s.io/client-go/pkg/apis/clientauthentication.(*ExecCredential).XXX_DiscardUnknown+0x0000000000000060>
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1d90: <0x000000c009758048  0x000000c009802000
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1da0:  0x0000000000000400  0x0000000000000400
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1db0: !0x0000000000000040 >0x0000000000000038
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1dc0:  0x000000000537e3c0  0x0000000000000000
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1dd0:  0x0000000000000000  0x00007f0151ceca68
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1de0:  0x0000000000000040  0x000000c000100000
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1df0:  0x000000c00617ca40  0x0000000000000000
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1e00:  0x0000000000000000  0x000000000040ff87 <runtime.newobject+0x0000000000000027>
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1e10:  0x0000000000000038  0x000000000537e3c0
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1e20:  0x000000c009ab1e01  0x000000c009ab1e88
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1e30:  0x00000000010e0a2f <k8s.io/client-go/rest/watch.(*Decoder).Decode+0x000000000000004f>  0x000000c00927cd20
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1e40:  0x000000c009ab1e68  0x000000000656a480
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1e50:  0x000000c00617ca40  0x0000000000001bc0
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1e60:  0x000000000963bb68  0x000000c009ab1eb8
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1e70:  0x000000c00617ca40  0x000000000963bb60
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1e80:  0x000000c009ab1eb8  0x000000c009ab1fb8
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1e90:  0x0000000001040e7c <k8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive+0x00000000000000dc>  0x000000c0095681e0
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1ea0:  0x000000000963bb60  0x000000c002f23f1c
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: 0x000000c009ab1eb0:  0x000000000963bb70
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: fatal error: unknown caller pc
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime stack:
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.throw({0x59e685b?, 0x94eddc0?})
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7f0029835740 sp=0x7f0029835710 pc=0x43cebd
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.gentraceback(0x30?, 0x6511658?, 0x0?, 0xc00717a340, 0x0, 0x0, 0x7fffffff, 0x7f0029835c78, 0x0?, 0x0)
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/traceback.go:270 +0x1bb0 fp=0x7f0029835a98 sp=0x7f0029835740 pc=0x466130
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.scanstack(0xc00717a340, 0xc000087738)
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/mgcmark.go:804 +0x1f2 fp=0x7f0029835ca0 sp=0x7f0029835a98 pc=0x4229b2
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.markroot.func1()
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/mgcmark.go:239 +0xb5 fp=0x7f0029835cf0 sp=0x7f0029835ca0 pc=0x4217b5
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.markroot(0xc000087738, 0xe2d, 0x1)
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/mgcmark.go:213 +0x1a5 fp=0x7f0029835d90 sp=0x7f0029835cf0 pc=0x421465
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.gcDrain(0xc000087738, 0x3)
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/mgcmark.go:1069 +0x39f fp=0x7f0029835df0 sp=0x7f0029835d90 pc=0x42355f
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.gcBgMarkWorker.func2()
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/mgc.go:1348 +0xad fp=0x7f0029835e40 sp=0x7f0029835df0 pc=0x41fa2d
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.systemstack()
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/asm_amd64.s:496 +0x49 fp=0x7f0029835e48 sp=0x7f0029835e40 pc=0x474029
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: goroutine 34 [GC worker (active)]:
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.systemstack_switch()
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/asm_amd64.s:463 fp=0xc000508750 sp=0xc000508748 pc=0x473fc0
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.gcBgMarkWorker()
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/mgc.go:1335 +0x205 fp=0xc0005087e0 sp=0xc000508750 pc=0x41f6c5
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.goexit()
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0005087e8 sp=0xc0005087e0 pc=0x4761e1
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: created by runtime.gcBgMarkStartWorkers
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/mgc.go:1199 +0x25
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: goroutine 1 [chan receive, 1 minutes]:
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.gopark(0xc003f68fc0?, 0xc00971ae10?, 0x20?, 0xb3?, 0xc000eb2d80?)
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0012ed798 sp=0xc0012ed778 pc=0x43fff6
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.chanrecv(0xc000ad46c0, 0x0, 0x1)
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/chan.go:583 +0x49d fp=0xc0012ed828 sp=0xc0012ed798 pc=0x408a7d
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: runtime.chanrecv1(0x6596e48?, 0xc00056ca50?)
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/usr/local/go/src/runtime/chan.go:442 +0x18 fp=0xc0012ed850 sp=0xc0012ed828 pc=0x408578
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: github.com/k3s-io/k3s/pkg/agent.run({_, _}, {{0xc001d88000, 0x6a}, {0x0, 0x0}, {0x0, 0x0}, {0xc000cacd38, 0x16}, ...}, ...)
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/go/src/github.com/k3s-io/k3s/pkg/agent/run.go:177 +0xa45 fp=0xc0012edb38 sp=0xc0012ed850 pc=0x4146ce5
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: github.com/k3s-io/k3s/pkg/agent.Run({_, _}, {{0xc001d88000, 0x6a}, {0x0, 0x0}, {0x0, 0x0}, {0xc000cacd38, 0x16}, ...})
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/go/src/github.com/k3s-io/k3s/pkg/agent/run.go:274 +0x14e fp=0xc0012edda0 sp=0xc0012edb38 pc=0x414760e
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: github.com/k3s-io/k3s/pkg/cli/server.run(0xc000b5a580, 0x9634840, {0xc000ace9f8, 0x0, 0x0?}, {0xc000ace9f8, 0x0, 0x942aa20?})
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/go/src/github.com/k3s-io/k3s/pkg/cli/server/server.go:550 +0x3b88 fp=0xc0012ee9b8 sp=0xc0012edda0 pc=0x46f4128
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: github.com/k3s-io/k3s/pkg/cli/server.Run(0xc000b5a580?)
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/go/src/github.com/k3s-io/k3s/pkg/cli/server/server.go:41 +0x35 fp=0xc0012eea08 sp=0xc0012ee9b8 pc=0x46f0575
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: github.com/urfave/cli.HandleAction({0x4ee6d80?, 0x5e432b8?}, 0x6?)
Jun 17 20:04:11 10918364-mhr-ci-10918364-dut-2 k3s[37918]: #011/go/src/github.com/k3s-io/k3s/vendor/gith
@brandond
Copy link
Member

brandond commented Jun 27, 2024

This is a crash DEEP in a core go garbage collection function. I do not believe that this is something we can address here in K3s. The fact that you see this under load, and the resulting stack is in the go runtime garbage collection routine suggests that it is running out of memory and trying to force GC, and that is somehow running into memory corruption while doing so ( ref golang/go#57550 (comment)).

How much memory does this node have? How much of that is available to k3s, when your workload is present? Have you tried profiling k3s while updating your application?

@k3s-io k3s-io locked and limited conversation to collaborators Jun 27, 2024
@brandond brandond converted this issue into discussion #10418 Jun 27, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants