Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet crashes trying to free memory under MemoryPressure #58541

Closed
Dema opened this issue Jan 19, 2018 · 8 comments · Fixed by #58574
Closed

Kubelet crashes trying to free memory under MemoryPressure #58541

Dema opened this issue Jan 19, 2018 · 8 comments · Fixed by #58574
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@Dema
Copy link

Dema commented Jan 19, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
Kubelet constantly crashes trying to evict pods under memory pressure
What you expected to happen:
kubelet to be able to evict pods
How to reproduce it (as minimally and precisely as possible):
Fresh 1.9.1 Kubernetes, installed via kubeadm

Here are relevant eviction parameters for kubelet

Environment="KUBELET_EVICTION_HARD=--eviction-hard=memory.available<3000Mi,nodefs.available<1Gi,imagefs.available<40Gi"
Environment="KUBELET_EVICTION_SOFT=--eviction-soft=memory.available<3500Mi,nodefs.available<2Gi,imagefs.available<50Gi --eviction-soft-grace-period=memory.available=1m30s,nodefs.available=5m,imagefs.available=5m"

Anything else we need to know?:

янв 20 00:20:07 o1.home kubelet[32406]: W0120 00:20:07.544864   32406 eviction_manager.go:142] Failed to admit pod kube-proxy-bbrr6_kube-system(860629c4-fd5e-11e7-880c-6c626d3ed96f) - node has conditions: [MemoryPressure]
янв 20 00:20:07 o1.home kubelet[32406]: W0120 00:20:07.726836   32406 eviction_manager.go:332] eviction manager: attempting to reclaim memory
янв 20 00:20:07 o1.home kubelet[32406]: I0120 00:20:07.726862   32406 eviction_manager.go:346] eviction manager: must evict pod(s) to reclaim memory
янв 20 00:20:07 o1.home kubelet[32406]: panic: runtime error: index out of range
янв 20 00:20:07 o1.home kubelet[32406]: goroutine 565 [running]:
янв 20 00:20:07 o1.home kubelet[32406]: k8s.io/kubernetes/vendor/k8s.io/api/core/v1.(*ResourceList).Memory(...)
янв 20 00:20:07 o1.home kubelet[32406]:         /common/var/tmp/portage/sys-cluster/kubelet-1.9.1/work/kubelet-1.9.1/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/eviction/helpers.go:612
янв 20 00:20:07 o1.home kubelet[32406]: k8s.io/kubernetes/pkg/kubelet/eviction.podRequest(0xc421a5bc00, 0x38a4b12, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
янв 20 00:20:07 o1.home kubelet[32406]:         /common/var/tmp/portage/sys-cluster/kubelet-1.9.1/work/kubelet-1.9.1/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/eviction/helpers.go:612 +0xab1
янв 20 00:20:07 o1.home kubelet[32406]: k8s.io/kubernetes/pkg/kubelet/eviction.exceedMemoryRequests.func1(0xc4212cf180, 0xc421a5bc00, 0x1)
янв 20 00:20:07 o1.home kubelet[32406]:         /common/var/tmp/portage/sys-cluster/kubelet-1.9.1/work/kubelet-1.9.1/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/eviction/helpers.go:555 +0x479
янв 20 00:20:07 o1.home kubelet[32406]: k8s.io/kubernetes/pkg/kubelet/eviction.(*multiSorter).Less(0xc4204112c0, 0x0, 0x4, 0x0)
янв 20 00:20:07 o1.home kubelet[32406]:         /common/var/tmp/portage/sys-cluster/kubelet-1.9.1/work/kubelet-1.9.1/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/eviction/helpers.go:503 +0xa2
янв 20 00:20:07 o1.home kubelet[32406]: sort.doPivot(0x577ff80, 0xc4204112c0, 0x0, 0xd, 0x7f877bcbe6c8, 0x0)
янв 20 00:20:07 o1.home kubelet[32406]:         /usr/lib64/go/src/sort/sort.go:123 +0x144
янв 20 00:20:07 o1.home kubelet[32406]: sort.quickSort(0x577ff80, 0xc4204112c0, 0x0, 0xd, 0x8)
янв 20 00:20:07 o1.home kubelet[32406]:         /usr/lib64/go/src/sort/sort.go:192 +0x8a
янв 20 00:20:07 o1.home kubelet[32406]: sort.Sort(0x577ff80, 0xc4204112c0)
янв 20 00:20:07 o1.home kubelet[32406]:         /usr/lib64/go/src/sort/sort.go:220 +0x79
янв 20 00:20:07 o1.home kubelet[32406]: k8s.io/kubernetes/pkg/kubelet/eviction.(*multiSorter).Sort(0xc4204112c0, 0xc4216f6580, 0xd, 0x10)
янв 20 00:20:07 o1.home kubelet[32406]:         /common/var/tmp/portage/sys-cluster/kubelet-1.9.1/work/kubelet-1.9.1/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/eviction/helpers.go:477 +0x5b
янв 20 00:20:07 o1.home kubelet[32406]: k8s.io/kubernetes/pkg/kubelet/eviction.rankMemoryPressure(0xc4216f6580, 0xd, 0x10, 0xc420248c30)
янв 20 00:20:07 o1.home kubelet[32406]:         /common/var/tmp/portage/sys-cluster/kubelet-1.9.1/work/kubelet-1.9.1/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/eviction/helpers.go:692 +0xf0

Environment:

  • Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.1", GitCommit:"3a1c9449a956b6026f075fa3134ff92f7d55f812", GitTreeState:"archive", BuildDate:"2018-01-05T20:28:28Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.1", GitCommit:"3a1c9449a956b6026f075fa3134ff92f7d55f812", GitTreeState:"clean", BuildDate:"2018-01-04T11:40:06Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration:

8GiB of RAM

  • OS (e.g. from /etc/os-release):
    Gentoo
  • Kernel (e.g. uname -a):

Linux o1.home 4.9.34-gentoo #5 SMP Sat Jan 13 23:06:16 MSK 2018 x86_64 Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz GenuineIntel GNU/Linux

  • Install tools:
  • Others:
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Jan 19, 2018
@Dema
Copy link
Author

Dema commented Jan 20, 2018

helpers.go:612 contains the following code

	for i := range pod.Spec.Containers {
		switch resourceName {
		case v1.ResourceMemory:
			containerValue.Add(*pod.Spec.Containers[i].Resources.Requests.Memory())
		case resourceDisk:
			containerValue.Add(*pod.Spec.Containers[i].Resources.Requests.StorageEphemeral())
		}
	}
	initValue := resource.Quantity{Format: resource.BinarySI}
	for i := range pod.Spec.InitContainers {
		switch resourceName {
		case v1.ResourceMemory:
			containerValue.Add(*pod.Spec.Containers[i].Resources.Requests.Memory())
^^^^^^^^^^^^^^^^^^^^^^^^^ this is line 612
		case resourceDisk:
			containerValue.Add(*pod.Spec.Containers[i].Resources.Requests.StorageEphemeral())
		}
	}

I think there is a bug in that it is iterating through InitContainers but getting resources of regular containers. I have a pod that has 3 initContainers. Looks like this could be a copy&paste error.

@yastij
Copy link
Member

yastij commented Jan 21, 2018

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 21, 2018
@yastij
Copy link
Member

yastij commented Jan 21, 2018

indeed there's a mismatch between initContainers and spec containers, I'll take a look.

@dims
Copy link
Member

dims commented Jan 21, 2018

@yastij looks like we need to do the following

diff --git a/pkg/kubelet/eviction/helpers.go b/pkg/kubelet/eviction/helpers.go
index 4376c84288..ee403779e3 100644
--- a/pkg/kubelet/eviction/helpers.go
+++ b/pkg/kubelet/eviction/helpers.go
@@ -609,9 +609,9 @@ func podRequest(pod *v1.Pod, resourceName v1.ResourceName) resource.Quantity {
        for i := range pod.Spec.InitContainers {
                switch resourceName {
                case v1.ResourceMemory:
-                       containerValue.Add(*pod.Spec.Containers[i].Resources.Requests.Memory())
+                       initValue.Add(*pod.Spec.InitContainers[i].Resources.Requests.Memory())
                case resourceDisk:
-                       containerValue.Add(*pod.Spec.Containers[i].Resources.Requests.StorageEphemeral())
+                       initValue.Add(*pod.Spec.InitContainers[i].Resources.Requests.StorageEphemeral())
                }
        }
        if containerValue.Cmp(initValue) > 0 {

and the problem was introduced in 8b3bd5a

cc @dashpole

@yastij
Copy link
Member

yastij commented Jan 21, 2018

@dims - Indeed, I'll open a PR for this

@dashpole
Copy link
Contributor

reopening until it is cherrypicked to 1.9

@dashpole dashpole reopened this Jan 26, 2018
@dashpole dashpole reopened this Jan 26, 2018
k8s-github-robot pushed a commit that referenced this issue Feb 1, 2018
…74-upstream-release-1.9

Automatic merge from submit-queue.

Automated cherry pick of #58574: fixing array out of bound by checking initContainers instead

Cherry pick of #58574 on release-1.9.

#58574: fixing array out of bound by checking initContainers instead

Fixes #58541
@pires
Copy link
Contributor

pires commented Feb 10, 2018

It was cherry-picked in 1.9.3.

@k82cn
Copy link
Member

k82cn commented Feb 25, 2018

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants