Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes pods still running even if they are evicted #81383

Open
Swetad90 opened this issue Aug 13, 2019 · 6 comments

Comments

@Swetad90
Copy link

commented Aug 13, 2019

What happened:
I am running a daemonset of minIO using hostpath Volumes. I have attached dedicated disks to these nodes and mounted it on "/data/minio" and mount that into each pod. I haven't edited the default threshold values of kubelet on any node. On event of diskpressure, the minio pods are getting evicted but the kublet still keeps on trying to delete the pods:

Aug 13 21:05:45 staging-node2 kubelet[2188]: I0813 21:05:45.968179 2188 kubelet_pods.go:1073] Killing unwanted pod "minio-kjrkc" Aug 13 21:05:45 staging-node2 kubelet[2188]: I0813 21:05:45.975372 2188 kuberuntime_container.go:559] Killing container "docker://6da1247718f8e6c92399e231f8c31ff1c510737c658ac2aca87c1659aa6b51cc" with 30 second grace period

docker logs on the container shows it's up even if it got TERMINATED signal.

Now, even though new pods are scheduled on these nodes , they immediately get evicted as the node always is under diskpressure.

What you expected to happen:
kubelet to kill the container gracefully once the pod is evicted

How to reproduce it (as minimally and precisely as possible):
apiVersion: apps/v1 kind: DaemonSet metadata: name: minio labels: app: minio spec: selector: matchLabels: app: minio template: metadata: labels: app: minio spec: nodeSelector: minio-server: "true" hostNetwork: true volumes: - name: storage hostPath: path: /data/minio/ containers: - name: minio env: - name: MINIO_ACCESS_KEY value: "minio" - name: MINIO_SECRET_KEY value: "minio" image: minio/minio:RELEASE.2019-07-24T02-02-23Z args: - server - http://node1:9000/data/minio - http://node2:9000/data/minio - http://node3:9000/data/minio - http://node4:9000/data/minio ports: - containerPort: 9000 volumeMounts: - name: storage mountPath: /data/minio/
Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration:
    VMware VMs

  • OS (e.g: cat /etc/os-release):
    NAME="Ubuntu"
    VERSION="18.04.1 LTS (Bionic Beaver)"
    ID=ubuntu

  • Kernel (e.g. uname -a):
    4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

@Swetad90

This comment has been minimized.

Copy link
Author

commented Aug 13, 2019

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node and removed needs-sig labels Aug 13, 2019

@tedyu

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

Can you format the snippet in 'How to reproduce' section so that it is more readable ?

thanks

@Swetad90

This comment has been minimized.

Copy link
Author

commented Aug 14, 2019

Attached the daemonset.

daemoset.txt

@zouyee

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

Could you show more kubelet logs ?
/area kubelet

@Swetad90

This comment has been minimized.

Copy link
Author

commented Aug 14, 2019

I noticed its not just with one daemonset, this behaviour is consistent in the entire cluster.
Pods are evicted but the oldest container in that node never gets killed..It gets the terminated signal but never exits

Kubelet logs:

Aug 14 15:54:02 node-4 kubelet[2166]: W0814 15:54:02.603150    2166 container_manager_linux.go:804] CPUAccounting not enabled for pid: 2166
Aug 14 15:54:02 node-4 kubelet[2166]: W0814 15:54:02.603187    2166 container_manager_linux.go:807] MemoryAccounting not enabled for pid: 2166
Aug 14 15:54:04 node-4 kubelet[2166]: I0814 15:54:04.396184    2166 setters.go:72] Using node IP: "172.17.21.137"
Aug 14 15:54:04 node-4 kubelet[2166]: I0814 15:54:04.477305    2166 kubelet_pods.go:1073] Killing unwanted pod "thanos-store-gateway-0"
Aug 14 15:54:04 node-4 kubelet[2166]: I0814 15:54:04.483285    2166 kuberuntime_container.go:559] Killing container "docker://827d5f3878ee145e2603f16421e28a760e0cbf557e18c8fc0bee113cd29a95e9" with 30 second grace period
Aug 14 15:54:04 node-4 kubelet[2166]: E0814 15:54:04.485206    2166 kuberuntime_container.go:71] Can't make a ref to pod "thanos-store-gateway-0_thanos(b5152916-be46-11e9-84a1-005056bcb43b)", container thanos: selfLink was empty, can't make reference
Aug 14 15:54:06 node-4 kubelet[2166]: I0814 15:54:06.479336    2166 kubelet_pods.go:1073] Killing unwanted pod "thanos-store-gateway-0"
Aug 14 15:54:06 node-4 kubelet[2166]: I0814 15:54:06.483401    2166 kuberuntime_container.go:559] Killing container "docker://827d5f3878ee145e2603f16421e28a760e0cbf557e18c8fc0bee113cd29a95e9" with 30 second grace period
Aug 14 15:54:06 node-4 kubelet[2166]: E0814 15:54:06.484076    2166 kuberuntime_container.go:71] Can't make a ref to pod "thanos-store-gateway-0_thanos(b5152916-be46-11e9-84a1-005056bcb43b)", container thanos: selfLink was empty, can't make reference
Aug 14 15:54:08 node-4 kubelet[2166]: I0814 15:54:08.470220    2166 kubelet_pods.go:1073] Killing unwanted pod "thanos-store-gateway-0"
Aug 14 15:54:08 node-4 kubelet[2166]: I0814 15:54:08.476465    2166 kuberuntime_container.go:559] Killing container "docker://827d5f3878ee145e2603f16421e28a760e0cbf557e18c8fc0bee113cd29a95e9" with 30 second grace period
Aug 14 15:54:08 node-4 kubelet[2166]: E0814 15:54:08.477680    2166 kuberuntime_container.go:71] Can't make a ref to pod "thanos-store-gateway-0_thanos(b5152916-be46-11e9-84a1-005056bcb43b)", container thanos: selfLink was empty, can't make reference


admin@node-4:~$ docker ps -a | grep thanos
827d5f3878ee        855881c0f940                                                                    "/bin/thanos store -…"   11 hours ago        Up 11 hours                                     k8s_thanos_thanos-store-gateway-0_thanos_b5152916-be46-11e9-84a1-005056bcb43b_0
ddb6d25e5008        platform-kubespray/pause-amd64:3.1-staging   "/pause"                 11 hours ago        Exited (0) 11 hours ago                         k8s_POD_thanos-store-gateway-0_thanos_b5152916-be46-11e9-84a1-005056bcb43b_0

Docker logs:

admin@node-4:~$ docker logs 827d5f3878ee
level=info ts=2019-08-14T03:50:59.726593524Z caller=main.go:154 msg="Tracing will be disabled"
level=info ts=2019-08-14T03:50:59.726659323Z caller=factory.go:39 msg="loading bucket configuration"
level=info ts=2019-08-14T03:50:59.727165888Z caller=cache.go:172 msg="created index cache" maxItemSizeBytes=262144000 maxSizeBytes=524288000 maxItems=math.MaxInt64
level=debug ts=2019-08-14T03:50:59.727787482Z caller=store.go:144 msg="initializing bucket store"
level=debug ts=2019-08-14T03:51:04.460451996Z caller=store.go:148 msg="bucket store ready" init_duration=4.73266052s
level=info ts=2019-08-14T03:51:04.461313307Z caller=main.go:274 msg="disabled TLS, key and cert must be set to enable"
level=info ts=2019-08-14T03:51:04.461669525Z caller=store.go:191 msg="starting store node"
level=info ts=2019-08-14T03:51:04.463177456Z caller=store.go:181 msg="Listening for StoreAPI gRPC" address=0.0.0.0:10901
level=info ts=2019-08-14T03:51:04.463277237Z caller=main.go:326 msg="Listening for metrics" address=0.0.0.0:10902
level=info ts=2019-08-14T03:54:36.909533557Z caller=main.go:210 msg="caught signal. Exiting." signal=terminated
level=info ts=2019-08-14T03:54:36.910003926Z caller=main.go:202 msg=exiting
@Swetad90

This comment has been minimized.

Copy link
Author

commented Aug 14, 2019

#54525 (comment)

As per this when pod is evicted, containers also should get killed. Any reason as to why it's not happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.