feat(chaos): add kubelet-restart chaos test #890

harshshekhar15 · 2020-09-08T19:39:06Z

Signed-off-by: Harsh Shekhar harsh.shekhar@mayadata.io

This PR intends to do the following:

Add Kubera chaos test - TCID-KUBELET-RESTART.
Add docker install in Dockerfile.

Exact application name that is under test.

Storage engine that is under test

OpenEBS version if required.

Assumptions of this PR

Notes to reviewer.

Anything else we need to know?

Versions:

$ Kubernetes version
$ Kubernetes platform
$ kubectl version

gprasath · 2020-09-09T02:09:41Z

litmus/director/TCID-KUBELET-RESTART/test.yml

+            # Restart Kubelet container present on the node
+          - name: Restarting Kubelet container in Rancher
+            shell: docker restart kubelet
+


it needs to be restarted on the node where the pods are scheduled

If restarted, it will be recreated immediately. Can we stop kubelet container, wait till node gets into NotReady.

The pods in the respective node should be rescheduled on some other node if resources are available.

Once things are set, you can start kubelet container back on the node.

To do this, we need root access right?

@gprasath it will restart the docker on the node in which the pod is scheduled. We have made use of Docker out of Docker for this. We have mounted the host's docker socket to this pod for this.

@gprasath what will happen to the statefulsets scheduled on that node if we stop the kubelet till node is in NotReady state?

Yes we need root access, so we are running the container with securityContext -- privileged: true

@gprasath what will happen to the statefulsets scheduled on that node if we stop the kubelet till node is in NotReady state?

That statefulset replica pod will be in pending state if the node is not ready

@gprasath it will restart the docker on the node in which the pod is scheduled. We have made use of Docker out of Docker for this. We have mounted the host's docker socket to this pod for this.

Restart is instantaneous. There won't be any considerable impact.

@gprasath I think I was not clear -- the above command will restart kubelet container running on the host node itself.

gprasath · 2020-09-09T02:30:19Z

litmus/director/TCID-KUBELET-RESTART/test.yml

+          # Task will fail if all the pods are not in 'Running' phase
+        - name: Checking Kubera pods status
+          shell: kubectl get pods -n {{ kuberaNamespace }} --field-selector=status.phase!=Running --no-headers
+          register: podStatus


when restarting, pods will not enter not running state as it becomes online immediately

@gprasath we are verifying if Kubera is working fine post the restart.

restart won't have any impact.

default pod eviction timeout is 5 minutes. There will be an impact in container status if and only if this is violated @harshshekhar15

How do we ensure kubera pods & the restart job are running on same node?

@AmitKumarDas some of the Kubera pods will be present on whatever node the job's pod is scheduled.

gprasath · 2020-09-09T02:56:57Z

litmus/director/TCID-KUBELET-RESTART/run_litmus_test.yml

+          path: /var/run/docker.sock
+          type: File
+      imagePullSecrets:
+      - name: oep-secret


where do we create this secret?
can we add more info about this.

@gprasath we are creating this secret post installing Kubera as a part of litmus pre-requisites, ref - https://github.com/mayadata-io/oep-e2e/blob/master/litmus/prerequisite/docker-secret.yml

Add comment in the places where we use this secret

gprasath · 2020-09-09T03:09:44Z

And can you add a readme for this scenario, describing the procedure

amitbhatt818

/lgtm

AmitKumarDas · 2020-09-09T10:49:08Z

litmus/director/TCID-KUBELET-RESTART/run_litmus_test.yml

+apiVersion: batch/v1
+kind: Job
+metadata:
+  generateName: kubelet-restart-


Do we restart kubelet of specific nodes?
We may target those node where specific app is running.
Otherwise this might result in flakiness.
Please prove otherwise if this will not result in flakiness.

@AmitKumarDas no we are not specifying any node, on whichever node the job's pod is scheduled it will restart docker on that host node.This will not result in flakiness as there is not much effects of kubelet restart on Kubera components, just that the pod of any statefulset scheduled on that node will go into NotReady state and will be back to Ready state as soon as kubelet starts running on that node.

Signed-off-by: Harsh Shekhar <harsh.shekhar@mayadata.io>

gprasath · 2020-09-15T08:38:29Z

litmus/director/TCID-KUBELET-RESTART/test.yml

+          when: platform == "RANCHER"
+
+        - name: Printing the status of nodes of the cluster
+          shell: kubectl get nodes -o wide


Can we check the status of node where the chaos is injected?

gprasath · 2020-09-15T09:23:07Z

Enhancement to this test: #892

harshshekhar15 requested review from amitbhatt818, AmitKumarDas and gprasath September 8, 2020 19:39

harshshekhar15 mentioned this pull request Sep 8, 2020

feat(chaos): add kubelet restart chaos test in Rancher #891

Closed

gprasath reviewed Sep 9, 2020

View reviewed changes

amitbhatt818 approved these changes Sep 9, 2020

View reviewed changes

AmitKumarDas reviewed Sep 9, 2020

View reviewed changes

harshshekhar15 added 2 commits September 10, 2020 15:05

feat(chaos): add kubelet-restart chaos test

a239b9d

Signed-off-by: Harsh Shekhar <harsh.shekhar@mayadata.io>

add Kubera pod check before running chaos

08cb8d1

Signed-off-by: Harsh Shekhar <harsh.shekhar@mayadata.io>

harshshekhar15 force-pushed the add-kubelet-restart-test branch from cbf840c to 08cb8d1 Compare September 10, 2020 10:22

add comments for using oep-secret

a0ba6da

Signed-off-by: Harsh Shekhar <harsh.shekhar@mayadata.io>

harshshekhar15 requested a review from kmova September 10, 2020 10:58

add task to stop kubelet container

170ea7e

Signed-off-by: Harsh Shekhar <harsh.shekhar@mayadata.io>

AmitKumarDas approved these changes Sep 15, 2020

View reviewed changes

gprasath reviewed Sep 15, 2020

View reviewed changes

gprasath approved these changes Sep 15, 2020

View reviewed changes

gprasath merged commit 2abfb6a into master Sep 15, 2020

harshshekhar15 deleted the add-kubelet-restart-test branch September 15, 2020 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(chaos): add kubelet-restart chaos test #890

feat(chaos): add kubelet-restart chaos test #890

harshshekhar15 commented Sep 8, 2020 •

edited

Loading

gprasath Sep 9, 2020

gprasath Sep 9, 2020

gprasath Sep 9, 2020

harshshekhar15 Sep 9, 2020

harshshekhar15 Sep 9, 2020

harshshekhar15 Sep 9, 2020

gprasath Sep 9, 2020

gprasath Sep 9, 2020

harshshekhar15 Sep 9, 2020

gprasath Sep 9, 2020

harshshekhar15 Sep 9, 2020

gprasath Sep 9, 2020

gprasath Sep 9, 2020

AmitKumarDas Sep 9, 2020

harshshekhar15 Sep 9, 2020

gprasath Sep 9, 2020

harshshekhar15 Sep 9, 2020 •

edited

Loading

gprasath Sep 9, 2020

gprasath commented Sep 9, 2020

amitbhatt818 left a comment

AmitKumarDas Sep 9, 2020

harshshekhar15 Sep 9, 2020

gprasath Sep 15, 2020

gprasath commented Sep 15, 2020

feat(chaos): add kubelet-restart chaos test #890

feat(chaos): add kubelet-restart chaos test #890

Conversation

harshshekhar15 commented Sep 8, 2020 • edited Loading

Exact application name that is under test.

Storage engine that is under test

OpenEBS version if required.

Assumptions of this PR

Notes to reviewer.

Anything else we need to know?

Versions:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harshshekhar15 Sep 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gprasath commented Sep 9, 2020

amitbhatt818 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gprasath commented Sep 15, 2020

harshshekhar15 commented Sep 8, 2020 •

edited

Loading

harshshekhar15 Sep 9, 2020 •

edited

Loading