chore(exp): Add pod autoscaler experiment used to check the scalability of the application pod #65

uditgaurav · 2020-07-23T01:16:24Z

Signed-off-by: Udit Gaurav udit.gaurav@mayadata.io

Issue:

fixes: Pod Creation for having Cluster autoscaling tested at scale litmus#1171

Details:

This PR is for adding a new experiment "pod auto-scheduler" in litmus-go. The experiment aims to check the ability of nodes to accommodate the number of replicas of the same application pod. We first scale the replicas of application pod then wait for a certain chaos duration to check if all pods comes to running state or not and this becomes the deciding factor for the experiment if it is passed or failed.

It has two inputs for this :

Total Chaos Duration: It is the time period upto which the experiment will wait and check for the replicas to be scheduled successfully in other words it is the timeout for this chaos.
Replica Count: This is the number of replicas upto which we want to scale the application pod and test.

For Example: 
If Total Chaos Duration is set to 600 sec and Replica count to 15 then the experiment will check till 10 mins 
to get all the replicas in Running state.

Node auto-scaling test:

Now this experiment can be used for different scenarios as well, like for checking the Node auto-scaling feature. If the pods are unable to schedule then the nodes are getting auto-scaled or not.

pod auto-scaling check:

pod auto-scaling: The pod scaling check can be done by noting the amount of time taken by the pods to get scheduled after scaleup. The heavy application may takes larger chaos duration to get scheduled.

Pre-Chaos and Post-Chaos Checks:

The Pre-Chaos check for the experiment is to check the health of the application pods that is if all the replicas are in Running state. After the per chaos check the initial replica count is recorded.
Post Chaos-Check: It will again do a status check over the application pods this time after chaos. In both cases (Passed/Failed) we scale down the deployments to the original replicas after chaos injection.

Abort:

In case of anything goes wrong and you want to do abort chaos then this experiment supports that it means when we abort the experiment either by running abort command or deleting the experiment in middle, the experiment gets removed and the replica count scale down to original value(before chaos).

Limitations:

Currently it supports deployment type of application only. The support for statefulset will also be added very soon.

…y of the application pod Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>

…o master-pod-scale-chaos

Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>

saiyam1814 · 2020-08-13T06:33:13Z

This is a very good feature where we can test node and pod additions.
I see lot of commits happen can you make it as a single commit?
something like
git rebase -i HEAD~3 and follow the instructions in the interactive rebase

saiyam1814 · 2020-08-13T06:34:01Z

So if I understand correct , it will check of a pod has insufficient node and then if it goes to running the test passes for node scaling?

uditgaurav · 2020-08-13T08:07:14Z

@saiyam1814 yes! it'll do that on account of two inputs (number of replicas and timeout).

saiyam1814 · 2020-08-13T09:42:48Z

Great sounds good and I am sure community would really like this feature which is currently not there in any chaos tools

chaoslib/litmus/pod_autoscaler/pod-autoscaler.go

experiments/generic/pod-autoscaler/pod-autoscaler-k8s-job.yml

chaoslib/litmus/pod_autoscaler/pod-autoscaler.go

uditgaurav · 2020-08-15T10:13:51Z

Chaos Result

Name:         nginx-chaos-pod-autoscaler
Namespace:    default
Labels:       name=nginx-chaos-pod-autoscaler
Annotations:  <none>
API Version:  litmuschaos.io/v1alpha1
Kind:         ChaosResult
Metadata:
  Creation Timestamp:  2020-08-15T10:09:54Z
  Generation:          2
  Resource Version:    13007592
  Self Link:           /apis/litmuschaos.io/v1alpha1/namespaces/default/chaosresults/nginx-chaos-pod-autoscaler
  UID:                 ede02311-330e-41a8-8cf7-b29f38dc314f
Spec:
  Engine:      nginx-chaos
  Experiment:  pod-autoscaler
Status:
  Experimentstatus:
    Fail Step:                 N/A
    Phase:                     Completed
    Probe Success Percentage:  Awaited
    Verdict:                   Pass
Events:
  Type    Reason   Age    From                         Message
  ----    ------   ----   ----                         -------
  Normal  Summary  2m20s  pod-autoscaler-s6kmw6-q52t5  pod-autoscaler experiment has been Passed

Chaos Engine

Name:         nginx-chaos
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"litmuschaos.io/v1alpha1","kind":"ChaosEngine","metadata":{"annotations":{},"name":"nginx-chaos","namespace":"default"},"spe...
API Version:  litmuschaos.io/v1alpha1
Kind:         ChaosEngine
Metadata:
  Creation Timestamp:  2020-08-15T10:03:03Z
  Finalizers:
    chaosengine.litmuschaos.io/finalizer
  Generation:        26
  Resource Version:  13007628
  Self Link:         /apis/litmuschaos.io/v1alpha1/namespaces/default/chaosengines/nginx-chaos
  UID:               a9a42b4a-e342-47b0-b804-d9007bfb33c2
Spec:
  Annotation Check:  false
  Appinfo:
    Appkind:              deployment
    Applabel:             run=nginx
    Appns:                default
  Chaos Service Account:  pod-autoscaler-sa
  Components:
    Runner:
      Image:     litmuschaos/chaos-runner:1.6.2
  Engine State:  stop
  Experiments:
    Name:  pod-autoscaler
    Spec:
      Components:
        Env:
          Name:   TOTAL_CHAOS_DURATION
          Value:  60
          Name:   REPLICA_COUNT
          Value:  5
        Status Check Timeouts:
      Rank:             0
  Job Clean Up Policy:  retain
Status:
  Engine Status:  completed
  Experiments:
    Experiment Pod:    pod-autoscaler-s6kmw6-q52t5
    Last Update Time:  2020-08-15T10:10:40Z
    Name:              pod-autoscaler
    Runner:            nginx-chaos-runner
    Status:            Completed
    Verdict:           Pass
Events:
  Type    Reason                     Age                    From                         Message
  ----    ------                     ----                   ----                         -------
  Normal  ChaosEngineInitialized     9m26s                  chaos-operator               nginx-chaos-runner created successfully
  Normal  ExperimentDependencyCheck  9m23s                  nginx-chaos-runner           Experiment resources validated for Chaos Experiment: 'pod-autoscaler'
  Normal  ExperimentJobCreate        9m18s                  nginx-chaos-runner           Experiment Job 'pod-autoscaler-ifyjk0' created for Chaos Experiment 'pod-autoscaler'
  Normal  ExperimentJobCleanUp       8m32s                  nginx-chaos-runner           Experiment Job 'pod-autoscaler-ifyjk0' will be retained
  Normal  RestartInProgress          2m47s (x2 over 2m47s)  chaos-operator               Chaos Engine restarted, will re-create all chaos-resources
  Normal  ExperimentDependencyCheck  2m44s                  nginx-chaos-runner           Experiment resources validated for Chaos Experiment: 'pod-autoscaler'
  Normal  ExperimentJobCreate        2m39s                  nginx-chaos-runner           Experiment Job 'pod-autoscaler-s6kmw6' created for Chaos Experiment 'pod-autoscaler'
  Normal  PreChaosCheck              2m31s (x2 over 9m8s)   pod-autoscaler-ifyjk0-nxpmn  AUT is Running successfully
  Normal  PostChaosCheck             119s (x2 over 8m38s)   pod-autoscaler-ifyjk0-nxpmn  AUT is Running successfully
  Normal  Summary                    119s (x2 over 8m38s)   pod-autoscaler-ifyjk0-nxpmn  pod-autoscaler experiment has been Passed
  Normal  ExperimentJobCleanUp       109s                   nginx-chaos-runner           Experiment Job 'pod-autoscaler-s6kmw6' will be retained
  Normal  ChaosEngineCompleted       103s (x2 over 8m26s)   chaos-operator               Chaos Engine completed, will delete or retain the resources according to jobCleanUpPolicy

chaoslib/litmus/pod_autoscaler/pod-autoscaler.go

experiments/generic/pod-autoscaler/README.md

experiments/generic/pod-autoscaler/test/test.yml

Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>

…ty of the application pod (litmuschaos#65) * chore(exp): Add pod autoscaler experiment used to check the salability of the application pod Signed-off-by: Udit Gaurav <uditgaurav@gmail.com> * Adding abort in the experiment Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>

* chore(probe): Adding probes in all go experiments (#80) Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io> * (fix)resource-chaoslib: run command with a shell instance (#81) Signed-off-by: ksatchit <karthik.s@mayadata.io> * refactor(go-experiments): Refactor all the go experiments (#82) Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io> * chore(exp): Add pod autoscaler experiment used to check the scalability of the application pod (#65) * chore(exp): Add pod autoscaler experiment used to check the salability of the application pod Signed-off-by: Udit Gaurav <uditgaurav@gmail.com> * Adding abort in the experiment Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io> * update(chaosresult): updating the chaosresult for probe score (#85) Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io> * refactor(pod-autoscaler): refactor the pod-scaler experiment (#86) Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io> Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io> Co-authored-by: UDIT GAURAV <35391335+uditgaurav@users.noreply.github.com>

chore(exp): Add pod autoscaler experiment used to check the salabilit…

1e80bcb

…y of the application pod Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>

uditgaurav added DO NOT MERGE and removed DO NOT MERGE labels Jul 23, 2020

uditgaurav requested a review from ksatchit July 27, 2020 00:50

uditgaurav self-assigned this Jul 27, 2020

uditgaurav added the enhancement New feature or request label Jul 27, 2020

uditgaurav requested a review from ispeakc0de July 27, 2020 00:50

uditgaurav and others added 3 commits August 3, 2020 12:11

Merge branch 'master' into master-pod-scale-chaos

e78199b

Merge branch 'master' of https://github.com/litmuschaos/litmus-go int…

6889115

…o master-pod-scale-chaos

Adding abort in the experiment

3e3a426

Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>

uditgaurav mentioned this pull request Aug 7, 2020

Pod Creation for having Cluster autoscaling tested at scale litmuschaos/litmus#1171

Closed

ispeakc0de reviewed Aug 14, 2020

View reviewed changes