Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(exp): Add pod autoscaler experiment used to check the scalability of the application pod #65

Merged
merged 5 commits into from
Aug 15, 2020
Merged

Conversation

uditgaurav
Copy link
Member

@uditgaurav uditgaurav commented Jul 23, 2020

Signed-off-by: Udit Gaurav udit.gaurav@mayadata.io

Issue:

Details:

  • This PR is for adding a new experiment "pod auto-scheduler" in litmus-go. The experiment aims to check the ability of nodes to accommodate the number of replicas of the same application pod. We first scale the replicas of application pod then wait for a certain chaos duration to check if all pods comes to running state or not and this becomes the deciding factor for the experiment if it is passed or failed.

It has two inputs for this :

  • Total Chaos Duration: It is the time period upto which the experiment will wait and check for the replicas to be scheduled successfully in other words it is the timeout for this chaos.
  • Replica Count: This is the number of replicas upto which we want to scale the application pod and test.
For Example: 
If Total Chaos Duration is set to 600 sec and Replica count to 15 then the experiment will check till 10 mins 
to get all the replicas in Running state. 

Node auto-scaling test:

Now this experiment can be used for different scenarios as well, like for checking the Node auto-scaling feature. If the pods are unable to schedule then the nodes are getting auto-scaled or not.

pod auto-scaling check:

pod auto-scaling: The pod scaling check can be done by noting the amount of time taken by the pods to get scheduled after scaleup. The heavy application may takes larger chaos duration to get scheduled.

Pre-Chaos and Post-Chaos Checks:

  • The Pre-Chaos check for the experiment is to check the health of the application pods that is if all the replicas are in Running state. After the per chaos check the initial replica count is recorded.
  • Post Chaos-Check: It will again do a status check over the application pods this time after chaos. In both cases (Passed/Failed) we scale down the deployments to the original replicas after chaos injection.

Abort:

In case of anything goes wrong and you want to do abort chaos then this experiment supports that it means when we abort the experiment either by running abort command or deleting the experiment in middle, the experiment gets removed and the replica count scale down to original value(before chaos).

Limitations:

  • Currently it supports deployment type of application only. The support for statefulset will also be added very soon.

…y of the application pod

Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>
@uditgaurav uditgaurav requested a review from ksatchit July 27, 2020 00:50
@uditgaurav uditgaurav self-assigned this Jul 27, 2020
@uditgaurav uditgaurav added the enhancement New feature or request label Jul 27, 2020
@saiyam1814
Copy link

This is a very good feature where we can test node and pod additions.
I see lot of commits happen can you make it as a single commit?
something like
git rebase -i HEAD~3 and follow the instructions in the interactive rebase

@saiyam1814
Copy link

So if I understand correct , it will check of a pod has insufficient node and then if it goes to running the test passes for node scaling?

@uditgaurav
Copy link
Member Author

uditgaurav commented Aug 13, 2020

@saiyam1814 yes! it'll do that on account of two inputs (number of replicas and timeout).

@saiyam1814
Copy link

Great sounds good and I am sure community would really like this feature which is currently not there in any chaos tools

@uditgaurav
Copy link
Member Author

uditgaurav commented Aug 15, 2020

Chaos Result

Name:         nginx-chaos-pod-autoscaler
Namespace:    default
Labels:       name=nginx-chaos-pod-autoscaler
Annotations:  <none>
API Version:  litmuschaos.io/v1alpha1
Kind:         ChaosResult
Metadata:
  Creation Timestamp:  2020-08-15T10:09:54Z
  Generation:          2
  Resource Version:    13007592
  Self Link:           /apis/litmuschaos.io/v1alpha1/namespaces/default/chaosresults/nginx-chaos-pod-autoscaler
  UID:                 ede02311-330e-41a8-8cf7-b29f38dc314f
Spec:
  Engine:      nginx-chaos
  Experiment:  pod-autoscaler
Status:
  Experimentstatus:
    Fail Step:                 N/A
    Phase:                     Completed
    Probe Success Percentage:  Awaited
    Verdict:                   Pass
Events:
  Type    Reason   Age    From                         Message
  ----    ------   ----   ----                         -------
  Normal  Summary  2m20s  pod-autoscaler-s6kmw6-q52t5  pod-autoscaler experiment has been Passed

Chaos Engine

Name:         nginx-chaos
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"litmuschaos.io/v1alpha1","kind":"ChaosEngine","metadata":{"annotations":{},"name":"nginx-chaos","namespace":"default"},"spe...
API Version:  litmuschaos.io/v1alpha1
Kind:         ChaosEngine
Metadata:
  Creation Timestamp:  2020-08-15T10:03:03Z
  Finalizers:
    chaosengine.litmuschaos.io/finalizer
  Generation:        26
  Resource Version:  13007628
  Self Link:         /apis/litmuschaos.io/v1alpha1/namespaces/default/chaosengines/nginx-chaos
  UID:               a9a42b4a-e342-47b0-b804-d9007bfb33c2
Spec:
  Annotation Check:  false
  Appinfo:
    Appkind:              deployment
    Applabel:             run=nginx
    Appns:                default
  Chaos Service Account:  pod-autoscaler-sa
  Components:
    Runner:
      Image:     litmuschaos/chaos-runner:1.6.2
  Engine State:  stop
  Experiments:
    Name:  pod-autoscaler
    Spec:
      Components:
        Env:
          Name:   TOTAL_CHAOS_DURATION
          Value:  60
          Name:   REPLICA_COUNT
          Value:  5
        Status Check Timeouts:
      Rank:             0
  Job Clean Up Policy:  retain
Status:
  Engine Status:  completed
  Experiments:
    Experiment Pod:    pod-autoscaler-s6kmw6-q52t5
    Last Update Time:  2020-08-15T10:10:40Z
    Name:              pod-autoscaler
    Runner:            nginx-chaos-runner
    Status:            Completed
    Verdict:           Pass
Events:
  Type    Reason                     Age                    From                         Message
  ----    ------                     ----                   ----                         -------
  Normal  ChaosEngineInitialized     9m26s                  chaos-operator               nginx-chaos-runner created successfully
  Normal  ExperimentDependencyCheck  9m23s                  nginx-chaos-runner           Experiment resources validated for Chaos Experiment: 'pod-autoscaler'
  Normal  ExperimentJobCreate        9m18s                  nginx-chaos-runner           Experiment Job 'pod-autoscaler-ifyjk0' created for Chaos Experiment 'pod-autoscaler'
  Normal  ExperimentJobCleanUp       8m32s                  nginx-chaos-runner           Experiment Job 'pod-autoscaler-ifyjk0' will be retained
  Normal  RestartInProgress          2m47s (x2 over 2m47s)  chaos-operator               Chaos Engine restarted, will re-create all chaos-resources
  Normal  ExperimentDependencyCheck  2m44s                  nginx-chaos-runner           Experiment resources validated for Chaos Experiment: 'pod-autoscaler'
  Normal  ExperimentJobCreate        2m39s                  nginx-chaos-runner           Experiment Job 'pod-autoscaler-s6kmw6' created for Chaos Experiment 'pod-autoscaler'
  Normal  PreChaosCheck              2m31s (x2 over 9m8s)   pod-autoscaler-ifyjk0-nxpmn  AUT is Running successfully
  Normal  PostChaosCheck             119s (x2 over 8m38s)   pod-autoscaler-ifyjk0-nxpmn  AUT is Running successfully
  Normal  Summary                    119s (x2 over 8m38s)   pod-autoscaler-ifyjk0-nxpmn  pod-autoscaler experiment has been Passed
  Normal  ExperimentJobCleanUp       109s                   nginx-chaos-runner           Experiment Job 'pod-autoscaler-s6kmw6' will be retained
  Normal  ChaosEngineCompleted       103s (x2 over 8m26s)   chaos-operator               Chaos Engine completed, will delete or retain the resources according to jobCleanUpPolicy

Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>
@ksatchit ksatchit merged commit 9b866d7 into litmuschaos:master Aug 15, 2020
ksatchit pushed a commit to ksatchit/litmus-go that referenced this pull request Aug 15, 2020
…ty of the application pod (litmuschaos#65)

* chore(exp): Add pod autoscaler experiment used to check the salability of the application pod

Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>

* Adding abort in the experiment

Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>
ksatchit pushed a commit that referenced this pull request Aug 15, 2020
* chore(probe): Adding probes in all go experiments (#80)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* (fix)resource-chaoslib: run command with a shell instance (#81)

Signed-off-by: ksatchit <karthik.s@mayadata.io>

* refactor(go-experiments): Refactor all the go experiments (#82)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* chore(exp): Add pod autoscaler experiment used to check the scalability of the application pod (#65)

* chore(exp): Add pod autoscaler experiment used to check the salability of the application pod

Signed-off-by: Udit Gaurav <uditgaurav@gmail.com>

* Adding abort in the experiment

Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>

* update(chaosresult): updating the chaosresult for probe score (#85)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* refactor(pod-autoscaler): refactor the pod-scaler experiment (#86)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io>
Co-authored-by: UDIT GAURAV <35391335+uditgaurav@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pod Creation for having Cluster autoscaling tested at scale
4 participants