New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix(container-kill): Adds statusCheckTimeout to container kill recovery #498
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: uditgaurav <udit@chaosnative.com>
ksatchit
approved these changes
Mar 30, 2022
avaakash
approved these changes
Apr 13, 2022
uditgaurav
added a commit
that referenced
this pull request
Jun 13, 2022
* Chore(stress-chaos): Run CPU chaos with percentage of cpu cores (#482) * Chore(stress-chaos): Run CPU chaos with percentage of cores Signed-off-by: uditgaurav <udit@chaosnative.com> * Fixeing alpine CVEs by upgrading the version (#486) * Chore(vulnerability): Remove openebs retry module and update pkgs (#488) * Chore(vulnerability): Fix some vulnerability by updaing the pkgs Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(vulnerability): Remove openebs retry module and update pkgs Signed-off-by: udit <udit@chaosnative.com> * Chore(cgroup): Add support for cgroup version2 in stress-chaos experiment (#490) Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(snyk): Fix snyk security scan on litmus-go (#492) Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment (#491) * Chore(network-chaos): Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment Signed-off-by: uditgaurav <udit@chaosnative.com> Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io> * Chore(randomize): Randomize stress-chaos tunables (#487) * Chore(randomize): Randomize stress-chaos tunables Signed-off-by: uditgaurav <udit@chaosnative.com> * Update stress-chaos.go * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill (#493) * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill Signed-off-by: uditgaurav <udit@chaosnative.com> * (enahncement)experiment: add node label filter for pod network and stress chaos (#494) Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix(targetContainer): Incorrect target container passed in the helper pod for pod level experiments (#496) * Fix target container issue Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix target container issue Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#498) Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#499) Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(warn): Remove warning Neither --kubeconfig nor --master was specified for InClusterConfig (#507) Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(ssm): Update the ssm file path in the Dockerfile (#508) Signed-off-by: uditgaurav <udit@chaosnative.com> * GCP Experiments Refactor, New Label Selector Experiments and IAM Integration (#495) * experiment init Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment file Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment lib Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated post chaos validation Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated empty slices to nil, updated experiment name in environment.go Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed experiment charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * bootstrapped gcp-vm-disk-loss-by-label artiacts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed device-names input for gcp-vm-disk-loss experiment, added API calls to derive device name internally Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed redundant condition check in gcp-vm-disk-loss experiment pre-requisite checks Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * reformatted error messages Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * replaced the SetTargetInstances function Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added settargetdisk function for getting target disk names using label Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * refactored Target Disk Attached VM Instance memorisation, updated vm-disk-loss and added lib logic for vm-disk-loss-by-label experiment Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added experiment to bin and cleared default experiment name in environment.go Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated test.yml Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated AutoScalingGroup to ManagedInstanceGroup; updated logic for checking InstanceStop recovery for ManagedInstanceGroup VMs; Updated log and error messages with VM names Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed redundant computeService code snippets Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed redundant computeService code snippets in gcp-disk-loss experiments Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated logic for deriving default gcp sa credentials for computeService Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated logging for IAM integration Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * refactored log and error messages and wait for start/stop instances logic Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * fixed logs, optimised control statements, added comments, corrected experiment names Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * fixed file exists check logic Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated instance and device name fetch logic for disk loss Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated logs Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * update(sdk): updating litmus sdk for the defaultAppHealthCheck (#513) Signed-off-by: shubhamc <shubhamc@jfrog.com> Co-authored-by: shubhamc <shubhamc@jfrog.com> * fix: updated release workflow (#512) Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com> * Added Active Node Count Check using AWS APIs (#500) * Added node count check using aws apis Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Added node count check using aws apis to instance terminate by tag experiment Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Log improvements; Code improvement in findActiveNodeCount function; Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Added log for instance status check failed in find active node count Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Added check if active node count is less than provided instance ids Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * updated appns podlist filtering error handling (#515) Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com> Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io> * return error if node not present (#516) Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Chore(helper pod): Make setHelper data as tunable (#519) Signed-off-by: uditgaurav <udit@chaosnative.com> Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com> Co-authored-by: Raj Babu Das <mail.rajdas@gmail.com> Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io> Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io> Co-authored-by: shubhamc <shubhamc@jfrog.com> Co-authored-by: Soumya Ghosh Dastidar <44349253+gdsoumya@users.noreply.github.com> Co-authored-by: Akash Shrivastava <akash@chaosnative.com> Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>
uditgaurav
added a commit
that referenced
this pull request
Jun 14, 2022
* modified the cmdProbe for inline mode of execution to accomodate litmusd Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * go mod tidy Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * bootstrapped process-kill experiment files Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated types.go and environment.go Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated secret envs Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment logic and added steady state validation steps Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed action from probe refactor function parameters Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added serial and parallel chaos execution steps Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added conn parameter to probe Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added logic for closing websocket in the end of the experiment Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added experiment to bin Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * corrected the agent endpoint Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * corrected environement.go Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated logs, removed close message and added parallel sequence as default Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated authorization header, replaced Processes struct with int slice of pids Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * restored experiment image Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated test.yml Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added rbac, README, exported charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added websocket connection to chaos details struct, restored probe functions params Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed websocket connection in chaoslib params Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated code function Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated readme Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * restructured directories, added m-agent tag Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated workflow branch Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed guest-os pkg Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * Chore(stress-chaos): Run CPU chaos with percentage of cpu cores (#482) * Chore(stress-chaos): Run CPU chaos with percentage of cores Signed-off-by: uditgaurav <udit@chaosnative.com> * updated client side m-agent design; added channelised message sending Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added liveness check for process kill Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated mutex lock to an RWMutex lock, locked read operations on the map Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * Fixeing alpine CVEs by upgrading the version (#486) * updated WaitForDurationAndCheckLiveness function Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated cpu-stress experiment and steady-state condition Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * corrected probe format Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added functionality for multiple websocket connections Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated liveness check to test for all the connections and added parallel chaos injection Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated m-agent cmd probe for only one agent endpoint Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated underChaosEndpoints for abort Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * optimised make connections logic Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed redundant check and comments Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated comments for function Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated chaosInterval timer for fixing infinitely running chaosInterval Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added CLOSE_CONNECTION action for closure of websocket connections Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * Chore(vulnerability): Remove openebs retry module and update pkgs (#488) * Chore(vulnerability): Fix some vulnerability by updaing the pkgs Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(vulnerability): Remove openebs retry module and update pkgs Signed-off-by: udit <udit@chaosnative.com> * added chaos revert logic Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated connection close on ERROR functionalty and return on Read error Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added log for chaos revert Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * reverted env params Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added abort log info, added defer close statement to message listener, added load percentage validation Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated probe error feedback, removed charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated mutex locks for RLock and RUnlock, updated connect agent function parameters Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * Chore(cgroup): Add support for cgroup version2 in stress-chaos experiment (#490) Signed-off-by: uditgaurav <udit@chaosnative.com> * updated mutex locks Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * Chore(snyk): Fix snyk security scan on litmus-go (#492) Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment (#491) * Chore(network-chaos): Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment Signed-off-by: uditgaurav <udit@chaosnative.com> Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io> * Chore(randomize): Randomize stress-chaos tunables (#487) * Chore(randomize): Randomize stress-chaos tunables Signed-off-by: uditgaurav <udit@chaosnative.com> * Update stress-chaos.go * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill (#493) * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill Signed-off-by: uditgaurav <udit@chaosnative.com> * (enahncement)experiment: add node label filter for pod network and stress chaos (#494) Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix(targetContainer): Incorrect target container passed in the helper pod for pod level experiments (#496) * Fix target container issue Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix target container issue Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#498) Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#499) Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(warn): Remove warning Neither --kubeconfig nor --master was specified for InClusterConfig (#507) Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(ssm): Update the ssm file path in the Dockerfile (#508) Signed-off-by: uditgaurav <udit@chaosnative.com> * GCP Experiments Refactor, New Label Selector Experiments and IAM Integration (#495) * experiment init Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment file Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment lib Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated post chaos validation Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated empty slices to nil, updated experiment name in environment.go Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed experiment charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * bootstrapped gcp-vm-disk-loss-by-label artiacts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed device-names input for gcp-vm-disk-loss experiment, added API calls to derive device name internally Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed redundant condition check in gcp-vm-disk-loss experiment pre-requisite checks Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * reformatted error messages Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * replaced the SetTargetInstances function Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added settargetdisk function for getting target disk names using label Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * refactored Target Disk Attached VM Instance memorisation, updated vm-disk-loss and added lib logic for vm-disk-loss-by-label experiment Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added experiment to bin and cleared default experiment name in environment.go Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated test.yml Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated AutoScalingGroup to ManagedInstanceGroup; updated logic for checking InstanceStop recovery for ManagedInstanceGroup VMs; Updated log and error messages with VM names Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed redundant computeService code snippets Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed redundant computeService code snippets in gcp-disk-loss experiments Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated logic for deriving default gcp sa credentials for computeService Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated logging for IAM integration Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * refactored log and error messages and wait for start/stop instances logic Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * fixed logs, optimised control statements, added comments, corrected experiment names Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * fixed file exists check logic Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated instance and device name fetch logic for disk loss Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated logs Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * update(sdk): updating litmus sdk for the defaultAppHealthCheck (#513) Signed-off-by: shubhamc <shubhamc@jfrog.com> Co-authored-by: shubhamc <shubhamc@jfrog.com> * fix: updated release workflow (#512) Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com> * Added Active Node Count Check using AWS APIs (#500) * Added node count check using aws apis Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Added node count check using aws apis to instance terminate by tag experiment Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Log improvements; Code improvement in findActiveNodeCount function; Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Added log for instance status check failed in find active node count Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Added check if active node count is less than provided instance ids Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * updated appns podlist filtering error handling (#515) Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com> Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io> * go mod tidy Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * return error if node not present (#516) Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Chore(helper pod): Make setHelper data as tunable (#519) Signed-off-by: uditgaurav <udit@chaosnative.com> * added CPUs check in prerequisites check Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * removed .DS_Store Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * removed .DS_Store Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated rbac and readme Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * removed .DS_Store Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated qemu github action Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated qemu action version Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated m-agent go-runner tag to 2.10.0-Beta1 Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated target names Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated machine=>Machine targets, removed .DS_Store Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com> Co-authored-by: Raj Babu Das <mail.rajdas@gmail.com> Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io> Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io> Co-authored-by: shubhamc <shubhamc@jfrog.com> Co-authored-by: Soumya Ghosh Dastidar <44349253+gdsoumya@users.noreply.github.com> Co-authored-by: Akash Shrivastava <akash@chaosnative.com> Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: uditgaurav udit@chaosnative.com
What this PR does / why we need it:
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer:
Checklist:
breaking-changes
tagrequires-upgrade
tag