cpu-stress Chaos Experiment (#518)

* modified the cmdProbe for inline mode of execution to accomodate litmusd Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * go mod tidy Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * bootstrapped process-kill experiment files Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated types.go and environment.go Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated secret envs Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment logic and added steady state validation steps Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed action from probe refactor function parameters Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added serial and parallel chaos execution steps Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added conn parameter to probe Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added logic for closing websocket in the end of the experiment Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added experiment to bin Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * corrected the agent endpoint Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * corrected environement.go Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated logs, removed close message and added parallel sequence as default Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated authorization header, replaced Processes struct with int slice of pids Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * restored experiment image Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated test.yml Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added rbac, README, exported charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added websocket connection to chaos details struct, restored probe functions params Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed websocket connection in chaoslib params Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated code function Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated readme Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * restructured directories, added m-agent tag Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated workflow branch Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed guest-os pkg Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * Chore(stress-chaos): Run CPU chaos with percentage of cpu cores (#482) * Chore(stress-chaos): Run CPU chaos with percentage of cores Signed-off-by: uditgaurav <udit@chaosnative.com> * updated client side m-agent design; added channelised message sending Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added liveness check for process kill Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated mutex lock to an RWMutex lock, locked read operations on the map Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * Fixeing alpine CVEs by upgrading the version (#486) * updated WaitForDurationAndCheckLiveness function Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated cpu-stress experiment and steady-state condition Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * corrected probe format Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added functionality for multiple websocket connections Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated liveness check to test for all the connections and added parallel chaos injection Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated m-agent cmd probe for only one agent endpoint Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated underChaosEndpoints for abort Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * optimised make connections logic Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed redundant check and comments Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated comments for function Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated chaosInterval timer for fixing infinitely running chaosInterval Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added CLOSE_CONNECTION action for closure of websocket connections Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * Chore(vulnerability): Remove openebs retry module and update pkgs (#488) * Chore(vulnerability): Fix some vulnerability by updaing the pkgs Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(vulnerability): Remove openebs retry module and update pkgs Signed-off-by: udit <udit@chaosnative.com> * added chaos revert logic Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated connection close on ERROR functionalty and return on Read error Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added log for chaos revert Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * reverted env params Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added abort log info, added defer close statement to message listener, added load percentage validation Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated probe error feedback, removed charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated mutex locks for RLock and RUnlock, updated connect agent function parameters Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * Chore(cgroup): Add support for cgroup version2 in stress-chaos experiment (#490) Signed-off-by: uditgaurav <udit@chaosnative.com> * updated mutex locks Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * Chore(snyk): Fix snyk security scan on litmus-go (#492) Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment (#491) * Chore(network-chaos): Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment Signed-off-by: uditgaurav <udit@chaosnative.com> Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io> * Chore(randomize): Randomize stress-chaos tunables (#487) * Chore(randomize): Randomize stress-chaos tunables Signed-off-by: uditgaurav <udit@chaosnative.com> * Update stress-chaos.go * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill (#493) * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill Signed-off-by: uditgaurav <udit@chaosnative.com> * (enahncement)experiment: add node label filter for pod network and stress chaos (#494) Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix(targetContainer): Incorrect target container passed in the helper pod for pod level experiments (#496) * Fix target container issue Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix target container issue Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#498) Signed-off-by: uditgaurav <udit@chaosnative.com> * Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#499) Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(warn): Remove warning Neither --kubeconfig nor --master was specified for InClusterConfig (#507) Signed-off-by: uditgaurav <udit@chaosnative.com> * Chore(ssm): Update the ssm file path in the Dockerfile (#508) Signed-off-by: uditgaurav <udit@chaosnative.com> * GCP Experiments Refactor, New Label Selector Experiments and IAM Integration (#495) * experiment init Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment file Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated experiment lib Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated post chaos validation Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated empty slices to nil, updated experiment name in environment.go Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed experiment charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * bootstrapped gcp-vm-disk-loss-by-label artiacts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed device-names input for gcp-vm-disk-loss experiment, added API calls to derive device name internally Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed redundant condition check in gcp-vm-disk-loss experiment pre-requisite checks Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * reformatted error messages Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * replaced the SetTargetInstances function Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added settargetdisk function for getting target disk names using label Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * refactored Target Disk Attached VM Instance memorisation, updated vm-disk-loss and added lib logic for vm-disk-loss-by-label experiment Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * added experiment to bin and cleared default experiment name in environment.go Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed charts Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated test.yml Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated AutoScalingGroup to ManagedInstanceGroup; updated logic for checking InstanceStop recovery for ManagedInstanceGroup VMs; Updated log and error messages with VM names Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed redundant computeService code snippets Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * removed redundant computeService code snippets in gcp-disk-loss experiments Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated logic for deriving default gcp sa credentials for computeService Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * updated logging for IAM integration Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * refactored log and error messages and wait for start/stop instances logic Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * fixed logs, optimised control statements, added comments, corrected experiment names Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * fixed file exists check logic Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated instance and device name fetch logic for disk loss Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated logs Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * update(sdk): updating litmus sdk for the defaultAppHealthCheck (#513) Signed-off-by: shubhamc <shubhamc@jfrog.com> Co-authored-by: shubhamc <shubhamc@jfrog.com> * fix: updated release workflow (#512) Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com> * Added Active Node Count Check using AWS APIs (#500) * Added node count check using aws apis Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Added node count check using aws apis to instance terminate by tag experiment Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Log improvements; Code improvement in findActiveNodeCount function; Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Added log for instance status check failed in find active node count Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Added check if active node count is less than provided instance ids Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * updated appns podlist filtering error handling (#515) Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com> Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io> * go mod tidy Signed-off-by: neelanjan00 <neelanjan@chaosnative.com> * return error if node not present (#516) Signed-off-by: Akash Shrivastava <akash@chaosnative.com> * Chore(helper pod): Make setHelper data as tunable (#519) Signed-off-by: uditgaurav <udit@chaosnative.com> * added CPUs check in prerequisites check Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * removed .DS_Store Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * removed .DS_Store Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated rbac and readme Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * removed .DS_Store Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated qemu github action Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated qemu action version Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated m-agent go-runner tag to 2.10.0-Beta1 Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated target names Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> * updated machine=>Machine targets, removed .DS_Store Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io> Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com> Co-authored-by: Raj Babu Das <mail.rajdas@gmail.com> Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io> Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io> Co-authored-by: shubhamc <shubhamc@jfrog.com> Co-authored-by: Soumya Ghosh Dastidar <44349253+gdsoumya@users.noreply.github.com> Co-authored-by: Akash Shrivastava <akash@chaosnative.com> Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>
litmuschaos · Jun 14, 2022 · f4892a7 · f4892a7
1 parent 7a11fd5
commit f4892a7
Show file tree

Hide file tree

Showing 12 changed files with 758 additions and 5 deletions.
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -67,7 +67,7 @@ jobs:
           ref: ${{ github.event.pull_request.head.sha }}
 
       - name: Set up QEMU
-        uses: docker/setup-qemu-action@v1
+        uses: docker/setup-qemu-action@v2
         with:
           platforms: all
 
@@ -83,7 +83,7 @@ jobs:
           push: false
           file: build/Dockerfile
           platforms: linux/amd64,linux/arm64
-          tags: litmuschaos/go-runner:m-agent
+          tags: litmuschaos/go-runner:2.10.0-Beta1
 
   trivy:
     needs: pre-checks

diff --git a/.github/workflows/push.yml b/.github/workflows/push.yml
@@ -47,7 +47,7 @@ jobs:
       - uses: actions/checkout@v2
 
       - name: Set up QEMU
-        uses: docker/setup-qemu-action@v1
+        uses: docker/setup-qemu-action@v2
         with:
           platforms: all
 
@@ -69,4 +69,4 @@ jobs:
           push: true
           file: build/Dockerfile
           platforms: linux/amd64,linux/arm64
-          tags: litmuschaos/go-runner:m-agent
+          tags: litmuschaos/go-runner:2.10.0-Beta1
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -45,7 +45,7 @@ jobs:
           echo "${RELEASE_TAG}" > ${{ github.workspace }}/tag.txt
           
       - name: Set up QEMU
-        uses: docker/setup-qemu-action@v1
+        uses: docker/setup-qemu-action@v2
         with:
           platforms: all
 

diff --git a/bin/experiment/experiment.go b/bin/experiment/experiment.go
@@ -51,6 +51,7 @@ import (
 	ebsLossByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-tag/experiment"
 	ec2TerminateByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-id/experiment"
 	ec2TerminateByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-tag/experiment"
+	cpuStress "github.com/litmuschaos/litmus-go/experiments/os/cpu-stress/experiment"
 	processKill "github.com/litmuschaos/litmus-go/experiments/os/process-kill/experiment"
 	vmpoweroff "github.com/litmuschaos/litmus-go/experiments/vmware/vm-poweroff/experiment"
 
@@ -165,6 +166,8 @@ func main() {
 		redfishNodeRestart.NodeRestart(clients)
 	case "process-kill":
 		processKill.ProcessKill(clients)
+	case "cpu-stress":
+		cpuStress.CPUStressExperiment(clients)
 	case "gcp-vm-instance-stop-by-label":
 		gcpVMInstanceStopByLabel.GCPVMInstanceStopByLabel(clients)
 	case "gcp-vm-disk-loss-by-label":

diff --git a/chaoslib/litmus/cpu-stress/lib/cpu-stress.go b/chaoslib/litmus/cpu-stress/lib/cpu-stress.go
@@ -0,0 +1,308 @@
+package lib
+
+import (
+	"os"
+	"os/signal"
+	"strconv"
+	"strings"
+	"sync"
+	"syscall"
+	"time"
+
+	"github.com/gorilla/websocket"
+	clients "github.com/litmuschaos/litmus-go/pkg/clients"
+	"github.com/litmuschaos/litmus-go/pkg/events"
+	"github.com/litmuschaos/litmus-go/pkg/log"
+	"github.com/litmuschaos/litmus-go/pkg/machine/common/messages"
+	experimentTypes "github.com/litmuschaos/litmus-go/pkg/os/cpu-stress/types"
+	"github.com/litmuschaos/litmus-go/pkg/probe"
+	"github.com/litmuschaos/litmus-go/pkg/types"
+	"github.com/litmuschaos/litmus-go/pkg/utils/common"
+	"github.com/pkg/errors"
+)
+
+var inject, abort chan os.Signal
+var timeDuration = 60 * time.Second
+var chaosRevert sync.WaitGroup
+var underChaosEndpoints []int
+
+type cpuStressParams struct {
+	Workers string
+	Load    string
+	Timeout string
+}
+
+// InjectCPUStressChaos contains the prepration and injection steps for the experiment
+func InjectCPUStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
+
+	// inject channel is used to transmit signal notifications.
+	inject = make(chan os.Signal, 1)
+	// Catch and relay certain signal(s) to inject channel.
+	signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
+
+	// abort channel is used to transmit signal notifications.
+	abort = make(chan os.Signal, 1)
+	// Catch and relay certain signal(s) to abort channel.
+	signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
+
+	// waiting for the ramp time before chaos injection
+	if experimentsDetails.RampTime != 0 {
+		log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
+		common.WaitForDuration(experimentsDetails.RampTime)
+	}
+
+	agentEndpointList := strings.Split(experimentsDetails.AgentEndpoints, ",")
+
+	select {
+	case <-inject:
+		// stopping the chaos execution, if abort signal received
+		os.Exit(0)
+	default:
+
+		// watching for the abort signal and revert the chaos
+		go AbortWatcher(chaosDetails.WebsocketConnections, agentEndpointList, abort, chaosDetails)
+		chaosRevert.Add(1)
+
+		switch strings.ToLower(experimentsDetails.Sequence) {
+		case "serial":
+			if err := injectChaosInSerialMode(experimentsDetails, agentEndpointList, clients, resultDetails, eventsDetails, chaosDetails, abort); err != nil {
+				return err
+			}
+		case "parallel":
+			if err := injectChaosInParallelMode(experimentsDetails, agentEndpointList, clients, resultDetails, eventsDetails, chaosDetails, abort); err != nil {
+				return err
+			}
+		default:
+			return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
+		}
+
+		// wait for the ramp time after chaos injection
+		if experimentsDetails.RampTime != 0 {
+			log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
+			common.WaitForDuration(experimentsDetails.RampTime)
+		}
+	}
+
+	return nil
+}
+
+// injectChaosInSerialMode injects CPU stress chaos in serial mode i.e. one after the other
+func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, agentEndpointList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, abort chan os.Signal) error {
+
+	//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
+	ChaosStartTimeStamp := time.Now()
+	duration := int(time.Since(ChaosStartTimeStamp).Seconds())
+
+	for duration < experimentsDetails.ChaosDuration {
+
+		if experimentsDetails.EngineName != "" {
+			msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
+			types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
+			events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
+		}
+
+		for i := range agentEndpointList {
+
+			log.Infof("[Chaos]: Injecting CPU stress for %s agent endpoint", agentEndpointList[i])
+			feedback, payload, err := messages.SendMessageToAgent(chaosDetails.WebsocketConnections[i], "EXECUTE_EXPERIMENT", cpuStressParams{experimentsDetails.CPUs, experimentsDetails.LoadPercentage, strconv.Itoa(experimentsDetails.ChaosInterval)}, &timeDuration)
+			if err != nil {
+				return errors.Errorf("failed while sending message to agent, err: %v", err)
+			}
+
+			// ACTION_SUCCESSFUL feedback is received only if the cpu stress chaos has been injected successfully
+			if feedback != "ACTION_SUCCESSFUL" {
+				if feedback == "ERROR" {
+
+					agentError, err := messages.GetErrorMessage(payload)
+					if err != nil {
+						return errors.Errorf("failed to interpret error message from agent, err: %v", err)
+					}
+
+					return errors.Errorf("error occured while injecting CPU stress chaos for %s agent endpoint, err: %s", agentEndpointList[i], agentError)
+				}
+
+				return errors.Errorf("unintelligible feedback received from agent: %s", feedback)
+			}
+
+			underChaosEndpoints = append(underChaosEndpoints, i)
+
+			common.SetTargets(agentEndpointList[i], "injected", "Machine", chaosDetails)
+
+			log.Infof("[Chaos]: CPU stress chaos injected successfully in %s agent endpoint", agentEndpointList[i])
+
+			// run the probes during chaos
+			// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
+			if len(resultDetails.ProbeDetails) != 0 && i == 0 {
+				if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
+					return err
+				}
+			}
+
+			// wait for the chaos interval
+			log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
+			if err := common.WaitForDurationAndCheckLiveness([]*websocket.Conn{chaosDetails.WebsocketConnections[i]}, []string{agentEndpointList[i]}, experimentsDetails.ChaosInterval, abort, &chaosRevert); err != nil {
+				return errors.Errorf("error occured during liveness check, err: %v", err)
+			}
+
+			log.Infof("[Chaos]: Reverting CPU stress for %s agent endpoint", agentEndpointList[i])
+			feedback, payload, err = messages.SendMessageToAgent(chaosDetails.WebsocketConnections[i], "REVERT_CHAOS", nil, &timeDuration)
+			if err != nil {
+				return errors.Errorf("failed while sending message to agent, err: %v", err)
+			}
+
+			// ACTION_SUCCESSFUL feedback is received only if the cpu stress chaos has been injected successfully
+			if feedback != "ACTION_SUCCESSFUL" {
+				if feedback == "ERROR" {
+
+					agentError, err := messages.GetErrorMessage(payload)
+					if err != nil {
+						return errors.Errorf("failed to interpret error message from agent, err: %v", err)
+					}
+
+					return errors.Errorf("error occured while reverting CPU stress chaos for %s agent endpoint, err: %s", agentEndpointList[i], agentError)
+				}
+
+				return errors.Errorf("unintelligible feedback received from agent: %s", feedback)
+			}
+
+			underChaosEndpoints = underChaosEndpoints[:len(underChaosEndpoints)-1]
+
+			common.SetTargets(agentEndpointList[i], "reverted", "Machine", chaosDetails)
+		}
+
+		duration = int(time.Since(ChaosStartTimeStamp).Seconds())
+	}
+
+	return nil
+}
+
+// injectChaosInParallelMode injects CPU stress chaos in parallel mode i.e. all at once
+func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, agentEndpointList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, abort chan os.Signal) error {
+
+	//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
+	ChaosStartTimeStamp := time.Now()
+	duration := int(time.Since(ChaosStartTimeStamp).Seconds())
+
+	for duration < experimentsDetails.ChaosDuration {
+
+		if experimentsDetails.EngineName != "" {
+			msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
+			types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
+			events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
+		}
+
+		// inject cpu stress chaos
+		for i := range agentEndpointList {
+
+			log.Infof("[Chaos]: Injecting CPU stress for %s agent endpoint", agentEndpointList[i])
+			feedback, payload, err := messages.SendMessageToAgent(chaosDetails.WebsocketConnections[i], "EXECUTE_EXPERIMENT", cpuStressParams{experimentsDetails.CPUs, experimentsDetails.LoadPercentage, strconv.Itoa(experimentsDetails.ChaosInterval)}, &timeDuration)
+			if err != nil {
+				return errors.Errorf("failed while sending message to agent, err: %v", err)
+			}
+
+			// ACTION_SUCCESSFUL feedback is received only if the cpu stress chaos has been injected successfully
+			if feedback != "ACTION_SUCCESSFUL" {
+				if feedback == "ERROR" {
+
+					agentError, err := messages.GetErrorMessage(payload)
+					if err != nil {
+						return errors.Errorf("failed to interpret error message from agent, err: %v", err)
+					}
+
+					return errors.Errorf("error occured while injecting CPU stress chaos for %s agent endpoint, err: %s", agentEndpointList[i], agentError)
+				}
+
+				return errors.Errorf("unintelligible feedback received from agent: %s", feedback)
+			}
+
+			underChaosEndpoints = append(underChaosEndpoints, i)
+
+			common.SetTargets(agentEndpointList[i], "injected", "Machine", chaosDetails)
+
+			log.Infof("[Chaos]: CPU stress chaos injected successfully in %s agent endpoint", agentEndpointList[i])
+		}
+
+		// run the probes during chaos
+		// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
+		if len(resultDetails.ProbeDetails) != 0 {
+			if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
+				return err
+			}
+		}
+
+		// wait for the chaos interval
+		log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
+		if err := common.WaitForDurationAndCheckLiveness(chaosDetails.WebsocketConnections, agentEndpointList, experimentsDetails.ChaosInterval, abort, &chaosRevert); err != nil {
+			return errors.Errorf("error occured during liveness check, err: %v", err)
+		}
+
+		for i := range agentEndpointList {
+
+			log.Infof("[Chaos]: Reverting CPU stress for %s agent endpoint", agentEndpointList[i])
+			feedback, payload, err := messages.SendMessageToAgent(chaosDetails.WebsocketConnections[i], "REVERT_CHAOS", nil, &timeDuration)
+			if err != nil {
+				return errors.Errorf("failed while sending message to agent, err: %v", err)
+			}
+
+			// ACTION_SUCCESSFUL feedback is received only if the cpu stress chaos has been injected successfully
+			if feedback != "ACTION_SUCCESSFUL" {
+				if feedback == "ERROR" {
+
+					agentError, err := messages.GetErrorMessage(payload)
+					if err != nil {
+						return errors.Errorf("failed to interpret error message from agent, err: %v", err)
+					}
+
+					return errors.Errorf("error occured while reverting CPU stress chaos for %s agent endpoint, err: %s", agentEndpointList[i], agentError)
+				}
+
+				return errors.Errorf("unintelligible feedback received from agent: %s", feedback)
+			}
+
+			common.SetTargets(agentEndpointList[i], "reverted", "Machine", chaosDetails)
+
+			underChaosEndpoints = underChaosEndpoints[1:]
+		}
+
+		duration = int(time.Since(ChaosStartTimeStamp).Seconds())
+	}
+
+	return nil
+}
+
+// AbortWatcher will watch for the abort signal and revert the chaos
+func AbortWatcher(connections []*websocket.Conn, agentEndpointList []string, abort chan os.Signal, chaosDetails *types.ChaosDetails) {
+
+	<-abort
+
+	log.Info("[Abort]: Chaos Revert Started")
+
+	for _, i := range underChaosEndpoints {
+
+		log.Infof("[Abort]: Reverting CPU stress for %s agent endpoint", agentEndpointList[i])
+		feedback, payload, err := messages.SendMessageToAgent(connections[i], "ABORT_EXPERIMENT", nil, &timeDuration)
+		if err != nil {
+			log.Errorf("unable to send abort chaos message to %s agent endpoint, err: %v", agentEndpointList[i], err)
+		}
+
+		// ACTION_SUCCESSFUL feedback is received only if the cpu stress chaos has been aborted successfully
+		if feedback != "ACTION_SUCCESSFUL" {
+			if feedback == "ERROR" {
+
+				agentError, err := messages.GetErrorMessage(payload)
+				if err != nil {
+					log.Errorf("failed to interpret error message from agent, err: %v", err)
+				}
+
+				log.Errorf("error occured while aborting the experiment for %s agent endpoint, err: %s", agentEndpointList[i], agentError)
+			}
+
+			log.Errorf("unintelligible feedback received from agent: %s", feedback)
+		}
+
+		common.SetTargets(agentEndpointList[i], "reverted", "Machine", chaosDetails)
+	}
+
+	log.Info("[Abort]: Chaos Revert Completed")
+	os.Exit(1)
+}
diff --git a/experiments/os/cpu-stress/README.md b/experiments/os/cpu-stress/README.md
@@ -0,0 +1,14 @@
+## Experiment Metadata
+
+<table>
+  <tr>
+    <th> Name </th>
+    <th> Description </th>
+    <th> Documentation Link </th>
+  </tr>
+  <tr>
+    <td> CPU Stress </td>
+    <td> CPU Stress experiment can stress the CPUs of target machine(s). </td>
+    <td> Coming Soon </td>
+  </tr>
+</table>