test/openshift/e2e: Smoke test for scale down #32

frobware · 2019-02-08T08:48:54Z

Reworked the e2e test to delete the workload. This will:

delete all workload-based pods
the autoscaler will notice that there are nodes without any utilisation
the autoscaler will scale down

The smoke tests asserts that:

the MachineSet's replicas drops to its original number.
the number of nodes in the cluster drops to the original count before scale up

I also reduced the number of replicas we spin up from 12 => 2 as we're only testing for a delta of 1.

frobware · 2019-02-08T08:49:15Z

/hold

As we're only concerned with delta values when validating the size of a machine set and the resultant number of nodes we only need to consider one additional node. This commit reduces the MaxReplicas from 12 => 2.

frobware · 2019-02-08T09:56:06Z

/hold cancel

frobware · 2019-02-08T09:57:28Z

This PR would also benefit from openshift/cluster-autoscaler-operator#37

frobware · 2019-02-08T09:57:50Z

/cc @ingvagabund @bison @enxebre

bison

/lgtm

ingvagabund · 2019-02-08T10:06:54Z

/approve

enxebre · 2019-02-08T10:09:58Z

test/openshift/e2e/operator_expectations.go

-	if err != nil {
-		return err
-	}
+	// As we have just deleted the workload the autoscaler will


I'm not sure if deleting the workload (job) will remove the pods as well, if not the autoscaler will still try to allocate new resources instead of scaling down

I believe it does. That's how I've done 100% of my manual testing over the last few months.

frobware · 2019-02-08T11:20:49Z

/refresh

frobware · 2019-02-08T11:21:34Z

/retest

We were adjusting the replica count when the cluster-autoscaler was still running which meant that the test would occasionally flake. It is likely that this bug was introduced in PR #32. For example, you would occasionally see the following: ```console I0214 12:51:41.034700 1 scale_up.go:584] Scale-up: setting group size to 3 I0214 12:51:51.129943 1 scale_up.go:584] Scale-up: setting group size to 3 ``` Between these two calls we were adjusting the replica count in an attempt to clean up at the end of the test. But occasionally the autoscaler would do its scan-of-the-state-of-the-cluster and would up add new nodes because the replica count was less then desired. When this condition occurred the node count could never drop below the initial node count as we just added a further max-min nodes. The test would eventually timeout trying to assert that the node count matched the initial node count. The fix here is to not reset the replica count but instead rely on the autoscaler to scale down and adjust the replica count naturally; this change further helps to verify that scale down is working properly. There are additional smaller fixes here too: - we set cascading delete in the batch job (i.e., workload) - we assert that the replica count == the initial replica count - we explicitly set the clusterautoscaler's ScaleDown config

…a count We were adjusting the replica count when the cluster-autoscaler was still running which meant that the test would occasionally flake. It is likely that this bug was introduced in PR #32. For example, you would occasionally see the following: ```console I0214 12:51:41.034700 1 scale_up.go:584] Scale-up: setting group size to 3 I0214 12:51:51.129943 1 scale_up.go:584] Scale-up: setting group size to 3 ``` Between these two calls we were adjusting the replica count in an attempt to clean up at the end of the test. But occasionally the autoscaler would do its scan-of-the-state-of-the-cluster and would up add new nodes because the replica count was less then desired. When this condition occurred the node count could never drop below the initial node count as we just added a further max-min nodes. The test would eventually timeout trying to assert that the node count matched the initial node count. The fix here is to not reset the replica count but instead rely on the autoscaler to scale down and adjust the replica count naturally; this change further helps to verify that scale down is working properly. There are additional smaller fixes here too: - we set cascading delete in the batch job (i.e., workload) - we assert that the replica count == the initial replica count - we explicitly set the clusterautoscaler's ScaleDown config

Scope err variables in calls to PollImmediate

21b3707

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 8, 2019

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2019

openshift-ci-robot requested review from paulfantom and vikaschoudhary16 February 8, 2019 08:49

frobware added 2 commits February 8, 2019 09:25

Rework test to also scale down

bbdfd6f

Cap MaxReplicas to 2

ef88007

As we're only concerned with delta values when validating the size of a machine set and the resultant number of nodes we only need to consider one additional node. This commit reduces the MaxReplicas from 12 => 2.

frobware force-pushed the smoke-test-for-scale-down branch from e80dd46 to ef88007 Compare February 8, 2019 09:30

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2019

openshift-ci-robot requested review from bison, enxebre and ingvagabund February 8, 2019 09:58

bison approved these changes Feb 8, 2019

View reviewed changes

openshift-ci-robot assigned bison Feb 8, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 8, 2019

enxebre reviewed Feb 8, 2019

View reviewed changes

openshift-merge-robot merged commit 0261cba into openshift:master Feb 8, 2019

frobware mentioned this pull request Feb 14, 2019

UPSTREAM: <carry>: test/openshift/e2e: don't modify replica count #40

Merged

frobware deleted the smoke-test-for-scale-down branch March 22, 2019 06:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/openshift/e2e: Smoke test for scale down #32

test/openshift/e2e: Smoke test for scale down #32

frobware commented Feb 8, 2019 •

edited

frobware commented Feb 8, 2019

frobware commented Feb 8, 2019

frobware commented Feb 8, 2019

frobware commented Feb 8, 2019

bison left a comment

ingvagabund commented Feb 8, 2019

enxebre Feb 8, 2019

frobware Feb 8, 2019 •

edited

frobware commented Feb 8, 2019

frobware commented Feb 8, 2019 •

edited

test/openshift/e2e: Smoke test for scale down #32

test/openshift/e2e: Smoke test for scale down #32

Conversation

frobware commented Feb 8, 2019 • edited

frobware commented Feb 8, 2019

frobware commented Feb 8, 2019

frobware commented Feb 8, 2019

frobware commented Feb 8, 2019

bison left a comment

Choose a reason for hiding this comment

ingvagabund commented Feb 8, 2019

enxebre Feb 8, 2019

Choose a reason for hiding this comment

frobware Feb 8, 2019 • edited

Choose a reason for hiding this comment

frobware commented Feb 8, 2019

frobware commented Feb 8, 2019 • edited

frobware commented Feb 8, 2019 •

edited

frobware Feb 8, 2019 •

edited

frobware commented Feb 8, 2019 •

edited