UPSTREAM: <carry>: openshift: Implement scale from zero #137

elmiko · 2020-03-12T13:26:05Z

This allows a Machine{Set,Deployment} to scale up/down from 0,
providing the following annotations are set:

apiVersion: v1
items:
- apiVersion: machine.openshift.io/v1beta1
  kind: MachineSet
  metadata:
    annotations:
      machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0"
      machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6"
      machine.openshift.io/vCPU: "2"
      machine.openshift.io/memoryMb: 8G
      machine.openshift.io/GPU: "1"
      machine.openshift.io/maxPods: "100"

please note that machine.openshift.io/maxPods is not required for scale to/from zero operations.

this PR is a continuation of the work started in #110

Counterpart PRs:
openshift/cluster-api-provider-aws#301
openshift/cluster-api-provider-azure#112
Design details:
openshift/enhancements#186

enxebre · 2020-03-12T14:04:57Z

Counterpart PRs:
openshift/cluster-api-provider-aws#301
openshift/cluster-api-provider-azure#112
Design details:
openshift/enhancements#186

Can we update the description to match the annotations in the links above and include this links as well?

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_utils.go

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_nodegroup.go

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_machinedeployment.go

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_nodegroup_test.go

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_utils.go

elmiko · 2020-03-16T20:48:48Z

updated to incorporate the comments here, and squashed my additions into a single commit.

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_utils_test.go

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_nodegroup.go

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_controller.go

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_machineset.go

enxebre · 2020-03-18T11:00:34Z

This is looking great @elmiko. We need to rebase as #137 got in.
In parallel can we put a PR against https://github.com/openshift/cluster-api-actuator-pkg/blob/master/pkg/autoscaler/autoscaler.go#L209 to include and validate scaling from zero scenario?

elmiko · 2020-03-18T13:27:58Z

rebased, still working on the outstanding comments

enxebre · 2020-03-23T10:39:24Z

can we conflate both commits into one? the split is meaningless since the second one is introducing changes over the first one but we never had support for the first one.

cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go

cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller.go

cluster-autoscaler/cloudprovider/clusterapi/clusterapi_utils.go

elmiko · 2020-03-23T21:34:49Z

This is looking great @elmiko. We need to rebase as #137 got in.

sounds good, i will rebase again and squash the 2 commits into one.

In parallel can we put a PR against https://github.com/openshift/cluster-api-actuator-pkg/blob/master/pkg/autoscaler/autoscaler.go#L209 to include and validate scaling from zero scenario?

i am still working through my understanding of the test framework, but i will give it my best =)

elmiko · 2020-03-31T22:33:47Z

@JoelSpeed ptal, i think i've covered all the fields we are manually copying through the converters, even the timestamps!

after looking deeper, we are not actually copying more from the underlying machines, etc... so i feel good that this test should be covering everything we have added.

i have squashed my extra commit, and i just need to clean up the last few comments about the varying unit types for the memory/cpu stuff.

elmiko · 2020-03-31T22:47:23Z

one more comment for tonight, i cleaned up the tests a little more (to make the converters stuff look similar to the others).

@JoelSpeed i also added a few more unit type tests, i'm curious how many of these should we add? i'm avoiding creating some matrix of every possible combo... ;)

JoelSpeed · 2020-04-01T11:03:50Z

I've reviewed this and am happy for this to merge on the 2 provisos:

We follow up on the cloud providers machineset controllers to set the memory units properly
We follow up to do fuzzing and improve the conversion stuff

@JoelSpeed i also added a few more unit type tests, i'm curious how many of these should we add? i'm avoiding creating some matrix of every possible combo... ;)

I think what you've got is good for now, though we aren't currently testing all of the code in that file, we aren't testing the structured to unstructured converters at the moment so that's why my second proviso is there, this is the kind of stuff that is easily gonna trip someone up as explained offline yesterday

/lgtm

enxebre · 2020-04-01T11:26:27Z

related to @JoelSpeed comment above kubernetes#3011
/approve
/retest
/hold
Please un hold once it's verified this passes the validation here openshift/cluster-api-actuator-pkg#140

openshift-ci-robot · 2020-04-01T11:26:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/cloudprovider/clusterapi/OWNERS~~ [enxebre]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

elmiko · 2020-04-01T12:30:15Z

Please un hold once it's verified this passes the validation here openshift/cluster-api-actuator-pkg#140

ack, i will spend some time with those tests and then post results here when it's working.

elmiko · 2020-04-01T18:40:56Z

i am fairly confident that this code is passing the end-to-end tests from cluster-api-actuator-pkg#140.

i have run the "from/to zero" tests and appear to be working as expected:

$ hack/ci-integration.sh -v -focus="zero"  
...
[3] • [SLOW TEST:398.530 seconds]
[3] [Feature:Machines] Autoscaler should
[3] /home/mike/workspace/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:210
[3]   It scales from/to zero
[3]   /home/mike/workspace/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:453
[3] ------------------------------
[3] 
[3] Ran 1 of 5 Specs in 398.534 seconds
[3] SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 4 Skipped
[3] PASS

Ginkgo ran 1 suite in 6m43.904044806s
Test Suite Passed

i am following up with the full suite but i am seeing non-related issues.

elmiko · 2020-04-01T18:42:19Z

/retest

This allows a Machine{Set,Deployment} to scale up/down from 0, providing the following annotations are set: ```yaml apiVersion: v1 items: - apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0" machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6" machine.openshift.io/vCPU: "2" machine.openshift.io/memoryMb: 8G machine.openshift.io/GPU: "1" machine.openshift.io/maxPods: "100" ``` Note that `machine.openshift.io/GPU` and `machine.openshift.io/maxPods` are optional.

enxebre · 2020-04-02T07:08:10Z

/retest

elmiko · 2020-04-02T12:48:43Z

as per our conversation on slack, i am going to remove the hold given that the proposed tests are passing.

/hold cancel

elmiko · 2020-04-02T17:30:44Z

/retest

elmiko · 2020-04-02T19:28:49Z

/test e2e-azure-operator

JoelSpeed

/lgtm

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 12, 2020

openshift-ci-robot requested review from enxebre and frobware March 12, 2020 13:26

elmiko mentioned this pull request Mar 12, 2020

[WIP] UPSTREAM: <carry>: openshift: Implement scale from zero #110

Closed

enxebre reviewed Mar 12, 2020

View reviewed changes

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_utils.go Outdated Show resolved Hide resolved

enxebre reviewed Mar 12, 2020

View reviewed changes

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_utils.go Outdated Show resolved Hide resolved

enxebre reviewed Mar 16, 2020

View reviewed changes

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_nodegroup.go Outdated Show resolved Hide resolved

JoelSpeed reviewed Mar 16, 2020

View reviewed changes

elmiko force-pushed the openshift/scale-from-0-using-annotations branch from 65be1d7 to e572aba Compare March 16, 2020 20:48

JoelSpeed reviewed Mar 17, 2020

View reviewed changes

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_utils_test.go Outdated Show resolved Hide resolved

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_nodegroup.go Outdated Show resolved Hide resolved

elmiko force-pushed the openshift/scale-from-0-using-annotations branch from e572aba to 8dbc526 Compare March 17, 2020 14:23

enxebre reviewed Mar 18, 2020

View reviewed changes

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_controller.go Outdated Show resolved Hide resolved

enxebre reviewed Mar 18, 2020

View reviewed changes

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_controller.go Outdated Show resolved Hide resolved

enxebre reviewed Mar 18, 2020

View reviewed changes

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_controller.go Outdated Show resolved Hide resolved

enxebre reviewed Mar 18, 2020

View reviewed changes

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_machineset.go Outdated Show resolved Hide resolved

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2020

elmiko force-pushed the openshift/scale-from-0-using-annotations branch from 8dbc526 to 9c4af16 Compare March 18, 2020 13:27

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2020

enxebre mentioned this pull request Mar 18, 2020

Validate scale from/to zero openshift/cluster-api-actuator-pkg#140

Closed