add MachineSet type #501

mrIncompetent · 2018-01-13T16:52:22Z

Add a simple MachineSet type.
Basically copied from ReplicaSet

k8s-reviewable · 2018-01-13T16:52:30Z

This change is

mvladev · 2018-01-15T16:31:22Z

cluster-api/api/cluster/v1alpha1/types.go

+	// Label keys and values that must match in order to be controlled by this MachineSet.
+	// It must match the machine template's labels.
+	// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
+	Selector *metav1.LabelSelector `json:"selector"`


This should be a required value, right? Auto-generated Selector doesn't work very well with kubectl apply

Also keeping it immutable can save us a lot of pain with orphaned resources.

Yes, the Selector is a required value.
Should i state that in the comment? By looking at the other fields it seems that everything is required which is not explicitly marked with +optional

Regarding immutability:
Based on the discussion in kubernetes/kubernetes#50808
Yes, the selector should be immutable.
Should this be stated in the comment?
It seems its not common to state that in the comment. Other types just fail when trying to update.

The immutability is enforced in the API server, so I guess comment is optional.

According to the API Conventions optional fields should be pointers and marked with // +optional. required values should not be pointers and not be marked with // +optional

changed it in d48e775

rsdcastro · 2018-01-31T18:17:00Z

/cc rsdcastro

rsdcastro · 2018-02-05T14:54:40Z

/assign roberthbailey

krisnova · 2018-02-05T17:33:02Z

cluster-api/api/cluster/v1alpha1/types.go

+	// Template is the object that describes the machine that will be created if
+	// insufficient replicas are detected.
+	// +optional
+	Template MachineTemplateSpec `json:"template,omitempty"`


Why do we have a MachineTemplateSpec that nests a MachineSpec?

That seems somewhat redundant at first glass. Wondering what the engineering reasoning here was.

@kris-nova this is how ReplicaSet and Deployment works. For the template you need both MachineSpec and ObjectMetadata. ObjectMetadata is needed in case you want to set some extra labels and annotations for every machine.

rsdcastro · 2018-02-12T20:36:10Z

Ping to all reviewer for any last comment or any concern about this PR. @mvladev @roberthbailey @kris-nova

@mrIncompetent Can you please address @mvladev's comment?

Putting on hold for a day or two to allow for any other comments.

/lgtm
/hold

mrIncompetent · 2018-02-13T12:07:36Z

Initially i removed some fields to keep it simple, but after discussion with @scheeles we decided to rather stick as close as possible to the ReplicaSet.

Now there's only one difference: MachineSet.Status.Conditions.
Should we here also stick to ErrorReason & ErrorMessage ?
Then what kind of errors could we have?

rsdcastro · 2018-02-13T15:13:50Z

For reference, this is a discussion on ReplicaSet and failure conditions:
kubernetes/kubernetes#32863

It does look like we will want to communicate failure conditions back, and some similar discussed for replica sets could apply here, like quota or permissions. Clouds could fail to create VMs due to other internal reasons (stockouts?).

We can add those to this PR to finalize it. Alternatively, I am also fine merging this PR as it is and add conditions later if you prefer (just please file an issue and follow up on it).

mrIncompetent · 2018-02-13T15:30:46Z

Based on the comment from @pipejakob a list of conditions will get deprecated at some point: #298 (comment)

rsdcastro · 2018-02-13T16:03:12Z

As GitHub collapses the comment #298 (comment), pasting it below as it's very relevant.

"As for ErrorMessage / ErrorReason, this is definitely one of the warts of the design so far that I'd like to flesh out with any better approaches.

We did get the advice from Brian Grant and Eric Tune that the existing pattern of Conditions in object statuses is near-deprecated. As part of trying to bring several controllers to GA, a survey was sent out to understand how people use Conditions and how effective they are for the intent. The overall response was that having Conditions be a list of state transitions generally made them not useful for the kinds of checks people wanted to make against the Status, which are to answer the question "is this done yet?". This resulted in clients always just looking at the most recent Condition in the list and treating it as the current state, which on top of making the client logic more difficult, also made them deteriorate into Phases anyway (which are thoroughly deprecated).

So, they suggested two different patterns to replace Conditions:

If you really want a timeseries stream of state transitions, we should use Events.
For Status fields that we think clients will care to watch, we should just have fine-grained top-level entries (rather than lists) for the current state.
We're on the bleeding edge, so there aren't other parts of Kubernetes that have migrated off of Conditions yet, and we might be setting the precedents, or we may just need to slightly alter our types once better recommendations are in place.

The Error* fields were my attempt at (2), with the general guidelines that if a client modifies a field in the Machine's Spec, it should specifically watch for the corresponding field of the Machine Status see whether or not it has been reconciled, while watching the Error* fields for any errors that occur in the meantime. If you're updating the version of kubelet in the Spec, you should watch the corresponding field in the Status to know when it's been reconciled. This works decently with the Error* fields so long as you have a single controller responsible for the entirety of the Machine Spec, but breaks down somewhat if you want different controllers to handle different fields of the same object, or handle reconciling the same Machine under different circumstances.

For instance, one controller may be responsible whenever a full VM replacement is needed, while another may specialize in being able to update a VM in-place for certain Spec changes. It's not fantastic if they're unable to report errors separately, and instead have to overwrite the same fields in the Status. One mitigation is to always publish an Event on the Machine as well, so anyone who cares can still see the full stream of all errors. Another mitigation is to provide very strong guidance over what constitutes an error worth reporting in the Status.

I'll clarify this in the documentation, but I think it's a good idea for Machine.Status.Error* to be reserved for errors that are considered terminal, rather than transient. A terminal error would be something like the fact that the Machine.Spec has an invalid configuration, so the controller won't be able to make progress until someone fixes some aspect of it. Another terminal error would be if the machine-controller is getting Unauthorized or Permission Denied responses from the cloud provider it's calling to create/delete VMs -- it's likely going to require some manual intervention to fix IAM permissions or service credentials before it's able to do anything useful. However, any transient service failures can just be logged in the controller's output and/or added as Events to the Machine, since they should only represent delays in reconciliation and not errors that require intervention.

If two different controllers want to report terminal errors on the same Machine object, then I think it's okay that they are overwriting each other's errors in the Machine.Status, since they are both valid errors that need to be taken care of. By definition, neither controller is able to make progress until someone steps in and modifies the Machine.Spec or some aspect of the environment to fix the error, so someone will need to address both errors anyway (which may have the same underlying root cause). Based on the timing, they'll either see one error or the other first, and will have to fix it. Then, that error will disappear as the first controller gets unwedged, but a second error might replace it from the second controller, and the admin can take action on that as well.

We can figure out some way to represent both errors in the Status at the same time, but I'm not sure how much value there is in that over the above model, or above the cluster admin looking at the Machine events anyway."

rsdcastro · 2018-02-13T16:13:49Z

@mrIncompetent and I chatted on Slack. Rather than follow the current pattern with conditions, we'll follow the advice from #298 (comment) (pasted above) to have Error for terminal errors only and anything transient to be output via Events.

mrIncompetent · 2018-02-13T16:43:45Z

ErrorReason & ErrorMessage are added

rsdcastro · 2018-02-13T17:30:20Z

/lgtm

Planning on merging by end of day if we don't have any further comments/concerns.

k8s-ci-robot · 2018-02-13T17:30:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrIncompetent, rsdcastro

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these OWNERS Files:

~~OWNERS~~ [rsdcastro]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rsdcastro · 2018-02-14T00:52:27Z

/hold cancel

roberthbailey · 2018-02-17T00:00:54Z

/lgtm for posterity (I reviewed this a couple of hours after it merged)

…set-type add MachineSet type

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 13, 2018

mvladev reviewed Jan 15, 2018

View reviewed changes

rsdcastro mentioned this pull request Jan 29, 2018

Machine Set implementation #535

Closed

k8s-ci-robot requested a review from rsdcastro January 31, 2018 18:17

rsdcastro added this to the cluster-api-alpha milestone Feb 2, 2018

k8s-ci-robot assigned roberthbailey Feb 5, 2018

krisnova reviewed Feb 5, 2018

View reviewed changes

k8s-ci-robot assigned rsdcastro Feb 12, 2018

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 12, 2018

mrIncompetent added 2 commits February 13, 2018 11:04

add MachineSet type

3bbcbb4

added missing fields

22a1e8f

mrIncompetent force-pushed the add-machineset-type branch from 7c8e3e8 to 22a1e8f Compare February 13, 2018 11:07

make machineset.spec.selector required

d48e775

added ErrorReason & ErrorMessage to MachineSet

2bd5471

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 13, 2018

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2018

k8s-ci-robot merged commit b8bb7a9 into kubernetes-retired:master Feb 14, 2018

mrIncompetent deleted the add-machineset-type branch February 16, 2018 16:52

k4leung4 pushed a commit to k4leung4/kube-deploy that referenced this pull request Apr 4, 2018

Merge pull request kubernetes-retired#501 from kubermatic/add-machine…

b7830da

…set-type add MachineSet type

rsdcastro mentioned this pull request Apr 12, 2018

Machine Set implementation kubernetes-sigs/cluster-api#26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add MachineSet type #501

add MachineSet type #501

mrIncompetent commented Jan 13, 2018

k8s-reviewable commented Jan 13, 2018

mvladev Jan 15, 2018 •

edited

mrIncompetent Feb 13, 2018

mvladev Feb 13, 2018

mrIncompetent Feb 13, 2018

rsdcastro commented Jan 31, 2018

rsdcastro commented Feb 5, 2018

krisnova Feb 5, 2018

mvladev Feb 7, 2018

rsdcastro commented Feb 12, 2018

mrIncompetent commented Feb 13, 2018

rsdcastro commented Feb 13, 2018

mrIncompetent commented Feb 13, 2018

rsdcastro commented Feb 13, 2018

rsdcastro commented Feb 13, 2018

mrIncompetent commented Feb 13, 2018

rsdcastro commented Feb 13, 2018

k8s-ci-robot commented Feb 13, 2018

rsdcastro commented Feb 14, 2018

roberthbailey commented Feb 17, 2018

add MachineSet type #501

add MachineSet type #501

Conversation

mrIncompetent commented Jan 13, 2018

k8s-reviewable commented Jan 13, 2018

mvladev Jan 15, 2018 • edited

Choose a reason for hiding this comment

mrIncompetent Feb 13, 2018

Choose a reason for hiding this comment

mvladev Feb 13, 2018

Choose a reason for hiding this comment

mrIncompetent Feb 13, 2018

Choose a reason for hiding this comment

rsdcastro commented Jan 31, 2018

rsdcastro commented Feb 5, 2018

krisnova Feb 5, 2018

Choose a reason for hiding this comment

mvladev Feb 7, 2018

Choose a reason for hiding this comment

rsdcastro commented Feb 12, 2018

mrIncompetent commented Feb 13, 2018

rsdcastro commented Feb 13, 2018

mrIncompetent commented Feb 13, 2018

rsdcastro commented Feb 13, 2018

rsdcastro commented Feb 13, 2018

mrIncompetent commented Feb 13, 2018

rsdcastro commented Feb 13, 2018

k8s-ci-robot commented Feb 13, 2018

rsdcastro commented Feb 14, 2018

roberthbailey commented Feb 17, 2018

mvladev Jan 15, 2018 •

edited