Skip to content
This repository has been archived by the owner on May 22, 2020. It is now read-only.

add MachineSet type #501

Merged

Conversation

mrIncompetent
Copy link
Contributor

Add a simple MachineSet type.
Basically copied from ReplicaSet

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 13, 2018
@k8s-reviewable
Copy link

This change is Reviewable

// Label keys and values that must match in order to be controlled by this MachineSet.
// It must match the machine template's labels.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
Selector *metav1.LabelSelector `json:"selector"`
Copy link
Contributor

@mvladev mvladev Jan 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a required value, right? Auto-generated Selector doesn't work very well with kubectl apply

Also keeping it immutable can save us a lot of pain with orphaned resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the Selector is a required value.
Should i state that in the comment? By looking at the other fields it seems that everything is required which is not explicitly marked with +optional

Regarding immutability:
Based on the discussion in kubernetes/kubernetes#50808
Yes, the selector should be immutable.
Should this be stated in the comment?
It seems its not common to state that in the comment. Other types just fail when trying to update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immutability is enforced in the API server, so I guess comment is optional.

According to the API Conventions optional fields should be pointers and marked with // +optional. required values should not be pointers and not be marked with // +optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it in d48e775

@rsdcastro
Copy link

/cc rsdcastro

@rsdcastro rsdcastro added this to the cluster-api-alpha milestone Feb 2, 2018
@rsdcastro
Copy link

/assign roberthbailey

// Template is the object that describes the machine that will be created if
// insufficient replicas are detected.
// +optional
Template MachineTemplateSpec `json:"template,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a MachineTemplateSpec that nests a MachineSpec?

That seems somewhat redundant at first glass. Wondering what the engineering reasoning here was.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kris-nova this is how ReplicaSet and Deployment works. For the template you need both MachineSpec and ObjectMetadata. ObjectMetadata is needed in case you want to set some extra labels and annotations for every machine.

@rsdcastro
Copy link

Ping to all reviewer for any last comment or any concern about this PR. @mvladev @roberthbailey @kris-nova

@mrIncompetent Can you please address @mvladev's comment?

Putting on hold for a day or two to allow for any other comments.

/lgtm
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 12, 2018
@mrIncompetent
Copy link
Contributor Author

Initially i removed some fields to keep it simple, but after discussion with @scheeles we decided to rather stick as close as possible to the ReplicaSet.

Now there's only one difference: MachineSet.Status.Conditions.
Should we here also stick to ErrorReason & ErrorMessage ?
Then what kind of errors could we have?

@rsdcastro
Copy link

For reference, this is a discussion on ReplicaSet and failure conditions:
kubernetes/kubernetes#32863

It does look like we will want to communicate failure conditions back, and some similar discussed for replica sets could apply here, like quota or permissions. Clouds could fail to create VMs due to other internal reasons (stockouts?).

We can add those to this PR to finalize it. Alternatively, I am also fine merging this PR as it is and add conditions later if you prefer (just please file an issue and follow up on it).

@mrIncompetent
Copy link
Contributor Author

Based on the comment from @pipejakob a list of conditions will get deprecated at some point: #298 (comment)

@rsdcastro
Copy link

As GitHub collapses the comment #298 (comment), pasting it below as it's very relevant.

"As for ErrorMessage / ErrorReason, this is definitely one of the warts of the design so far that I'd like to flesh out with any better approaches.

We did get the advice from Brian Grant and Eric Tune that the existing pattern of Conditions in object statuses is near-deprecated. As part of trying to bring several controllers to GA, a survey was sent out to understand how people use Conditions and how effective they are for the intent. The overall response was that having Conditions be a list of state transitions generally made them not useful for the kinds of checks people wanted to make against the Status, which are to answer the question "is this done yet?". This resulted in clients always just looking at the most recent Condition in the list and treating it as the current state, which on top of making the client logic more difficult, also made them deteriorate into Phases anyway (which are thoroughly deprecated).

So, they suggested two different patterns to replace Conditions:

If you really want a timeseries stream of state transitions, we should use Events.
For Status fields that we think clients will care to watch, we should just have fine-grained top-level entries (rather than lists) for the current state.
We're on the bleeding edge, so there aren't other parts of Kubernetes that have migrated off of Conditions yet, and we might be setting the precedents, or we may just need to slightly alter our types once better recommendations are in place.

The Error* fields were my attempt at (2), with the general guidelines that if a client modifies a field in the Machine's Spec, it should specifically watch for the corresponding field of the Machine Status see whether or not it has been reconciled, while watching the Error* fields for any errors that occur in the meantime. If you're updating the version of kubelet in the Spec, you should watch the corresponding field in the Status to know when it's been reconciled. This works decently with the Error* fields so long as you have a single controller responsible for the entirety of the Machine Spec, but breaks down somewhat if you want different controllers to handle different fields of the same object, or handle reconciling the same Machine under different circumstances.

For instance, one controller may be responsible whenever a full VM replacement is needed, while another may specialize in being able to update a VM in-place for certain Spec changes. It's not fantastic if they're unable to report errors separately, and instead have to overwrite the same fields in the Status. One mitigation is to always publish an Event on the Machine as well, so anyone who cares can still see the full stream of all errors. Another mitigation is to provide very strong guidance over what constitutes an error worth reporting in the Status.

I'll clarify this in the documentation, but I think it's a good idea for Machine.Status.Error* to be reserved for errors that are considered terminal, rather than transient. A terminal error would be something like the fact that the Machine.Spec has an invalid configuration, so the controller won't be able to make progress until someone fixes some aspect of it. Another terminal error would be if the machine-controller is getting Unauthorized or Permission Denied responses from the cloud provider it's calling to create/delete VMs -- it's likely going to require some manual intervention to fix IAM permissions or service credentials before it's able to do anything useful. However, any transient service failures can just be logged in the controller's output and/or added as Events to the Machine, since they should only represent delays in reconciliation and not errors that require intervention.

If two different controllers want to report terminal errors on the same Machine object, then I think it's okay that they are overwriting each other's errors in the Machine.Status, since they are both valid errors that need to be taken care of. By definition, neither controller is able to make progress until someone steps in and modifies the Machine.Spec or some aspect of the environment to fix the error, so someone will need to address both errors anyway (which may have the same underlying root cause). Based on the timing, they'll either see one error or the other first, and will have to fix it. Then, that error will disappear as the first controller gets unwedged, but a second error might replace it from the second controller, and the admin can take action on that as well.

We can figure out some way to represent both errors in the Status at the same time, but I'm not sure how much value there is in that over the above model, or above the cluster admin looking at the Machine events anyway."

@rsdcastro
Copy link

@mrIncompetent and I chatted on Slack. Rather than follow the current pattern with conditions, we'll follow the advice from #298 (comment) (pasted above) to have Error for terminal errors only and anything transient to be output via Events.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 13, 2018
@mrIncompetent
Copy link
Contributor Author

ErrorReason & ErrorMessage are added

@rsdcastro
Copy link

/lgtm

Planning on merging by end of day if we don't have any further comments/concerns.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrIncompetent, rsdcastro

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these OWNERS Files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rsdcastro
Copy link

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2018
@k8s-ci-robot k8s-ci-robot merged commit b8bb7a9 into kubernetes-retired:master Feb 14, 2018
@mrIncompetent mrIncompetent deleted the add-machineset-type branch February 16, 2018 16:52
@roberthbailey
Copy link
Contributor

/lgtm for posterity (I reviewed this a couple of hours after it merged)

k4leung4 pushed a commit to k4leung4/kube-deploy that referenced this pull request Apr 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants