-
Notifications
You must be signed in to change notification settings - Fork 226
add MachineSet type #501
add MachineSet type #501
Conversation
// Label keys and values that must match in order to be controlled by this MachineSet. | ||
// It must match the machine template's labels. | ||
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors | ||
Selector *metav1.LabelSelector `json:"selector"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a required value, right? Auto-generated Selector doesn't work very well with kubectl apply
Also keeping it immutable can save us a lot of pain with orphaned resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the Selector is a required value.
Should i state that in the comment? By looking at the other fields it seems that everything is required which is not explicitly marked with +optional
Regarding immutability:
Based on the discussion in kubernetes/kubernetes#50808
Yes, the selector should be immutable.
Should this be stated in the comment?
It seems its not common to state that in the comment. Other types just fail when trying to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The immutability is enforced in the API server, so I guess comment is optional.
According to the API Conventions optional
fields should be pointers
and marked with // +optional
. required
values should not be pointers
and not be marked with // +optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it in d48e775
/cc rsdcastro |
/assign roberthbailey |
// Template is the object that describes the machine that will be created if | ||
// insufficient replicas are detected. | ||
// +optional | ||
Template MachineTemplateSpec `json:"template,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have a MachineTemplateSpec
that nests a MachineSpec
?
That seems somewhat redundant at first glass. Wondering what the engineering reasoning here was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kris-nova this is how ReplicaSet
and Deployment
works. For the template you need both MachineSpec
and ObjectMetadata
. ObjectMetadata
is needed in case you want to set some extra labels and annotations for every machine.
Ping to all reviewer for any last comment or any concern about this PR. @mvladev @roberthbailey @kris-nova @mrIncompetent Can you please address @mvladev's comment? Putting on hold for a day or two to allow for any other comments. /lgtm |
7c8e3e8
to
22a1e8f
Compare
Initially i removed some fields to keep it simple, but after discussion with @scheeles we decided to rather stick as close as possible to the ReplicaSet. Now there's only one difference: |
For reference, this is a discussion on ReplicaSet and failure conditions: It does look like we will want to communicate failure conditions back, and some similar discussed for replica sets could apply here, like quota or permissions. Clouds could fail to create VMs due to other internal reasons (stockouts?). We can add those to this PR to finalize it. Alternatively, I am also fine merging this PR as it is and add conditions later if you prefer (just please file an issue and follow up on it). |
Based on the comment from @pipejakob a list of conditions will get deprecated at some point: #298 (comment) |
As GitHub collapses the comment #298 (comment), pasting it below as it's very relevant. "As for ErrorMessage / ErrorReason, this is definitely one of the warts of the design so far that I'd like to flesh out with any better approaches. We did get the advice from Brian Grant and Eric Tune that the existing pattern of Conditions in object statuses is near-deprecated. As part of trying to bring several controllers to GA, a survey was sent out to understand how people use Conditions and how effective they are for the intent. The overall response was that having Conditions be a list of state transitions generally made them not useful for the kinds of checks people wanted to make against the Status, which are to answer the question "is this done yet?". This resulted in clients always just looking at the most recent Condition in the list and treating it as the current state, which on top of making the client logic more difficult, also made them deteriorate into Phases anyway (which are thoroughly deprecated). So, they suggested two different patterns to replace Conditions: If you really want a timeseries stream of state transitions, we should use Events. The Error* fields were my attempt at (2), with the general guidelines that if a client modifies a field in the Machine's Spec, it should specifically watch for the corresponding field of the Machine Status see whether or not it has been reconciled, while watching the Error* fields for any errors that occur in the meantime. If you're updating the version of kubelet in the Spec, you should watch the corresponding field in the Status to know when it's been reconciled. This works decently with the Error* fields so long as you have a single controller responsible for the entirety of the Machine Spec, but breaks down somewhat if you want different controllers to handle different fields of the same object, or handle reconciling the same Machine under different circumstances. For instance, one controller may be responsible whenever a full VM replacement is needed, while another may specialize in being able to update a VM in-place for certain Spec changes. It's not fantastic if they're unable to report errors separately, and instead have to overwrite the same fields in the Status. One mitigation is to always publish an Event on the Machine as well, so anyone who cares can still see the full stream of all errors. Another mitigation is to provide very strong guidance over what constitutes an error worth reporting in the Status. I'll clarify this in the documentation, but I think it's a good idea for Machine.Status.Error* to be reserved for errors that are considered terminal, rather than transient. A terminal error would be something like the fact that the Machine.Spec has an invalid configuration, so the controller won't be able to make progress until someone fixes some aspect of it. Another terminal error would be if the machine-controller is getting Unauthorized or Permission Denied responses from the cloud provider it's calling to create/delete VMs -- it's likely going to require some manual intervention to fix IAM permissions or service credentials before it's able to do anything useful. However, any transient service failures can just be logged in the controller's output and/or added as Events to the Machine, since they should only represent delays in reconciliation and not errors that require intervention. If two different controllers want to report terminal errors on the same Machine object, then I think it's okay that they are overwriting each other's errors in the Machine.Status, since they are both valid errors that need to be taken care of. By definition, neither controller is able to make progress until someone steps in and modifies the Machine.Spec or some aspect of the environment to fix the error, so someone will need to address both errors anyway (which may have the same underlying root cause). Based on the timing, they'll either see one error or the other first, and will have to fix it. Then, that error will disappear as the first controller gets unwedged, but a second error might replace it from the second controller, and the admin can take action on that as well. We can figure out some way to represent both errors in the Status at the same time, but I'm not sure how much value there is in that over the above model, or above the cluster admin looking at the Machine events anyway." |
@mrIncompetent and I chatted on Slack. Rather than follow the current pattern with conditions, we'll follow the advice from #298 (comment) (pasted above) to have Error for terminal errors only and anything transient to be output via Events. |
ErrorReason & ErrorMessage are added |
/lgtm Planning on merging by end of day if we don't have any further comments/concerns. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrIncompetent, rsdcastro The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these OWNERS Files:
Approvers can indicate their approval by writing |
/hold cancel |
/lgtm for posterity (I reviewed this a couple of hours after it merged) |
…set-type add MachineSet type
Add a simple MachineSet type.
Basically copied from ReplicaSet