-
Notifications
You must be signed in to change notification settings - Fork 226
Minimalistic Machines API proposal. #298
Minimalistic Machines API proposal. #298
Conversation
e4232ee
to
0302668
Compare
Hey, we are working on a very similar concept: Basic idea is to rely on the already existing node resources.
Flow is:
We have a example node-controller (using docker-machine): https://github.com/kube-node/kube-machine How can we align and possible collaborate on this topic? Contributors (not sorted by anything): |
@mrIncompetent Your nodeset looks similar to what we're trying to do, but I'm wondering if there's a write-up somewhere of the explicit goals of the project? It's not clear to me from reading the code where its boundaries are, or what user problems it's trying to solve. For instance, I don't see any notions of software versions in the nodeset, so it doesn't seem like you're targeting being able to upgrade nodes like we are. Also, only having a "set" concept for nodes without being able to address specific ones seems like it would give you no control over which nodes to scale down when you have excess capacity (which could be important, if you want to target the most idle nodes for deletion). I could be wrong, though, if you're intending for the usecase to be handled by deleting Node objects themselves. That said, it would be great to have any/all of its contributors join us in our ongoing Cluster/Machines API discussions. We're meeting weekly on Wednesdays at 11:00 PST via Zoom (https://zoom.us/j/166836624). If you want to get the invite on your calendar, you can join the SIG Cluster Lifecycle mailing list, where we can also start a thread to discuss this before our next Zoom meeting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
|
||
The ProviderConfig is recommended to be a serialized API object in a format | ||
owned by that provider, akin to the [Component Config](https://goo.gl/opSc2o) | ||
pattern. This will allow the configuration to be strongly typed, versioned, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think this is a little bit of a cheat, and I'd like us to avoid this if possible, but I also recognize that we'll always need an escape hatch in the long term. My concern in the short/medium term is that allowing this means we avoid defining common fields where we actually could do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolutely, 100% agree that it's a simplification for the short-term. I'll be fully transparent here:
My other proposal was a little too over-indexed on trying to support a cloud-agnostic cluster autoscaler as an initial customer. That was appealing from a design standpoint, because the cluster autoscaler already exists and actually has concrete requirements to build off of, but also seemed like a logical thing you would want to do with a declarative NodeSet concept. After a lot of discussions, though, the feedback was:
- For a very first proposal of a brand-new API, it had a lot of new concepts to grasp.
To be fair, I had new types across two dimensions: Machine / MachineSet / MachineDeployment in one dimension, and Machine / MachineClass / MachineTemplate across the other. I was urged to think about the absolute minimum we could get away with for the first iteration, and only add more complexity and new concepts later when we were certain they were necessary.
My other design also had the requirement that in order to create even a single simple node, you had to first create a cloud-specific MachineTemplate, then register it as a MachineClass, and then finally create a Machine that referenced the class. It seemed like a lot of overhead just to support a single custom node that you had no intent to reuse. So, one design principle that emerged from that was that if I just want to create a single Machine and not care about portability, I should be able to. I think there's still room to evolve the API and introduce other concepts down the road, but it seems reasonable to me that if you don't care about portability, it should be possible to create a single custom Machine without the overhead of needing to use other concepts, which means that we'll need something akin to the opaque ProviderConfig blob we have now to actually be able to feed the right values into the cloud-specific node creation APIs.
- Rebasing the cluster autoscaler on top of our new APIs isn't actually delivering any new value, since cluster autoscaling already exists today.
It's a nice-to-have down the road, but in order to get the momentum we want for the project, we should focus first on the new functionality this enables that was never possible before (like cloud- and deployment-agnostic cluster upgrades). If you remove cluster autoscaling (and node autoprovisioning) as a client, at least for the short term, then I think most of the benefits of having more cloud-agnostic fields in the MachineSpec disappear as well.
I could definitely be wrong here, and welcome more feedback. But, the intent of this proposal was "what is the absolute bare minimum that we can all agree is a good starting point?" and then continue to add more as we see appropriate.
Some of the things I had considered for inclusion in the MachineSpec:
- OS image
- This could be represented as a single string across most providers, but the values wouldn't be portable anyway. If the values are cloud-specific, then moving the entire field to be cloud-specific means that each provider can represent this however it would like. For instance, in GCE, OS images are actually more naturally represented in a structured way, since they have a project, family, name, etc. In DigitalOcean, OS Images can also be referred to via int IDs, so an IntOrStr might be more appropriate there.
- Disk configuration
- I don't know if I personally have enough of a grasp on how much power is needed to be useful here. I've swung back and forth between two extremes: a simplified view of just specifying a total amount of working space desired, and on the other end of the spectrum, having a full array of structs to represent disks, along with fields for how much capacity each disk should have, whether it should be bootable, etc. At this point, deferring disk setup to the ProviderConfig lets us look at how different early adopters represent this in their config, and we can always bubble it back up to a generic representation in MachineSpec in the future once we think we have a grasp of where we want to end up on the power/usability trade-off scale.
- Preemptible
- AWS and GCE support these, but I think Azure only supports low-priority VMs in Batch, and I don't know if this concept exists in other on-premise environments like vSphere.
- Topology
- Like OS Image, these are similar-enough concepts across environments (regions, zones, availability zones, etc), but I couldn't see the actual values being portable at all. Also, the number of dimensions you might want to support in different environments might differ wildly. In some clouds, a single availability zone might be enough to specify. If you're on-premise, you might want a lot of custom fields like datacenter, rack, etc.
There are many more, but you get the point. Are there particular concepts that you feel we should represent in the ClusterSpec now (or in the near term)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinsb it shouldn't be a problem to lift common fields into this API as we notice them. Providers can always extend their controllers to check for the API field and fall back on their old provider-specific field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pipejakob After re-watching all meeting recordings, I see the justification for using serialized blob - simplicity. I'm 100% for simplicity. However I think that providerConfig
is very similar to EnvVar
in Container
. There you can define a literal value, when you don't need reusability (or doing it quickly) and reference, when you need it. Something like
providerConfig: # either value or valueRef
value: >
{
"apiVersion": "gceproviderconfig/v1alpha1",
"kind": "GCEProviderConfig",
"machineType": "n1-standard-1",
...
}
valueRef:
apiVersion: gceproviderconfig/v1alpha1
kind: GCEProviderConfig
name: config-1
can be suitable for both use-cases and it feels very similar to what people know and use in Pods
Edit:
even better it would be if we have:
providerConfig:
value: # either value (runtime.Object/runtime.RawExtension) or valueRef
apiVersion: gceproviderconfig/v1alpha1,
kind: GCEProviderConfig,
machineType: n1-standard-1,
project: test-1-test
...
valueRef:
apiVersion: gceproviderconfig/v1alpha1
kind: GCEProviderConfig
name: config-1
p.s. this is a re-post of a previous comment (wrong user)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using runtime.Object/runtime.RawExtension
instead of string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second this. runtime.RawExtension needs an apiVersion and will probably fall back to Unstructured
(@mrIncompetent can you confirm that?). Alternatively, a generic JSON type will do, compare https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1beta1/types_jsonschema.go#L28.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sttts runtime.RawExtension.Object
would fall back to runtime.Unknown
.
Though i've been using it without apiVersion
and more like https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1beta1/types_jsonschema.go#L28
So i only used runtime.RawExtension.Raw
. In this case, RawExtension.Object
will be nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvladev @pipejakob @justinsb @sttts @roberthbailey
Any objections on using runtime.RawExtension
for providerConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection, it's way better than string
.
update machine | ||
|
||
and allow the provider to decide if it is capable of performing an in-place | ||
update, or if a full Node replacement is necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree - and this could also be an option / policy that can be set by the machine controller. For example, kops encourages full node replacement (a little more reliable, perhaps). But certainly it is slower, and some people would likely choose in-place replacement if it was available. And if we have a kops-controller, it'll have to support in-place for bare-metal.
cluster-api/machines-api/types.go
Outdated
// | Master present | Master absent | | ||
// +---------------+-----------------------+------------------------| | ||
// | Node present: | Install control plane | Join the cluster as | | ||
// | | and be schedulable | just a node | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure schedulable exists as a concept any more... AIUI the masters are supposed to be schedulable, but tainted so user pods aren't scheduled to it. But e.g. a monitoring daemonset or networking daemonset should tolerate the taint and thus be on the master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is poor wording on my part. If I replace "schedulable" with "untainted" and "unschedulable' with "tainted," is that sufficient? Or should we not differentiate between (Master
) and (Master
, Node
)? Or do you think there's a better way to represent the desire to install the control plane altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what @justinsb said, it sounds like there shouldn't be a distinction between Master and Node, and I think I agree with this position. Is there any scenario where an un-tainted master would be significantly preferable to a tainted one?
cluster-api/machines-api/types.go
Outdated
type MachineStatus struct { | ||
// If the corresponding Node exists, this will point to its object. | ||
// +optional | ||
NodeRef *api.ObjectReference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should define a field like ProviderId (https://github.com/kubernetes/kubernetes/blob/master/pkg/api/types.go#L3092), for the window in between when a machine is created and when it registers with kube-apiserver. But happy to wait and see...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be *corev1.ObjectReference
cluster-api/machines-api/types.go
Outdated
|
||
// When was this status last observed | ||
// +optional | ||
LastUpdated metav1.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make clear that this is last-transition, rather than a heartbeat. (If we hadn't done the node heartbeat, I'd wager we'd be at 10k nodes by now...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
cluster-api/machines-api/types.go
Outdated
Name string | ||
|
||
// Semantic version of the container runtime to use | ||
Version string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to guess these should be optional to mean "use the controller-recommended setting for the k8s/kubelet version" (which IMO is what we should be encouraging!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. I think it's a good idea to make these optional at cluster creation time, but do you think they should stay optional afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful with default values for fields. Changing defaults can easily break an API and unlike ProviderConfig there's no clear provider-specific versioning mechanism for the defaults applied to these common fields.
Committing these types so we can start prototyping against them, but the full proposal is still under review (and accepting feedback) here: kubernetes-retired#298
@kubernetes/sig-cluster-lifecycle-pr-reviews |
@pipejakob Just wanted to share some thoughts while we were working on a project called Archon with similar ideas. The basic idea is the same. Using declarative resources in Kubernetes to define the node machines and delegating real work to controllers. We chose Here is the definition for
We put lots of information in To adapt for different cloud controllers, we put common fields like There are two minor differences: We chose to treat Another difference is that we target Archon as a general purpose computing resource management tool instead of a Kubernetes specific one. The fundamental design could be used to build an etcd cluster or any distributed system but we have Kubernetes support builtin. I can understand that Whether the I just wanted to raise another question here. Is the Working with @mrIncompetent and others, we introduced concepts like BTW, we use I'm really looking forward to see a common resource shared by all the Kubernetes bootstrapping tools got defined. My thoughts on this topic might not be correct nor optimal. I just wanted to bring some discussions on the design decisions to be made. Hope we could attract more people interested in this idea and polish the resource definition together. |
cluster-api/machines-api/types.go
Outdated
ContainerRuntime ContainerRuntimeInfo | ||
} | ||
|
||
type ContainerRuntimeInfo struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the scope of this particular object/concept? Should we expect to see {Rkt,Frakti,Containerd}RuntimeConfig
structs at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intended purpose was to:
- Know what runtime to install at provisioning time. It's completely fine for an implementation to say "sorry, I only know how to install Docker. Anything else will result in an
UnsupportedConfiguration
error." Or even "I have no idea how to install that version of Docker," etc. - Know how to set kubelet's
--container-runtime
flag.
You're right to question how this will evolve in the future, with cluster admins potentially wanting to fine-tune the settings of the runtime itself. I think it would largely follow the same pattern we use to handle the provider-specific configuration for Machines
. At first, we would likely have an opaque blob to capture all of the settings that were fed to the runtime installer, which would allow each container runtime to version its configuration independently of the Machines API. Then, potentially upgrade this to an ObjectReference so that it identical configuration wouldn't have to be inlined with every object that uses it. Any time we have configuration that seems to be useful across every runtime we support, we can graduate it out of the opaque config blob and into the ContainerRuntimeInfo
struct if desired.
I think the ContainerRuntimeInfo
is a good candidate for culling in v1alpha1, actually, until there's a strong need to add it. Definitely worth discussion.
cluster-api/machines-api/types.go
Outdated
// | Node absent: | Install control plane | Invalid configuration | | ||
// | | and be unscheduleable | | | ||
// +---------------+-----------------------+------------------------+ | ||
Roles []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest just making this a string and having a single identifier for each possible configuration. It's much less confusing that way. e.g. have "SchedulableMaster," "UnschedulableMaster," and "Node."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually originally had it that way in an earlier draft that was Google-internal only, and got the opposite feedback from Brian Grant, Tim Hockin, and Daniel Smith: make it a list of strings instead.
I think there might be contention over exactly what roles or node installation scenarios we want to support directly in this API, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you ping me the internal doc so I can get up to speed on the rationale?
cluster-api/machines-api/types.go
Outdated
|
||
type MachineSpec struct { | ||
// This ObjectMeta will autopopulate the Node created. Use this to | ||
// indicate what labels, annotations, name prefix, etc., should be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Name
of the MachineSpec
used as the name prefix? This sounds reasonable to me just double checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was aiming to follow the same pattern as we do for pods (and other objects): you can specify name: value
if you know exactly what name you want to use, or generateName: value
to use value
as a prefix and have a unique suffix generated for you.
I can call this out much more explicitly, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't generateName
handled generically by the API server? How will you prevent it from renaming the MachineSpec
you create, so that providers can generate the names (e.g. GKE has a way of generating names that is not the same as what the API server does)?
OTOH you could just force providers to use the name generated by the API server. Though IDK what kinds of incompatibility that would introduce.
Keep in mind that using generateName
will prevent you from making idempotent creation requests, because it is not deterministic.
attempt to upgrade machine in-place | ||
if error: | ||
create new machine | ||
delete old machine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to drain the old machine before deleting it, so that the containers get a chance to exit gracefully. IIRC drain also ensures you respect any pod disruption budgets that are set up in the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtaufen there is an open issue to move the drain operation into the k8s server.
In the meantime, I think that it's much better to have a special drain controller, that will mark Machines
for draining, adding finalizer to prevent deletion and removing it once the node has been drained.
built on top of the Machines API would follow the same pattern: | ||
|
||
for machine in machines: | ||
attempt to upgrade machine in-place |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: With in-place upgrades, providers should determine how disruptive a given in-place mutation is and ensure that they respect the pod disruption budget.
## In-place vs. Replace | ||
|
||
One simplification that might be controversial in this proposal is the lack of | ||
API control over "in-place" versus "replace" reconciliation strategies. For |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users may end up wanting this to make a trade-off between the disruptiveness and cleanliness of a rollout, but I think it's fine to push this down to the ProviderConfig and leave it out of the top-level API.
|
||
The ProviderConfig is recommended to be a serialized API object in a format | ||
owned by that provider, akin to the [Component Config](https://goo.gl/opSc2o) | ||
pattern. This will allow the configuration to be strongly typed, versioned, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinsb it shouldn't be a problem to lift common fields into this API as we notice them. Providers can always extend their controllers to check for the API field and fall back on their old provider-specific field.
* Dynamic API endpoint | ||
|
||
This proposal lacks the ability to declaratively update the kube-apiserver | ||
endpoint for the kubelet to register with. This feature could be added later, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear on what this section means, but it kinda sounds like something CRD could handle?
cluster-api/machines-api/types.go
Outdated
// controller observes that the spec has changed and no longer matches | ||
// reality, it should update Ready to false before reconciling the | ||
// state, and then set back to true when the state matches the spec. | ||
Ready bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What criteria will govern this value?
Also is there an expectation that this value will always be accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it might be a duplication of data that exists already in API. Moreover, what is actually updating this? My take would be this would be a lot of overhead, and may not scale. Making this generic seems to be an issue. Can we rely on kubelet ready status, or what other options do we have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very much in favor of building out a pre-alpha version of this so we can start testing sooner than later. The whole point is that we can version these, and I will probably have much more concrete feedback once I tried mutating infrastructure with these 😄
TLDR; LGTM
@adieu Yes, Archon is one of the many projects we looked at when starting to work on the Cluster API project. I do think there's a lot of overlap conceptually, but that the two efforts have fundamentally different goals. As you say yourself, Archon's One of the most important usecases we're targeting with the Cluster and Machines APIs is for developers to be able to write operational tooling on top of these APIs that is completely agnostic of the cluster's environment, the cloud that it's running in, and even the deployment mechanism used to provision the cluster. With the current proposal of having Kubernetes concepts be first-class citizens of these APIs, we will enable tooling to be written like generic cluster upgraders that only have to update the value of a single field on an object, and have the right thing happen with little room for shooting oneself in the foot. In the Archon world, I don't see how a tool could generically upgrade a single I understand the desire for creating a completely generic abstraction of hosts, but I think that direction is off the spectrum of what would be useable for the usecases we're targeting to support with this particular project. However, one way that I can see these two projects potentially collaborating is by having a As for your question of whether a |
@pipejakob Couldn't the abstraction still be done with archon like language with some kind of helper on the host and annotations of some kind on the node? Say user sets k8s version previous+1 annotation. node itself could pull the annotation and tweak things like performing a kubelet upgrade. Maybe some of the logic your talking about belongs in kubeadm? Being a ndoe level thing has the advantage of being very agnostic to the tool (could be driven in the image by go, ansible, chef, etc). Could map annotations to chef roles for example. It would be really nice if the both use cases could be handled by the same object with some extra fields somehow. For an operator, sometimes the distinction between a fully k8s node, and oh, I need to override this one thing on the host is a very fine line. being able to mix both use cases together would really be helpful I think. |
cluster-api/machines-api/types.go
Outdated
|
||
type MachineVersionInfo struct { | ||
// Semantic version of kubelet to run | ||
Kubelet string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an optional value? In most cases you want to match your kubelet and api-server versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that it would be optional at installation time, but that the installer would fill it in with the value that was actually used. Then, tooling built on top of this API can inspect and potentially modify a concrete value here, instead of having to reason about what an empty value here means. I'll document the expectations here more clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a struct? kubelet
to me is an object, not a string. Is a Struct with a single string a good start? For instance, we have multiple components. Also how does this relate not relate to component config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against making this a struct, but I'm wondering how you see that evolving in the future. What other fields would you envision in the struct? The way this is laid out currently is:
machine:
spec:
versions:
kubelet: 1.8.0
Are you hoping to have different ways of specifying the version of kubelet to use beyond a single semver, or were hoping to gather together any configuration related to the kubelet into a single struct, rather than having the version be stand-alone here?
cluster-api/machines-api/types.go
Outdated
import ( | ||
corev1 "k8s.io/api/core/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/kubernetes/pkg/api" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed and we can use k8s.io/api/core/v1
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this was an accident that got fixed in the merged types but not this PR. I'll update the PR with the newest definition from the codebase.
cluster-api/machines-api/types.go
Outdated
type MachineStatus struct { | ||
// If the corresponding Node exists, this will point to its object. | ||
// +optional | ||
NodeRef *api.ObjectReference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be *corev1.ObjectReference
cluster-api/machines-api/types.go
Outdated
// If set, indicates that there is a problem reconciling state, and | ||
// will be set to a human readable string to indicate the problem. | ||
ErrorMessage *string | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the Provider puts the status for the cloud resources it creates? ProviderA
might create / update / delete security_groups
, keys
or anything_related
for every machine it reconciles and those cloud resources are going to be completely different from ProviderB
's resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would ErrorReason *MachineStatusError
and ErrorMessage *string
make more sense in a list of a struct that is an encapsulation of those values? What does the event data structure look like? This almost appears to be a list of Error Events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+
+ // +optional
+ // If set, indicates that there is a problem reconciling state, and
+ // will be set to a human readable string to indicate the problem.
+ ErrorMessage *string
Maybe this is a better question, since the struct may be below. What is the ErrorMessage
? How does it relate to ErrorReason
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this in one of the Cluster API breakout sessions, but I forgot to follow-up here: it's completely up to the controller as an implementation detail. A decent pattern is for the controller to add custom annotations to the Machines it's reconciling, to keep track of information about external resources it has created. The controller could also create its own ConfigMaps or CustomResources to have better control (and RBAC scoping) of its state, or even store state outside of the cluster if it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for ErrorMessage / ErrorReason, this is definitely one of the warts of the design so far that I'd like to flesh out with any better approaches.
We did get the advice from Brian Grant and Eric Tune that the existing pattern of Conditions in object statuses is near-deprecated. As part of trying to bring several controllers to GA, a survey was sent out to understand how people use Conditions and how effective they are for the intent. The overall response was that having Conditions be a list of state transitions generally made them not useful for the kinds of checks people wanted to make against the Status, which are to answer the question "is this done yet?". This resulted in clients always just looking at the most recent Condition in the list and treating it as the current state, which on top of making the client logic more difficult, also made them deteriorate into Phases anyway (which are thoroughly deprecated).
So, they suggested two different patterns to replace Conditions:
- If you really want a timeseries stream of state transitions, we should use Events.
- For Status fields that we think clients will care to watch, we should just have fine-grained top-level entries (rather than lists) for the current state.
We're on the bleeding edge, so there aren't other parts of Kubernetes that have migrated off of Conditions yet, and we might be setting the precedents, or we may just need to slightly alter our types once better recommendations are in place.
The Error* fields were my attempt at (2), with the general guidelines that if a client modifies a field in the Machine's Spec, it should specifically watch for the corresponding field of the Machine Status see whether or not it has been reconciled, while watching the Error* fields for any errors that occur in the meantime. If you're updating the version of kubelet
in the Spec, you should watch the corresponding field in the Status to know when it's been reconciled. This works decently with the Error* fields so long as you have a single controller responsible for the entirety of the Machine Spec, but breaks down somewhat if you want different controllers to handle different fields of the same object, or handle reconciling the same Machine under different circumstances.
For instance, one controller may be responsible whenever a full VM replacement is needed, while another may specialize in being able to update a VM in-place for certain Spec changes. It's not fantastic if they're unable to report errors separately, and instead have to overwrite the same fields in the Status. One mitigation is to always publish an Event on the Machine as well, so anyone who cares can still see the full stream of all errors. Another mitigation is to provide very strong guidance over what constitutes an error worth reporting in the Status.
I'll clarify this in the documentation, but I think it's a good idea for Machine.Status.Error* to be reserved for errors that are considered terminal, rather than transient. A terminal error would be something like the fact that the Machine.Spec has an invalid configuration, so the controller won't be able to make progress until someone fixes some aspect of it. Another terminal error would be if the machine-controller is getting Unauthorized or Permission Denied responses from the cloud provider it's calling to create/delete VMs -- it's likely going to require some manual intervention to fix IAM permissions or service credentials before it's able to do anything useful. However, any transient service failures can just be logged in the controller's output and/or added as Events to the Machine, since they should only represent delays in reconciliation and not errors that require intervention.
If two different controllers want to report terminal errors on the same Machine object, then I think it's okay that they are overwriting each other's errors in the Machine.Status, since they are both valid errors that need to be taken care of. By definition, neither controller is able to make progress until someone steps in and modifies the Machine.Spec or some aspect of the environment to fix the error, so someone will need to address both errors anyway (which may have the same underlying root cause). Based on the timing, they'll either see one error or the other first, and will have to fix it. Then, that error will disappear as the first controller gets unwedged, but a second error might replace it from the second controller, and the admin can take action on that as well.
We can figure out some way to represent both errors in the Status at the same time, but I'm not sure how much value there is in that over the above model, or above the cluster admin looking at the Machine events anyway.
Do we really need a new object that presents Node? We can just use the existing Noe object. I think using a separate object type will create another point of reconciliation. Especially for cloud providers like AWS/GCE where the actual nodes are created via AutoScaler (hence these names are generated randomly), this will require sync among 3 things - Cloud providers IntanceGroup , []Machine and []Node. In case of appscode/pharmer, we defined a This gets translated to appropriate group concept depending for cloud providers that support them (AWs & GCE). For simple VPS provider this just becomes a simple loop over Node creation. To maintain the sync, we pass the NodeGroup name via kubelet's --node-labels flag. |
Thanks @pipejakob both for your work putting this PR together and handling feedback and also helping up wrap it up as you ramp up on Istio. I will try to take a pass at comparing the types files, but it may take a day or two so if someone wants to jump in (@medinatiger? @jessicaochen?) that'd be really helpful. |
@mvladev volunteered during our meeting today to take a pass on step one (sanity checking the latest changes). Once that is done I'll do a quick pass and lgtm as well. |
@roberthbailey after I:
see that there is a little miss-alignment in the comment for $ git diff
diff --git a/cluster-api/api/cluster/v1alpha1/types.go b/cluster-api/api/cluster/v1alpha1/types.go
index c5b829a7..d2787c39 100644
--- a/cluster-api/api/cluster/v1alpha1/types.go
+++ b/cluster-api/api/cluster/v1alpha1/types.go
@@ -168,8 +168,8 @@ type Machine struct {
type MachineSpec struct {
// This ObjectMeta will autopopulate the Node created. Use this to
- // indicate what labels, annotations, name prefix, etc., should be used
- // when creating the Node.
+ // indicate what labels, annotations, etc., should be used when
+ // creating the Node.
// +optional
metav1.ObjectMeta `json:"metadata,omitempty"` added lines are coming from Except for that small comment difference, |
Thanks Martin! @pipejakob - go ahead and delete |
hey @pipejakob can you address the conflict? Also the PR LGTM |
On it, but it looks like the upstream build of |
#568 is the fix for the compilation error, which is being reviewed. |
4e58ac6
to
a9dcd93
Compare
Okay, rebased and tested end-to-end via |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pipejakob, roberthbailey The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
This is a proposal to add a new API for managing Nodes in a declarative way: Machines. It is part of the overall Cluster API effort.
a9dcd93
to
ff3a7fc
Compare
Committing these types so we can start prototyping against them, but the full proposal is still under review (and accepting feedback) here: kubernetes-retired#298
…posal Minimalistic Machines API proposal.
This is a proposal to add a new API for managing Nodes in a declarative way: Machines.
It is part of the overall Cluster API effort.