Should not error if worker node does not have JoinConfiguration #213

chuckha · 2019-09-05T15:17:50Z

/kind bug

https://github.com/kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm/blob/cc2b1e48885f2f7bda4b8a3048b1c33367322a0f/controllers/kubeadmconfig_controller.go#L258-L260

What steps did you take and what happened:
After the control plane is initialized and a worker gets created but that worker does not have a JoinConfiguration the reconciler returns an error.

What did you expect to happen:
I expected a nil join configuration would be absolutely fine in this case. The reconciler should imply set it to an initialized JoinConfiguration. If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

detiber · 2019-09-05T15:24:45Z

If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

Should we do this though? Maybe we should error if a Cluster or Init config is provided rather than ignoring?

detiber · 2019-09-05T15:25:12Z

I expected a nil join configuration would be absolutely fine in this case. The reconciler should imply set it to an initialized JoinConfiguration.

Yes, please

detiber · 2019-09-05T15:26:10Z

I'm also wondering if we should default Cluster/Init configuration if one is not provided for the initial control plane instance?

chuckha · 2019-09-05T15:26:54Z

If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

Should we do this though? Maybe we should error if a Cluster or Init config is provided rather than ignoring?

Yeah, I think this was designed to allow the user to be quite lazy with their configuration. I don't have strong feelings on omitting a log or straight up erroring.

@fabriziopandini any thoughts here?

fabriziopandini · 2019-09-06T07:23:12Z

@chuckha @detiber

I expected a nil join configuration would be absolutely fine in this case. The reconciler should imply set it to an initialized JoinConfiguration.

I'm ok with this (see also this issue #112)

I'm also wondering if we should default Cluster/Init configuration if one is not provided for the initial control plane instance?

If we are going to generate a default JoinConfiguration, I don't see strong reasons for generating default Cluster/Init configuration as well

If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

Should we do this though? Maybe we should error if a Cluster or Init config is provided rather than ignoring?

I'm personally in favor of strict validation, but lazy gives you the possibility to use the same configuration for all your control plane nodes vs having one configuration for the first control plane and another configuration for the additional control plane nodes.
As a side note, also in kubeadm we reverted to lazy because people were using a single file with all the objects for all the nodes.

Ok for adding a warning when we detect lazy behaviors

PS. if we get to an agreement, I would like to work on this 😉

chuckha · 2019-09-06T13:34:31Z

That's really good to know.

I'm ok with allowing all control planes to be defined with Init and Cluster configurations, mostly because that's what Kubeadm does.

However, if they are not the initial control plane then we should generate the JoinConfiguration for them.

Sorry @fabriziopandini I'm a little confused about this line:

If we are going to generate a default JoinConfiguration, I don't see strong reasons for generating default Cluster/Init configuration as well

Are you in favor of generating default Cluster/Init configuration if it does not exist for the first found control plane node?

detiber · 2019-09-06T14:07:25Z

I'm personally in favor of strict validation, but lazy gives you the possibility to use the same configuration for all your control plane nodes vs having one configuration for the first control plane and another configuration for the additional control plane nodes.
As a side note, also in kubeadm we reverted to lazy because people were using a single file with all the objects for all the nodes.

My major worry here is not that someone provides the same configuration to all Machines, but rather that they provide a different configuration to an additional Machine and expect that it would mutate the config of the existing Cluster.

chuckha · 2019-09-06T15:08:29Z

@detiber that is a good point. Different configurations would not be good.

Without adding aggregation/ensuring that the configs are the same, the only thing we can really do here is fail on secondary control planes with Init/Config. I'm ok with this approach.

fabriziopandini · 2019-09-09T12:57:43Z

So, if I got this right,

If join configuration is missing, we are going to generate a default one.
If secondary control plane nodes have Init/Cluster config, we are going to fail
Right?

detiber · 2019-09-09T14:31:43Z

@fabriziopandini that and:

if init/cluster config is not found on initial control plane, generate a default one.

Is my understanding, at least.

fabriziopandini · 2019-09-17T14:11:46Z

/lifecycle active

chuckha · 2019-09-17T14:31:42Z

/assign @fabriziopandini

accepting · 2019-09-22T04:56:32Z

I have a suggestion here.

If secondary control plane nodes or worker nodes have Init/Cluster config, we set them as nil with corresponding warnings.
If initial control plane has Join config, we set it as nil with corresponding warnings.

What do you think of this suggesion@fabriziopandini @chuckha @detiber

fabriziopandini · 2019-09-23T07:56:38Z

@accepting please correct me if I'm wrong, but I'm not sure what are you proposing work in use cases different than yours where all the KubeadmConfig have Init/Cluster/Join configuration.

fi. if the user is providing:
a) a KubeadmConfig object specifically configured for init
b) a KubeadmConfig object specifically configured for joining a second control plane

If b is processed as first:

the join configuration in b will be set to nil
a default Init/Cluster configuration will be created and the node will act as a bootstrap control plane

then, when a is processed:

the Init/Cluster configuration a will be set to nil
a default JoinConfiguration will be created and the node will act as a secondary control plane

This is not correct, because CABPK has to respect user-provided Init/Cluster configuration in a and JoinCongiration in b.

chuckha · 2019-09-23T15:16:20Z

I think this comes down to how control planes are configured.

If we give the same InitConfig, ClusterConfig and JoinConfig to every machine and mark some as control planes then the configmap lock prevents multiple kubeadm init calls. The first one that gets the lock runs a kubeadm init. In this case, once we have the lock, we set the Join configuration to nil and move on. The next reconcile loop the following control plane that succeeds will be a control plane join. In this case we set the init and cluster configuration to nil and move on.

However, as @fabriziopandini points out, if we give all configurations to the control plane and specifically configure one config as a join but leave the init and cluster configuration there will be a problem.

I wonder what makes sense here? I hadn't accounted for the first use case, but that is a very SIG-cluster-lifecycle way to use cluster-api. Machines are treated as cattle. I'd almost say that if a user wants to specifically create a machine that is the bootstrap control plane then they should mark the it as such by setting the JoinConfiguration to nil.

@fabriziopandini do you think it's acceptable that if a user wants a specific machine to be a primary control plane they must configure it as such by setting JoinConfiguration to nil and if they want a specific instance to be a secondary control plane they must configure as such by string Init and Cluster configurations to nil?

ncdc · 2019-09-23T15:17:48Z

Are there use cases where it makes sense to distinguish between control plane node 1 vs all other control plane nodes?

vincepri · 2019-09-23T15:33:45Z

Just chiming in a little here. I'd prefer to keep things explicit and avoid making users have to understand the way controllers work internally. A possible solution could be:

A user has to set at least one control plane machine with InitConfiguration and ClusterConfiguration, there should be only one machine with these values set, all the others fail / are ignored.
Any other machine may or may not have a JoinConfiguration, one is generated for Machines that don't have it iff the first control plane has been already initialized

chuckha · 2019-09-24T17:57:51Z

@ncdc I haven't heard of any but I also don't have many points of data.

I'm in favor of being lenient to allow the "copy configs across every machine" since that is actually easier to implement and would help @accepting out, but I want to do a little more questioning of users to see how people are actually using this

chuckha · 2019-09-24T18:41:36Z

let's not change behavior right now and have a discussion about behavior going forward so we can come to some agreement before implementing this change.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 5, 2019

fabriziopandini mentioned this issue Sep 9, 2019

🐛 Test and logic fixes #211

Merged

ncdc added this to the Next milestone Sep 11, 2019

ncdc added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Sep 11, 2019

k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Sep 17, 2019

k8s-ci-robot assigned fabriziopandini Sep 17, 2019

fabriziopandini mentioned this issue Sep 17, 2019

✨ Initialize JoinConfiguration if missing #234

Merged

k8s-ci-robot closed this as completed in #234 Oct 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should not error if worker node does not have JoinConfiguration #213

Should not error if worker node does not have JoinConfiguration #213

chuckha commented Sep 5, 2019

detiber commented Sep 5, 2019

detiber commented Sep 5, 2019

detiber commented Sep 5, 2019

chuckha commented Sep 5, 2019

fabriziopandini commented Sep 6, 2019

chuckha commented Sep 6, 2019

detiber commented Sep 6, 2019

chuckha commented Sep 6, 2019

fabriziopandini commented Sep 9, 2019

detiber commented Sep 9, 2019

fabriziopandini commented Sep 17, 2019

chuckha commented Sep 17, 2019

accepting commented Sep 22, 2019 •

edited

Loading

fabriziopandini commented Sep 23, 2019

chuckha commented Sep 23, 2019

ncdc commented Sep 23, 2019

vincepri commented Sep 23, 2019

chuckha commented Sep 24, 2019

chuckha commented Sep 24, 2019

Should not error if worker node does not have JoinConfiguration #213

Should not error if worker node does not have JoinConfiguration #213

Comments

chuckha commented Sep 5, 2019

detiber commented Sep 5, 2019

detiber commented Sep 5, 2019

detiber commented Sep 5, 2019

chuckha commented Sep 5, 2019

fabriziopandini commented Sep 6, 2019

chuckha commented Sep 6, 2019

detiber commented Sep 6, 2019

chuckha commented Sep 6, 2019

fabriziopandini commented Sep 9, 2019

detiber commented Sep 9, 2019

fabriziopandini commented Sep 17, 2019

chuckha commented Sep 17, 2019

accepting commented Sep 22, 2019 • edited Loading

fabriziopandini commented Sep 23, 2019

chuckha commented Sep 23, 2019

ncdc commented Sep 23, 2019

vincepri commented Sep 23, 2019

chuckha commented Sep 24, 2019

chuckha commented Sep 24, 2019

accepting commented Sep 22, 2019 •

edited

Loading