Skip to content
This repository has been archived by the owner on Jul 30, 2021. It is now read-only.

Should not error if worker node does not have JoinConfiguration #213

Closed
chuckha opened this issue Sep 5, 2019 · 19 comments · Fixed by #234
Closed

Should not error if worker node does not have JoinConfiguration #213

chuckha opened this issue Sep 5, 2019 · 19 comments · Fixed by #234
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@chuckha
Copy link
Contributor

chuckha commented Sep 5, 2019

/kind bug

https://github.com/kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm/blob/cc2b1e48885f2f7bda4b8a3048b1c33367322a0f/controllers/kubeadmconfig_controller.go#L258-L260

What steps did you take and what happened:
After the control plane is initialized and a worker gets created but that worker does not have a JoinConfiguration the reconciler returns an error.

What did you expect to happen:
I expected a nil join configuration would be absolutely fine in this case. The reconciler should imply set it to an initialized JoinConfiguration. If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 5, 2019
@detiber
Copy link
Contributor

detiber commented Sep 5, 2019

If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

Should we do this though? Maybe we should error if a Cluster or Init config is provided rather than ignoring?

@detiber
Copy link
Contributor

detiber commented Sep 5, 2019

I expected a nil join configuration would be absolutely fine in this case. The reconciler should imply set it to an initialized JoinConfiguration.

Yes, please

@detiber
Copy link
Contributor

detiber commented Sep 5, 2019

I'm also wondering if we should default Cluster/Init configuration if one is not provided for the initial control plane instance?

@chuckha
Copy link
Contributor Author

chuckha commented Sep 5, 2019

If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

Should we do this though? Maybe we should error if a Cluster or Init config is provided rather than ignoring?

Yeah, I think this was designed to allow the user to be quite lazy with their configuration. I don't have strong feelings on omitting a log or straight up erroring.

@fabriziopandini any thoughts here?

@fabriziopandini
Copy link
Contributor

@chuckha @detiber

I expected a nil join configuration would be absolutely fine in this case. The reconciler should imply set it to an initialized JoinConfiguration.

I'm ok with this (see also this issue #112)

I'm also wondering if we should default Cluster/Init configuration if one is not provided for the initial control plane instance?

If we are going to generate a default JoinConfiguration, I don't see strong reasons for generating default Cluster/Init configuration as well

If we detect that a Cluster or Init configuration is set then we should log a line saying "the worker will ignore a set Cluster/Init configuration" and proceed with the join. Right now we are ignoring Cluster/Init without notifying the user which may lead to confusion.

Should we do this though? Maybe we should error if a Cluster or Init config is provided rather than ignoring?

I'm personally in favor of strict validation, but lazy gives you the possibility to use the same configuration for all your control plane nodes vs having one configuration for the first control plane and another configuration for the additional control plane nodes.
As a side note, also in kubeadm we reverted to lazy because people were using a single file with all the objects for all the nodes.

Ok for adding a warning when we detect lazy behaviors

PS. if we get to an agreement, I would like to work on this 😉

@chuckha
Copy link
Contributor Author

chuckha commented Sep 6, 2019

That's really good to know.

I'm ok with allowing all control planes to be defined with Init and Cluster configurations, mostly because that's what Kubeadm does.

However, if they are not the initial control plane then we should generate the JoinConfiguration for them.

Sorry @fabriziopandini I'm a little confused about this line:

If we are going to generate a default JoinConfiguration, I don't see strong reasons for generating default Cluster/Init configuration as well

Are you in favor of generating default Cluster/Init configuration if it does not exist for the first found control plane node?

@detiber
Copy link
Contributor

detiber commented Sep 6, 2019

I'm personally in favor of strict validation, but lazy gives you the possibility to use the same configuration for all your control plane nodes vs having one configuration for the first control plane and another configuration for the additional control plane nodes.
As a side note, also in kubeadm we reverted to lazy because people were using a single file with all the objects for all the nodes.

My major worry here is not that someone provides the same configuration to all Machines, but rather that they provide a different configuration to an additional Machine and expect that it would mutate the config of the existing Cluster.

@chuckha
Copy link
Contributor Author

chuckha commented Sep 6, 2019

@detiber that is a good point. Different configurations would not be good.

Without adding aggregation/ensuring that the configs are the same, the only thing we can really do here is fail on secondary control planes with Init/Config. I'm ok with this approach.

@fabriziopandini
Copy link
Contributor

So, if I got this right,

  • If join configuration is missing, we are going to generate a default one.
  • If secondary control plane nodes have Init/Cluster config, we are going to fail
    Right?

@detiber
Copy link
Contributor

detiber commented Sep 9, 2019

@fabriziopandini that and:

  • if init/cluster config is not found on initial control plane, generate a default one.

Is my understanding, at least.

@ncdc ncdc added this to the Next milestone Sep 11, 2019
@ncdc ncdc added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Sep 11, 2019
@fabriziopandini
Copy link
Contributor

/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Sep 17, 2019
@chuckha
Copy link
Contributor Author

chuckha commented Sep 17, 2019

/assign @fabriziopandini

@accepting
Copy link
Contributor

accepting commented Sep 22, 2019

I have a suggestion here.

  • If secondary control plane nodes or worker nodes have Init/Cluster config, we set them as nil with corresponding warnings.
  • If initial control plane has Join config, we set it as nil with corresponding warnings.

What do you think of this suggesion@fabriziopandini @chuckha @detiber

@fabriziopandini
Copy link
Contributor

@accepting please correct me if I'm wrong, but I'm not sure what are you proposing work in use cases different than yours where all the KubeadmConfig have Init/Cluster/Join configuration.

fi. if the user is providing:
a) a KubeadmConfig object specifically configured for init
b) a KubeadmConfig object specifically configured for joining a second control plane

If b is processed as first:

  • the join configuration in b will be set to nil
  • a default Init/Cluster configuration will be created and the node will act as a bootstrap control plane

then, when a is processed:

  • the Init/Cluster configuration a will be set to nil
  • a default JoinConfiguration will be created and the node will act as a secondary control plane

This is not correct, because CABPK has to respect user-provided Init/Cluster configuration in a and JoinCongiration in b.

@chuckha
Copy link
Contributor Author

chuckha commented Sep 23, 2019

I think this comes down to how control planes are configured.

If we give the same InitConfig, ClusterConfig and JoinConfig to every machine and mark some as control planes then the configmap lock prevents multiple kubeadm init calls. The first one that gets the lock runs a kubeadm init. In this case, once we have the lock, we set the Join configuration to nil and move on. The next reconcile loop the following control plane that succeeds will be a control plane join. In this case we set the init and cluster configuration to nil and move on.

However, as @fabriziopandini points out, if we give all configurations to the control plane and specifically configure one config as a join but leave the init and cluster configuration there will be a problem.

I wonder what makes sense here? I hadn't accounted for the first use case, but that is a very SIG-cluster-lifecycle way to use cluster-api. Machines are treated as cattle. I'd almost say that if a user wants to specifically create a machine that is the bootstrap control plane then they should mark the it as such by setting the JoinConfiguration to nil.

@fabriziopandini do you think it's acceptable that if a user wants a specific machine to be a primary control plane they must configure it as such by setting JoinConfiguration to nil and if they want a specific instance to be a secondary control plane they must configure as such by string Init and Cluster configurations to nil?

@ncdc
Copy link
Contributor

ncdc commented Sep 23, 2019

Are there use cases where it makes sense to distinguish between control plane node 1 vs all other control plane nodes?

@vincepri
Copy link
Contributor

Just chiming in a little here. I'd prefer to keep things explicit and avoid making users have to understand the way controllers work internally. A possible solution could be:

  • A user has to set at least one control plane machine with InitConfiguration and ClusterConfiguration, there should be only one machine with these values set, all the others fail / are ignored.
  • Any other machine may or may not have a JoinConfiguration, one is generated for Machines that don't have it iff the first control plane has been already initialized

@chuckha
Copy link
Contributor Author

chuckha commented Sep 24, 2019

@ncdc I haven't heard of any but I also don't have many points of data.

I'm in favor of being lenient to allow the "copy configs across every machine" since that is actually easier to implement and would help @accepting out, but I want to do a little more questioning of users to see how people are actually using this

@chuckha
Copy link
Contributor Author

chuckha commented Sep 24, 2019

let's not change behavior right now and have a discussion about behavior going forward so we can come to some agreement before implementing this change.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants