Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: move responsibility for managing k3s token to control plane controller #71

Merged
merged 6 commits into from
Nov 28, 2023

Conversation

cannonpalms
Copy link
Contributor

@cannonpalms cannonpalms commented Nov 17, 2023

Closes #70
Closes #69
Related to #53

This re-solves one of the problems already solved by #68, but in a way that will not require ownership to be manually patched for existing clusters. It also moves responsibility for the token to the control plane controller, so that coordination around token creation between different KThreesConfigs is not required.

I've added tests for the token functionality that I moved into ./pkg/token, and performed the following manual tests:

  • Initialize non-HA control plane
  • Initialize HA control plane
  • Scale out non-HA control plane to be HA
  • HA control plane rolling upgrade
  • Overwriting of existing controller ref if one exists

#72 should be merged first and this branch rebased on top of it so that the new token pkg tests are executed by CI


  • fix: move responsibility for managing k3s token to control plane controller (7fcf70b)

    - Moves responsibility for creating the token required by nodes to join the cluster to the KThreesControlPlane controller
    - Replaces the previous "create once, never update" strategy with a reconciliation strategy to ensure that ownership
    changes for existing clusters
    
  • test: add tests for token.Lookup and token.Reconcile (1c20218)

  • style: satisfy linter (d4a3a48)

  • test: extend tests for pkg/token (0ec58e8)

    This commit:
    - Adds an additional test covering the entirety of token#upsertControllerRef(...)
    - Corrects errors and omissions in the existing test convering token#Reconcile(...)
    
  • fix(token/upsertControllerRef): check for controlling ownership of the current ownerRef (798eecb)

  • style(token_test): satisfy linter (188de1e)

    We do not care to error check the return value of these function calls because we know
    them to be error-free in our toy test examples.
    

@cannonpalms
Copy link
Contributor Author

cannonpalms commented Nov 17, 2023

@zawachte If you give me direction on release notes, I'm happy to add them. Not sure how you intend to release this (an argument could be made for incrementing patch version or minor version).

Copy link
Contributor

@luisdavim luisdavim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@mogliang mogliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the upsertControllerRef func looks not right

var updatedOwnerReferences []metav1.OwnerReference
var controllerRefUpdated bool
for _, ownerRef := range controllee.GetOwnerReferences() {
// Identify and replace the controlling owner reference
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems not right?

Copy link
Contributor Author

@cannonpalms cannonpalms Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean here by "right"--a potential correctness issue, or a style issue perhaps?

The goal of this function is to replace an existing controlling owner reference if one exists, hence its verbosity. I'd love to clean it up if you have any ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/controller/controllerutil#SetOwnerReference ? But I'm not sure that covers this specific use case...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @cannonpalms , sorry, i should describe my thought clearer.

consider the case where ownerReferences has 1 controller ref, but not kcp, then this func will add a new controller ref to kcp, but this causes problem, because only 1 controller ref is allowed.

consider another case where ownerReferences has several ref, and one of it is controller ref to kcp. then this func will remove all existing ref and add same number of duplicate controller ref to kcp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. The answer to both of your questions is actually the same.


tl;dr-- What we need to do in the case of existing secrets is to replace any existing controller ref with a KThreesControlPlane controller ref while maintaining any existing non-controlling owner refs. That's exactly what this upsertControllerRef(...) function does; it properly handles both cases you mentioned, @mogliang.


I evaluated controllerutil.SetControllerReference(...), but this function throws an error if there is already an existing controller reference set.

From the doc @luisdavim linked:

// Since only one OwnerReference can be a controller, it returns an error if
// there is another OwnerReference with Controller flag set.
func SetControllerReference(owner, controlled metav1.Object, scheme *runtime.Scheme) error {

That makes it unsuitable for our use case.


Knowing that we cannot use controllerutil.SetControllerReference, I looked at other options. I considered a naive implementation that would simply overwrite the entire slice of owner references, replacing it with a slice containing only our new controller ref for KThreesControlPlane.

controllerRef := metav1.NewControllerRef(owner, owner.GetObjectKind().GroupVersionKind())
secret.OwnerReferences = []metav1.OwnerReference{controllerRef}

This solution is also not an option, because it is not acceptable to remove the owner references set by other operators or humans.


Therefore, we needed a solution that would respect existing owner references, but remove the existing controller ref and replace it with our new KThreesControlPlane controller ref.

controllerutil has exactly the function we need, but it sadly is not exported:
https://github.com/kubernetes-sigs/controller-runtime/blob/2154ffbc22e26ffd8c3b713927f0df2fa40841f2/pkg/controller/controllerutil/controllerutil.go#L177-L185

Therefore, I wrote our own implementation.

// upsertControllerRef takes controllee and controller objects, either replaces the existing controller ref
// if one exists or appends the new controller ref if one does not exist, and returns the updated controllee
// This is meant to be used in place of controllerutil.SetControllerReference(...), which would throw an error
// if there were already an existing controller ref.
func upsertControllerRef(controllee client.Object, controller client.Object) {
	newControllerRef := metav1.NewControllerRef(controller, controller.GetObjectKind().GroupVersionKind())
	// Iterate through existing owner references
	var updatedOwnerReferences []metav1.OwnerReference
	var controllerRefUpdated bool
	for _, ownerRef := range controllee.GetOwnerReferences() {
		// Identify and replace the controlling owner reference
		if metav1.IsControlledBy(controllee, controller) {
			updatedOwnerReferences = append(updatedOwnerReferences, *newControllerRef)
			controllerRefUpdated = true
		} else {
			// Keep non-controlling owner references intact
			updatedOwnerReferences = append(updatedOwnerReferences, ownerRef)
		}
	}
	// If the controlling owner reference was not found, add the new one
	if !controllerRefUpdated {
		updatedOwnerReferences = append(updatedOwnerReferences, *newControllerRef)
	}
	controllee.SetOwnerReferences(updatedOwnerReferences)
}

This function iterates over all existing owner references and builds a new slice of owner references that (1) does not include any existing controller ref and (2) does include our new controller ref. Exactly what we need.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use SetControllerReference and even if we use a custom function for setting the controller reference it will return the same error if more than one is marked as the controller, for this case we don't need it to be the controller right?

// SetOwnerReference is a helper method to make sure the given object contains an object reference to the object provided.
// This allows you to declare that owner has a dependency on the object without specifying it as a controller.
// If a reference to the same object already exists, it'll be overwritten with the newly provided version.
func SetOwnerReference(owner, object metav1.Object, scheme *runtime.Scheme) error {

yeah, i think we don't need add it as controller ref, normal owner ref shall work as well.

Copy link
Contributor Author

@cannonpalms cannonpalms Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need and want to replace the controlling ownership by a KThreesConfig with controlling ownership by our KThreesControlPlane.

After this change is deployed:

  • we do not want existing Secrets to maintain any owner reference, controlling or non-controlling, to a KThreesConfig object
  • we do want existing Secrets to have a controlling owner reference to our KThreesControlPlane

I've just pushed additional tests for this function and Reconcile(...) that helped me find a bug in upsertControllerRef(...), so I very much appreciate this feedback!

Copy link
Contributor Author

@cannonpalms cannonpalms Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that I have been clear in my responses here. I'm going to resolve this thread before merge, but please unresolve it if you have further concerns. I'll summarize my opinions below. Please correct me where I am wrong!

  • We do not want to maintain the existing reference to a KThreesConfig. It must be removed. Keeping it would send the wrong message about our intentions and desired garbage collection behavior.
  • We need to use a controlling ownership reference to the KThreesControlPlane. Controlling ownership is an oft-misused (or underused) concept, but is necessary in order to clearly call out that the KThreesControlPlane is responsible for managing this k3s token Secret. Although controlling ownership does not impact default (without a finalizer) garbage collection behavior in a way different to non-controlling ownership, it is commonly used to signal that the given owner is responsible for the deletion of the child (in addition to its creation/management over time).
  • The helper functions in controllerutil are not equipped to deal with this scenario.
  • This thread brought a helpful increased emphasis on the correctness of our ownership management of the Secret holding k3s tokens. This emphasis helped me to discover a bug in the implementation of the upsertControllerRef function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a possible point of contention is whether we want to use a controlling reference here. My thoughts are above; would you mind articulating your reasoning for using a non-controlling reference?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks cannonpalms, i just thought set owner can be easier to implement. Since upsertControllerRef has been fixed, and with good test coverage, i will perfer keep this way, and we can reuse it.

…roller

- Moves responsibility for creating the token required by nodes to join the cluster to the KThreesControlPlane controller
- Replaces the previous "create once, never update" strategy with a reconciliation strategy to ensure that ownership
changes for existing clusters
@cannonpalms cannonpalms force-pushed the fix/controlplane-ctrl-manage-token branch from 2ec687e to d4a3a48 Compare November 22, 2023 17:20
zawachte
zawachte previously approved these changes Nov 27, 2023
This commit:
- Adds an additional test covering the entirety of token#upsertControllerRef(...)
- Corrects errors and omissions in the existing test convering token#Reconcile(...)
We do not care to error check the return value of these function calls because we know
them to be error-free in our toy test examples.
@zawachte zawachte merged commit 090c3e0 into k3s-io:main Nov 28, 2023
4 checks passed
@lukebond
Copy link
Contributor

FYI @zawachte we put this through its paces in staging in our environment, did a bunch of cluster resizes and clusterctl rollout restart off machine deployments, which is what causes the token to disappear before. all went well, token didn't disappear, stayed owned by the KThreesControlPlane and our clusters were healthy according to all our other tests throughout. we're going to promote it to prod.

thought it might be useful data for you to consider a release!

@zawachte
Copy link
Collaborator

Thanks @lukebond! Awesome to hear. Will cut a release today/tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants