Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SURE-6353]fix: Reconcile ACE #45188

Open
wants to merge 1 commit into
base: release/v2.9
Choose a base branch
from

Conversation

pratikjagrut
Copy link
Contributor

Issue:

#41832
SURE-6353

Problem

The issue is that enabling ACE on an RKE2 downstream cluster is not functioning as anticipated. While the kube-api-auth pod starts successfully and the kube-apiserver flags are set correctly, the controller within the kube-api-auth pod fails to start. As a result, the necessary Custom Resource Definitions (CRDs) are not created on the downstream cluster, leading to the absence of tokens.

Solution

The solution to the aforementioned problem was identified through troubleshooting. It was observed that there was a discrepancy between the appliedSpec and Spec fields:

spec:
  desiredAgentImage: rancher/rancher-agent:v2.7.3
  desiredAuthImage: rancher/kube-api-auth:v0.1.8
  localClusterAuthEndpoint:
    enabled: true
------------
status:
  appliedSpec:
    localClusterAuthEndpoint:
      enabled: false 

Further investigation revealed that the ACE was not being reconciled upon updates. As a result, the status.appliedSpec.localClusterAuthEndpoint.enabled was not updated, preventing the controller from triggering the installation of the necessary CRDs on the downstream cluster. To address this issue, this PR introduces code to reconcile ACE, ensuring that the status.appliedSpec.localClusterAuthEndpoint.enabled is updated appropriately.

Testing

Engineering Testing

Manual Testing

Followed repro steps from JIRA issue SURE-6353

Automated Testing

  • Test types added/modified:
    • Unit
    • Integration (Go Framework)
    • Integration (v2prov Framework)
    • Validation (Go Framework)
    • Other - Explain: EXPLAIN
    • None
    • REMOVE NOT APPLICABLE BULLET POINTS ABOVE
  • If "None" - Reason: EXPLAIN THE REASON
  • If "None" - GH Issue/PR: LINK TO GH ISSUE/PR TO ADD TESTS

Summary: TODO

QA Testing Considerations

Regressions Considerations

TODO

Existing / newly added automated tests that provide evidence there are no regressions:

  • TODO

@Tbag12
Copy link

Tbag12 commented Apr 22, 2024

Will this problem be fixed in the next version? ? ? Currently I am using v2.8.3 and also encountered.

@pratikjagrut pratikjagrut marked this pull request as ready for review April 23, 2024 07:52
@snasovich snasovich requested review from bfbachmann and a team April 23, 2024 15:59
Copy link
Contributor

@bfbachmann bfbachmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pratikjagrut Did you see @thatmidwesterncoder's code suggestion in the original issue? It looks like this

diff --git a/pkg/controllers/managementuser/controllers.go b/pkg/controllers/managementuser/controllers.go
index f08e9909a..7a9129214 100644
--- a/pkg/controllers/managementuser/controllers.go
+++ b/pkg/controllers/managementuser/controllers.go
@@ -48,13 +48,12 @@ func Register(ctx context.Context, mgmt *config.ScaledContext, cluster *config.U
 	// register secrets controller for impersonation
 	cluster.Core.Secrets("").Controller()
 
-	if clusterRec.Spec.LocalClusterAuthEndpoint.Enabled {
-		err := clusterauthtoken.CRDSetup(ctx, cluster.UserOnlyContext())
-		if err != nil {
-			return err
-		}
-		clusterauthtoken.Register(ctx, cluster)
+	// create the auth CRDs
+	err := clusterauthtoken.CRDSetup(ctx, cluster.UserOnlyContext())
+	if err != nil {
+		return err
 	}
+	clusterauthtoken.Register(ctx, cluster)
 
 	// Ensure these caches are started
 	cluster.Core.Namespaces("").Controller()

I don't have that much context on this functionality, but is it possible that we can just fix this by enabling those controllers like the above patch does? Do you know what the tradeoffs are between that solution and the one you've implemented?

To be clear, I'm not suggesting that we should go with the alternative solution instead of what you have, I'm just trying to get a better understanding of the difference between the two from your perspective.

@pratikjagrut
Copy link
Contributor Author

@bfbachmann After reviewing the code above, you may have noticed that the if condition has been removed. If I'm not wrong then this could cause clusterauthtoken.CRDSetup() to run even when ACE is not enabled.

If you refer to the JIRA ticket, the main issue was that the status.appliedSpec.localClusterAuthEndpoint.enabled field of the cluster object was not updated. As a result, the status of the existing cluster object remains the same as the updated cluster object, preventing the change from triggering here: link.

This heppened because the ACE was never reconciled for updated cluster object, so even if change ACE in spec field it was never refelcted in status.AppliedSpec field. So I added the reconcileACE for the imported cluster which is basically RKE2 clusters.

@bfbachmann bfbachmann requested a review from a team April 26, 2024 21:09
@jakefhyde
Copy link
Contributor

jakefhyde commented Apr 26, 2024

@pratikjagrut Thanks for doing this, I don't see any immediate issues, but I'm curious which test cases you've validated thus far? I think we want to ensure that imported and non-imported clusters with ACE enabled still work, for both RKE1 and RKE2.

I also restarted the drone build since I don't believe the failure was related.

@pratikjagrut
Copy link
Contributor Author

pratikjagrut commented May 6, 2024

@jakefhyde I manually tested issue SURE-6353 with this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Enabling ACE after cluster provisioning results in unusable kubeconfig contexts
4 participants