[BUG] Enabling ACE after cluster provisioning results in unusable kubeconfig contexts #41832

thatmidwesterncoder · 2023-06-13T21:16:44Z

Rancher Server Setup

Rancher version: v2.7.3+, reported on 2.7.3 but persists to newest version
Installation option (Docker install/Helm Chart): Docker and running locally from source
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): k3s latest via k3d
Proxy/Cert Details:

Information about the Cluster

Kubernetes version: 1.25.9, but any version most likely will be effected
Cluster Type (Local/Downstream): Downstream
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Custom and DigitalOcean

User Information

What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom): Admin
- If custom, define the set of permissions: all

Describe the bug

When you create a RKE2 cluster on existing nodes or provisioned through and activate "Authorized Cluster Endpoint" after the cluster is available; connecting to the ACE endpoint results in a "401 Unauthorized" even though there is a cert/token present.

The ACE works fine if it is enabled when creating the cluster. This behavior only occurs when enabling ACE after creation.

To Reproduce

Create a RKE2 cluster without ACE, can leave everything default
Once it is provisioned, edit the cluster and activate ACE
Download a kubeconfig and try to use an ACE context

Result
The ACE context does not work.

Expected Result

The ACE should work exactly as it does if it is enabled during provisioning -> e.g. exactly like a regular k8s context hitting the infrastructure node(s).

Screenshots

n/a

Additional context

The kubeconfig looks fine - cert is present for ACE, and token is used for relay context
I dug a bit into it due to the fact that it appears CRDs for clusterauthtoken aren't present, thus the kube-api-proxy pod barks about not being able to fetch the token when the user comes in on the ACE endpoint
applying the diff below seems to fix it. But I am not sure of the other repercussions that could be caused by having the agent watching more CRDs. Maybe these CRDs are harder/heavier to watch and thats why we wouldn't want them installed on every cluster.
this also makes rancher be the part that says "install X CRDs", I'm not sure if this should be moved somewhere else.

always_install_token_crds.diff

diff --git a/pkg/controllers/managementuser/controllers.go b/pkg/controllers/managementuser/controllers.go
index f08e9909a..7a9129214 100644
--- a/pkg/controllers/managementuser/controllers.go
+++ b/pkg/controllers/managementuser/controllers.go
@@ -48,13 +48,12 @@ func Register(ctx context.Context, mgmt *config.ScaledContext, cluster *config.U
 	// register secrets controller for impersonation
 	cluster.Core.Secrets("").Controller()
 
-	if clusterRec.Spec.LocalClusterAuthEndpoint.Enabled {
-		err := clusterauthtoken.CRDSetup(ctx, cluster.UserOnlyContext())
-		if err != nil {
-			return err
-		}
-		clusterauthtoken.Register(ctx, cluster)
+	// create the auth CRDs
+	err := clusterauthtoken.CRDSetup(ctx, cluster.UserOnlyContext())
+	if err != nil {
+		return err
 	}
+	clusterauthtoken.Register(ctx, cluster)
 
 	// Ensure these caches are started
 	cluster.Core.Namespaces("").Controller()

SURE-6359
SURE-6353

The text was updated successfully, but these errors were encountered:

Sahota1225 · 2023-07-28T17:33:18Z

duplicate of #42255

KevinKeo · 2023-08-11T09:31:31Z

Same issue here, with the stable rancher and provisioning latest RKE2.

ifelsefi · 2023-11-10T15:16:46Z

As workaround restarting upstream rancher pods on local cluster fixes the issue. I am running Rancher 2.7.1 v1.24.10+rke2r1.

Zappelphilipp · 2024-01-16T08:03:00Z

running Rancher 2.8 and K8s v1.26.11 +rke2r1 - same problem.
can confirm, restarting Rancher (development environment - docker restart rancher) fixed it.

harrisonbc · 2024-02-15T10:26:04Z

Running Rancher v2.8.2 and can confirm the issue and the resolution was to restart the Upstream Rancher Pods.

dantecl · 2024-03-21T14:36:52Z

+1 on Rancher 2.8.2 here, and also affected by this bug. For us, a kubectl rollout restart rancher -n cattle-system populated the CRD/token and things worked without downtime, but really not ideal.

atsai1220 · 2024-04-12T00:44:38Z

Not sure if exactly the same issue. We have Rancher v2.7.9 and the CRDs for ClusterAuthToken exist in the cluster except generating a new Kubeconfig does not create new ClusterAuthTokens in some rke2 clusters. They all have ACE enabled.

I can see in my kube-audit logs on a working rke2 cluster for an event with verb=create for the requestURI is:

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "stage": "ResponseComplete",
  "requestURI": "/apis/cluster.cattle.io/v3/namespaces/cattle-system/clusterauthtokens",
  "verb": "create",
  "user": {
    "username": "system:serviceaccount:cattle-system:cattle",
    "groups": [
      "system:serviceaccounts",
      "system:serviceaccounts:cattle-system",
      "system:authenticated"
    ]
  },
  "sourceIPs": [
    "10.252.199.199"
  ],
  "userAgent": "rancher/v0.0.0 (linux/amd64) kubernetes/$Format cluster c-m-6cwv8brf",
  "objectRef": {
    "resource": "clusterauthtokens",
    "namespace": "cattle-system",
    "name": "kubeconfig-user-p5rddsl2kp",
    "apiGroup": "cluster.cattle.io",
    "apiVersion": "v3"
  },
  "responseStatus": {
    "metadata": {},
    "code": 201
  },
  "requestReceivedTimestamp": "2024-04-12T00:52:30.881375Z",
  "stageTimestamp": "2024-04-12T00:52:30.888528Z",
}

And that verb=create even does not happen at all on clusters that are not creating the ClusterAuthToken object.

Tbag12 · 2024-04-22T13:41:33Z

Will this problem be fixed in the next version? ? ? Currently I am using v2.8.3 and also encountered

Sahota1225 mentioned this issue Jul 28, 2023

[BUG] Enabling ACE after cluster creation results in ACE endpoints access not working #42255

Closed

Oats87 added the area/ace label Dec 5, 2023

snasovich added this to the v2.9-Next1 milestone Apr 16, 2024

snasovich assigned bfbachmann Apr 16, 2024

pratikjagrut linked a pull request Apr 22, 2024 that will close this issue

[SURE-6353]fix: Reconcile ACE #45188

Open

Josh-Diamond assigned susesgartner Apr 25, 2024

Josh-Diamond added the QA/XS label Apr 25, 2024

pratikjagrut mentioned this issue May 2, 2024

[BUG] Unable to authenticate with imported downstream Cluster via ACE #45328

Open

pratikjagrut linked a pull request May 13, 2024 that will close this issue

[SURE-6353]fix: Reconcile ACE #45188

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Enabling ACE after cluster provisioning results in unusable kubeconfig contexts #41832

[BUG] Enabling ACE after cluster provisioning results in unusable kubeconfig contexts #41832

thatmidwesterncoder commented Jun 13, 2023 •

edited by snasovich

Sahota1225 commented Jul 28, 2023

KevinKeo commented Aug 11, 2023

ifelsefi commented Nov 10, 2023 •

edited

Zappelphilipp commented Jan 16, 2024

harrisonbc commented Feb 15, 2024

dantecl commented Mar 21, 2024

atsai1220 commented Apr 12, 2024 •

edited

Tbag12 commented Apr 22, 2024

[BUG] Enabling ACE after cluster provisioning results in unusable kubeconfig contexts #41832

[BUG] Enabling ACE after cluster provisioning results in unusable kubeconfig contexts #41832

Comments

thatmidwesterncoder commented Jun 13, 2023 • edited by snasovich

Sahota1225 commented Jul 28, 2023

KevinKeo commented Aug 11, 2023

ifelsefi commented Nov 10, 2023 • edited

Zappelphilipp commented Jan 16, 2024

harrisonbc commented Feb 15, 2024

dantecl commented Mar 21, 2024

atsai1220 commented Apr 12, 2024 • edited

Tbag12 commented Apr 22, 2024

thatmidwesterncoder commented Jun 13, 2023 •

edited by snasovich

ifelsefi commented Nov 10, 2023 •

edited

atsai1220 commented Apr 12, 2024 •

edited