Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Enabling ACE after cluster provisioning results in unusable kubeconfig contexts #41832

Open
thatmidwesterncoder opened this issue Jun 13, 2023 · 8 comments · May be fixed by #45188
Open
Assignees
Labels
area/ace internal JIRA To be used in correspondence with the internal ticketing system. kind/bug Issues that are defects reported by users or that we know have reached a real release QA/XS team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
Milestone

Comments

@thatmidwesterncoder
Copy link
Contributor

thatmidwesterncoder commented Jun 13, 2023

Rancher Server Setup

  • Rancher version: v2.7.3+, reported on 2.7.3 but persists to newest version
  • Installation option (Docker install/Helm Chart): Docker and running locally from source
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): k3s latest via k3d
  • Proxy/Cert Details:

Information about the Cluster

  • Kubernetes version: 1.25.9, but any version most likely will be effected
  • Cluster Type (Local/Downstream): Downstream
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Custom and DigitalOcean

User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom): Admin
    • If custom, define the set of permissions: all

Describe the bug

When you create a RKE2 cluster on existing nodes or provisioned through and activate "Authorized Cluster Endpoint" after the cluster is available; connecting to the ACE endpoint results in a "401 Unauthorized" even though there is a cert/token present.

The ACE works fine if it is enabled when creating the cluster. This behavior only occurs when enabling ACE after creation.

To Reproduce

  1. Create a RKE2 cluster without ACE, can leave everything default
  2. Once it is provisioned, edit the cluster and activate ACE
  3. Download a kubeconfig and try to use an ACE context

Result
The ACE context does not work.

Expected Result

The ACE should work exactly as it does if it is enabled during provisioning -> e.g. exactly like a regular k8s context hitting the infrastructure node(s).

Screenshots

n/a

Additional context

  • The kubeconfig looks fine - cert is present for ACE, and token is used for relay context
  • I dug a bit into it due to the fact that it appears CRDs for clusterauthtoken aren't present, thus the kube-api-proxy pod barks about not being able to fetch the token when the user comes in on the ACE endpoint
  • applying the diff below seems to fix it. But I am not sure of the other repercussions that could be caused by having the agent watching more CRDs. Maybe these CRDs are harder/heavier to watch and thats why we wouldn't want them installed on every cluster.
  • this also makes rancher be the part that says "install X CRDs", I'm not sure if this should be moved somewhere else.
always_install_token_crds.diff
diff --git a/pkg/controllers/managementuser/controllers.go b/pkg/controllers/managementuser/controllers.go
index f08e9909a..7a9129214 100644
--- a/pkg/controllers/managementuser/controllers.go
+++ b/pkg/controllers/managementuser/controllers.go
@@ -48,13 +48,12 @@ func Register(ctx context.Context, mgmt *config.ScaledContext, cluster *config.U
 	// register secrets controller for impersonation
 	cluster.Core.Secrets("").Controller()
 
-	if clusterRec.Spec.LocalClusterAuthEndpoint.Enabled {
-		err := clusterauthtoken.CRDSetup(ctx, cluster.UserOnlyContext())
-		if err != nil {
-			return err
-		}
-		clusterauthtoken.Register(ctx, cluster)
+	// create the auth CRDs
+	err := clusterauthtoken.CRDSetup(ctx, cluster.UserOnlyContext())
+	if err != nil {
+		return err
 	}
+	clusterauthtoken.Register(ctx, cluster)
 
 	// Ensure these caches are started
 	cluster.Core.Namespaces("").Controller()

SURE-6359
SURE-6353

@thatmidwesterncoder thatmidwesterncoder added kind/bug Issues that are defects reported by users or that we know have reached a real release internal team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support JIRA To be used in correspondence with the internal ticketing system. labels Jun 13, 2023
@Sahota1225
Copy link
Contributor

duplicate of #42255

@KevinKeo
Copy link

Same issue here, with the stable rancher and provisioning latest RKE2.

@ifelsefi
Copy link

ifelsefi commented Nov 10, 2023

As workaround restarting upstream rancher pods on local cluster fixes the issue. I am running Rancher 2.7.1 v1.24.10+rke2r1.

@Oats87 Oats87 added the area/ace label Dec 5, 2023
@Zappelphilipp
Copy link

running Rancher 2.8 and K8s v1.26.11 +rke2r1 - same problem.
can confirm, restarting Rancher (development environment - docker restart rancher) fixed it.

@harrisonbc
Copy link

Running Rancher v2.8.2 and can confirm the issue and the resolution was to restart the Upstream Rancher Pods.

@dantecl
Copy link

dantecl commented Mar 21, 2024

+1 on Rancher 2.8.2 here, and also affected by this bug. For us, a kubectl rollout restart rancher -n cattle-system populated the CRD/token and things worked without downtime, but really not ideal.

@atsai1220
Copy link

atsai1220 commented Apr 12, 2024

Not sure if exactly the same issue. We have Rancher v2.7.9 and the CRDs for ClusterAuthToken exist in the cluster except generating a new Kubeconfig does not create new ClusterAuthTokens in some rke2 clusters. They all have ACE enabled.

I can see in my kube-audit logs on a working rke2 cluster for an event with verb=create for the requestURI is:

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "stage": "ResponseComplete",
  "requestURI": "/apis/cluster.cattle.io/v3/namespaces/cattle-system/clusterauthtokens",
  "verb": "create",
  "user": {
    "username": "system:serviceaccount:cattle-system:cattle",
    "groups": [
      "system:serviceaccounts",
      "system:serviceaccounts:cattle-system",
      "system:authenticated"
    ]
  },
  "sourceIPs": [
    "10.252.199.199"
  ],
  "userAgent": "rancher/v0.0.0 (linux/amd64) kubernetes/$Format cluster c-m-6cwv8brf",
  "objectRef": {
    "resource": "clusterauthtokens",
    "namespace": "cattle-system",
    "name": "kubeconfig-user-p5rddsl2kp",
    "apiGroup": "cluster.cattle.io",
    "apiVersion": "v3"
  },
  "responseStatus": {
    "metadata": {},
    "code": 201
  },
  "requestReceivedTimestamp": "2024-04-12T00:52:30.881375Z",
  "stageTimestamp": "2024-04-12T00:52:30.888528Z",
}

And that verb=create even does not happen at all on clusters that are not creating the ClusterAuthToken object.

@snasovich snasovich added this to the v2.9-Next1 milestone Apr 16, 2024
@pratikjagrut pratikjagrut linked a pull request Apr 22, 2024 that will close this issue
@Tbag12
Copy link

Tbag12 commented Apr 22, 2024

Will this problem be fixed in the next version? ? ? Currently I am using v2.8.3 and also encountered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ace internal JIRA To be used in correspondence with the internal ticketing system. kind/bug Issues that are defects reported by users or that we know have reached a real release QA/XS team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
Projects
None yet
Development

Successfully merging a pull request may close this issue.