KEv2 clusters imported using the Rancher client have their config improperly rewritten #36128

thedadams · 2022-01-13T00:37:08Z

Rancher Server Setup

Rancher version: v2.6-head
Installation option (Docker install/Helm Chart): any
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): any
Proxy/Cert Details: any

Information about the Cluster

Kubernetes version:
Cluster Type (Local/Downstream): Downstream
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): imported KEv2

Describe the bug
If a KEv2 cluster is imported using the Rancher client (like with terraform), then the config will be improperly applied. For example, importing an EKS cluster with terraform while leaving out any mention of node groups (which is proper) will end up deleting all the node groups from the EKS cluster when it is imported into Rancher.

To Reproduce
Use the terraform provider to import an EKS cluster. Do not specify any node groups in the terraform config.

Result
All the node groups of the EKS cluster will be deleted after the EKS cluster is imported.

Expected Result
The cluster should be imported and not updated at all in EKS.

Additional context
The underlying issue here is that the slice types in the KEv2 structs have the struct tag norman:pointer. This is improper because a slice type is already a pointer.

To fix this:

All slices types in the three KEv2 operators need to have this struct tag removed.
The dependencies for these three operators should be bumped in Rancher.
Run go generate in Rancher.

After Rancher is updated, the terraform provider and the CLI also should be updated.

rancher/terraform-provider-rancher2#800

SURE-3842
SURE-3833

The text was updated successfully, but these errors were encountered:

a-blender · 2022-02-11T22:15:19Z

I fixed this issue with direction from @thedadams.

If a node_group is not defined in Terraform, Rancher received an empty NodeGroups array in the ClusterConfigSpec and reconciled that state by removing node groups in the KEv2 provider.

I opened three PRs to fix each KEv2 operator and three more PRs to fix code in the terraform provider and bump the operator and crd versions in rancher/rancher and rancher/charts.

Fixes

Integration PRs

a-blender · 2022-02-11T22:26:32Z

Testing template

Root cause

Unnecessary norman:pointer struct tags on the NodeGroups field in the ClusterConfigSpec for each KEv2 provider.

What was fixed, or what changes have occurred

norman:pointer struct tags removed from each KEv2 operator
Updated code in terraform provider
KEv2 operator versions bumped in rancher/rancher and rancher/charts

Areas or cases that should be tested

Test importing an EKS and AKS cluster into Rancher using terraform (GKE didn't actually have the bug but did receive code updates)

Create EKS cluster in AWS
Create terraform config, don't specify node_group
Import cluster into Rancher
Check in EKS if your node groups have been deleted or not. They should still be active!
Repeat for AKS

EKS terraform config

terraform {
  required_providers {
    rancher2 = {
      source = "rancher/rancher2"
      version = "1.0.0"
    }
  }
}

provider "rancher2" {
  api_url    = "<Rancher URL>"
  token_key = "<Rancher API bearer token>"
  insecure = true
}

resource "rancher2_cluster" "bar" {
  name = "ablender-eks-test"
  description = "Terraform EKS cluster"
  eks_config_v2 {
    cloud_credential_id = "cattle-global-data:<cloud credential token>"
    region = "<region>"
    imported = true
  }
}

AKS terraform config

terraform {
  required_providers {
    rancher2 = {
      source = "rancher/rancher2"
      version = "1.0.0"
    }
  }
}

provider "rancher2" {
  api_url    = "<Rancher URL>"
  token_key = "<Rancher API bearer token>"
  insecure = true
}

resource "rancher2_cloud_credential" "aks-cluster" {
  name = "aks-cluster"
  azure_credential_config {
    client_id = "<Client ID"
    client_secret = "<Client secret value>"
    subscription_id = "<Subscription ID"
  }
}

What areas could experience regressions ?

KEv2 provider import and cluster provisioning.

Are the repro steps accurate/minimal ?

Yes.

sowmyav27 · 2022-02-16T15:20:16Z

Test cases to validate:

Import EKS v2 into Rancher using TF
Import EKS v2 in Rancher using Rancher UI

- Create an EKS cluster - `cluster-1` in AWS console (Latest k8s version). Add Node groups in this cluster in AWS console.
- Import this cluster in Rancher by navigating to Cluster Management --> Import --> EKS
- Add in the credentials, select the right region where EKS is created in AWS console.
- Select the cluster from the dropdown - `cluster-1`
- Click on Create/Save
- Verify the cluster comes up Active. And nodegroups/nodes are available on the cluster
- Edit cluster
- Add another nodegroup
- Save changes made
- Wait for the cluster to come back up Active.
- Verify in Rancher - all nodegroups including existing ones and the new one added is available.
- Verify in AWS, all nodegoups are available.

Provision an EKS v2 cluster from Rancher UI

timhaneunsoo · 2022-02-23T00:13:34Z

Test Environment:

Rancher version: v2.6-head
Rancher cluster type: HA
Docker version: 20.10

Downstream cluster type: EKS v2 cluster

Testing:

Tested this issue with the following scenarios:

Import EKS v2 into Rancher using TF
Import EKS v2 in Rancher using Rancher UI
Provision an EKS v2 cluster from Rancher UI

Result

Import EKS v2 into Rancher using TF - Low Pass
The cluster gets added into Rancher but is stuck on Waiting
Import EKS v2 in Rancher using Rancher UI - Low Pass
The cluster gets added into Rancher but is stuck on Waiting
Provision an EKS v2 cluster from Rancher UI - Pass

a-blender · 2022-03-01T14:19:47Z

@timhaneunsoo I was unable to reproduce the specific error that you showed above. I did, however, test importing an EKS cluster on v2.6-head run on a DO node, and ran into an intermittent issue that I've seen before where the UI hangs with Waiting for API to be available.

This is my terraform config

terraform {
  required_providers {
    rancher2 = {
      source = "terraform.example.com/local/rancher2"
      version = "1.0.0"
    }
  }
}

provider "rancher2" {
  api_url    = "<DO node ip>"
  token_key = "<Rancher API bearer token>"
  insecure = true
}

resource "rancher2_cluster" "bar" {
  name = "ablender-eks-test"
  description = "Terraform EKS cluster"
  eks_config_v2 {
    cloud_credential_id = "cattle-global-data:<Rancher AWS cloud credential>"
    region = "us-east-2"
    imported = true
  }
}

Here's the result of trying to import EKS ablender-eks-test

The mgmt cluster shows NodeGroups: null in the eksConfig which is correct.

This means node groups in EKS will not get deleted on import. The node group for my cluster is also still active. @kinarashah and I believe this issue is fixed. I will open a new GitHub issue for the error I saw on 2.6-head.

This issue can be retested.

a-blender · 2022-03-01T23:18:50Z

Here's the new GitHub issue #36700 that blocked my EKS import when I replicated the QA tests. To verify the fix for this issue, check the mgmt cluster yaml in the Rancher UI and verify that NodeGroups: null. If this is set and your node group still exists in AWS, the node deletion bug is fixed.

timhaneunsoo · 2022-03-03T18:52:43Z

Test Environment:

Rancher version: v2.6-head
Rancher cluster type: HA
Docker version: 20.10

Downstream cluster type: EKS v2 cluster

Testing:

Retested this issue with the following scenarios:

Import EKS v2 into Rancher using TF
Import EKS v2 in Rancher using Rancher UI
Provision an EKS v2 cluster from Rancher UI (Pass from previous testing)

Result

Import EKS v2 into Rancher using TF - Pass
Node groups is set to null and node group is still preserved once importing into Rancher
Import EKS v2 in Rancher using Rancher UI - Pass
Node groups is set to null and node group is still preserved once importing into Rancher
Provision an EKS v2 cluster from Rancher UI - Pass (from previous testing)

thedadams added area/cli area/gke area/aks area/eks area/terraform feature/kev2 team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support labels Jan 13, 2022

thedadams added this to the v2.6.4 milestone Jan 13, 2022

snasovich assigned a-blender Jan 13, 2022

snasovich added the [zube]: Next Up label Jan 13, 2022

a-blender added the [zube]: QA Working label Jan 14, 2022

zube bot removed the [zube]: Next Up label Jan 14, 2022

snasovich added the [zube]: Working label Jan 14, 2022

zube bot removed the [zube]: QA Working label Jan 14, 2022

a-blender added the internal label Jan 14, 2022

a-blender mentioned this issue Feb 1, 2022

Remove norman:pointer tags from EKS operator rancher/eks-operator#59

Merged

sowmyav27 assigned timhaneunsoo and unassigned timhaneunsoo Feb 2, 2022

thedadams mentioned this issue Feb 9, 2022

Rancher deletes EKS Nodegroups when importing EKS Cluster #36465

Closed

a-blender mentioned this issue Feb 11, 2022

Rancher delete existing node_groups and logging_types config EKS imported cluster rancher/terraform-provider-rancher2#800

Closed

a-blender added [zube]: To Test and removed [zube]: Working labels Feb 15, 2022

timhaneunsoo added [zube]: QA Next up and removed [zube]: To Test labels Feb 15, 2022

sowmyav27 added the QA/S label Feb 21, 2022

timhaneunsoo added [zube]: QA Working and removed [zube]: QA Next up labels Feb 22, 2022

timhaneunsoo added [zube]: Reopened and removed [zube]: QA Working labels Feb 23, 2022

a-blender added [zube]: Working and removed [zube]: Reopened labels Feb 23, 2022

a-blender added [zube]: To Test and removed [zube]: Working labels Mar 1, 2022

timhaneunsoo added [zube]: QA Working and removed [zube]: To Test labels Mar 2, 2022

timhaneunsoo closed this as completed Mar 3, 2022

timhaneunsoo added [zube]: Done and removed [zube]: QA Working labels Mar 3, 2022

zube bot removed the [zube]: Done label Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEv2 clusters imported using the Rancher client have their config improperly rewritten #36128

KEv2 clusters imported using the Rancher client have their config improperly rewritten #36128

thedadams commented Jan 13, 2022 •

edited by paynejacob

Loading

a-blender commented Feb 11, 2022 •

edited

Loading

a-blender commented Feb 11, 2022 •

edited

Loading

sowmyav27 commented Feb 16, 2022

timhaneunsoo commented Feb 23, 2022 •

edited

Loading

a-blender commented Mar 1, 2022 •

edited

Loading

a-blender commented Mar 1, 2022 •

edited

Loading

timhaneunsoo commented Mar 3, 2022

KEv2 clusters imported using the Rancher client have their config improperly rewritten #36128

KEv2 clusters imported using the Rancher client have their config improperly rewritten #36128

Comments

thedadams commented Jan 13, 2022 • edited by paynejacob Loading

a-blender commented Feb 11, 2022 • edited Loading

a-blender commented Feb 11, 2022 • edited Loading

Testing template

Root cause

What was fixed, or what changes have occurred

Areas or cases that should be tested

What areas could experience regressions ?

Are the repro steps accurate/minimal ?

sowmyav27 commented Feb 16, 2022

timhaneunsoo commented Feb 23, 2022 • edited Loading

Test Environment:

Testing:

a-blender commented Mar 1, 2022 • edited Loading

a-blender commented Mar 1, 2022 • edited Loading

timhaneunsoo commented Mar 3, 2022

Test Environment:

Testing:

thedadams commented Jan 13, 2022 •

edited by paynejacob

Loading

a-blender commented Feb 11, 2022 •

edited

Loading

a-blender commented Feb 11, 2022 •

edited

Loading

timhaneunsoo commented Feb 23, 2022 •

edited

Loading

a-blender commented Mar 1, 2022 •

edited

Loading

a-blender commented Mar 1, 2022 •

edited

Loading