Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite GKEv2 cluster refresher #31789

Closed
cbron opened this issue Mar 22, 2021 · 7 comments
Closed

Rewrite GKEv2 cluster refresher #31789

cbron opened this issue Mar 22, 2021 · 7 comments
Assignees
Labels
kind/enhancement Issues that improve or augment existing functionality release-note Note this issue in the milestone's release notes
Milestone

Comments

@cbron
Copy link
Contributor

cbron commented Mar 22, 2021

In order to accommodate all the kev2 settings and clusters, we are generalizing the cluster refresher. This requires a re-testing of the EKS syncing mechanism.
The cron job is getting retired and moved to a int seconds setting. A prior cronjob setting should be migrated to that setting.

@cbron cbron added this to the v2.5.8 milestone Mar 22, 2021
@cbron
Copy link
Contributor Author

cbron commented Mar 23, 2021

2.6 PR: #31674
2.5 PR: #31885

@aaronyeeski
Copy link
Contributor

aaronyeeski commented Apr 2, 2021

Tested and validated the following test cases on Rancher master-head version 89c4d85 and v2.5-head version bdec2f2
EKS operator version v1.0.8-rc2

Scenario 1:
From Rancher create a hosted EKS cluster.
After cluster becomes available, edit cluster from Rancher.
Validate the following field edits are seen in EKS console:

  • SSH Key
  • Instance Type
  • User Data
  • Resource Tag
  • Node group Tag
    Add node group and verify it is available in EKS console.

Scenario 2:
From EKS console, create a cluster.
From Rancher, import the EKS cluster.
From Rancher, scale the node group.
Verify node group is scaled from Rancher and EKS console.
From Rancher, add a node group. Verify node group is available from EKS console.

Scenario 3:
From Rancher create a hosted EKS cluster.
After cluster becomes available, edit cluster from EKS console.
Because hosted EKS clusters use launch templates, add a launch template version to the managed template from EKS console. Make the following edits to the new version. (Users are discouraged from doing this.)

  • SSH Key
  • Instance Type
  • User Data
  • Resource Tag
  • Node group Tag

When Rancher syncs again, the changes will be reverted to Rancher's previous launch template version.
From EKS console, add a cluster. Verify nodes appear on cluster node view. From Rancher, the node group created from EKS console does not appear on cluster edit. Created an issue to track this: #31896

There are additional upgrade tests that are needed.

@aaronyeeski
Copy link
Contributor

The syncing interval can be changed by running kubectl edit setting eks-refresh on the local cluster. The value field (not the default field) can be edited to change the sync timing. The refresher will produce logs similar to: checking cluster [<cluster-name>] upstream state for changes. As the refresher code is all in rancher, EKS-Operator does not need an update.

@aaronyeeski
Copy link
Contributor

aaronyeeski commented Apr 2, 2021

Tested and validated the following upgrade use cases:

Scenario 1:
Deploy Rancher v2.5.7.
Create a hosted EKS cluster.
Verify EKS operator is version v1.0.7
Upgrade Rancher setup to v2.5-head
Verify EKS operator is upgraded to v1.0.8-rc2
From Rancher, edit the EKS cluster.
Validate the following field edits are seen in EKS console:

  • SSH Key
  • Instance Type
  • User Data
  • Resource Tag
  • Node group Tag

Add node group and verify it is available in EKS console.

Scenario 2:
Deploy Rancher v2.5.7.
Create a hosted EKS cluster.
Verify EKS operator is version v1.0.7
From the local cluster, run kubectl edit setting eks-refresh-cron to change the timing of the cron sync.
I changed it to 2 minutes:

apiVersion: management.cattle.io/v3
customized: false
default: '*/5 * * * *'
kind: Setting
metadata:
  creationTimestamp: "2021-04-02T17:45:12Z"
  generation: 2
  name: eks-refresh-cron
  resourceVersion: "8801"
  selfLink: /apis/management.cattle.io/v3/settings/eks-refresh-cron
  uid: 1d4a5c77-b6b1-4520-a1ad-0b62aa521e4d
source: ""
value: '*/2 * * * *'

Upgrade Rancher setup to v2.5-head
Verify EKS operator is upgraded to v1.0.8-rc2
From the local cluster, run kubectl edit setting eks-refresh to check the timing.
The timing used by the generic cloud config has been changed to 2 minutes:

apiVersion: management.cattle.io/v3
customized: false
default: "300"
kind: Setting
metadata:
  creationTimestamp: "2021-04-02T18:22:39Z"
  generation: 2
  managedFields:
  - apiVersion: management.cattle.io/v3
    fieldsType: FieldsV1
    fieldsV1:
      f:customized: {}
      f:default: {}
      f:source: {}
      f:value: {}
    manager: rancher
    operation: Update
    time: "2021-04-02T18:22:39Z"
  name: eks-refresh
  resourceVersion: "17083"
  selfLink: /apis/management.cattle.io/v3/settings/eks-refresh
  uid: 96dfec25-5ab4-485c-a9f7-1edef3f87fcf
source: ""
value: "120"

From Rancher, edit the EKS cluster.
Validate the following field edits are seen in EKS console:

  • SSH Key
  • Instance Type
  • User Data
  • Resource Tag
  • Node group Tag

Add node group and verify it is available in EKS console.

In both scenarios, cluster edits from Rancher are working. But I did notice a consistent repeat of errors from the Rancher logs:

2021/04/02 18:39:09 [DEBUG] EKS cluster [c-8kqbt] will be managed by eks-operator-controller, skipping update
2021/04/02 18:39:09 [INFO] waiting for cluster EKS [c-8kqbt] to update
2021/04/02 18:39:09 [ERROR] error syncing 'c-8kqbt': handler cluster-refresher-controller: lastRefreshTime is required, requeuing

@mjura
Copy link
Contributor

mjura commented Apr 2, 2021

In both scenarios, cluster edits from Rancher are working. But I did notice a consistent repeat of errors from the Rancher logs:

Indeed for the first time when when clusterLastRefreshTime annotation doesn't exist yet, we don't have to report error about lastRefreshTime.

I have prepared fix for it
2.6 PR: #31898
2.5 PR: #31899

@aaronyeeski
Copy link
Contributor

aaronyeeski commented Apr 2, 2021

Tested and validated an upgrade scenario for EKS V1.

Steps:
Bring up a Rancher v2.4.15
Deploy an EKS V1 cluster. (k8s version 1.16)
Verify cluster is available. Verify system workloads are healthy.
Upgrade Rancher setup to v2.5-head version bdec2f2
Verify cluster is available. Verify system workloads are healthy.
Verify cluster rancher-agent is upgraded: rancher-agent:v2.5-bdec2f2
Edit and upgrade the cluster k8s version to 1.17
After cluster is done upgrading, verify cluster is available. Verify system workloads are healthy.

@sowmyav27
Copy link
Contributor

An issue is seen on a sync of k8s version change from EKS to rancher - #31909

@shpwrck shpwrck added the kind/enhancement Issues that improve or augment existing functionality label May 4, 2021
@zube zube bot removed the [zube]: Done label Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Issues that improve or augment existing functionality release-note Note this issue in the milestone's release notes
Projects
None yet
Development

No branches or pull requests

7 participants