Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] vSphere RKE2 node driver windows cluster with cloud provider enabled fails to bring windows worker node to active state. #39060

Closed
vivek-shilimkar opened this issue Sep 21, 2022 · 5 comments
Assignees
Labels
area/windows kind/bug Issues that are defects reported by users or that we know have reached a real release kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement QA/S team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support [zube]: Done
Milestone

Comments

@vivek-shilimkar
Copy link
Member

vivek-shilimkar commented Sep 21, 2022

Rancher Server Setup

  • Rancher version: v2.6-head
  • Installation option (Docker install/Helm Chart): Docker
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc):

Information about the Cluster

  • Kubernetes version: v1.23.10+rke2r1
  • Cluster Type (Local/Downstream): Downstream
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): vSphere Node Driver

User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) Admin

Describe the bug

vSphere RKE2 node driver windows cluster with cloud provider enabled fails to bring windows worker node to active state.

To Reproduce

  1. Create a rancher server v2.6-head.
  2. Provision an RKE2 node driver cluster, v1.23.10+rke2r1 with cloud provider enabled to vSphere with following configuration.
  • 1cp - Linux OS (focal-server-cloudimg-amd64 template in data center)
  • 1etcd - Linux OS (focal-server-cloudimg-amd64 template in data center)
  • 1 Worker - Linux OS (focal-server-cloudimg-amd64 template in data center)
  • 1 Worker - Windows (windows-server-2022)
  1. Once the vsphere cloud provider is selected, add CPI-CSI details in Add-On Config in Cluster Configuration.
  2. Once all the details are correctly filled, create a cluster.
  3. Create a deployment along with storage with default PVC.

Result
All the linux nodes come in an active state, except the windows worker node.
However, the deployment with storage get created successfully with default PVC.

Expected Result

Windows worker node should also become active.

@vivek-shilimkar vivek-shilimkar added kind/bug Issues that are defects reported by users or that we know have reached a real release team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support labels Sep 21, 2022
@vivek-shilimkar vivek-shilimkar added this to the v2.7.0 milestone Sep 21, 2022
@vivek-shilimkar vivek-shilimkar self-assigned this Sep 21, 2022
@Sahota1225 Sahota1225 modified the milestones: v2.7.0, v2.7.1 Sep 21, 2022
@slickwarren
Copy link
Contributor

note that this was closed, however QA never tested it. So could possibly be a regression: rancher/windows#127

@Sahota1225 Sahota1225 modified the milestones: v2.7.1, v2.7.2 Oct 10, 2022
@sowmyav27 sowmyav27 modified the milestones: v2.7.2, v2.7.x Dec 21, 2022
@sowmyav27 sowmyav27 modified the milestones: v2.7.x, 2024-v2.8x-Backlog Oct 27, 2023
@slickwarren slickwarren added [zube]: To Test kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement and removed [zube]: To Triage labels Jan 10, 2024
@susesgartner
Copy link
Contributor

susesgartner commented Jan 16, 2024

I was unable to reproduce this on v2.6-head or v2.8-head.

@slickwarren
Copy link
Contributor

I'm able to reproduce this again on 2.7.11 and 2.8.3-rc2

@HarrisonWAffel
Copy link
Contributor

I believe the root cause of this was due to how we were generating Windows vSphere templates. The templates did not properly utilize cloudbase init when setting the hostname, resulting in the hostname, vm name, and dns name differing on initial provisioning. This caused lookup errors in the CPI and as a result the windows nodes never became available.

I've fixed the templates and uploaded them to the vSphere 8 environment, located in the hostbusters-windows-development-library. We should retest this using those templates to confirm if this is a misconfiguration in our environment or a genuine bug.

@slickwarren
Copy link
Contributor

tested using v2.8.3 - This indeed does work as intended now that our VMs have the right setup. Closing this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/windows kind/bug Issues that are defects reported by users or that we know have reached a real release kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement QA/S team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support [zube]: Done
Projects
None yet
Development

No branches or pull requests

9 participants