Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Creating windows node group actually creates linux nodes #7645

Closed
mjj29 opened this issue Mar 13, 2024 · 10 comments · Fixed by #7681
Closed

[Bug] Creating windows node group actually creates linux nodes #7645

mjj29 opened this issue Mar 13, 2024 · 10 comments · Fixed by #7681
Labels

Comments

@mjj29
Copy link

mjj29 commented Mar 13, 2024

I've been trying to create Windows EKS clusters and every time I try it I actually get Linux nodes instead. I tried with this command to add them to an existing cluster: eksctl create nodegroup --cluster=apama-windows --node-ami-family=WindowsServer2022CoreContainer --node-volume-size=200 --instance-selector-vcpus=16 --instance-selector-cpu-architecture x86_64 --spot

We tried creating a complete cluster including nodes with:

---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: apama-windows
  region: eu-north-1

managedNodeGroups:
  - name: linux-ng
    instanceSelector:
      vCPUs: 2
    minSize: 1
    maxSize: 2
    spot: true

  - name: windows-managed-ng
    amiFamily: WindowsServer2022CoreContainer
    instanceSelector:
      vCPUs: 16
    minSize: 1
    maxSize: 3
    volumeSize: 200
    spot: true

It complained that we would need to run eksctl utils install-vpc-controllers --name=apama-windows --region=eu-north-1 --approve, but created linux nodes so it wouldn't matter.

I tried creating linux nodes only, then running that command, then adding windows nodes. That command failed with Error: error installing VPC controller: creating CertificateSigningRequest: constructing REST client mapping for certificates.k8s.io/v1beta1, Kind=CertificateSigningRequest: no matches for kind "CertificateSigningRequest" in versions ["certificates.k8s.io/" "certificates.k8s.io/v1beta1"], but I ensured we had running VPC controllers, then tried to add nodes, then still got linux nodes.

I tried with 2022Core, 2019Core and 2022Full.

What's going on?

@mjj29 mjj29 added the kind/bug label Mar 13, 2024
Copy link
Contributor

Hello mjj29 👋 Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

@yuxiang-zhang
Copy link
Member

Hey @mjj29 could you please check if this answers your question? #6158

@mjj29
Copy link
Author

mjj29 commented Mar 14, 2024

I read that one, and the other one about errors occurring during the process. I'm not asking (nor do I need) for GPUs, and there weren't any errors displayed

@mjj29
Copy link
Author

mjj29 commented Mar 18, 2024

Hi there, any updates?

@yuxiang-zhang
Copy link
Member

Hi, I can't reproduce the issue, I can create windows nodegroups with windows nodes without any problem (even with the same config file you had).

Regarding the VPC controller error, please take a look at #7521 (comment)

@TiberiuGC
Copy link
Collaborator

I read that one, and the other one about errors occurring during the process. I'm not asking (nor do I need) for GPUs, and there weren't any errors displayed

The problem here is similar to the issue previously indicated (#6158). The instance selector returns a bunch of instance types that satisfy your requirement (vCPUs: 16), among which there are GPU instances. Thus, eksctl tries to select a compatible AMI, which for Windows is not available.

func selectManagedInstanceType(ng *api.ManagedNodeGroup) string {
if len(ng.InstanceTypes) > 0 {
for _, instanceType := range ng.InstanceTypes {
if instanceutils.IsGPUInstanceType(instanceType) {
return instanceType
}
}
return ng.InstanceTypes[0]
}
return ng.InstanceType
}

As per this comment, I agree that this is a bug and eksctl could EITHER:

  • return an error, if the desired instance type was requested by the user
    OR
  • ignore GPU instances if they were returned by instance selector, provided other instance types fulfil the requirements

For the moment, I'm not sure about the implementation details of the fix. I'll investigate further.

@mjj29
Copy link
Author

mjj29 commented Mar 20, 2024

OK, that sounds plausible. I can presumably work around it by specifying gpu: false in the selector? I definitely think that your second option is what it should do

@TiberiuGC
Copy link
Collaborator

Yes, I've tried the configuration below and it's working

amiFamily: WindowsServer2022CoreContainer
instanceSelector:
  vCPUs: 16
  gpus: 0

@mjj29
Copy link
Author

mjj29 commented Mar 21, 2024

OK, I can confirm this has created windows nodes for me now. However, I was still getting the IP address error, so I followed the steps on https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html#enable-windows-support and then deleted and replaced my windows pod deployment and I still get:

Warning FailedCreatePodSandBox 10s kubelet, ip-10-78-134-13.eu-central-1.compute.internal Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b60ef7a1d8e2a3889cf3dc44daedecd756bb9994a3cd3f58ee795a20f1c390e0": plugin type="vpc-bridge" name="vpc" failed (add): failed to parse Kubernetes args: failed to get pod IP address win-worker-node-6b76c4cc65-fcdxv: error executing k8s connector: error executing connector binary: exit status 1 with execution error: pod win-worker-node-6b76c4cc65-fcdxv does not have label vpc.amazonaws.com/PrivateIPv4Address

@mjj29
Copy link
Author

mjj29 commented Mar 21, 2024

Now I have this error: Warm pool for resource vpc.amazonaws.com/PrivateIPv4Address is currently empty, will retry in 600ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants