Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rke2-worker-plan fail on windows server 2022 node #45912

Open
Ducatel opened this issue Jun 21, 2024 · 9 comments
Open

rke2-worker-plan fail on windows server 2022 node #45912

Ducatel opened this issue Jun 21, 2024 · 9 comments

Comments

@Ducatel
Copy link

Ducatel commented Jun 21, 2024

Environmental Info:
RKE2 Version:

On windows nodes:

rke2.exe version v1.27.12+rke2r1 (25b27b4e4709a2ac4c550609ad730a9e172d110a)
go version go1.21.8

On linux node:

rke2 version v1.27.12+rke2r1 (25b27b4e4709a2ac4c550609ad730a9e172d110a)
go version go1.21.8 X:boringcrypto

Node(s) CPU architecture, OS, and Version:

On windows nodes:

windows server 2022 21H2 Build 20348.2527

On linux node:

Linux  5.14.0-427.18.1.el9_4.x86_64 rancher/rke2#1 SMP PREEMPT_DYNAMIC Mon May 13 10:47:09 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

  • 3 RHEL9 master
  • 2 RHEL9 worker
  • 2 windows server 2022 worker

Describe the bug:

Repeatedly observe a failing pod named apply-rke2-worker-plan* due to ImagePullBackOff (Back-off pulling image "rancher/rke2-upgrade:v1.27.12-rke2r1") on windows node.

But pods scheduled on linux node seems to work properly.

image

image

When I tried to manually pull the image from a windows node:

> crictl pull rancher/rke2-upgrade:v1.27.12-rke2r1
E0621 14:41:14.505286    9092 remote_image.go:167] "PullImage from image service failed" err="rpc error: code = NotFound desc = failed to pull and unpack image \"docker.io/rancher/rke2-upgrade:v1.27.12-rke2r1\": no match for platform in manifest: not found" image="rancher/rke2-upgrade:v1.27.12-rke2r1"
time="2024-06-21T14:41:14+02:00" level=fatal msg="pulling image: rpc error: code = NotFound desc = failed to pull and unpack image \"docker.io/rancher/rke2-upgrade:v1.27.12-rke2r1\": no match for platform in manifest: not found"

So seems to not have windows compatible build.

Steps To Reproduce:

  • Installed RKE2:

    1. by following the quickstart
    2. started server with --cni=calico (to be able to handle windows node)
  • On top of that, I installed rancher UI by using helm by following official documentation

Expected behavior:

Upgrade should not fail on windows node. (I don't really know what this upgrade do ???)

Actual behavior:

Upgrade fail on windows node

Thanks in advance for your help

@manuelbuil
Copy link

Automatic upgrading does not work on windows nodes. But I think we should include that in the docs or show a useful log to avoid confusion. Would you mind explaining how did you trigger the upgrade process? Thanks!

@Ducatel
Copy link
Author

Ducatel commented Jun 24, 2024

Hi,

As I said, I just follow the step from Quick start Rancher documentation and there is no place where this upgrade plan is mentioned.
So I don't know at all how I can remove this schedule on windows node.
After few search, It seems many peoples face this issue ( on rancher forum, stackoverflow, etc.....) without solutions.

I can probably update the node selector in the rke2-worker-plan to avoid that, but I don't know if it's safe or not

image

@manuelbuil
Copy link

In the RKE2 docs, we are explaining that plan: https://docs.rke2.io/upgrade/automated_upgrade

Could you be more specific about the Quick start Rancher documentation? What exact doc are you looking at? Unfortunately, in the Quick Start Guide, I don't see anything describing upgrades:
image

I think we should create some anti-affinity with Windows nodes and warn the user that it needs to do that upgrade manually

@Ducatel
Copy link
Author

Ducatel commented Jun 24, 2024

I followed :

And yes, nothing about upgrade in theses documentation.

So I tried to update the plan with

    matchExpressions:
    - key: node-role.kubernetes.io/master
      operator: DoesNotExist
    - key: kubernetes.io/os
      operator: NotIn
      values:
        - windows

But seems to not be applied

@manuelbuil
Copy link

Try with this item in the matchExpressions ==> - {key: beta.kubernetes.io/os, operator: In, values: ["linux"]}

@manuelbuil
Copy link

I'll probably update the docs warning about windows not being supported in the SUC and add that matchExpression in the example, so that it is less likely that people get confused

@Ducatel
Copy link
Author

Ducatel commented Jun 26, 2024

Try with this item in the matchExpressions ==> - {key: beta.kubernetes.io/os, operator: In, values: ["linux"]}

It's not a matter of how I write the matchExpressions. Just when I edit the plan by rancher UI or kubectl edit my change seems to not be reflected. When I get back the yaml config, it's still the same.

I'll probably update the docs warning about windows not being supported in the SUC and add that matchExpression in the example, so that it is less likely that people get confused

The thing is, I didn't create this plan myself. So even with a documentation updated properly, some people will still face the issu

@manuelbuil
Copy link

Try with this item in the matchExpressions ==> - {key: beta.kubernetes.io/os, operator: In, values: ["linux"]}

It's not a matter of how I write the matchExpressions. Just when I edit the plan by rancher UI or kubectl edit my change seems to not be reflected. When I get back the yaml config, it's still the same.

I'll probably update the docs warning about windows not being supported in the SUC and add that matchExpression in the example, so that it is less likely that people get confused

The thing is, I didn't create this plan myself. So even with a documentation updated properly, some people will still face the issu

Ok, I thought this was a pure RKE2 issue but I now understand it's a Rancher issue. I'll have a look at how RM is generating that plan. I think updating the docs will also help users that are doing the upgrade by following the RKE2 docs https://docs.rke2.io/upgrade/automated_upgrade

@brandond
Copy link
Contributor

Yeah, if this is the Rancher-managed SUC deployment and plan, then this is a Rancher issue, not RKE2.

I don't know that Rancher currently supports imported clusters with Windows nodes, I suspect it only properly handles Windows clusters that are provisioned via Rancher. I would defer to the support matrix as to whether or not this is something that is supposed to work.

@brandond brandond transferred this issue from rancher/rke2 Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants