Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Linux tolerations for all deployments #324

Closed
nickgerace opened this issue Apr 5, 2021 · 8 comments
Closed

Add Linux tolerations for all deployments #324

nickgerace opened this issue Apr 5, 2021 · 8 comments
Assignees
Labels
Milestone

Comments

@nickgerace
Copy link
Contributor

With a "Windows cluster", there may not actually be any Windows node in a "Ready" state yet. In case pods have to fall back to Linux nodes, they need to toleration to do so.

Discovered by @sowmyav27

@nickgerace nickgerace added this to the v2.5.8 milestone Apr 5, 2021
@nickgerace nickgerace self-assigned this Apr 5, 2021
@sowmyav27 sowmyav27 self-assigned this Apr 5, 2021
@sowmyav27
Copy link

sowmyav27 commented Apr 5, 2021

I have a windows cluster, but no windows nodes yet. Have 1 etcd+control and 3 worker nodes linux nodes. No windows nodes.
I see fleet is unscheduled state. (on 2.5-head - commit id: 2f2cfe63)

0/4 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/controlplane: true}, that the pod didn't tolerate, 3 node(s) had taint {cattle.io/os: linux}, that the pod didn't tolerate.

Expected:
Fleet should be deployed successfully

@nickgerace
Copy link
Contributor Author

#323 has been merged, but we should get a newly tagged chart into rancher/charts

@nickgerace
Copy link
Contributor Author

A new chart has been tagged: v0.3.5-rc2

@thehejik
Copy link

thehejik commented Apr 6, 2021

Successfully validated the fix.

HA upstream cluster based on rancher:v2.5-head 93d921f with rancher/fleet:v0.3.5-rc2 automatically deployed from dev-v2.5 helm chart repo.

After bootstraping a downstream windows cluster (flannel,VXLAN), with 3 linux nodes only (same roles - etcd, control, worker), I see there is one Running/Active replica of fleet-agent pod using rancher/fleet-agent:v0.3.5-rc2 image.


The issue is reproducible on rancher:2.5.7 with rancher/fleet:v0.3.4 - the fleet-agent pod on downstream windows cluster (only linux nodes) is not running and failing on:

[Unschedulable] 0/3 nodes are available: 3 node(s) had taint {cattle.io/os: linux}, that the pod didn't tolerate.

@sowmyav27
Copy link

Reopening for a check on master-head in dev-v2.6 feature charts branch

@thehejik
Copy link

thehejik commented Apr 7, 2021

Blocked by broken Cluster Explorer in master-head 077fbf6- fleet-agent pod is not deployed when looking over kubectl/ember.

@sowmyav27
Copy link

Blocked because of this issue - rancher/rancher#31964

@thehejik
Copy link

Closing this, tracking master-head validation here - #339

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants