Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: incorrect runOnMaster config #1358

Merged
merged 1 commit into from
May 29, 2022

Conversation

andyzhangx
Copy link
Member

What type of PR is this?
/kind bug

What this PR does / why we need it:
fix: incorrect runOnMaster config

Which issue(s) this PR fixes:

Fixes #

Requirements:

Special notes for your reviewer:

Release note:

fix: incorrect runOnMaster config

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 28, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 28, 2022
@andyzhangx andyzhangx removed the request for review from edreed May 28, 2022 15:05
@andyzhangx
Copy link
Member Author

/retest

1 similar comment
@andyzhangx
Copy link
Member Author

/retest

@andyzhangx andyzhangx merged commit 7d115ab into kubernetes-sigs:master May 29, 2022
@jackfrancis
Copy link

@marosset
Copy link
Contributor

marosset commented Jun 1, 2022

On failed runs I see csi-azuredisk-controller pod not getting scheduled with

Node-Selectors:              kubernetes.io/os=linux
                             node-role.kubernetes.io/master=
Tolerations:                 node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/controlplane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  63s (x5 over 21m)  default-scheduler  0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.

but on the provisioned cluster the linux node only has the control-plane role

timeout --foreground 600 bash -c "while ! kubectl --kubeconfig=./kubeconfig get nodes | grep control-plane; do sleep 1; done"
Unable to connect to the server: dial tcp 20.31.26.118:6443: i/o timeout
capz-vrnxgy-control-plane-qb9m5   NotReady   control-plane   13s   v1.25.0-alpha.0.715+c6970e64528ba7
run "kubectl --kubeconfig=./kubeconfig ..." to work with the new target cluster
make[1]: Leaving directory '/home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure'
Waiting for 1 control plane machine(s), 0 worker machine(s), and 2 windows machine(s) to become Ready
node/capz-vrnx-96nm9 condition met
node/capz-vrnx-9p2rm condition met
node/capz-vrnxgy-control-plane-qb9m5 condition met
NAME                              STATUS   ROLES           AGE     VERSION                              INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION     CONTAINER-RUNTIME
capz-vrnx-96nm9                   Ready    <none>          72s     v1.25.0-alpha.0.715+c6970e64528ba7   10.1.0.5      <none>        Windows Server 2019 Datacenter   10.0.17763.2366    containerd://1.6.0-rc.1
capz-vrnx-9p2rm                   Ready    <none>          72s     v1.25.0-alpha.0.715+c6970e64528ba7   10.1.0.4      <none>        Windows Server 2019 Datacenter   10.0.17763.2366    containerd://1.6.0-rc.1
capz-vrnxgy-control-plane-qb9m5   Ready    control-plane   3m59s   v1.25.0-alpha.0.715+c6970e64528ba7   10.0.0.4      <none>        Ubuntu 18.04.5 LTS               5.3.0-1034-azure   containerd://1.3.4

@jackfrancis - should the clusters created with ci-entrypoint.sh have the control-plane nodes also be marked as master?

@jackfrancis
Copy link

It would be nice if we can kubectl get node <control plane node> -o yaml and verify that the nodeSelector change no longer works.

@marosset
Copy link
Contributor

marosset commented Jun 1, 2022

I have another cluster i recently build with ci-entrypoint.sh

Here is the output for the control-plane node

kubectl get node marosset-hpc-new-control-plane-l4b2h -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    cluster.x-k8s.io/cluster-name: marosset-hpc-new
    cluster.x-k8s.io/cluster-namespace: marosset-hpc-new
    cluster.x-k8s.io/machine: marosset-hpc-new-control-plane-zjkkq
    cluster.x-k8s.io/owner-kind: KubeadmControlPlane
    cluster.x-k8s.io/owner-name: marosset-hpc-new-control-plane
    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
    node.alpha.kubernetes.io/ttl: "0"
    projectcalico.org/IPv4Address: 10.0.0.4/16
    projectcalico.org/IPv4VXLANTunnelAddr: 192.168.20.65
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2022-05-24T20:32:19Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: Standard_D2s_v3
    beta.kubernetes.io/os: linux
    failure-domain.beta.kubernetes.io/region: westus2
    failure-domain.beta.kubernetes.io/zone: westus2-1
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: marosset-hpc-new-control-plane-l4b2h
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: ""
    node.kubernetes.io/exclude-from-external-load-balancers: ""
    node.kubernetes.io/instance-type: Standard_D2s_v3
    topology.kubernetes.io/region: westus2
    topology.kubernetes.io/zone: westus2-1
  name: marosset-hpc-new-control-plane-l4b2h
  resourceVersion: "1007416"
  uid: 1ac3030d-61a3-44f1-a08c-760c55dc45b2

@edreed
Copy link
Collaborator

edreed commented Jun 1, 2022

According to Well-Known Labels, Annotations and Taints, "node-role.kubernetes.io/master" is deprecated and has been removed in 1.25. It has been replaced by "node-role.kubernetes.io/control-plane" from 1.20 onward.

@marosset
Copy link
Contributor

marosset commented Jun 1, 2022

Sounds like we need a runOnControlPlane option similar to runOnMaster then :)

@marosset
Copy link
Contributor

marosset commented Jun 1, 2022

It also looks like kubeadm is probably setting this.
On another cluster I created with ci-entrypoint.sh that was configured with K8s v1.23.5 components I see both master and control-plane roles on the linux/master node.

@jackfrancis It was probably a combination of this change and changes to use 'latest' K8s bits instead of the hardcoded v1.23.5 bits you made with that caused this job to stop working

@jackfrancis
Copy link

Thanks @marosset and @edreed!

#1360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants