Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodeup will fail in nodes #16270

Closed
zetaab opened this issue Jan 22, 2024 · 6 comments · Fixed by #16271
Closed

nodeup will fail in nodes #16270

zetaab opened this issue Jan 22, 2024 · 6 comments · Fixed by #16271
Labels
blocks-next kind/bug Categorizes issue or PR as related to a bug.

Comments

@zetaab
Copy link
Member

zetaab commented Jan 22, 2024

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

master

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.29.1

3. What cloud provider are you using?

openstack

4. What commands did you run? What is the simplest way to reproduce this issue?

updating cluster from 1.28.x to 1.29.1 using kops (current master aka 1.29 alpha 3)

5. What happened after the commands executed?

Control planes will be updated fine. However, none of the normal nodes does not update.

Jan 22 13:43:12 nodes-esptnl-11yc4j nodeup[1176]: W0122 13:43:12.140729    1176 main.go:133] got error running nodeup (will retry in 30s): failed to get node config from server: lookup kops-controller.internal.xx.k8s.local on 127.0.0.53:53: server misbehaving

/opt/kops/conf/kube_env.yaml config from 1.28.x working node https://gist.github.com/zetaab/29f7660159010f6327d526fd2a0dc635

vs https://gist.github.com/zetaab/daf6dd1f3b28c778a91d49e6fdbaf466

What is changed? APIServerIPs array field is removed and ConfigServer.servers are modified to use dns name instead of ips. I tried to modify it back to use ip address, but certs does not work after that. The problem with dns name is that these dns names does not exist in /etc/hosts without gossip or similar(?)

6. What did you expect to happen?

I expect that I could use kOps still with dns=none.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 22, 2024
@zetaab zetaab changed the title nodeup does not start nodeup will fail in nodes Jan 22, 2024
@zetaab
Copy link
Member Author

zetaab commented Jan 22, 2024

one possible breaking #15829 which modifies the same part of the code

@zetaab
Copy link
Member Author

zetaab commented Jan 22, 2024

@justinsb I can confirm that #15829 modified the behaviour in APIServerIPs and ConfigServer.servers. Do you have somekind of idea how we could get it working? The problem is that dns name for kops-controller does not exists in dns=none clusters at least in OpenStack.

@zetaab
Copy link
Member Author

zetaab commented Jan 22, 2024

created new cluster to aws and openstack. AWS works, openstack not. The command is

./kops create cluster \
  --cloud openstack \
  --name jessetesti.k8s.local \
  --state ${KOPS_STATE_STORE} \
  --zones x,y,z \
  --network-cidr 10.2.0.0/16 \
  --image ubuntu-2004-081223-devops \
  --bastion \
  --dns=none \
  --control-plane-count=3 \
  --node-count=3 \
  --node-size m1.medium \
  --control-plane-size m1.medium \
  --etcd-storage-type solidfire \
  --topology private \
  --networking calico \
  --api-loadbalancer-type public \
  --os-octavia=true \
  --os-ext-net xx-nap \
  --os-ext-subnet ext-ha-v4 \
  --os-lb-floating-subnet ext-ha-v4 --kubernetes-version 1.29.1 --yes

and

./kops create cluster --name jesseaws2.k8s.local --dns=none --zones eu-north-1a,eu-north-1b,eu-north-1c --control-plane-count=3 --node-count=3 --node-size t3.small --kubernetes-version 1.29.1 --control-plane-size t3.small
cat /opt/kops/conf/kube_env.yaml 
APIServerIPs:
- 172.20.120.96
- 172.20.146.122
- 172.20.54.100
CloudProvider: aws
ClusterName: jesseaws2.k8s.local
ConfigServer:
  CACertificates: |
    -----BEGIN CERTIFICATE-----
    MIIC+DCCAeCgAwIBAgIMF6y145gxZShmQ1iPMA0GCSqGSIb3DQEBCwUAMBgxFjAU
    BgNVBAMTDWtxxxX4tTNotU=
    -----END CERTIFICATE-----
  servers:
  - https://172.20.120.96:3988/
  - https://172.20.146.122:3988/
  - https://172.20.54.100:3988/
InstanceGroupName: nodes-eu-north-1a
InstanceGroupRole: Node
NodeupConfigHash: cYaiGjBPTqbN1eCGvAJxjastnXGFNWRR2i2mUoTC3M0=
cat /opt/kops/conf/kube_env.yaml 
CloudProvider: openstack
ClusterName: jessetesti.k8s.local
ConfigServer:
  CACertificates: |
    -----BEGIN CERTIFICATE-----
    MIIC+DCCAeCgAwIBAgIMF6y2D8jb/2jJ9N3HMA0GCSqGSIb3DQEBCwUAMBgxFjAU
    BgNVBAMTDWt1YmVybmV0ZXMtY2EwHhcNMjQwMTIwMTU0ODU3WhcNMzQwMTE5MTU0
    ODU3Wxxxx+gL
    62ktwBmZ9w90b9Y1n+7tC5ujAyqcGIe7CEUOUavY4XBQQoXvAwpzJaY6FaIoZpJ+
    X+12JyIAXpHlmA4NV/7VjKkDyAWDncEOsk0ImWXDXB8L3xipCaUJrtI5LTU=
    -----END CERTIFICATE-----
  servers:
  - https://kops-controller.internal.jessetesti.k8s.local:3988/
InstanceGroupName: nodes-xxx
InstanceGroupRole: Node
NodeupConfigHash: /hwCnaFYXi1GUHTKbWsKqR/FyjIBBH73e0s4vhi1OrI=

so we can clearly see that this is the issue. I will next investigate why its not working in similar way in OpenStack

@zetaab
Copy link
Member Author

zetaab commented Jan 22, 2024

earlier https://github.com/kubernetes/kops/blame/master/upup/pkg/fi/cloudup/apply_cluster.go#L1461 this function was containing apiserverAdditionalIPs but now I cannot see anything. I can actually see loadbalancer ip address, but in case of OpenStack we are interested of apiserver ips, which are not part of that array at all.

@zetaab
Copy link
Member Author

zetaab commented Jan 22, 2024

it did not solve the whole issue

Jan 22 20:32:18 nodes-xx-p6tj9t nodeup[1175]: W0122 20:32:18.368505    1175 main.go:133] got error running nodeup (will retry in 30s): failed to get node config from server: Post "https://100.68.2.89:3988/bootstrap": tls: failed to verify certificate: x509: cannot validate certificate for 100.68.2.89 because it doesn't contain any IP SANs; Post "https://100.72.3.40:3988/bootstrap": tls: failed to verify certificate: x509: cannot validate certificate for 100.72.3.40 because it doesn't contain any IP SANs; Post "https://100.76.3.165:3988/bootstrap": tls: failed to verify certificate: x509: cannot validate certificate for 100.76.3.165 because it doesn't contain any IP SANs

@zetaab
Copy link
Member Author

zetaab commented Jan 22, 2024

had old controlplanes that did not have correct certs in kops-controller

@zetaab zetaab closed this as completed Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocks-next kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants