-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken "On-Premise" Installation on a Fresh Kubernetes 1.22 Cluster #4875
Comments
Same problem upgrading from Kubernetes 1.21.2 to 1.22.1 CoreDNS no longer works, but it is because there is no longer communication between PODs in different nodes. So one POD in nodeX can't reach to a coredns deployed in nodeY. It seems that the problem is in Calico 3.20 Calico 3.19.1 works fine with the same configuration in Kubernetes 1.21.2 I will try upgrading Kubernetes without updating Calico to see if it works. |
Hi again! I've upgraded my Kubernetes installation with kubeadm from version 1.21.2 to 1.22.1 without upgrade Calico (version 3.19.1) and the POD networking works OK. So I'm afraid the problem is in Calico version 3.20. |
@jeliseocd thanks for adding some extra info to my original bug report, been a bit low on time. |
@jeliseocd @pmyjavec could you share the logs from one of your calico/node pods? The logs in the OP appear to be from the init container and not from the calico-node container, which is where I would expect to see relevant diags. e.g. I am guessing this is related to v1.22 - I tried those steps on a v1.21 cluster and it appears to be working fine. Will try again on v1.22 to see if I can repro. |
Which manifest from that page are you guys using? |
tigera-operator.yaml.old.zip
This work fine. Calico version 3.20 was downloaded from: https://docs.projectcalico.org/archive/v3.20/manifests/tigera-operator.yaml In a while I will update Calico to version 3.20 again to upload the logs. |
I'm seeing what I think is a related problem and did some testing with various versions. Testing was primarily on Ubuntu 21.04 (Hirsute) on Vagrant but I also tested the first case with 20.04 (Focal) on AWS for another data point with the same results. All clusters are three node HA installed via kubeadm. Calico is installed using Calico 3.20 / k8s 1.22.1Not working. This is where I started. I first observed an issue with services not being able to find endpoints. When I ran I captured this log message which was repeated in all of my
While the routes existed, they appeared to be set up as I would expect (the next hop was the correct IP address for the other node). Calico 3.20 / k8s 1.21.4Not working. Same route removal behavior as Calico 3.20 / k8s 1.22.1. Same log messages in Calico 3.19 / k8s 1.22.1Not working, for a new reason. Calico never gets installed as the
This is not surprising as this resource type is no longer available in 1.22. Calico 3.19 / k8s 1.21.4Success! Everything works as expected. I don't see the above messages in the Hope this helps. Let me know if there's any additional data I can pull from any of these combinations. |
The parent directory is created fine, I don't know why it's referring to issues resolving symlinks:
|
The same issue in k8s cluster v 1.20.4, rhel 7.9 nodes. |
I have the same problem with kubernetes 1.22.1, centos 8 and calico 3.20 In the calico pod logs I don't see errors, but it definitely seems like a networking issue related to calico. The deployed pods can't communicate with each other. I don't know if my problem is related to this, but it is also fresh install 1.22 kubernetes cluster. |
I had a near/same problem with kubernetes 1.22.1, debian 11 and calico 3.20. There was an interesting log on
It seems to access wrong path |
@pmyjavec it seems like the "resolving symlinks" part of that message is a bit misleading, the real issue is that there is no "7.log" file in that dir. Did you by any chance change logging drivers? I ask because googling for this gave me this result: https://stackoverflow.com/questions/63028034/kubernetes-pod-logging-broken-with-journald-logging-driver Maybe passing a |
We solved this issue at least for our use-case. The problem was that we needed to comment out the following:
One thing that I didn't make clear was that while we were running a fresh cluster, it was hosted inside an LXD container. It seemed the /sys/fs/bpf information was already mounted for us. Thanks for the help, feel free to close this issue. |
When following the official documentation for on-premise on a freshly provisioned kubernetes cluster with
kubeadm
, Calico has issues and coredns containers can no longer start.When using the instructions in the quick start guide, calico seems to work fine, the cluster seems ok.
Expected Behavior
I would expect that the installation instructions on a new cluster would work fine.
Current Behavior
After installing calico, none of the calico containers can start, and the logs contain the following:
Logs from the pods in /var/log/pods.
The CoreDNS pods / containers also have trouble starting.
Possible Solution
Not sure yet.
Steps to Reproduce (for bugs)
# kubeadm init --pod-network-cidr=192.168.0.0/16
Context
Your Environment
The text was updated successfully, but these errors were encountered: