Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico can't live without a default route #8481

Closed
uablrek opened this issue Feb 3, 2024 · 11 comments
Closed

Calico can't live without a default route #8481

uablrek opened this issue Feb 3, 2024 · 11 comments

Comments

@uablrek
Copy link
Contributor

uablrek commented Feb 3, 2024

Expected Behavior

Calico should work even if the K8s nodes has no default route.

Other CNI-plugins can handle this case. I have tested: Cilium, Flannel, Kindnet and Antrea.

Current Behavior

If there is no default route the K8s kubernetes service becomes unavailable (and likely all services but I don't get that far).

With proxy-mode=ipvs the calico-kube-controllers never becomes "ready" and restarts:

NAMESPACE     NAME                                       READY   STATUS    RESTARTS      AGE
kube-system   calico-kube-controllers-5fc7d6cf67-prrkb   0/1     Running   3 (33s ago)   3m3s
kube-system   calico-node-r8rxc                          1/1     Running   0             3m3s
# Logs (narrowed)
...
2024-02-03 10:25:10.127 [INFO][1] main.go 138: Failed to initialize datastore error=Get "https://12.0.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 12.0.0.1:443: connect: no route to host
2024-02-03 10:25:18.187 [ERROR][1] client.go 295: Error getting cluster information config ClusterInformation="default" error=Get "https://12.0.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 12.0.0.1:443: connect: no route to host
2024-02-03 10:25:18.187 [INFO][1] main.go 138: Failed to initialize datastore error=Get "https://12.0.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 12.0.0.1:443: connect: no route to host

12.0.0.1 is the ip of the "kubernetes" service.

With proxy-mode=iptables it's even worse:

NAMESPACE     NAME                                       READY   STATUS                  RESTARTS      AGE
kube-system   calico-kube-controllers-5fc7d6cf67-mvxcj   0/1     Pending                 0             47s
kube-system   calico-node-dlfq4                          0/1     Init:CrashLoopBackOff   2 (25s ago)   47s

Possible Solution

No idea

Steps to Reproduce (for bugs)

Start a K8s cluster with Calico and no default route on the nodes.

Context

Experiment in a virtual test-cluster. When no router VMs are started, the K8s nodes don't get a default route.
No impact on any production, and no hurry to fix this.

Your Environment

  • Calico version: calico/cni:v3.27.0
  • Orchestrator version: K8s v1.29.1
  • Operating System and version: Own built busybox system, kernel Linux 6.7.0
@cyclinder
Copy link
Contributor

cyclinder commented Feb 4, 2024

As far as I know, Calico doesn't depend on the NO default route, Can you show the manifests of the calico-node and logs?

@uablrek
Copy link
Contributor Author

uablrek commented Feb 4, 2024

Manifest calico.yaml.txt

@uablrek
Copy link
Contributor Author

uablrek commented Feb 4, 2024

Some logs with proxy-mode=iptables:

vm-001 ~ # kubectl logs -n kube-system calico-node-5462m -c install-cni
2024-02-04 07:49:20.598 [INFO][1] cni-installer/<nil> <nil>: Running as a Kubernetes pod
2024-02-04 07:49:20.687 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/bandwidth"
2024-02-04 07:49:20.687 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/bandwidth
2024-02-04 07:49:20.737 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/calico"
2024-02-04 07:49:20.737 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/calico
2024-02-04 07:49:20.801 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/calico-ipam"
2024-02-04 07:49:20.801 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/calico-ipam
2024-02-04 07:49:20.803 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/flannel"
2024-02-04 07:49:20.803 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/flannel
2024-02-04 07:49:20.805 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/host-local"
2024-02-04 07:49:20.805 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/host-local
2024-02-04 07:49:20.867 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/install"
2024-02-04 07:49:20.867 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/install
2024-02-04 07:49:20.875 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/loopback"
2024-02-04 07:49:20.875 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/loopback
2024-02-04 07:49:20.882 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/portmap"
2024-02-04 07:49:20.882 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/portmap
2024-02-04 07:49:20.888 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/tuning"
2024-02-04 07:49:20.888 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/tuning
2024-02-04 07:49:20.888 [INFO][1] cni-installer/<nil> <nil>: Wrote Calico CNI binaries to /host/opt/cni/bin

2024-02-04 07:49:20.907 [INFO][1] cni-installer/<nil> <nil>: CNI plugin version: v3.27.0

2024-02-04 07:49:20.907 [INFO][1] cni-installer/<nil> <nil>: /host/secondary-bin-dir is not writeable, skipping
2024-02-04 07:49:20.907 [WARNING][1] cni-installer/<nil> <nil>: Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2024-02-04 07:49:20.908 [ERROR][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "https://12.0.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-cni-plugin/token": dial tcp 12.0.0.1:443: connect: network is unreachable
2024-02-04 07:49:20.908 [FATAL][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "https://12.0.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-cni-plugin/token": dial tcp 12.0.0.1:443: connect: network is unreachable

vm-001 ~ # ip ro add default via 192.168.1.201
# (wait for restart... then)
kubectl logs -n kube-system calico-node-5462m -c install-cni
2024-02-04 07:54:11.865 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/tuning
2024-02-04 07:54:11.865 [INFO][1] cni-installer/<nil> <nil>: Wrote Calico CNI binaries to /host/opt/cni/bin

2024-02-04 07:54:11.876 [INFO][1] cni-installer/<nil> <nil>: CNI plugin version: v3.27.0

2024-02-04 07:54:11.876 [INFO][1] cni-installer/<nil> <nil>: /host/secondary-bin-dir is not writeable, skipping
2024-02-04 07:54:11.876 [WARNING][1] cni-installer/<nil> <nil>: Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "log_file_path": "/var/log/calico/cni/cni.log",
      "datastore_type": "kubernetes",
      "nodename": "vm-001",
      "mtu": 0,
      "ipam": {
          "type": "calico-ipam",
          "assign_ipv4": "true",
          "assign_ipv6": "true"
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {"portMappings": true}
    },
    {
      "type": "bandwidth",
      "capabilities": {"bandwidth": true}
    }
  ]
}
2024-02-04 07:54:11.880 [INFO][1] cni-installer/<nil> <nil>: Using CNI config template from CNI_NETWORK_CONFIG environment variable.
2024-02-04 07:54:11.880 [INFO][1] cni-installer/<nil> <nil>: Created /host/etc/cni/net.d/10-calico.conflist
2024-02-04 07:54:11.880 [INFO][1] cni-installer/<nil> <nil>: Done configuring CNI.  Sleep= false

@uablrek
Copy link
Contributor Author

uablrek commented Feb 4, 2024

vm-001 ~ # kubectl describe pod -n kube-system   calico-node-7sf2p
...
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  51s               default-scheduler  Successfully assigned kube-system/calico-node-7sf2p to vm-001
  Normal   Pulling    50s               kubelet            Pulling image "docker.io/calico/cni:v3.27.0"
  Normal   Pulled     49s               kubelet            Successfully pulled image "docker.io/calico/cni:v3.27.0" in 1.7s (1.7s including waiting)
  Normal   Created    49s               kubelet            Created container upgrade-ipam
  Normal   Started    49s               kubelet            Started container upgrade-ipam
  Normal   Pulled     3s (x4 over 48s)  kubelet            Container image "docker.io/calico/cni:v3.27.0" already present on machine
  Normal   Created    3s (x4 over 48s)  kubelet            Created container install-cni
  Normal   Started    3s (x4 over 48s)  kubelet            Started container install-cni
  Warning  BackOff    2s (x5 over 46s)  kubelet            Back-off restarting failed container install-cni in pod calico-node-7sf2p_kube-system(40fc324f-d550-488d-998d-6798495813ee)

@cyclinder
Copy link
Contributor

➜  test git:(e687fb89) ✗ kubectl get po -A
NAMESPACE            NAME                                           READY   STATUS              RESTARTS      AGE
kube-system          calico-kube-controllers-5fc7d6cf67-pld87       0/1     ContainerCreating   0             3m34s
kube-system          calico-node-nmc8l                              0/1     Init:Error          4             3m34s
kube-system          calico-node-rfnz9                              0/1     Init:Error          4 (75s ago)   3m34s
...
2024-02-04 08:41:24.403 [INFO][1] cni-installer/<nil> <nil>: /host/secondary-bin-dir is not writeable, skipping
2024-02-04 08:41:24.403 [WARNING][1] cni-installer/<nil> <nil>: Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2024-02-04 08:41:24.411 [ERROR][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "https://10.233.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-cni-plugin/token": dial tcp 10.233.0.1:443: connect: network is unreachable
2024-02-04 08:41:24.411 [FATAL][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "https://10.233.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-cni-plugin/token": dial tcp 10.233.0.1:443: connect: network is unreachable

After I added the default route, the calico-node is running.

~# ip r a default via 172.25.0.1 dev eth0
➜  test git:(e687fb89) ✗ kubectl get po -A
NAMESPACE            NAME                                           READY   STATUS       RESTARTS      AGE
kube-system          calico-kube-controllers-5fc7d6cf67-pld87       1/1     Running      0             52m
kube-system          calico-node-2bmn4                              1/1     Running      0             3m1s

I think it has something to do with iptables MASQUERADE?

@cyclinder
Copy link
Contributor

Without a default route, I can't access the apiserver at the node through the address of the kubernetes service, but access to the endpoint is fine. After adding the default route, everything works fine.

test git:(e687fb89) ✗ nsenter -t 108333 -n
cyclinder3# curl -k https://10.233.0.1:443
curl: (7) Couldn't connect to server
cyclinder3# curl -k https://10.233.0.1:443
curl: (7) Couldn't connect to server

cyclinder3# curl -k https://172.25.0.3:6443
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}
cyclinder3#
cyclinder3# ip r a default via 172.25.0.1 dev eth0
cyclinder3# curl -k https://10.233.0.1:6443
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

This doesn't seem to have anything to do with calico, it's more like iptables behavior.

@uablrek
Copy link
Contributor Author

uablrek commented Feb 4, 2024

No, it works with just about every other cni-plugin:

Other CNI-plugins can handle this case. I have tested: Cilium, Flannel, Kindnet and Antrea.

so this is a Calico-only issue. I may be that Calico can live without a default route, but can't be installed without it? I start with an empty cluster

@cyclinder
Copy link
Contributor

This seems very strange, there is no default route and DNAT doesn't seem to work. I'm not sure how other CNIs work, these iptables rules were created by kube-proxy

@uablrek
Copy link
Contributor Author

uablrek commented Feb 4, 2024

You are right. With proxy-mode=iptables flannel doesn't work without a default route either. My default setup is proxy-mode=ipvs so I didn't notice, sorry.

But with proxy-mode=ipvs the kubernetes service address works from a main netns on a node, even with Calico:

vm-001 ~ # curl -k https://12.0.0.1
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401

@uablrek
Copy link
Contributor Author

uablrek commented Feb 4, 2024

I guess this is a K8s "peculiarity" after all, not a Calico issue. I don't think it's a big deal, everybody has a default route, right...

Some security buff many refuse to set it though, so it should probably be documented. I'll write an issue on K8s...

Should I close this one? Or do you want to track it?

It took some time to troubleshoot 😄

@uablrek
Copy link
Contributor Author

uablrek commented Feb 4, 2024

Moved to kubernetes/kubernetes#123120

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants