You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We hope Calico CNI automatically retries configuring the Wireguard interface IP address in case of failure.
Current Behavior
Unable to recover automatically after failure to configure the Wireguard interface IP address.
So, pods communication is broken between the affected node and other cluster nodes.
On the affected node:
WG interface: 40: wireguard.cali: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000 link/none
node object: spec: bgp: ipv4Address: 10.0.0.122/24 ipv4VXLANTunnelAddr: 10.233.72.0 orchRefs: - nodeName: node-3eb2f20d-fbc7-4a92-8948-f2f72ef68ce7 orchestrator: k8s status: wireguardPublicKey: TLki4UgTiy04biGtGl690AuMZzLNHO20GntWI9iipl0=
On a healthy node:
WG interface: 40: wireguard.cali: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000 link/none inet 10.233.67.1/32 scope global wireguard.cali valid_lft forever preferred_lft forever
node object: spec: bgp: ipv4Address: 10.0.0.120/24 ipv4VXLANTunnelAddr: 10.233.67.0 orchRefs: - nodeName: node-48e5cbf4-d2d2-49a9-a407-4cb3fe79559b orchestrator: k8s wireguard: interfaceIPv4Address: 10.233.67.1 status: wireguardPublicKey: m9TXUUo74FvTEOHtg4w7fYrcBl4SwOweSFuDGkJeIn8=
Possible Solution
N/A
(W/A: restart calico-node pod; IPv4 address on Wireguard interface was allocated after that.)
Steps to Reproduce (for bugs)
The issue was observed once on one node (in a three-node cluster) after a few similar deployment runs (seems to be a rare case). No exact steps found to reproduce it. Wireguard was enabled for Calico before the affected calico-node pod was started. The related calico-node log is attached.
I know you said you don't have an easy way to reproduce this, but could you try enabling debug logs on felix so that we have some more info in case it happens again?
Expected Behavior
We hope Calico CNI automatically retries configuring the Wireguard interface IP address in case of failure.
Current Behavior
Unable to recover automatically after failure to configure the Wireguard interface IP address.
So, pods communication is broken between the affected node and other cluster nodes.
On the affected node:
WG interface:
40: wireguard.cali: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000 link/none
node object:
spec: bgp: ipv4Address: 10.0.0.122/24 ipv4VXLANTunnelAddr: 10.233.72.0 orchRefs: - nodeName: node-3eb2f20d-fbc7-4a92-8948-f2f72ef68ce7 orchestrator: k8s status: wireguardPublicKey: TLki4UgTiy04biGtGl690AuMZzLNHO20GntWI9iipl0=
On a healthy node:
WG interface:
40: wireguard.cali: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000 link/none inet 10.233.67.1/32 scope global wireguard.cali valid_lft forever preferred_lft forever
node object:
spec: bgp: ipv4Address: 10.0.0.120/24 ipv4VXLANTunnelAddr: 10.233.67.0 orchRefs: - nodeName: node-48e5cbf4-d2d2-49a9-a407-4cb3fe79559b orchestrator: k8s wireguard: interfaceIPv4Address: 10.233.67.1 status: wireguardPublicKey: m9TXUUo74FvTEOHtg4w7fYrcBl4SwOweSFuDGkJeIn8=
Possible Solution
N/A
(W/A: restart calico-node pod; IPv4 address on Wireguard interface was allocated after that.)
Steps to Reproduce (for bugs)
The issue was observed once on one node (in a three-node cluster) after a few similar deployment runs (seems to be a rare case). No exact steps found to reproduce it. Wireguard was enabled for Calico before the affected calico-node pod was started. The related calico-node log is attached.
calico-node-log.zip
The possibly related error message:
2023-09-06T20:28:15.696034464Z 2023-09-06 20:28:15.695 [INFO][71] tunnel-ip-allocator/allocateip.go 686: Error updating node node-3eb2f20d-fbc7-4a92-8948-f2f72ef68ce7: %!s(<nil>). Retrying. type="wireguardTunnelAddress"
https://github.com/projectcalico/calico/blob/v3.25.1/node/pkg/allocateip/allocateip.go#L686
Your Environment
Calico version
v3.25.1 (etcd as calico data store, vxlan overlay, Wireguard enabled)
Orchestrator version (e.g. kubernetes, mesos, rkt):
Kubernetes v1.24.6.
Operating System and version:
Ubuntu 20.04.6 LTS, kernel version 5.4.0-150-generic
The text was updated successfully, but these errors were encountered: