for a service mapped to multiple pods with local storage - ebpf NAT table broken in case of node shutdown affecting one of a pods #8650

plesoun-stein · 2024-03-25T16:20:57Z

Expected Behavior

ebpf is enabled.
I have a service for a multiple pods with a local storage
After a node shutdown nat records of the failed pods should be deleted
and the service maps only to a running pods.

Current Behavior

ebpf is enabled
We have k8s service for multiple pods with local storage

After nodeshutdown:

node-monitor-grace-period (40s)
the service works as expected, it rotates a pods
the pods on failed node generates timeouts
nat table looks like

defaulted container "calico-node" out of: calico-node, flexvol-driver (init), mount-bpffs (init), install-cni (init)
10.106.177.114 port 80 proto 6 id 29 count 2 local 1
	29:0	 10.250.31.171 : 80
	29:1	 10.250.151.62 : 80

node.kubernetes.io/unreachable tolerations period (for testing set to 30 s)
the service works as expected,
the ip from a failed pods are deleted from the service NAT table

Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), mount-bpffs (init), install-cni (init)
10.106.177.114 port 80 proto 6 id 29 count 1 local 1
  29:0	 10.250.31.171 : 80

a tolerations period expires
the pods on a failed node are marked as Terminating
the service works in unexpected behavior.
there is added a record is missing to nat table
the service rotates a pods and
the pods on a failed node generates Operation not permitted
output from nat table:

Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), mount-bpffs (init), install-cni (init)
10.106.177.114 port 80 proto 6 id 29 count 2 local 1
  29:0	 10.250.31.171 : 80
  29:1	 is missing

strace for the failed pod looks like

connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("10.106.177.114")}, 16) = -1 EPERM (Operation not permitted)

Possible Solution

As a short term we can disable ebpf

Steps to Reproduce (for bugs)

enable eBPF mode for calico.
install a rancher.io/local-path provisioner
create a pods with locally mounted volume with storage class local-path
create a service mapped to pods
shutdown a node with a pod
wait until the grace time and the tolerations expires

It's the same with statefulset

Context

We were testing a cloudnative postgresql operator
We have observed unexpected behavior during failover.

Your Environment

testing cluster
VMs in VMware
3 control plain nodes (1,3,5)
3 worker nodes (2,4,6)
3 locations, P1(nodes 1,2), P2(nodes 3,4), B(nodes 5,6)
node4 is labeled with a label number=4 and node 6 is labeled with a label number=6
pods created on nodes 4, 6
An attachment file test-service-pods.txt contains a definitions for a pvs, pvcs, pods and services
A pods expects nodes labeled as described.
test-service-pods.txt
A graceful shutdown was triggered with shutdown -h now command
A non-graceful shutdown was triggered with a vmware poweroff simulation
Behavior does not depend on a node fail source.

IPV4 only
Calico version
Client Version: v3.27.0
Git commit: 711528e
Cluster Version: v3.27.0
Cluster Type: typha,kdd,k8s,operator,bgp,kubeadm
Orchestrator version (e.g. kubernetes, mesos, rkt):
kubectl version output
Client Version: v1.28.7
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.7
Operating System and version:
CentOS Stream release 9
Link to your project (optional):

The text was updated successfully, but these errors were encountered:

tomastigera · 2024-04-01T17:27:24Z

I believe this has been fixed in 3.27.1 by #8460 So I am closing this issue. Feel free to reopen if 3.27.1 does not work as expected.

caseydavenport added the area/bpf eBPF Dataplane issues label Mar 27, 2024

tomastigera closed this as completed Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

for a service mapped to multiple pods with local storage - ebpf NAT table broken in case of node shutdown affecting one of a pods #8650

for a service mapped to multiple pods with local storage - ebpf NAT table broken in case of node shutdown affecting one of a pods #8650

plesoun-stein commented Mar 25, 2024 •

edited

tomastigera commented Apr 1, 2024

for a service mapped to multiple pods with local storage - ebpf NAT table broken in case of node shutdown affecting one of a pods #8650

for a service mapped to multiple pods with local storage - ebpf NAT table broken in case of node shutdown affecting one of a pods #8650

Comments

plesoun-stein commented Mar 25, 2024 • edited

Expected Behavior

Current Behavior

After nodeshutdown:

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

tomastigera commented Apr 1, 2024

plesoun-stein commented Mar 25, 2024 •

edited