Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for a service mapped to multiple pods with local storage - ebpf NAT table broken in case of node shutdown affecting one of a pods #8650

Closed
plesoun-stein opened this issue Mar 25, 2024 · 1 comment
Labels
area/bpf eBPF Dataplane issues

Comments

@plesoun-stein
Copy link

plesoun-stein commented Mar 25, 2024

Expected Behavior

ebpf is enabled.
I have a service for a multiple pods with a local storage
After a node shutdown nat records of the failed pods should be deleted
and the service maps only to a running pods.

Current Behavior

ebpf is enabled
We have k8s service for multiple pods with local storage

After nodeshutdown:

  1. node-monitor-grace-period (40s)
    the service works as expected, it rotates a pods
    the pods on failed node generates timeouts
    nat table looks like
defaulted container "calico-node" out of: calico-node, flexvol-driver (init), mount-bpffs (init), install-cni (init)
10.106.177.114 port 80 proto 6 id 29 count 2 local 1
	29:0	 10.250.31.171 : 80
	29:1	 10.250.151.62 : 80
  1. node.kubernetes.io/unreachable tolerations period (for testing set to 30 s)
    the service works as expected,
    the ip from a failed pods are deleted from the service NAT table
Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), mount-bpffs (init), install-cni (init)
10.106.177.114 port 80 proto 6 id 29 count 1 local 1
  29:0	 10.250.31.171 : 80
  1. a tolerations period expires
    the pods on a failed node are marked as Terminating
    the service works in unexpected behavior.
    there is added a record is missing to nat table
    the service rotates a pods and
    the pods on a failed node generates Operation not permitted
    output from nat table:
Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), mount-bpffs (init), install-cni (init)
10.106.177.114 port 80 proto 6 id 29 count 2 local 1
  29:0	 10.250.31.171 : 80
  29:1	 is missing

strace for the failed pod looks like

connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("10.106.177.114")}, 16) = -1 EPERM (Operation not permitted)

Possible Solution

As a short term we can disable ebpf

Steps to Reproduce (for bugs)

  1. enable eBPF mode for calico.
  2. install a rancher.io/local-path provisioner
  3. create a pods with locally mounted volume with storage class local-path
  4. create a service mapped to pods
  5. shutdown a node with a pod
  6. wait until the grace time and the tolerations expires

It's the same with statefulset

Context

We were testing a cloudnative postgresql operator
We have observed unexpected behavior during failover.

Your Environment

testing cluster
VMs in VMware
3 control plain nodes (1,3,5)
3 worker nodes (2,4,6)
3 locations, P1(nodes 1,2), P2(nodes 3,4), B(nodes 5,6)
node4 is labeled with a label number=4 and node 6 is labeled with a label number=6
pods created on nodes 4, 6
An attachment file test-service-pods.txt contains a definitions for a pvs, pvcs, pods and services
A pods expects nodes labeled as described.
test-service-pods.txt
A graceful shutdown was triggered with shutdown -h now command
A non-graceful shutdown was triggered with a vmware poweroff simulation
Behavior does not depend on a node fail source.

  • IPV4 only

  • Calico version
    Client Version: v3.27.0
    Git commit: 711528e
    Cluster Version: v3.27.0
    Cluster Type: typha,kdd,k8s,operator,bgp,kubeadm

  • Orchestrator version (e.g. kubernetes, mesos, rkt):
    kubectl version output
    Client Version: v1.28.7
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: v1.28.7

  • Operating System and version:
    CentOS Stream release 9

  • Link to your project (optional):

@caseydavenport caseydavenport added the area/bpf eBPF Dataplane issues label Mar 27, 2024
@tomastigera
Copy link
Contributor

I believe this has been fixed in 3.27.1 by #8460 So I am closing this issue. Feel free to reopen if 3.27.1 does not work as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bpf eBPF Dataplane issues
Projects
None yet
Development

No branches or pull requests

3 participants