Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

servicelb isn't using all nodes #8809

Closed
kyrofa opened this issue Nov 11, 2023 · 4 comments
Closed

servicelb isn't using all nodes #8809

kyrofa opened this issue Nov 11, 2023 · 4 comments

Comments

@kyrofa
Copy link

kyrofa commented Nov 11, 2023

Environmental Info:
K3s Version:
$ k3s -v
k3s version v1.27.7+k3s2 (575bce7)
go version go1.20.10

Node(s) CPU architecture, OS, and Version:
Debian 12 (Bookworm)
Linux s1 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux

Cluster Configuration:

  • Dual stack (ipv6 primary)
  • 3 servers
  • No agents

Describe the bug:

I have three servers:

$ sudo k3s kubectl get nodes -o wide
NAME   STATUS   ROLES                       AGE    VERSION        INTERNAL-IP              EXTERNAL-IP                OS-IMAGE                         KERNEL-VERSION   CONTAINER-RUNTIME
s1     Ready    control-plane,etcd,master   106m   v1.27.7+k3s2   fda5:8888:9999:310::10   2603:1111:2222:2e:10::10   Debian GNU/Linux 12 (bookworm)   6.1.0-13-amd64   containerd://1.7.7-k3s1.27
s2     Ready    control-plane,etcd,master   106m   v1.27.7+k3s2   fda5:8888:9999:310::20   2603:1111:2222:2e:10::20   Debian GNU/Linux 12 (bookworm)   6.1.0-13-amd64   containerd://1.7.7-k3s1.27
s3     Ready    control-plane,etcd,master   106m   v1.27.7+k3s2   fda5:8888:9999:310::30   2603:1111:2222:2e:10::30   Debian GNU/Linux 12 (bookworm)   6.1.0-13-amd64   containerd://1.7.7-k3s1.27

I disabled traefik and replaced it with the nginx ingress controller (from nginx, not the kubernetes community). None of these nodes are labeled in any special way: servicelb should be happy using all of them. And indeed, you can see the three servicelb pods running properly here:

$ sudo k3s kubectl -n kube-system get pods
NAME                                            READY   STATUS    RESTARTS      AGE
coredns-77ccd57875-lf9jc                        1/1     Running   0             36m
local-path-provisioner-957fdf8bc-d7clp          1/1     Running   0             36m
metrics-server-648b5df564-vp8pp                 1/1     Running   2 (35m ago)   36m
svclb-nginx-ingress-controller-44270a7a-4hmf4   2/2     Running   2 (38m ago)   101m
svclb-nginx-ingress-controller-44270a7a-hd77d   2/2     Running   4 (37m ago)   101m
svclb-nginx-ingress-controller-44270a7a-vkbqk   2/2     Running   2 (33m ago)   101m

You can see that I have nginx configured to have two replicas:

$ sudo k3s kubectl -n nginx-ingress get pods
NAME                                        READY   STATUS    RESTARTS   AGE
nginx-ingress-controller-5cdd57d7cc-ffc78   1/1     Running   0          40m
nginx-ingress-controller-5cdd57d7cc-lzjfc   1/1     Running   0          38m

Interestingly, its service only has two nodes in status.loadBalancer.ingress (four addresses, but remember this is dual stack).

$ sudo k3s kubectl -n nginx-ingress get svc -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: nginx-ingress
      meta.helm.sh/release-namespace: nginx-ingress
    creationTimestamp: "2023-11-11T00:09:39Z"
    finalizers:
    - service.kubernetes.io/load-balancer-cleanup
    labels:
      app.kubernetes.io/instance: nginx-ingress
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: nginx-ingress
      app.kubernetes.io/version: 3.3.1
      helm.sh/chart: nginx-ingress-1.0.1
    name: nginx-ingress-controller
    namespace: nginx-ingress
    resourceVersion: "19685"
    uid: 44270a7a-35a9-4cf0-b4e8-47620eb6529f
  spec:
    allocateLoadBalancerNodePorts: true
    clusterIP: fda5:8888:eeee:312::b4c4
    clusterIPs:
    - fda5:8888:eeee:312::b4c4
    - 10.3.125.169
    externalTrafficPolicy: Local
    healthCheckNodePort: 30393
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv6
    - IPv4
    ipFamilyPolicy: PreferDualStack
    ports:
    - name: http
      nodePort: 32298
      port: 80
      protocol: TCP
      targetPort: 80
    - name: https
      nodePort: 31523
      port: 443
      protocol: TCP
      targetPort: 443
    selector:
      app.kubernetes.io/instance: nginx-ingress
      app.kubernetes.io/name: nginx-ingress
    sessionAffinity: None
    type: LoadBalancer
  status:
    loadBalancer:
      ingress:
      - ip: 2603:1111:2222:2e:10::20
      - ip: 2603:1111:2222:2e:10::30
      - ip: 50.100.200.229
      - ip: 50.100.200.230
kind: List
metadata:
  resourceVersion: ""

Indeed, if I curl port 80 of the external IPs of s2 or s3, I hit my ingress. But not s1. Shouldn't I be able to hit s1 as well, and have servicelb forward that traffic to either s2 or s3? What am I missing, here?

@kyrofa
Copy link
Author

kyrofa commented Nov 13, 2023

Okay, one new datapoint. It appears that servicelb, as configured in k3s, doesn't actually have dual stack support. Check this out:

start_proxy() {
    for src_range in ${SRC_RANGES}; do
    if echo ${src_range} | grep -Eq ":"; then
        ip6tables -t filter -I FORWARD -s ${src_range} -p ${DEST_PROTO} --dport ${DEST_PORT} -j ACCEPT
    else
        iptables -t filter -I FORWARD -s ${src_range} -p ${DEST_PROTO} --dport ${DEST_PORT} -j ACCEPT
    fi
    done

    for dest_ip in ${DEST_IPS}; do
        if echo ${dest_ip} | grep -Eq ":"; then
            [ $(cat /proc/sys/net/ipv6/conf/all/forwarding) == 1 ] || exit 1
            ip6tables -t filter -A FORWARD -d ${dest_ip}/128 -p ${DEST_PROTO} --dport ${DEST_PORT} -j DROP
            ip6tables -t nat -I PREROUTING -p ${DEST_PROTO} --dport ${SRC_PORT} -j DNAT --to [${dest_ip}]:${DEST_PORT}
            ip6tables -t nat -I POSTROUTING -d ${dest_ip}/128 -p ${DEST_PROTO} -j MASQUERADE
        else
            [ $(cat /proc/sys/net/ipv4/ip_forward) == 1 ] || exit 1
            iptables -t filter -A FORWARD -d ${dest_ip}/32 -p ${DEST_PROTO} --dport ${DEST_PORT} -j DROP
            iptables -t nat -I PREROUTING -p ${DEST_PROTO} --dport ${SRC_PORT} -j DNAT --to ${dest_ip}:${DEST_PORT}
            iptables -t nat -I POSTROUTING -d ${dest_ip}/32 -p ${DEST_PROTO} -j MASQUERADE
        fi
    done
}

As configured in k3s, the DEST_IPS is only ever defined to a single IP: the primary node IP. In my case, that's the IPv6 address. So ipv4 is never actually configured. Similarly, in dual stack clusters where ipv4 is the primary, ipv6 is never configured. As far as I can tell, k3s needs to be updated to set DEST_IPS to all of the IPs, not just the first one.

The src_range logic isn't working properly either, because k3s sets SRC_RANGES to 0.0.0.0/0, which ends up only running the ipv4 command instead of ipv6. Shouldn't ::/0 be in there?

@brandond
Copy link
Contributor

brandond commented Nov 14, 2023

You can see that I have nginx configured to have two replicas:
if I curl port 80 of the external IPs of s2 or s3, I hit my ingress. But not s1. Shouldn't I be able to hit s1 as well, and have servicelb forward that traffic to either s2 or s3? What am I missing, here?
externalTrafficPolicy: Local

Two nodes have replicas of nginx. You set the externalTrafficPolicy to local, so serviceLB only advertises the two nodes that have replicas. The 3rd node doesn't have a replica, and the external traffic policy would dictate that traffic to it be dropped - so we don't advertise its address. If you want to allow sending traffic to, and advertise the address of, nodes without replicas, change the external traffic policy.

It appears that servicelb, as configured in k3s, doesn't actually have dual stack support.
DEST_IPS is only ever defined to a single IP: the primary node IP.

That is not correct. The behavior depends on the external traffic policy, and how Kubernetes sets status.hostIP in dual-stack environments. See: https://github.com/k3s-io/k3s/blob/master/pkg/cloudprovider/servicelb.go#L550-L576

tl;dr it sounds like you don't want to use externalTrafficPolicy: Local - everything you're having trouble with will be resolved if you change that to Cluster.

@brandond
Copy link
Contributor

brandond commented Nov 14, 2023

There is a FeatureGate that is alpha in 1.28 that will improve dual-stack support for externalTrafficPolicy: Local

https://kubernetes.io/docs/concepts/workloads/pods/downward-api/

status.hostIPs
the IP addresses is a dual-stack version of status.hostIP, the first is always the same as status.hostIP. The field is available if you enable the PodHostIPs feature gate.

Once that feature-gate goes beta and is on by default, we can evaluate how to best make use of it in servicelb.

Ref: kubernetes/enhancements#2681

@kyrofa
Copy link
Author

kyrofa commented Nov 16, 2023

Confirmed, thank you @brandond. Switching to Cluster made this all work as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants