`--proxy-mode=nftables` fails if non-canonical (but standardized) IPv6 addresses are used #122611

uablrek · 2024-01-05T14:42:38Z

What happened?

When a non-canonical IPv6 address, e.g. fd00::10.0.0.34, is used as a loadBalancerIP, communication fails.

/sig network
/area kube-proxy

What did you expect to happen?

Addresses like fd00::10.0.0.34 should work, as they do for proxy-mode ipvs and iptables

How can we reproduce it (as minimally and precisely as possible)?

Create a dual-stack service with type loadBalancer
Assign a non-canonical ipv6 loadBalancerIP
Try to access the service with the non-canonical ipv6 address

I will add a manifest for test, and more instructions later.

Anything else we need to know?

What "canonical" really is can be debated. The rfc5952 seems to accept "embedded ipv4", as long as the prefix follows the rules. So, as I understand it, fd00::10.0.0.34 is a canonical IPv6 address.

@danwinship If you don't mind I would like to take a shot at this myself to get acquainted with the code

Kubernetes version

v1.29.0

Cloud provider

None

OS version

Linux

Install tools

None

Container runtime (CRI) and version (if applicable)

N/A (cri-o)

Related plugins (CNI, CSI, ...) and versions (if applicable)

N/A (but tested with calico and flannel)

The text was updated successfully, but these errors were encountered:

uablrek · 2024-01-05T15:27:08Z

Use the manifest below, and the assign-lb-ip tool:

kubectl create namespace alpine-nc
kubectl create -n alpine-nc -f alpine-nc.yaml
assign-lb-ip -n alpine-nc -svc alpine-nc -ip 203.0.113.1,2001:db8::203.0.113.1
echo | nc 203.0.113.1 7007
echo | nc 2001:db8::203.0.113.1 7007

The loadBalancerIPs are taken from the documentation ranges, specified in rfc5737 and rfc3849.

alpine-nc.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: alpine-nc
spec:
  selector:
    matchLabels:
      app: alpine-nc
  replicas: 4
  template:
    metadata:
      labels:
        app: alpine-nc
    spec:
      containers:
      - name: alpine
        image: alpine:latest
        imagePullPolicy: IfNotPresent
        command: ["nc", "-lk", "-p", "7007", "-e", "hostname"]
        ports:
        - name: nc
          containerPort: 7007
---
apiVersion: v1
kind: Service
metadata:
  name: alpine-nc
spec:
  ipFamilyPolicy: RequireDualStack
  selector:
    app: alpine-nc
  type: LoadBalancer
  ports:
  - port: 7007
    name: nc

uablrek · 2024-01-05T15:36:19Z

/assign @danwinship

uablrek · 2024-01-05T16:23:49Z

If you are using KinD, you can set static routes to your control-plane for testing (--name default assumed below):

ipv4=$(docker inspect default-control-plane | jq -r .[0].NetworkSettings.Networks.kind.IPAddress)
sudo ip ro replace 203.0.113.0/24 via $ipv4
ipv6=$(docker inspect default-control-plane | jq -r .[0].NetworkSettings.Networks.kind.GlobalIPv6Address)
sudo ip -6 ro replace 2001:db8::203.0.113.0/120 via $ipv6

Since the addresses are reserved for documentation, it shouldn't disturb your network setup (in theory 😄 )

danwinship · 2024-01-05T16:39:46Z

What "canonical" really is can be debated. The rfc5952 seems to accept "embedded ipv4", as long as the prefix follows the rules. So, as I understand it, fd00::10.0.0.34 is a canonical IPv6 address.

No, fd00::10.0.0.34 is a valid IPv6 address, but it's not "canonical" because it can't be "distinguished as having IPv4 addresses embedded in the lower 32 bits solely from the address field through the use of a well-known prefix" (RFC 5932 §5). (IOW, fd00::/96 isn't one of the prefixes that is normally used with embedded IPv4 addresses, so the canonical representation of an address that starts with fd00::/96 doesn't have an embedded IPv4 address. It should be fd00::a00:22.)

But anyway, that said, it seems like this could only be failing if the nftables CLI doesn't handle those IPs (which would be a bug in nftables), and kube-proxy is directly passing the value from the Service object to nftables, which it shouldn't be doing anyway because of the whole "IPv4 with leading 0s" fiasco. (I just wrote a warning about this in another PR!). We should be parsing-and-restringifying the IP from the Service and passing the restringified version to external APIs. (The service tracker code enforces this for ClusterIP but it looks like we're not doing that for other Service/Endpoint IPs...)

@danwinship If you don't mind I would like to take a shot at this myself to get acquainted with the code

Go ahead, but as above, I think the fixes will be in pkg/proxy, not pkg/proxy/nftables specifically.

aojea · 2024-01-05T18:23:42Z

+1 to Dan's opinion on canonicalizing everything before plumbing it down

it seems like this could only be failing if the nftables CLI doesn't handle those IPs (which would be a bug in nftables)

@uablrek you should see in the logs if it fails in nftables , because we should open a bug there if that is the problem

/triage accepted
/assign @uablrek

uablrek · 2024-01-06T05:42:52Z

you should see in the logs if it fails in nftables , because we should open a bug there if that is the problem

Yes.

E0106 05:36:20.530237     365 proxier.go:1564] "nftables sync failed" err=<
        /dev/stdin:73:50-55: Error: syntax error, unexpected string, expecting comma or '}'
        add element ip6 kube-proxy service-ips { fd00::10.0.0.3 . tcp . 7007 : goto external-45IHSLOA-default/alpine/tcp/nc }
                                                         ^^^^^^
 >
I0106 05:36:20.530298     365 proxier.go:958] "Sync failed" retryingTime="30s"

Also manually tested with the latest nft 1.0.9:

nft -v
nftables v1.0.9 (Old Doc Yak #3)
nft add table ip6 test6
nft 'add chain ip6 test6 test6 { type filter hook prerouting priority 0; }'
nft insert rule ip6 test6 test6 ip6 saddr fd00::10.0.0.1 log
Error: syntax error, unexpected log
insert rule ip6 test6 test6 ip6 saddr fd00::10.0.0.1 log

nft insert rule ip6 test6 test6 ip6 saddr fd00::a00:1 log
nft list table ip6 test6
table ip6 test6 {
        chain test6 {
                type filter hook prerouting priority filter; policy accept;
                ip6 saddr fd00::a00:1 log
        }
}
nft delete table ip6 test6

uablrek · 2024-01-06T06:15:15Z

nft uses flex/bison, and the parsing doesn't handle embedded ipv4 addresses (and is quite horrible IMHO).

ipv6 parsing from scanner.l

hex4		([[:xdigit:]]{1,4})
v680		(({hex4}:){7}{hex4})
v670		((:)((:{hex4}){7}))
v671		((({hex4}:){1})((:{hex4}){6}))
v672		((({hex4}:){2})((:{hex4}){5}))
v673		((({hex4}:){3})((:{hex4}){4}))
v674		((({hex4}:){4})((:{hex4}){3}))
v675		((({hex4}:){5})((:{hex4}){2}))
v676		((({hex4}:){6})(:{hex4}{1}))
v677		((({hex4}:){7})(:))
v67		({v670}|{v671}|{v672}|{v673}|{v674}|{v675}|{v676}|{v677})
v660		((:)((:{hex4}){6}))
v661		((({hex4}:){1})((:{hex4}){5}))
v662		((({hex4}:){2})((:{hex4}){4}))
v663		((({hex4}:){3})((:{hex4}){3}))
v664		((({hex4}:){4})((:{hex4}){2}))
v665		((({hex4}:){5})((:{hex4}){1}))
v666		((({hex4}:){6})(:))
v66		({v660}|{v661}|{v662}|{v663}|{v664}|{v665}|{v666})
v650		((:)((:{hex4}){5}))
v651		((({hex4}:){1})((:{hex4}){4}))
v652		((({hex4}:){2})((:{hex4}){3}))
v653		((({hex4}:){3})((:{hex4}){2}))
v654		((({hex4}:){4})(:{hex4}{1}))
v655		((({hex4}:){5})(:))
v65		({v650}|{v651}|{v652}|{v653}|{v654}|{v655})
v640		((:)((:{hex4}){4}))
v641		((({hex4}:){1})((:{hex4}){3}))
v642		((({hex4}:){2})((:{hex4}){2}))
v643		((({hex4}:){3})((:{hex4}){1}))
v644		((({hex4}:){4})(:))
v64		({v640}|{v641}|{v642}|{v643}|{v644})
v630		((:)((:{hex4}){3}))
v631		((({hex4}:){1})((:{hex4}){2}))
v632		((({hex4}:){2})((:{hex4}){1}))
v633		((({hex4}:){3})(:))
v63		({v630}|{v631}|{v632}|{v633})
v620		((:)((:{hex4}){2}))
v620_rfc4291	((:)(:{ip4addr}))
v621		((({hex4}:){1})((:{hex4}){1}))
v622		((({hex4}:){2})(:))
v62_rfc4291	((:)(:[fF]{4})(:{ip4addr}))
v62		({v620}|{v621}|{v622}|{v62_rfc4291}|{v620_rfc4291})
v610		((:)(:{hex4}{1}))
v611		((({hex4}:){1})(:))
v61		({v610}|{v611})
v60		(::)

macaddr		(([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2})
ip4addr		(([[:digit:]]{1,3}"."){3}([[:digit:]]{1,3}))
ip6addr		({v680}|{v67}|{v66}|{v65}|{v64}|{v63}|{v62}|{v61}|{v60})
ip6addr_rfc2732	(\[{ip6addr}\])

uablrek · 2024-01-06T10:01:58Z

If a user specifies externalIPs with "fd00::10.0.0.1", then every sync for ipv6 will fail until the address is removed. So, services added after that will not work for ipv6, and on a reboot, ipv6 services may not work at all.

sftim · 2024-01-06T12:35:03Z

/retitle --proxy-mode=nftables fails if non-canonical (but standardized) IPv6 addresses are used

aojea · 2024-01-06T14:28:54Z

opened a bug in netfilter https://bugzilla.netfilter.org/show_bug.cgi?id=1730 so community can fix it there

uablrek · 2024-01-06T14:43:20Z

A more relaxed regexp for ip6addr may be used. The address is parsed later on anyway. Just try a ipv4 "900.900.900.0". It's recognized as an ip4addr, but can't be used (obviously), and the error points (correctly) at the address, not at the "action" as with an embedded ipv4. BTW, I tried to alter the scanner.l, but it didn't work. I don't know enough about flex.

uablrek added the kind/bug Categorizes issue or PR as related to a bug. label Jan 5, 2024

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. area/kube-proxy needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 5, 2024

k8s-ci-robot assigned danwinship Jan 5, 2024

k8s-ci-robot assigned uablrek Jan 5, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 5, 2024

uablrek mentioned this issue Jan 6, 2024

kube-proxy: canonicalize external IPs #122620

Closed

k8s-ci-robot changed the title ~~proxy-mode=nftables fails is non-canonical (but standardized) IPv6 addresses are used~~ --proxy-mode=nftables fails if non-canonical (but standardized) IPv6 addresses are used Jan 6, 2024

uablrek mentioned this issue Jan 7, 2024

Kube-proxy: use type net.IP for addresses #122630

Merged

k8s-ci-robot closed this as completed in #122630 Jan 9, 2024

aroradaman mentioned this issue Jan 13, 2024

proxy/nftables: reject packets destined for invalid ports of service ips #122692

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`--proxy-mode=nftables` fails if non-canonical (but standardized) IPv6 addresses are used #122611

`--proxy-mode=nftables` fails if non-canonical (but standardized) IPv6 addresses are used #122611

uablrek commented Jan 5, 2024

uablrek commented Jan 5, 2024

uablrek commented Jan 5, 2024

uablrek commented Jan 5, 2024 •

edited

danwinship commented Jan 5, 2024

aojea commented Jan 5, 2024

uablrek commented Jan 6, 2024

uablrek commented Jan 6, 2024

uablrek commented Jan 6, 2024

sftim commented Jan 6, 2024

aojea commented Jan 6, 2024

uablrek commented Jan 6, 2024 •

edited

--proxy-mode=nftables fails if non-canonical (but standardized) IPv6 addresses are used #122611

--proxy-mode=nftables fails if non-canonical (but standardized) IPv6 addresses are used #122611

Comments

uablrek commented Jan 5, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

uablrek commented Jan 5, 2024

uablrek commented Jan 5, 2024

uablrek commented Jan 5, 2024 • edited

danwinship commented Jan 5, 2024

aojea commented Jan 5, 2024

uablrek commented Jan 6, 2024

uablrek commented Jan 6, 2024

uablrek commented Jan 6, 2024

sftim commented Jan 6, 2024

aojea commented Jan 6, 2024

uablrek commented Jan 6, 2024 • edited

`--proxy-mode=nftables` fails if non-canonical (but standardized) IPv6 addresses are used #122611

`--proxy-mode=nftables` fails if non-canonical (but standardized) IPv6 addresses are used #122611

uablrek commented Jan 5, 2024 •

edited

uablrek commented Jan 6, 2024 •

edited