Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--proxy-mode=nftables fails if non-canonical (but standardized) IPv6 addresses are used #122611

Closed
uablrek opened this issue Jan 5, 2024 · 11 comments · Fixed by #122630
Closed
Assignees
Labels
area/kube-proxy kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@uablrek
Copy link
Contributor

uablrek commented Jan 5, 2024

What happened?

When a non-canonical IPv6 address, e.g. fd00::10.0.0.34, is used as a loadBalancerIP, communication fails.

/sig network
/area kube-proxy

What did you expect to happen?

Addresses like fd00::10.0.0.34 should work, as they do for proxy-mode ipvs and iptables

How can we reproduce it (as minimally and precisely as possible)?

  1. Create a dual-stack service with type loadBalancer
  2. Assign a non-canonical ipv6 loadBalancerIP
  3. Try to access the service with the non-canonical ipv6 address

I will add a manifest for test, and more instructions later.

Anything else we need to know?

What "canonical" really is can be debated. The rfc5952 seems to accept "embedded ipv4", as long as the prefix follows the rules. So, as I understand it, fd00::10.0.0.34 is a canonical IPv6 address.

@danwinship If you don't mind I would like to take a shot at this myself to get acquainted with the code

Kubernetes version

v1.29.0

Cloud provider

None

OS version

Linux

Install tools

None

Container runtime (CRI) and version (if applicable)

N/A (cri-o)

Related plugins (CNI, CSI, ...) and versions (if applicable)

N/A (but tested with calico and flannel)

@uablrek uablrek added the kind/bug Categorizes issue or PR as related to a bug. label Jan 5, 2024
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. area/kube-proxy needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 5, 2024
@uablrek
Copy link
Contributor Author

uablrek commented Jan 5, 2024

Use the manifest below, and the assign-lb-ip tool:

kubectl create namespace alpine-nc
kubectl create -n alpine-nc -f alpine-nc.yaml
assign-lb-ip -n alpine-nc -svc alpine-nc -ip 203.0.113.1,2001:db8::203.0.113.1
echo | nc 203.0.113.1 7007
echo | nc 2001:db8::203.0.113.1 7007

The loadBalancerIPs are taken from the documentation ranges, specified in rfc5737 and rfc3849.

alpine-nc.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: alpine-nc
spec:
  selector:
    matchLabels:
      app: alpine-nc
  replicas: 4
  template:
    metadata:
      labels:
        app: alpine-nc
    spec:
      containers:
      - name: alpine
        image: alpine:latest
        imagePullPolicy: IfNotPresent
        command: ["nc", "-lk", "-p", "7007", "-e", "hostname"]
        ports:
        - name: nc
          containerPort: 7007
---
apiVersion: v1
kind: Service
metadata:
  name: alpine-nc
spec:
  ipFamilyPolicy: RequireDualStack
  selector:
    app: alpine-nc
  type: LoadBalancer
  ports:
  - port: 7007
    name: nc

@uablrek
Copy link
Contributor Author

uablrek commented Jan 5, 2024

/assign @danwinship

@uablrek
Copy link
Contributor Author

uablrek commented Jan 5, 2024

If you are using KinD, you can set static routes to your control-plane for testing (--name default assumed below):

ipv4=$(docker inspect default-control-plane | jq -r .[0].NetworkSettings.Networks.kind.IPAddress)
sudo ip ro replace 203.0.113.0/24 via $ipv4
ipv6=$(docker inspect default-control-plane | jq -r .[0].NetworkSettings.Networks.kind.GlobalIPv6Address)
sudo ip -6 ro replace 2001:db8::203.0.113.0/120 via $ipv6

Since the addresses are reserved for documentation, it shouldn't disturb your network setup (in theory 😄 )

@danwinship
Copy link
Contributor

What "canonical" really is can be debated. The rfc5952 seems to accept "embedded ipv4", as long as the prefix follows the rules. So, as I understand it, fd00::10.0.0.34 is a canonical IPv6 address.

No, fd00::10.0.0.34 is a valid IPv6 address, but it's not "canonical" because it can't be "distinguished as having IPv4 addresses embedded in the lower 32 bits solely from the address field through the use of a well-known prefix" (RFC 5932 §5). (IOW, fd00::/96 isn't one of the prefixes that is normally used with embedded IPv4 addresses, so the canonical representation of an address that starts with fd00::/96 doesn't have an embedded IPv4 address. It should be fd00::a00:22.)

But anyway, that said, it seems like this could only be failing if the nftables CLI doesn't handle those IPs (which would be a bug in nftables), and kube-proxy is directly passing the value from the Service object to nftables, which it shouldn't be doing anyway because of the whole "IPv4 with leading 0s" fiasco. (I just wrote a warning about this in another PR!). We should be parsing-and-restringifying the IP from the Service and passing the restringified version to external APIs. (The service tracker code enforces this for ClusterIP but it looks like we're not doing that for other Service/Endpoint IPs...)

@danwinship If you don't mind I would like to take a shot at this myself to get acquainted with the code

Go ahead, but as above, I think the fixes will be in pkg/proxy, not pkg/proxy/nftables specifically.

@aojea
Copy link
Member

aojea commented Jan 5, 2024

+1 to Dan's opinion on canonicalizing everything before plumbing it down

it seems like this could only be failing if the nftables CLI doesn't handle those IPs (which would be a bug in nftables)

@uablrek you should see in the logs if it fails in nftables , because we should open a bug there if that is the problem

/triage accepted
/assign @uablrek

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 5, 2024
@uablrek
Copy link
Contributor Author

uablrek commented Jan 6, 2024

you should see in the logs if it fails in nftables , because we should open a bug there if that is the problem

Yes.

E0106 05:36:20.530237     365 proxier.go:1564] "nftables sync failed" err=<
        /dev/stdin:73:50-55: Error: syntax error, unexpected string, expecting comma or '}'
        add element ip6 kube-proxy service-ips { fd00::10.0.0.3 . tcp . 7007 : goto external-45IHSLOA-default/alpine/tcp/nc }
                                                         ^^^^^^
 >
I0106 05:36:20.530298     365 proxier.go:958] "Sync failed" retryingTime="30s"

Also manually tested with the latest nft 1.0.9:

nft -v
nftables v1.0.9 (Old Doc Yak #3)
nft add table ip6 test6
nft 'add chain ip6 test6 test6 { type filter hook prerouting priority 0; }'
nft insert rule ip6 test6 test6 ip6 saddr fd00::10.0.0.1 log
Error: syntax error, unexpected log
insert rule ip6 test6 test6 ip6 saddr fd00::10.0.0.1 log

nft insert rule ip6 test6 test6 ip6 saddr fd00::a00:1 log
nft list table ip6 test6
table ip6 test6 {
        chain test6 {
                type filter hook prerouting priority filter; policy accept;
                ip6 saddr fd00::a00:1 log
        }
}
nft delete table ip6 test6

@uablrek
Copy link
Contributor Author

uablrek commented Jan 6, 2024

nft uses flex/bison, and the parsing doesn't handle embedded ipv4 addresses (and is quite horrible IMHO).

ipv6 parsing from scanner.l
hex4		([[:xdigit:]]{1,4})
v680		(({hex4}:){7}{hex4})
v670		((:)((:{hex4}){7}))
v671		((({hex4}:){1})((:{hex4}){6}))
v672		((({hex4}:){2})((:{hex4}){5}))
v673		((({hex4}:){3})((:{hex4}){4}))
v674		((({hex4}:){4})((:{hex4}){3}))
v675		((({hex4}:){5})((:{hex4}){2}))
v676		((({hex4}:){6})(:{hex4}{1}))
v677		((({hex4}:){7})(:))
v67		({v670}|{v671}|{v672}|{v673}|{v674}|{v675}|{v676}|{v677})
v660		((:)((:{hex4}){6}))
v661		((({hex4}:){1})((:{hex4}){5}))
v662		((({hex4}:){2})((:{hex4}){4}))
v663		((({hex4}:){3})((:{hex4}){3}))
v664		((({hex4}:){4})((:{hex4}){2}))
v665		((({hex4}:){5})((:{hex4}){1}))
v666		((({hex4}:){6})(:))
v66		({v660}|{v661}|{v662}|{v663}|{v664}|{v665}|{v666})
v650		((:)((:{hex4}){5}))
v651		((({hex4}:){1})((:{hex4}){4}))
v652		((({hex4}:){2})((:{hex4}){3}))
v653		((({hex4}:){3})((:{hex4}){2}))
v654		((({hex4}:){4})(:{hex4}{1}))
v655		((({hex4}:){5})(:))
v65		({v650}|{v651}|{v652}|{v653}|{v654}|{v655})
v640		((:)((:{hex4}){4}))
v641		((({hex4}:){1})((:{hex4}){3}))
v642		((({hex4}:){2})((:{hex4}){2}))
v643		((({hex4}:){3})((:{hex4}){1}))
v644		((({hex4}:){4})(:))
v64		({v640}|{v641}|{v642}|{v643}|{v644})
v630		((:)((:{hex4}){3}))
v631		((({hex4}:){1})((:{hex4}){2}))
v632		((({hex4}:){2})((:{hex4}){1}))
v633		((({hex4}:){3})(:))
v63		({v630}|{v631}|{v632}|{v633})
v620		((:)((:{hex4}){2}))
v620_rfc4291	((:)(:{ip4addr}))
v621		((({hex4}:){1})((:{hex4}){1}))
v622		((({hex4}:){2})(:))
v62_rfc4291	((:)(:[fF]{4})(:{ip4addr}))
v62		({v620}|{v621}|{v622}|{v62_rfc4291}|{v620_rfc4291})
v610		((:)(:{hex4}{1}))
v611		((({hex4}:){1})(:))
v61		({v610}|{v611})
v60		(::)

macaddr		(([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2})
ip4addr		(([[:digit:]]{1,3}"."){3}([[:digit:]]{1,3}))
ip6addr		({v680}|{v67}|{v66}|{v65}|{v64}|{v63}|{v62}|{v61}|{v60})
ip6addr_rfc2732	(\[{ip6addr}\])

@uablrek
Copy link
Contributor Author

uablrek commented Jan 6, 2024

If a user specifies externalIPs with "fd00::10.0.0.1", then every sync for ipv6 will fail until the address is removed. So, services added after that will not work for ipv6, and on a reboot, ipv6 services may not work at all.

@sftim
Copy link
Contributor

sftim commented Jan 6, 2024

/retitle --proxy-mode=nftables fails if non-canonical (but standardized) IPv6 addresses are used

@k8s-ci-robot k8s-ci-robot changed the title proxy-mode=nftables fails is non-canonical (but standardized) IPv6 addresses are used --proxy-mode=nftables fails if non-canonical (but standardized) IPv6 addresses are used Jan 6, 2024
@aojea
Copy link
Member

aojea commented Jan 6, 2024

opened a bug in netfilter https://bugzilla.netfilter.org/show_bug.cgi?id=1730 so community can fix it there

@uablrek
Copy link
Contributor Author

uablrek commented Jan 6, 2024

A more relaxed regexp for ip6addr may be used. The address is parsed later on anyway. Just try a ipv4 "900.900.900.0". It's recognized as an ip4addr, but can't be used (obviously), and the error points (correctly) at the address, not at the "action" as with an embedded ipv4. BTW, I tried to alter the scanner.l, but it didn't work. I don't know enough about flex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kube-proxy kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
5 participants