Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partial support for IPv6 only mode #4450

Merged
merged 1 commit into from
Mar 8, 2022

Conversation

olljanat
Copy link
Contributor

@olljanat olljanat commented Nov 10, 2021

Proposed Changes

Upstream IPv6 support related work is going on two different tracks.

Dual-stack was recently moved to stable kubernetes/enhancements#2962 which was handled here by #3212

in addition IPv6 only have been on beta couple of years already kubernetes/enhancements#1138 and https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/506-ipv6

This PR add experimental flag --ipv6-only which let user to bypass IPv4 cluster-cidr and service-cidr requirement added by #3212

This PR add new logic which switch to IPv6 only mode automatically when:
a) host does not have IPv4 address
b) only IPv6 service CIDR is given
c) only IPv6 host-ip is given

Types of Changes

Verification

$k3s server --node-ip=fd:1000::1 --disable-network-policy --flannel-backend=none

Linked Issues

#284 #2123 #4389

User-Facing Change


Further Comments

Currently Flannel or network policy does not support IPv6 so those are forced to be disabled.

Focus on this PR is add enough logic to allow IPv6 only to be used with vcluster after support is added to that side on loft-sh/vcluster#209

@olljanat olljanat requested a review from a team as a code owner November 10, 2021 20:25
@olljanat
Copy link
Contributor Author

Hmm. I'm not sure why CI failed but it does not look related.

@brandond
Copy link
Contributor

Does this work? Last I heard the in-cluster apiserver endpoint was ipv4 only so you can't actually run a cluster with only ipv6 service cidrs.

@olljanat
Copy link
Contributor Author

olljanat commented Nov 10, 2021

Works on which level you mean? When started without Flannel error message like:

E1110 22:36:50.565086   66116 kubelet.go:2337] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

of course constantly gets printed to console and kubectl says that node is on "NotReady" state but sure apiserver server is up on level that it responses to commands like kubeclt get nodes and kubectl get namespaces just fine.

and here is example that apiserver is really listening IPv6:

$ curl -k https://[::1]:6443
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

Also configuration which I'm currently looking for it one where connections inside of cluster and between pods on multi-cluster setup would be IPv6 only but most likely there will be anyway load balancer will expose services also to IPv4 clients.

and on theory my custom version of RKE2 https://github.com/olljanat/rke2/releases/tag/v1.22.3-olljanat1 should now support this too but didn't had time to test it yet.

@olljanat
Copy link
Contributor Author

Hmm. I guess that issue which you are referencing to is that on IPv6 cluster there is no port NAT at all so services like Calico, CoreDNS, etc need to be configured to use port 6443 instead of 443. Other why they cannot connect to apiserver.

With Calico solution is set environment variable KUBERNETES_SERVICE_PORT=6443 for all its containers. For CoreDNS I didn't (yet) found solution.

However from those purposes this flag is proposed as experimental now.

@brandond
Copy link
Contributor

Yes, the apiserver certainly binds to ipv6 just fine, but the in-cluster Endpoint list for kubernetes.default.svc will not currently ever show an ipv6 address as far as I know, which means that pods would require an ipv4 address to reach it.

@manuelbuil
Copy link
Contributor

I don't think flannel support ipv6 only. Another CNI plugin would need to be used too or we should contribute that to upstream flannel

@olljanat
Copy link
Contributor Author

Yes, the apiserver certainly binds to ipv6 just fine, but the in-cluster Endpoint list for kubernetes.default.svc will not currently ever show an ipv6 address as far as I know, which means that pods would require an ipv4 address to reach it.

It looks to be that on IPv4 world it is kube-proxy which does listen kubernetes.default.svc on port 443 and do proxying to apiserver on port 6443 and it does not do it for IPv6.

Example of to make it working

Here is example how to make apiserver listening port 443 on way that it works both inside and outside of containers without need kube-proxy to be middle of it.

Basic setup

# Start K3s with needed parameters:
k3s server --service-cidr=fd:1::/108 --cluster-cidr=fd:2::/64 --ipv6-only --flannel-backend=none --https-listen-port=443 --disable=metrics-server --disable-network-policy --disable traefik

# Remove ip6table line added by kube-proxy
# NOTE! kube-proxy will add this back when you start pods so you need remove it again then
ip6tables -D KUBE-SERVICES -d fd:1::1/128 -p tcp -m comment --comment "default/kubernetes:https has no endpoints" -m tcp --dport 443 -j REJECT --reject-with icmp6-port-unreachable

# Add IPv6 node address to loopback adapter
ip addr add fd:1::1/64 dev lo

# Test that api server respond with IPv6:
curl -k https://[fd:1::1]:443/api/v1/nodes/foo

CNI installation

Deploy Calico like it is on their guide on https://docs.projectcalico.org/networking/ipv6#enable-ipv6-only
Ready made deployment YAML with needed parameters can be found from https://pastebin.com/raw/0sRxugmk

Test deployment

Deploy something, here is example with one ASP.NET Core hello world application.
See that both pod and service have IPv6 addresses

$ kubectl -n foo get all -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP                                        NODE                               NOMINATED NODE   READINESS GATES
pod/example-69fc586f6d-qh5sp   1/1     Running   0          78m   fd12:a9f4:7ee5:7d08:75f2:ae9a:a7e3:4043   k3s   <none>           <none>

NAME              TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE   SELECTOR
service/example   ClusterIP   fd:1::561e   <none>        8080/TCP   78m   service=example

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                       SELECTOR
deployment.apps/example   1/1     1            1           78m   example      sample-app:dev   service=example

NAME                                 DESIRED   CURRENT   READY   AGE   CONTAINERS   IMAGES                       SELECTOR
replicaset.apps/example-69fc586f6d   1         1         1       78m   example      sample-app:dev   pod-template-hash=69fc586f6d,service=example

Test inside of pod

Start shell inside of pod and see how thins looks lie on there:

$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fd12:a9f4:7ee5:7d08:75f2:ae9a:a7e3:4043  prefixlen 128  scopeid 0x0<global>
        inet6 fe80::c9e:61ff:fe6b:f903  prefixlen 64  scopeid 0x20<link>
        ether 0e:9e:61:6b:f9:03  txqueuelen 0  (Ethernet)
        RX packets 9  bytes 906 (906.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 9  bytes 770 (770.0 B)
        TX errors 0  dropped 1 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

$ ping kubernetes.default.svc
PING kubernetes.default.svc(kubernetes.default.svc.cluster.local (fd:1::1)) 56 data bytes
64 bytes from kubernetes.default.svc.cluster.local (fd:1::1): icmp_seq=1 ttl=64 time=0.041 ms

$ curl -k https://kubernetes.default.svc:443/api/v1/
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

Yes, the apiserver certainly binds to ipv6 just fine, but the in-cluster Endpoint list for kubernetes.default.svc will not currently ever show an ipv6 address as far as I know, which means that pods would require an ipv4 address to reach it.

Yes and as described above also kube-proxy is problematic which why example Cilium with kube-proxy free setup would be good option https://docs.cilium.io/en/v1.9/gettingstarted/kubeproxy-free/

However those are out side of scope of this PR. This just targets to allow users to play with IPv6 only config and find out where those limits are without need to build custom version of K3s.

Also my target is get to working with RKE2 on long run which why it need to be solved on here first.

@olljanat olljanat marked this pull request as draft November 11, 2021 17:43
@olljanat
Copy link
Contributor Author

olljanat commented Nov 11, 2021

FYI. Marked this as draft as I figured out intermediate solution on way that I can use MutatingAdmissionWebhook to force services on non-default/non-system namespaces to use IPv6 only like it is described on https://kubernetes.io/docs/concepts/services-networking/dual-stack/#services

For record. The problematic places on kube-proxy looks to be:

Also at least on theory it should be possible to use https://github.com/kubernetes-sigs/ip-masq-agent here to solve problem at least partly.

EDIT: I just noticed that Calico documentation contains info how to make Kubernetes control plane operate on IPv6 only: https://docs.projectcalico.org/networking/ipv6-control-plane

@manuelbuil
Copy link
Contributor

FYI. Marked this as draft as I figured out intermediate solution on way that I can use MutatingAdmissionWebhook to force services on non-default/non-system namespaces to use IPv6 only like it is described on https://kubernetes.io/docs/concepts/services-networking/dual-stack/#services

For record. The problematic places on kube-proxy looks to be:

* https://github.com/kubernetes/kubernetes/blob/1d8966f4f92429fc1c1073d121dca1cc3f53974a/cmd/kube-apiserver/app/server.go#L563

* https://github.com/kubernetes/kubernetes/blob/9fba771fe9ea8fc4e68cada314c464d90abe4ae3/pkg/proxy/iptables/proxier.go#L1133

Also at least on theory it should be possible to use https://github.com/kubernetes-sigs/ip-masq-agent here to solve problem at least partly.

EDIT: I just noticed that Calico documentation contains info how to make Kubernetes control plane operate on IPv6 only: https://docs.projectcalico.org/networking/ipv6-control-plane

Are you perhaps aware if upstream kubernetes (e.g. kube-proxy folks) are trying to add ipv6 only support?

@olljanat
Copy link
Contributor Author

Are you perhaps aware if upstream kubernetes (e.g. kube-proxy folks) are trying to add ipv6 only support?

I expected that would be case based on https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/506-ipv6 but not sure actually as I'm still new on Kubernetes world.

@manuelbuil
Copy link
Contributor

Are you perhaps aware if upstream kubernetes (e.g. kube-proxy folks) are trying to add ipv6 only support?

I expected that would be case based on https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/506-ipv6 but not sure actually as I'm still new on Kubernetes world.

Thanks

@olljanat
Copy link
Contributor Author

FYI. It looks to be that on K8s v1.21 and v1.22 dual-stack code is broken on way that its feature gate need to be disabled on IPv6 only nodes.

However it looks to be that v1.23 will promote it to stable which why logic is changed on way that as long --cluster-cidr , --service-cidr and --node-ip all points to IPv6 only then dual stack gets disabled automatically.

@olljanat olljanat force-pushed the support-ipv6-only branch 3 times, most recently from f1651b4 to 9307a80 Compare November 28, 2021 21:37
@olljanat olljanat changed the title Add experimental support for IPv6 only mode Add partial support for IPv6 only mode Nov 28, 2021
@olljanat olljanat marked this pull request as ready for review November 28, 2021 22:06
@olljanat
Copy link
Contributor Author

@brandond @manuelbuil I managed to get this working at least on that level that I can use vcluster with this loft-sh/vcluster#209 to spin either IPv4 only or IPv6 only K3s clusters to top of dual-stack host cluster (I use RKE2 as host cluster).

Kube-proxy also now create correct ip6tables rules for Kubernetes API. However I don't currently have pure IPv6 lab so that scenario might still need some work (at least it need to be tested) but I would prefer to leave those for next PR as that vcluster scenario is very useful.

@manuelbuil
Copy link
Contributor

@brandond @manuelbuil I managed to get this working at least on that level that I can use vcluster with this loft-sh/vcluster#209 to spin either IPv4 only or IPv6 only K3s clusters to top of dual-stack host cluster (I use RKE2 as host cluster).

Kube-proxy also now create correct ip6tables rules for Kubernetes API. However I don't currently have pure IPv6 lab so that scenario might still need some work (at least it need to be tested) but I would prefer to leave those for next PR as that vcluster scenario is very useful.

How did you fix the kube-proxy issue?

@olljanat
Copy link
Contributor Author

How did you fix the kube-proxy issue?

Key thing with that one is that default node-ip is IPv6 so this rule apply: https://github.com/kubernetes/kubernetes/blob/c1153d3353bd4f4b68d85245d53d2745586be474/cmd/kube-proxy/app/server_others.go#L178-L180

However all those other bindings need to also use IPv6 instead of IPv4 other why other issues appears.

@manuelbuil
Copy link
Contributor

I have just realized that github.com still does not support ipv6, so I guess you did not download the binary using curl -sfL https://get.k3s.io | sh -, right?

@olljanat
Copy link
Contributor Author

I used these DNS servers on my IPv6 only lab to avoid that issue https://www.nat64.net/

Copy link
Contributor

@manuelbuil manuelbuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I'm actually thinking that creating a utils function and executing it at the beginning in pkg/cli/server/server.go and saving the result in a variable could be very useful. You could save a lot of lines by just checking that value that'll tell you if we are in a ipv4, dual-stack or ipv6 scenario

pkg/agent/config/config.go Outdated Show resolved Hide resolved
pkg/agent/config/config.go Outdated Show resolved Hide resolved
pkg/agent/run.go Outdated Show resolved Hide resolved
pkg/cli/server/server.go Outdated Show resolved Hide resolved
pkg/cli/server/server.go Outdated Show resolved Hide resolved
pkg/cli/server/server.go Outdated Show resolved Hide resolved
pkg/cli/server/server.go Outdated Show resolved Hide resolved
@manuelbuil
Copy link
Contributor

manuelbuil commented Dec 10, 2021

BTW, why the title says: Add partial support ? Why partial?

@olljanat
Copy link
Contributor Author

BTW, why the title says: Add partial support ? Why partial?

Because Flannel and network policy need to be disabled and separate CNI plugin to be used. Also I did most of the testing with vcluster which does disable a lot of other features too so it might be that there is still some missing changes to those features https://github.com/loft-sh/vcluster/blob/49258d6242885262fb8b6ee67f7c215f9a3ff67d/charts/k3s/values.yaml#L40-L49

@manuelbuil
Copy link
Contributor

BTW, why the title says: Add partial support ? Why partial?

Because Flannel and network policy need to be disabled and separate CNI plugin to be used. Also I did most of the testing with vcluster which does disable a lot of other features too so it might be that there is still some missing changes to those features https://github.com/loft-sh/vcluster/blob/49258d6242885262fb8b6ee67f7c215f9a3ff67d/charts/k3s/values.yaml#L40-L49

I'll test it next week on AWS and see if it works.

Hopefully, in the next weeks/months, we can provide ipv6 support to flannel and kube-router network policy controller, let's see!

@codecov-commenter
Copy link

codecov-commenter commented Dec 10, 2021

Codecov Report

Merging #4450 (9307a80) into master (0c1f816) will decrease coverage by 0.52%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4450      +/-   ##
==========================================
- Coverage   12.03%   11.51%   -0.53%     
==========================================
  Files         135      135              
  Lines        9179     9248      +69     
==========================================
- Hits         1105     1065      -40     
- Misses       7838     7960     +122     
+ Partials      236      223      -13     
Flag Coverage Δ
inttests ?
unittests 11.51% <0.00%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/config/config.go 0.00% <0.00%> (ø)
pkg/agent/run.go 0.00% <0.00%> (ø)
pkg/cli/server/server.go 0.00% <0.00%> (ø)
pkg/daemons/agent/agent_linux.go 0.00% <0.00%> (ø)
pkg/util/net.go 0.00% <0.00%> (ø)
pkg/flock/flock_unix.go 0.00% <0.00%> (-46.67%) ⬇️
tests/util/cmd.go 0.00% <0.00%> (-42.31%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c1f816...9307a80. Read the comment docs.

@manuelbuil
Copy link
Contributor

I tested today in Ubuntu 20 on AWS using nat64 (server: https://www.nat64.net/) to avoid github and dockerhub not working well with ipv6. I deployed a control-plane node and an agent.

control-plane config:

write-kubeconfig-mode: 644
token: "secret"
node-ip: 2a05:d012:c6f:4657:6feb:ca31:3c6f:9d05 
cluster-cidr: 2001:cafe:42:0::/56
service-cidr: 2001:cafe:42:1::/112
disable-network-policy: true
flannel-backend: none

agent config:

server: "https://[2a05:d012:c6f:4657:6feb:ca31:3c6f:9d05]:6443"
token: "secret"
node-ip: 2a05:d012:c6f:4657:d8ea:bc19:4495:9b80

Deployed calico via the tigera operator using the config:

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  # Configures Calico networking.
  calicoNetwork:
    # Note: The ipPools section cannot be modified post-install.
    bgp: Enabled
    ipPools:
    - cidr: 2001:cafe:42:0::/56
      encapsulation: None
      natOutgoing: Enabled
      nodeSelector: all()

I enabled ports in the security groups of AWS for ipv6 (e.g. 5473 for typha, 6443 for k3s...) and also disabled source/destination check (required when there is no encapsulation)

TESTS

  1. Ping between pods in the same node
  2. Ping between pods in different nodes
  3. Access to services via the clusterIP from the host
  4. Access to services via the externalIP provided by the svclb (klipper-lb) from the host

The 4 tests work! 🥳

@j-landru
Copy link

Is there any chance, this PR will be soon validated for a next release. I'd like to build a poc on an IPv6 only k3s cluster. It would be geat if #284 could be closed ?
Thanks

Automatically switch to IPv6 only mode if first node-ip is IPv6 address

Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
@olljanat
Copy link
Contributor Author

@j-landru thanks for reminder. It have been long time waiting on my todo list.

@manuelbuil updated now and hopefully covers all changes which you requested. Note that I don't have IPv6 only lab currently so I was not able to test this anymore but I did my best to make sure that logic should works same as earlier but just moved most of the code to utils.

@brandond brandond requested review from manuelbuil and a team February 10, 2022 19:46
@manuelbuil
Copy link
Contributor

Code looks good! I'll try to test it again as soon as I find some time (Monday at latest). Note that we are already on code freeze, so this feature will need to be part of the March release

@manuelbuil
Copy link
Contributor

I tested ipv6-only, dual-stack and ipv4-only and things work! Of course using the nat64 as explained above and Cilium as the CNI plugin :). +1 from my side but note that this would need to be merged in the March release of k3s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants