Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cilium] Full Kube proxy replacement not working #1210

Closed
byRoadrunner opened this issue Feb 10, 2024 Discussed in #1199 · 22 comments · Fixed by #1222
Closed

[Cilium] Full Kube proxy replacement not working #1210

byRoadrunner opened this issue Feb 10, 2024 Discussed in #1199 · 22 comments · Fixed by #1222

Comments

@byRoadrunner
Copy link

byRoadrunner commented Feb 10, 2024

If further information is required, I will be happy to provide it.

Discussed in #1199

Originally posted by byRoadrunner January 31, 2024
Hi,

I'm trying to use cilium in complete kube proxy free mode (https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/).
For this I disabled the k3s kube proxy:

k3s_exec_server_args = "--disable-kube-proxy"

But this ends in the following error:

module.kube-hetzner.null_resource.kustomization (remote-exec): + echo 'Waiting for the system-upgrade-controller deployment to become available...'
module.kube-hetzner.null_resource.kustomization (remote-exec): Waiting for the system-upgrade-controller deployment to become available...
module.kube-hetzner.null_resource.kustomization (remote-exec): + kubectl -n system-upgrade wait --for=condition=available --timeout=360s deployment/system-upgrade-controller
module.kube-hetzner.null_resource.kustomization: Still creating... [20s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [30s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [50s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m10s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m20s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m30s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m50s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m10s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m20s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m30s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m50s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [3m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [3m10s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [3m20s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [3m30s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [3m40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [3m50s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [4m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [4m10s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [4m20s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [4m30s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [4m40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [4m50s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [5m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [5m10s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [5m20s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [5m30s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [5m40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [5m50s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [6m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [6m10s elapsed]
module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller
╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.kustomization,
│   on .terraform/modules/kube-hetzner/init.tf line 288, in resource "null_resource" "kustomization":
│  288:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_140755124.sh": Process exited with status 1

Probably cilium is deployed afterwards and networking does not correctly work at this point, so the nodes are unschedulable?

Has anyone done this before? Do you have any advise on how to do this? Or is this a bug which should not happen and should be moved to an issue?

Thanks!

@lpellegr
Copy link

I am getting the same error while simply trying to use cilium with this option:

cni_plugin = "cilium"

@pvtrn
Copy link

pvtrn commented Feb 12, 2024

when I install for the first time with default settings I get the same error

@spmse
Copy link

spmse commented Feb 17, 2024

Receiving the same error (see logs) when trying to create a new cluster but without backup_kustomization. As far as I could track it down in the source, this might be related to the presence or absence1 of any values for the remote-exec handler.

1the provisioner has dependencies to multiple values, but also available resources, e.g. load-balancer, or volumes. I could also recognize that some of the csi-nodes seem to be stuck when being created which might be related to the provisioner hanging up. Expand the following section to see code snippet

Details

depends_on = [
    hcloud_load_balancer.cluster,
    null_resource.control_planes,
    random_password.rancher_bootstrap,
    hcloud_volume.longhorn_volume
]

module.kube-hetzner.null_resource.kustomization: Still creating... [6m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [6m10s elapsed]
module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller
╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.kustomization,
│   on .terraform/modules/kube-hetzner/init.tf line 288, in resource "null_resource" "kustomization":
│  288:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1207280219.sh": Process exited with status 1

Although this error occurs, it seems the resources are prepared and the cluster is reachable.

$ k get nodes
NAME                                                        STATUS   ROLES                       AGE   VERSION
training-shared-cluster-agent-large-bxf          Ready    <none>                      28m   v1.28.6+k3s2
training-shared-cluster-agent-large-cnb          Ready    <none>                      28m   v1.28.6+k3s2
training-shared-cluster-agent-large-ddd          Ready    <none>                      28m   v1.28.6+k3s2
training-shared-cluster-agent-large-iui          Ready    <none>                      28m   v1.28.6+k3s2
training-shared-cluster-agent-large-qtp          Ready    <none>                      28m   v1.28.6+k3s2
training-shared-cluster-agent-large-ric          Ready    <none>                      28m   v1.28.6+k3s2
training-shared-cluster-agent-large-rpr          Ready    <none>                      28m   v1.28.6+k3s2
training-shared-cluster-control-plane-fsn1-fdf   Ready    control-plane,etcd,master   28m   v1.28.6+k3s2
training-shared-cluster-control-plane-fsn1-gvp   Ready    control-plane,etcd,master   27m   v1.28.6+k3s2
training-shared-cluster-control-plane-fsn1-uli   Ready    control-plane,etcd,master   27m   v1.28.6+k3s2

Other than that, I noticed that on each node there will be an instance of cilium with multiple restarts and 0/1 ready pods:

$ k get pods -n kube-system
NAMESPACE        NAME                                              READY   STATUS              RESTARTS        AGE
kube-system      cilium-9jp8p                                      1/1     Running             0               27m
kube-system      cilium-chbwp                                      0/1     Running             9 (5m9s ago)    27m
kube-system      cilium-czp8w                                      1/1     Running             0               27m
kube-system      cilium-jv7cz                                      1/1     Running             0               27m
kube-system      cilium-mcmft                                      0/1     Running             8 (33s ago)     27m
kube-system      cilium-ns945                                      1/1     Running             0               27m
kube-system      cilium-operator-f5dcdcc8d-prm4z                   1/1     Running             0               27m
kube-system      cilium-operator-f5dcdcc8d-wpf6n                   1/1     Running             0               27m
kube-system      cilium-qgtql                                      1/1     Running             0               27m
kube-system      cilium-svjx2                                      0/1     Running             9 (5m32s ago)   27m
kube-system      cilium-t9r7x                                      1/1     Running             0               27m
kube-system      cilium-zsmkr                                      1/1     Running             0               27m

@mysticaltech
Copy link
Collaborator

mysticaltech commented Feb 20, 2024

@M4t7e @Silvest89 Any ideas this issue?

@Silvest89
Copy link
Contributor

Silvest89 commented Feb 20, 2024

@byRoadrunner What makes you think that the current implementation with Cilium has kube-proxy? My cluster is kube-proxy-free without the need to use --disable-kube-proxy

@mysticaltech
Haven't had to bootstrap a cluster from scratch in a while :P If it happens even with default settings, it will need to be looked into

@byRoadrunner
Copy link
Author

byRoadrunner commented Feb 20, 2024

@Silvest89
At the last time when I checked it with this validation part (https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#validate-the-setup) I still got an iptables result. This led me to the conclusion that the cluster is still running with kube-proxy coexisting (https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#kube-proxy-hybrid-modes). If I made some mistake feel free to correct me, but I want the completely standalone mode, if that's possible.

@Silvest89
Copy link
Contributor

@byRoadrunner
There is no kube-proxy POD, so it means your system is kube-proxy free. Where did you execute the iptables the command? On your own computer? :P

@byRoadrunner
Copy link
Author

@Silvest89 definitely not on my own computer 😉
It's some time ago since I tested so I can't remember the exact details, I will try to test again this evening. But did you check the iptables rules on your system?

@Silvest89
Copy link
Contributor

@byRoadrunner
It's not so long ago that @M4t7e did a rewrite of the cilium part. So yeah it is kube-proxy-free when using Cilium as CNI. No comments when making use of the other CNI's.

@mysticaltech
Copy link
Collaborator

@Silvest89 Thanks for the clarifications, will have a look.

@kube-hetzner/core FYi, if you have any ideas.

@byRoadrunner
Copy link
Author

byRoadrunner commented Feb 20, 2024

@byRoadrunner It's not so long ago that @M4t7e did a rewrite of the cilium part. So yeah it is kube-proxy-free when using Cilium as CNI. No comments when making use of the other CNI's.

I definitely used the latest available version for this testing, which was, and still is, v2.11.8.
It was not about the kube proxy replacement not working. It was about the iptables rules still present which should not be the case if using full kube-proxy replacement mode.
iptables-save | grep KUBE-SVC
This still returned rules if I had a NodePort Service running on the cluster.
Anyways I will test this evening if I did a mistake or if I can still replicate this behaviour. As soon as I know more i will post a follow up 👍

@byRoadrunner
Copy link
Author

@Silvest89 just to clarify, a standard installation with just changing the cni to cilium should be kube-proxy free?

@Silvest89
Copy link
Contributor

@Silvest89 just to clarify, a standard installation with just changing the cni to cilium should be kube-proxy free?

Yes.

@byRoadrunner
Copy link
Author

byRoadrunner commented Feb 20, 2024

Now I'm getting the same error like before/like the others but with complete default installation (only cni set to cilium).

module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller

EDIT: Ignore this, it was my fault, i forgot to increase the server_type from the defaults (which is needed for cilium)

@byRoadrunner
Copy link
Author

So I just tested and when running iptables-save | grep KUBE-SVC on the node on which the services are running I get the following:

k3s-agent-large-qtr:~ # iptables-save | grep KUBE-SVC
:KUBE-SVC-E3IBCFULSWKQCT47 - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-JD5MR3NA4I4DYORP - [0:0]
:KUBE-SVC-L65ENXXZWWSAPRCR - [0:0]
:KUBE-SVC-LODJXQNF3DWSNB7B - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-RY6ZSH2GAUYGLHMF - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
:KUBE-SVC-UIRTXPNS5NKAPNTY - [0:0]
:KUBE-SVC-UZ2GNDIHHRV7XITW - [0:0]
:KUBE-SVC-Z4ANX4WAEWEBLCTM - [0:0]
:KUBE-SVC-ZUD4L6KQKCHD52W4 - [0:0]
-A KUBE-EXT-L65ENXXZWWSAPRCR -j KUBE-SVC-L65ENXXZWWSAPRCR
-A KUBE-EXT-LODJXQNF3DWSNB7B -j KUBE-SVC-LODJXQNF3DWSNB7B
-A KUBE-EXT-UIRTXPNS5NKAPNTY -j KUBE-SVC-UIRTXPNS5NKAPNTY
-A KUBE-SERVICES -d 10.43.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -d 10.43.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SERVICES -d 10.43.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES -d 10.43.188.115/32 -p tcp -m comment --comment "kube-system/hcloud-csi-controller-metrics:metrics cluster IP" -m tcp --dport 9189 -j KUBE-SVC-RY6ZSH2GAUYGLHMF
-A KUBE-SERVICES -d 10.43.33.175/32 -p tcp -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics cluster IP" -m tcp --dport 9189 -j KUBE-SVC-UZ2GNDIHHRV7XITW
-A KUBE-SERVICES -d 10.43.186.42/32 -p tcp -m comment --comment "traefik/traefik:web cluster IP" -m tcp --dport 80 -j KUBE-SVC-UIRTXPNS5NKAPNTY
-A KUBE-SERVICES -d 10.43.186.42/32 -p tcp -m comment --comment "traefik/traefik:websecure cluster IP" -m tcp --dport 443 -j KUBE-SVC-LODJXQNF3DWSNB7B
-A KUBE-SERVICES -d 10.43.132.82/32 -p tcp -m comment --comment "cert-manager/cert-manager-webhook:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-ZUD4L6KQKCHD52W4
-A KUBE-SERVICES -d 10.43.91.27/32 -p tcp -m comment --comment "cert-manager/cert-manager:tcp-prometheus-servicemonitor cluster IP" -m tcp --dport 9402 -j KUBE-SVC-E3IBCFULSWKQCT47
-A KUBE-SERVICES -d 10.43.33.192/32 -p tcp -m comment --comment "kube-system/metrics-server:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-Z4ANX4WAEWEBLCTM
-A KUBE-SERVICES -d 10.43.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES -d 10.43.252.212/32 -p tcp -m comment --comment "default/my-nginx cluster IP" -m tcp --dport 80 -j KUBE-SVC-L65ENXXZWWSAPRCR
-A KUBE-SVC-E3IBCFULSWKQCT47 ! -s 10.42.0.0/16 -d 10.43.91.27/32 -p tcp -m comment --comment "cert-manager/cert-manager:tcp-prometheus-servicemonitor cluster IP" -m tcp --dport 9402 -j KUBE-MARK-MASQ
-A KUBE-SVC-E3IBCFULSWKQCT47 -m comment --comment "cert-manager/cert-manager:tcp-prometheus-servicemonitor -> 10.42.0.46:9402" -j KUBE-SEP-I2YMRF6X5XNTHNZY
-A KUBE-SVC-ERIFXISQEP7F7OF4 ! -s 10.42.0.0/16 -d 10.43.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp -> 10.42.5.127:53" -j KUBE-SEP-YJQITTE5EFTFYMQA
-A KUBE-SVC-JD5MR3NA4I4DYORP ! -s 10.42.0.0/16 -d 10.43.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics -> 10.42.5.127:9153" -j KUBE-SEP-UQZEKWHFV7M5EZRG
-A KUBE-SVC-L65ENXXZWWSAPRCR ! -s 10.42.0.0/16 -d 10.43.252.212/32 -p tcp -m comment --comment "default/my-nginx cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SVC-L65ENXXZWWSAPRCR -m comment --comment "default/my-nginx -> 10.42.0.29:80" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-7GYOAUIAMQUOKVDR
-A KUBE-SVC-L65ENXXZWWSAPRCR -m comment --comment "default/my-nginx -> 10.42.4.79:80" -j KUBE-SEP-IIS32MVGMNZKV3T6
-A KUBE-SVC-LODJXQNF3DWSNB7B ! -s 10.42.0.0/16 -d 10.43.186.42/32 -p tcp -m comment --comment "traefik/traefik:websecure cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-LODJXQNF3DWSNB7B -m comment --comment "traefik/traefik:websecure -> 10.42.0.145:8443" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-CB7E2YRDT4Z4QBTW
-A KUBE-SVC-LODJXQNF3DWSNB7B -m comment --comment "traefik/traefik:websecure -> 10.42.3.82:8443" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YIDA72PITHMP23YL
-A KUBE-SVC-LODJXQNF3DWSNB7B -m comment --comment "traefik/traefik:websecure -> 10.42.5.85:8443" -j KUBE-SEP-FBV2XYRHLSEUTYAL
-A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s 10.42.0.0/16 -d 10.43.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> 10.253.0.101:6443" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-DYW6NS6B3Z6DCBYV
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> 10.254.0.101:6443" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-C4RBGSB4EWYWEBUX
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> 10.255.0.101:6443" -j KUBE-SEP-4VCRLYVIMEEKDPZE
-A KUBE-SVC-RY6ZSH2GAUYGLHMF ! -s 10.42.0.0/16 -d 10.43.188.115/32 -p tcp -m comment --comment "kube-system/hcloud-csi-controller-metrics:metrics cluster IP" -m tcp --dport 9189 -j KUBE-MARK-MASQ
-A KUBE-SVC-RY6ZSH2GAUYGLHMF -m comment --comment "kube-system/hcloud-csi-controller-metrics:metrics -> 10.42.5.54:9189" -j KUBE-SEP-66GUL3YV6P7YZ4PQ
-A KUBE-SVC-TCOU7JCQXEZGVUNU ! -s 10.42.0.0/16 -d 10.43.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns -> 10.42.5.127:53" -j KUBE-SEP-EOVD533DIZYTLRLX
-A KUBE-SVC-UIRTXPNS5NKAPNTY ! -s 10.42.0.0/16 -d 10.43.186.42/32 -p tcp -m comment --comment "traefik/traefik:web cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SVC-UIRTXPNS5NKAPNTY -m comment --comment "traefik/traefik:web -> 10.42.0.145:8000" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-7GIJ33LAD7V5FOMW
-A KUBE-SVC-UIRTXPNS5NKAPNTY -m comment --comment "traefik/traefik:web -> 10.42.3.82:8000" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-EF5ZOLPEVQ6KPKV5
-A KUBE-SVC-UIRTXPNS5NKAPNTY -m comment --comment "traefik/traefik:web -> 10.42.5.85:8000" -j KUBE-SEP-KEEKTLA2JCCDM4TY
-A KUBE-SVC-UZ2GNDIHHRV7XITW ! -s 10.42.0.0/16 -d 10.43.33.175/32 -p tcp -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics cluster IP" -m tcp --dport 9189 -j KUBE-MARK-MASQ
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.0.227:9189" -m statistic --mode random --probability 0.12500000000 -j KUBE-SEP-OTIHTLUJMKCO6SJF
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.1.23:9189" -m statistic --mode random --probability 0.14285714272 -j KUBE-SEP-JPR2HJKAN3EI7DKT
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.2.97:9189" -m statistic --mode random --probability 0.16666666651 -j KUBE-SEP-LGKW3HMYZ2EVIUFP
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.3.55:9189" -m statistic --mode random --probability 0.20000000019 -j KUBE-SEP-OWAFQ4MLCJKHS6VS
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.4.67:9189" -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-WXXIE2DBZOKGLXOH
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.5.96:9189" -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-H6Q4WP6KZPHOWVUC
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.6.191:9189" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YL5OSEYGQQS3VPSO
-A KUBE-SVC-UZ2GNDIHHRV7XITW -m comment --comment "kube-system/hcloud-csi-node-metrics:metrics -> 10.42.7.160:9189" -j KUBE-SEP-N5CVT4WDXUVTBOM4
-A KUBE-SVC-Z4ANX4WAEWEBLCTM ! -s 10.42.0.0/16 -d 10.43.33.192/32 -p tcp -m comment --comment "kube-system/metrics-server:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-Z4ANX4WAEWEBLCTM -m comment --comment "kube-system/metrics-server:https -> 10.42.5.140:10250" -j KUBE-SEP-ELRRLVYKTAQW7W2Q
-A KUBE-SVC-ZUD4L6KQKCHD52W4 ! -s 10.42.0.0/16 -d 10.43.132.82/32 -p tcp -m comment --comment "cert-manager/cert-manager-webhook:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-ZUD4L6KQKCHD52W4 -m comment --comment "cert-manager/cert-manager-webhook:https -> 10.42.3.106:10250" -j KUBE-SEP-RTVYB7WCGOMJZGQE

But according to Cilium documentation (https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#validate-the-setup) this should just return an empty line.

@mysticaltech
Copy link
Collaborator

mysticaltech commented Feb 21, 2024

@byRoadrunner How do you do the kube-proxy replacement? Via the cilium_values var? It should be done there! See the cilium-values.yaml for the options available, you can find the link in kube.tf.example.

Then look at the examples section in the readme about how to get info about the cilium install, it should say kube-proxy free mode or something when you run that.

@byRoadrunner
Copy link
Author

byRoadrunner commented Feb 21, 2024

@mysticaltech kube-proxy replacement is already the default in the helm values which are deployed (

kubeProxyReplacement: true
)

You are right about the part that it says KubeProxyReplacement: True when entering cilium status. But thats not the problem I'm having.

In my previous comments i mentioned the hybrid mode and the validation steps provided by cilium, and if there are still KUBE-SVC rules in ip-tables there is something not right. And then we would be running in hybrid mode, which I dont want.
grafik

@M4t7e
Copy link
Contributor

M4t7e commented Feb 21, 2024

Hey @byRoadrunner, you're right about your assumption that the current replacement is a hybrid solution. Kube Proxy is still running in background and manages a few functionalities.

I'm currently working on a new PR to update Cilium to the 1.15 release and I can include the full kube-proxy replacement as well. I already have a working setup, but I want to test a few more things. I'll probably file the PR tomorrow.

@mysticaltech
Copy link
Collaborator

Great to hear from you @M4t7e 🙏

@byRoadrunner
Copy link
Author

@M4t7e works like a charm, no more KUBE-SVC rules, thanks!

@maggie44
Copy link
Contributor

maggie44 commented Sep 10, 2024

@Silvest89 just to clarify, a standard installation with just changing the cni to cilium should be kube-proxy free?

Yes.

@Silvest89 @M4t7e is this part of the README no longer true then:

Cilium supports full kube-proxy replacement. Cilium runs by default in hybrid kube-proxy replacement mode. To achieve a completely kube-proxy-free cluster, set disable_kube_proxy = true.

is this related? #1267

I am looking to upgrade an existing cluster from the current default flannel to Cilium and its a bit confusing what the config should be.

@M4t7e
Copy link
Contributor

M4t7e commented Sep 11, 2024

@maggie44 I added a comment to the issue: #1267 (comment)

Cilium was not properly configured to take over the full kube-proxy functionality. If you don't craft your own cilium_values, it should work by default. Otherwise, you must ensure the configuration is properly done on your own.

I am looking to upgrade an existing cluster from the current default flannel to Cilium and its a bit confusing what the config should be.

If you want to replace the kube-proxy with Cilium by setting disable_kube_proxy = true, then Cilium needs this configured to take over full kube proxy functionality:

k8sServiceHost: "127.0.0.1"
k8sServicePort: "6444"

This is already the default configuration used when you do not specify custom cilium_values:

# Access to Kube API Server (mandatory if kube-proxy is disabled)
k8sServiceHost: "127.0.0.1"
k8sServicePort: "6444"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants