Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add masquerade mode #782

Merged
merged 1 commit into from
Apr 2, 2024
Merged

Add masquerade mode #782

merged 1 commit into from
Apr 2, 2024

Conversation

lou-lan
Copy link
Contributor

@lou-lan lou-lan commented Mar 8, 2024

Additional configuration list

Configuration Key Type Unit Default Options Required
backend_health_check_interval int seconds 5 No
iptables_backend string nft, legacy No
masquerade_mark string 0x1119 No
  • iptables_backend: Although we already have egress_withnftables, it is not sufficiently clear for users who want to configuremasquerade mode. The newly added field does not require user specification by default; the program will determine it automatically.
  • masquerade_mark: we change iptable rule to iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.113.200 --vport 8443 -j MASQUERADE, so remove this configration item, thanks @wyike .

Required Permissions

securityContext:
  privileged: true

We require the necessary permissions to set net.ipv4.vs.conntrack=1.

Usage

apiVersion: v1
kind: Pod
metadata:
  name: kube-vip
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    k8s-app: kube-vip
spec:
  containers:
    - args:
        - manager
      env:
        - name: vip_arp
          value: "True"
        - name: port
          value: "6443"
        - name: vip_cidr
          value: "32"
        - name: cp_enable
          value: "true"
        - name: cp_namespace
          value: kube-system
        - name: vip_ddns
          value: "False"
        - name: vip_leaderelection
          value: "true"
        - name: vip_leaseduration
          value: "5"
        - name: vip_renewdeadline
          value: "3"
        - name: vip_retryperiod
          value: "1"
        - name: address
          value: "172.16.25.136"
        - name: lb_enable
          value: "true"
        - name: lb_fwdmethod
          value: "masquerade"
        - name: "kubernetes_addr"
          value: "172.16.25.131:6443" # current node api-server
      image: "" 
      imagePullPolicy: IfNotPresent
      name: kube-vip
      resources: {}
      securityContext:
        privileged: true
      volumeMounts:
        - mountPath: /etc/kubernetes/admin.conf
          name: kubeconfig
  hostNetwork: true
  volumes:
    - hostPath:
        path: /etc/kubernetes/admin.conf
      name: kubeconfig

Test

The cluster I am testing has 3 nodes.

$ kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
workstation1   Ready    control-plane   47h   v1.28.7
workstation2   Ready    control-plane   41h   v1.28.7
workstation3   Ready    control-plane   39h   v1.28.7

172.16.25.136 is a VIP address in my cluster. He is currently located at node workstation1.

$ ip a | grep 136 -B 5
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:d8:0d:79 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
    inet 172.16.25.131/24 metric 100 brd 172.16.25.255 scope global dynamic ens160
       valid_lft 1293sec preferred_lft 1293sec
    inet 172.16.25.136/32 scope global ens160
$ sudo ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.16.25.136:6443 rr
  -> 172.16.25.131:6443           Masq    1      4          0
  -> 172.16.25.132:6443           Masq    1      0          0
  -> 172.16.25.133:6443           Masq    1      0          0

When I run kubectl get pods -n some-not-found-ns, I can observe specific logs in each kube-api-server.

I0313 07:04:01.881205       1 httplog.go:132] "HTTP" verb="LIST" URI="/api/v1/namespaces/some-not-found-ns/pods?limit=500" latency="2.603368ms" userAgent="kubectl/v1.29.2 (linux/arm64) kubernetes/4b8e819" audit-ID="bb650a25-5565-413e-8616-ca7402b7861b" srcIP="172.16.25.136:34090" apf_pl="exempt" apf_fs="exempt" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_execution_time="1.926017ms" resp=200

And then i am shutdown workstation1, VIP is move to workstation3.

$ sudo ipvsadm -Ln
[sudo] password for node:
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.16.25.136:6443 rr
  -> 172.16.25.132:6443           Masq    1      0          2
  -> 172.16.25.133:6443           Masq    1      2          0

In the above log, we can see 172.16.25.131 is removed.

@lou-lan lou-lan requested a review from thebsdbox as a code owner March 8, 2024 12:48
@thebsdbox
Copy link
Collaborator

woah, we don't need to have /vendor :) as much as I love a bit PR.

@lou-lan lou-lan force-pushed the fix/masquerade branch 2 times, most recently from 17735c6 to 3bd1635 Compare March 11, 2024 02:04
@lou-lan lou-lan changed the title WIP: Add masquerade code WIP: Add masquerade mode Mar 11, 2024
@lou-lan lou-lan force-pushed the fix/masquerade branch 6 times, most recently from bdafd39 to 9f793cf Compare March 11, 2024 13:44
@thebsdbox
Copy link
Collaborator

Love this! Other than some linting, looks good.

@lou-lan
Copy link
Contributor Author

lou-lan commented Mar 11, 2024

Love this! Other than some linting, looks good.

I still need to spend some time making minor modifications and testing.

@lou-lan lou-lan force-pushed the fix/masquerade branch 3 times, most recently from d413c09 to 243a28a Compare March 13, 2024 03:51
@lou-lan lou-lan changed the title WIP: Add masquerade mode Add masquerade mode Mar 13, 2024
@lou-lan lou-lan force-pushed the fix/masquerade branch 2 times, most recently from 0bb11ea to 425ac7a Compare March 13, 2024 06:52
@lou-lan
Copy link
Contributor Author

lou-lan commented Mar 13, 2024

@thebsdbox cc. This PR is now ready and can be reviewed.

@@ -350,6 +352,14 @@ var kubeVipManager = &cobra.Command{
}
}

if initConfig.LoadBalancerForwardingMethod == "masquerade" {
log.Infof("sysctl set net.ipv4.vs.conntrack to 1")
err := sysctl.WriteProcSys("/proc/sys/net/ipv4/vs/conntrack", "1")
Copy link
Contributor

@wyike wyike Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's better or nice to have to also add sysctl.WriteProcSys("/proc/sys/net/ipv4/ip_forward", "1")? If it is 0, the packet still cannot be forwarded.

Copy link
Contributor Author

@lou-lan lou-lan Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's better or nice to have to also add sysctl.WriteProcSys("/proc/sys/net/ipv4/ip_forward", "1")? If it is 0, the packet still cannot be forwarded.

Yes, if it is 0, packets will not be forwarded. If kube-vip does not handle it, users who need to use this mode must add it manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add it? How do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add it? How do you think?

I think kube-vip doesn't need to set ip_forward. When using Kubernetes, other components or scripts typically set this parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh got it. For example kube-proxy etc. things always need ip_forward is 1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason this would be a potential issue is Cilium and it's kube-proxy replacement

// addIptablesRulesForMasquerade add iptables rules for MASQUERADE
// insert example
// sudo iptables -t mangle -I PREROUTING -d 10.1.105.1 -p tcp --dport 6443 -j MARK --set-xmark 0x1119
// sudo iptables -t nat -I POSTROUTING -m mark --mark 0x1119 -j MASQUERADE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Lan, just curious:
I use iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.113.200 --vport 8443 -j MASQUERADE to configure masquerade directly, learning from some materials from internet.
Is there some additional benefit of one of both, if comparing these two methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Lan, just curious: I use iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.113.200 --vport 8443 -j MASQUERADE to configure masquerade directly, learning from some materials from internet. Is there some additional benefit of one of both, if comparing these two methods?

I haven't used the rule you mentioned; it seems more suitable for ipvs. I will test it tomorrow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this rule, can anyone shed some light?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Lan, just curious: I use iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.113.200 --vport 8443 -j MASQUERADE to configure masquerade directly, learning from some materials from internet. Is there some additional benefit of one of both, if comparing these two methods?

For kube-vip, the command iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.113.200 --vport 8443 -j MASQUERADE is more appropriate and eliminates the need to configure the mark parameter. I have updated the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this rule, can anyone shed some light?

image

The purpose of the 2 IP rules is the same; they both masquerade the traffic when the vip node accesses the api-server on another node with MASQUERADE.

@thebsdbox
Copy link
Collaborator

Apologies for the delay in review, I've been on PTO. Back now, and it's KubeCon.. but I'll be as quick as I can.

Signed-off-by: lou-lan <loulan@loulan.me>
@thebsdbox thebsdbox merged commit f1cf044 into kube-vip:main Apr 2, 2024
9 checks passed
// Return our created load-balancer
return lb, nil
}

func (lb *IPVSLoadBalancer) RemoveIPVSLB() error {
close(lb.stop)
Copy link
Contributor

@wyike wyike Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lou-lan just curious, has the cleanup on iptabales, ipvs and VIP ever worked in your setup if you delete kube-vip?


func delMasqueradeRuleForVIP(ipt *iptables.IPTables, vip, comment string) error {
err := ipt.DeleteIfExists(iptables.TableNat, iptables.ChainPOSTROUTING,
"-d", "-m", "ipvs", "--vaddr", vip, "-j", "MASQUERADE", "-m", "comment", "--comment", comment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one bug here is there is no "-d" in deletion.

Copy link
Contributor Author

@lou-lan lou-lan Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one bug here is there is no "-d" in deletion.

Yes, I was negligent here. When switching from the old iptables (MASQUERADE processing rules) to the new way of writing, I failed to remove this parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants