Skip to content

use iptables for proxying instead of userspace #3760

Closed
@thockin

Description

@thockin

I was playing with iptables yesterday, and I protoyped (well, copied from Google hits and mutated) a set of iptables rules that essentially do all the proxying for us without help from userspace. It's not urgent, but I want to file my notes before I lose them.

This has the additional nice side-effect (as far as I can tell) of preserving the source IP and being a large net simplification. Now kube-proxy would just need to sync Services -> iptables. This has the downside of not being compatible with older iptables and kernels. We had a problem with this before - at some point we need to decide just how far back in time we care about.

This can probably be optimized further, but in basic testing, I see sticky sessions working and if I comment that part out I see ~equal probability of hitting each backend. I was not able to get deterministic round-robin working properly (with --nth instead of --probability) but we could come back to that if we want.

This sets up a service portal with the backends listed below

iptables -t nat -N TESTSVC
iptables -t nat -F TESTSVC
iptables -t nat -N TESTSVC_A
iptables -t nat -F TESTSVC_A
iptables -t nat -N TESTSVC_B
iptables -t nat -F TESTSVC_B
iptables -t nat -N TESTSVC_C
iptables -t nat -F TESTSVC_C
iptables -t nat -A TESTSVC -m recent --name hostA --rcheck --seconds 1 --reap -j TESTSVC_A
iptables -t nat -A TESTSVC -m recent --name hostB --rcheck --seconds 1 --reap -j TESTSVC_B
iptables -t nat -A TESTSVC -m recent --name hostC --rcheck --seconds 1 --reap -j TESTSVC_C
iptables -t nat -A TESTSVC -m statistic --mode random --probability 0.333 -j TESTSVC_A
iptables -t nat -A TESTSVC -m statistic --mode random --probability 0.500 -j TESTSVC_B
iptables -t nat -A TESTSVC -m statistic --mode random --probability 1.000 -j TESTSVC_C

iptables -t nat -A TESTSVC_A -m recent --name hostA --set -j DNAT -p tcp --to-destination 10.244.4.6:9376
iptables -t nat -A TESTSVC_B -m recent --name hostB --set -j DNAT -p tcp --to-destination 10.244.1.15:9376
iptables -t nat -A TESTSVC_C -m recent --name hostC --set -j DNAT -p tcp --to-destination 10.244.4.7:9376

iptables -t nat -F KUBE-PORTALS-HOST
iptables -t nat -A KUBE-PORTALS-HOST -d 10.0.0.93/32 -m state --state NEW -p tcp -m tcp --dport 80 -j TESTSVC
iptables -t nat -F KUBE-PORTALS-CONTAINER
iptables -t nat -A KUBE-PORTALS-CONTAINER -d 10.0.0.93/32 -m state --state NEW -p tcp -m tcp --dport 80 -j TESTSVC

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority/awaiting-more-evidenceLowest priority. Possibly useful, but not yet enough support to actually get it done.release-noteDenotes a PR that will be considered when it comes time to generate release notes.sig/networkCategorizes an issue or PR as relevant to SIG Network.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions