Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--egress-selector-mode=cluster|pod are broken #7332

Closed
brandond opened this issue Apr 21, 2023 · 3 comments
Closed

--egress-selector-mode=cluster|pod are broken #7332

brandond opened this issue Apr 21, 2023 · 3 comments
Assignees
Milestone

Comments

@brandond
Copy link
Contributor

brandond commented Apr 21, 2023

At some point we picked up a race condition that causes wrangler controller startup to run before the tunnel server handlers are added. If the controller's Pod callbacks aren't registered before the wrangler shared informers start, the tunnel server is unable to watch for pod changes and properly tunnel connections to pods.

When this is working and the egress-selector-mode is set to either cluster or pod, you should see messages like:

DEBU[0080] Tunnel server egress proxy updating Node k3s-agent-1 Pod IP 10.42.0.3/32
DEBU[0081] Tunnel server handing HTTP/1.1 CONNECT request for //10.42.0.3:10250 from 127.0.0.1:60070
DEBU[0081] Tunnel server egress proxy dialing 10.42.0.3:10250 via Session to k3s-agent-1

On recent releases, this is not happening, and the tunnel server is falling back to dialing all endpoints directly, which does not work if the server is not running an agent. I suspect this was caused by #6922 which streamlined the controller startup process a bit.

This would have been caught sooner if we had a test for --disable-agent - we should add one.

@aganesh-suse
Copy link

aganesh-suse commented May 11, 2023

OS: Ubuntu 22.04
Test for "pod" value:
k3s server install with config.yaml:

sudo mkdir -p /etc/rancher/k3s
sudo bash -c 'cat <<EOF> /etc/rancher/k3s/config.yaml
write-kubeconfig-mode: "0644"
debug: true
token: secret
cluster-init: true
egress-selector-mode: pod
EOF'

Installed using:

curl -fL https://get.k3s.io| INSTALL_K3S_COMMIT=8f450bafe1cad0e962e521d56eb74a38a73722c7 sh -s - server

Verified that journal log messages contained:

May 11 20:30:57 ip-172-31-38-203 k3s[69072]: time="2023-05-11T20:30:57Z" level=debug msg="Tunnel server egress proxy updating Node ip-172-31-38-203 Pod IP 10.42.0.3/32"
May 11 20:30:57 ip-172-31-38-203 k3s[69072]: time="2023-05-11T20:30:57Z" level=debug msg="Tunnel authorizer adding Pod IP 10.42.0.3/32"
May 11 20:30:59 ip-172-31-38-203 k3s[69072]: time="2023-05-11T20:30:59Z" level=debug msg="Tunnel server egress proxy updating Node ip-172-31-38-203 IP 172.31.38.203/32"
May 11 20:31:06 ip-172-31-38-203 k3s[69072]: time="2023-05-11T20:31:06Z" level=debug msg="Tunnel server egress proxy updating Node ip-172-31-38-203 Pod IP 10.42.0.5/32"
May 11 20:31:06 ip-172-31-38-203 k3s[69072]: time="2023-05-11T20:31:06Z" level=debug msg="Tunnel authorizer adding Pod IP 10.42.0.5/32"
May 11 20:31:06 ip-172-31-38-203 k3s[69072]: time="2023-05-11T20:31:06Z" level=debug msg="Tunnel server handing HTTP/1.1 CONNECT request for //10.42.0.5:10250 from 127.0.0.1:49960"
May 11 20:31:06 ip-172-31-38-203 k3s[69072]: time="2023-05-11T20:31:06Z" level=debug msg="Tunnel server egress proxy dialing 10.42.0.5:10250 directly"

@aganesh-suse
Copy link

Test for egress-selector-mode: cluster option:

cat /etc/rancher/k3s/config.yaml
write-kubeconfig-mode: "0644"
debug: true
token: secret
cluster-init: true
egress-selector-mode: cluster

Verified that the journal logs had:

May 11 20:51:41 ip-172-31-38-203 k3s[72313]: time="2023-05-11T20:51:41Z" level=debug msg="Tunnel server handing HTTP/1.1 CONNECT request for //10.42.0.4:10250 from 127.0.0.1:45306"
May 11 20:51:41 ip-172-31-38-203 k3s[72313]: time="2023-05-11T20:51:41Z" level=debug msg="Tunnel server egress proxy dialing 10.42.0.4:10250 directly"
May 11 20:52:19 ip-172-31-38-203 k3s[72313]: time="2023-05-11T20:52:19Z" level=debug msg="Tunnel server egress proxy updating Node ip-172-31-38-203 IP 172.31.38.203/32"
May 11 20:56:41 ip-172-31-38-203 k3s[72313]: time="2023-05-11T20:56:41Z" level=debug msg="Tunnel server handing HTTP/1.1 CONNECT request for //10.42.0.4:10250 from 127.0.0.1:36776"
May 11 20:56:41 ip-172-31-38-203 k3s[72313]: time="2023-05-11T20:56:41Z" level=debug msg="Tunnel server egress proxy dialing 10.42.0.4:10250 directly"
May 11 20:57:26 ip-172-31-38-203 k3s[72313]: time="2023-05-11T20:57:26Z" level=debug msg="Tunnel server egress proxy updating Node ip-172-31-38-203 IP 172.31.38.203/32"
May 11 21:01:41 ip-172-31-38-203 k3s[72313]: time="2023-05-11T21:01:41Z" level=debug msg="Tunnel server handing HTTP/1.1 CONNECT request for //10.42.0.4:10250 from 127.0.0.1:36180"

@jacksgt
Copy link

jacksgt commented May 12, 2023

I can confirm that this is fixed with v1.25.8+ - thanks for the fix!
#7064

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants