New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd 245 (channel 2605) breaks cilium pod to out-of-node traffic #181
Comments
The rules were added in systemd-245 and break cluster networking, e.g. cilium. Please see flatcar/Flatcar#181 for details. Signed-off-by: Thilo Fromm <thilo@kinvolk.io>
The rules were added in systemd-245 and break cluster networking, e.g. cilium. Please see flatcar/Flatcar#181 for details. Signed-off-by: Thilo Fromm <thilo@kinvolk.io>
On my setup (bare metal, calico), with stable flatcar, rp_filter is all set to 1, which is more strict than 2, which happens on beta. Connectivity on stable is fine though. So it seems for me, at least for calico, rp_filter is not (alone) the culprit. Example from stable:
|
Today I found out that my connectivity issues have been caused by a different change - which I was able to solve now and which are unrelated to the FlatCar update. 2605.2.0 looks good currently in my VirtualBox setup, will got to bare metal next. |
The rules were added in systemd-245 and break cluster networking, e.g. cilium. Please see flatcar/Flatcar#181 for details. Signed-off-by: Thilo Fromm <thilo@kinvolk.io>
Can we close this now or is there any remaining issue? |
The rules were added in systemd-245 and break cluster networking, e.g. cilium. Please see flatcar/Flatcar#181 for details. Signed-off-by: Thilo Fromm <thilo@kinvolk.io>
The issue is actually that due to the sorting, the baselayout settings get applied after |
PR here with a ordering at 60 that allows us to overwrite systemd defaults: flatcar/baselayout#13 |
The systemd changes that we did before were not enough not only because of the ordering but also because they did not set |
This is fixed in the latest Stable release (2605.10.0) |
Description
The latest flatcar move to systemd 245 introduces this problem with cilium: cilium/cilium#10645
A quick fix is to create a systemd override: echo 'net.ipv4.conf.lxc*.rp_filter = 0' | sudo tee -a /etc/sysctl.d/90-override.conf && sudo systemctl restart systemd-sysctl
Or via ignition:
Impact
This broke / breaks our production clusters after moving to the channel. A previously healthy / functioning cluster after this update can no longer respond to inbound traffic (can't ack back) and components that require external connectivity to function no longer work. Inter-node / inter-cluster traffic stays the same.
Environment and steps to reproduce
Cilium 1.8.2
Flatcar Beta or Alpha channel 2605
Kubernetes 1.18.8
Pod egress traffic fails
Moving from previous flatcar version to new 2605 channel versions with systemd 245
No direct errors are reported by the system directly; traffic failing to egress only generates more synthetic / business-specific alerts regarding service connectivity
Expected behavior
I would have expected these filters to be dropped from sysctl settings similar to debian:
The text was updated successfully, but these errors were encountered: