Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd 245 (channel 2605) breaks cilium pod to out-of-node traffic #181

Closed
matoszz opened this issue Sep 4, 2020 · 7 comments
Closed
Labels
channel/alpha Issue concerns the Alpha channel. channel/beta Issue concerns the Beta channel. kind/bug Something isn't working

Comments

@matoszz
Copy link

matoszz commented Sep 4, 2020

Description

The latest flatcar move to systemd 245 introduces this problem with cilium: cilium/cilium#10645

A quick fix is to create a systemd override: echo 'net.ipv4.conf.lxc*.rp_filter = 0' | sudo tee -a /etc/sysctl.d/90-override.conf && sudo systemctl restart systemd-sysctl

Or via ignition:

    - path: /etc/sysctl.d/90-override.conf
      filesystem: root
      mode: 420
      contents:
        inline: net.ipv4.conf.lxc*.rp_filter = 0

Impact

This broke / breaks our production clusters after moving to the channel. A previously healthy / functioning cluster after this update can no longer respond to inbound traffic (can't ack back) and components that require external connectivity to function no longer work. Inter-node / inter-cluster traffic stays the same.

Environment and steps to reproduce

  1. Set-up:

Cilium 1.8.2
Flatcar Beta or Alpha channel 2605
Kubernetes 1.18.8

  1. Task:

Pod egress traffic fails

  1. Action(s):

Moving from previous flatcar version to new 2605 channel versions with systemd 245

  1. Error:

No direct errors are reported by the system directly; traffic failing to egress only generates more synthetic / business-specific alerts regarding service connectivity

Expected behavior

I would have expected these filters to be dropped from sysctl settings similar to debian:

$ grep -R rp_filter systemd_245.4-4ubuntu3.2/debian/patches/*
systemd_245.4-4ubuntu3.2/debian/patches/debian/UBUNTU-drop-kernel.-settings-from-sysctl-defaults-shipped.patch:-net.ipv4.conf.default.rp_filter = 2
systemd_245.4-4ubuntu3.2/debian/patches/debian/UBUNTU-drop-kernel.-settings-from-sysctl-defaults-shipped.patch:-net.ipv4.conf.*.rp_filter = 2
systemd_245.4-4ubuntu3.2/debian/patches/debian/UBUNTU-drop-kernel.-settings-from-sysctl-defaults-shipped.patch:--net.ipv4.conf.all.rp_filter
@margamanterola margamanterola added channel/alpha Issue concerns the Alpha channel. channel/beta Issue concerns the Beta channel. kind/bug Something isn't working labels Sep 10, 2020
t-lo added a commit to kinvolk-archives/systemd-legacy that referenced this issue Sep 10, 2020
The rules were added in systemd-245 and break cluster
networking, e.g. cilium. Please see
flatcar/Flatcar#181
for details.

Signed-off-by: Thilo Fromm <thilo@kinvolk.io>
t-lo added a commit to kinvolk-archives/systemd-legacy that referenced this issue Sep 10, 2020
The rules were added in systemd-245 and break cluster
networking, e.g. cilium. Please see
flatcar/Flatcar#181
for details.

Signed-off-by: Thilo Fromm <thilo@kinvolk.io>
@sfudeus
Copy link
Contributor

sfudeus commented Sep 11, 2020

On my setup (bare metal, calico), with stable flatcar, rp_filter is all set to 1, which is more strict than 2, which happens on beta. Connectivity on stable is fine though. So it seems for me, at least for calico, rp_filter is not (alone) the culprit.

Example from stable:

net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.cali0005d567dca.rp_filter = 1
net.ipv4.conf.cali255fb8921b2.rp_filter = 1
net.ipv4.conf.cali2a55941878e.rp_filter = 1
net.ipv4.conf.cali2f32aba01f4.rp_filter = 1
net.ipv4.conf.cali3326a34accc.rp_filter = 1
net.ipv4.conf.cali45bab155a51.rp_filter = 1
net.ipv4.conf.cali4df7e55cd03.rp_filter = 1
net.ipv4.conf.cali5f7cc3b44b6.rp_filter = 1
net.ipv4.conf.cali61819286827.rp_filter = 1
net.ipv4.conf.cali6ea791f950f.rp_filter = 1
net.ipv4.conf.cali6f088619731.rp_filter = 1
net.ipv4.conf.cali78cced7ba65.rp_filter = 1
net.ipv4.conf.cali7f163f16c6d.rp_filter = 1
net.ipv4.conf.cali8af5bd79d8a.rp_filter = 1
net.ipv4.conf.cali96b4a3b4b5f.rp_filter = 1
net.ipv4.conf.cali9909ab3bc49.rp_filter = 1
net.ipv4.conf.cali9eb2eebec9e.rp_filter = 1
net.ipv4.conf.calib39d6436775.rp_filter = 1
net.ipv4.conf.calic85a7c380d3.rp_filter = 1
net.ipv4.conf.calidef0d03b873.rp_filter = 1
net.ipv4.conf.calif369183b044.rp_filter = 1
net.ipv4.conf.califacf8e13aad.rp_filter = 1
net.ipv4.conf.califfef9a2492d.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.docker0.rp_filter = 1
net.ipv4.conf.eno1.rp_filter = 1
net.ipv4.conf.eno2.rp_filter = 1
net.ipv4.conf.eno3.rp_filter = 1
net.ipv4.conf.eno4.rp_filter = 1
net.ipv4.conf.lo.rp_filter = 0
net.ipv4.conf.tunl0.rp_filter = 1
net.ipv4.conf.vethd9c0347.rp_filter = 1

@sfudeus
Copy link
Contributor

sfudeus commented Sep 13, 2020

Today I found out that my connectivity issues have been caused by a different change - which I was able to solve now and which are unrelated to the FlatCar update. 2605.2.0 looks good currently in my VirtualBox setup, will got to bare metal next.

krnowak pushed a commit to kinvolk-archives/systemd-legacy that referenced this issue Sep 28, 2020
The rules were added in systemd-245 and break cluster
networking, e.g. cilium. Please see
flatcar/Flatcar#181
for details.

Signed-off-by: Thilo Fromm <thilo@kinvolk.io>
@pothos
Copy link
Member

pothos commented Oct 13, 2020

Can we close this now or is there any remaining issue?

krnowak pushed a commit to kinvolk-archives/systemd-legacy that referenced this issue Oct 19, 2020
The rules were added in systemd-245 and break cluster
networking, e.g. cilium. Please see
flatcar/Flatcar#181
for details.

Signed-off-by: Thilo Fromm <thilo@kinvolk.io>
@pothos
Copy link
Member

pothos commented Dec 14, 2020

The issue is actually that due to the sorting, the baselayout settings get applied after [0-9]* and override the custom settings starting with [0-9]. We should rename the baselayout setttings to start with 00-.

@pothos
Copy link
Member

pothos commented Dec 14, 2020

PR here with a ordering at 60 that allows us to overwrite systemd defaults: flatcar/baselayout#13

@pothos
Copy link
Member

pothos commented Dec 14, 2020

The systemd changes that we did before were not enough not only because of the ordering but also because they did not set net.ipv4.conf.default.rp_filter which is used for new interfaces.

@pothos
Copy link
Member

pothos commented Dec 16, 2020

This is fixed in the latest Stable release (2605.10.0)

@pothos pothos closed this as completed Dec 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
channel/alpha Issue concerns the Alpha channel. channel/beta Issue concerns the Beta channel. kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants