-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cilium with IPSec tunneling fails to start on 3033.2.2 #626
Comments
Thanks for the report, maybe our cilium tests need some extensions to catch these cases in the future. |
In the mean time make sure you disable automatic updates: https://www.flatcar.org/docs/latest/setup/releases/update-strategies/#disable-automatic-updates |
Looked at the kernel config under |
here's a minimal repro: https://gist.github.com/tormath1/bf3af973de9a4232698cc42199496496 from cilium failing part. with strace, we can see that we're hitting the following change: torvalds/linux@8dce439 Investigating on the netlink side... |
@tormath1 if its failing on the changelink then its a kernel bug (if_id is always 0). |
@jepio based on the linked commit, |
Missed the lower half of the gist on my phone xD |
It's really something to discuss how this (LTS) bug fix kernel update ended up in this situation given the leading principle that userspace ABI should not be broken. Maybe this change could be reverted in the next kernel bug fix release? Edit: also posted to lkml: https://marc.info/?l=linux-kernel&m=164483790014524&w=2 |
in this patch: https://patchwork.kernel.org/project/netdevbpf/patch/20220106093606.3046771-6-steffen.klassert@secunet.com/ we see that `if_id` must be different from 0 for policy and state construction. With a 0 value, it makes the creation of the dummy interface fail with the following error: ``` level=fatal msg="IPSec with tunneling requires support for xfrm state output masks (Linux 4.19 or later)." error="invalid argument" subsys=daemon ``` Related-To: flatcar/Flatcar#626 Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
in this patch: https://patchwork.kernel.org/project/netdevbpf/patch/20220106093606.3046771-6-steffen.klassert@secunet.com/ we see that `if_id` must be different from 0 for policy and state construction. With a 0 value, it makes the creation of the dummy interface fail with the following error: ``` level=fatal msg="IPSec with tunneling requires support for xfrm state output masks (Linux 4.19 or later)." error="invalid argument" subsys=daemon ``` Related-To: flatcar/Flatcar#626 Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Hi @shosti, Cilium's PR has been merged - if you're feeling adventurous, you can build a cilium image and try it to deploy by updating this value: https://github.com/cilium/cilium/blob/v1.11.0/install/kubernetes/cilium/values.yaml#L84 Otherwise, the fix should be part of the next cilium's release (1.11.3). :) |
The proposal to revert upstream got rejected but we decided to do it for Flatcar anyway: flatcar-archive/coreos-overlay#1682 - this is not part of a release yet. |
Hi @shosti , so it seems that cilium has already fixed the issue starting from latest release 1.11.2. Do you think you could give a try ? |
Maybe as additional info that there is a regression: cilium/cilium#19019
Edit: With 1.11.2 this is not a problem |
I think we can close this now as a Cilium release was done and we have anyway reverted the change for next week's release |
Description
Since updating to 3033.2.2, all of my Cilium pods are in a
CrashLoopBackoff
state with the following error message:After rolling back to 3033.2.1, cilium starts up again.
Impact
Cilium fails -> all other pods can't get network -> general mayhem 😈
Environment and steps to reproduce
cilium/cilium
Helm chart at 1.11.0 and the following Helm values: https://gist.github.com/shosti/f8c93283a200af0f8dd9de0f73f794bdkubectl logs
for a cilium podExpected behavior
Cilium pods start up correctly.
The text was updated successfully, but these errors were encountered: