-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cilium fix bpffs check #8599
Cilium fix bpffs check #8599
Conversation
Hi @olemarkus. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@olemarkus I see the original PR was backported all the way to 1.15. Should this be considered a blocker for 1.16.0 stable? |
I don't understand why, but the 1.16 and 1.15 branch is not affected. The /sys/fs/bpf path doesn't exist at the time of the check on those branches even if I use the same ami. Another change added to kops after 1.16 branch was cut must be causing this. |
Just a quick update, the Cilium periodic e2e is working again: |
upup/models/cloudup/resources/addons/networking.cilium.io/k8s-1.12.yaml.template
Outdated
Show resolved
Hide resolved
@rifelpet unfortunately, this does indeed break cilium back to 1.15.2. It should probably block 1.16. |
bdf4a39
to
60a8f34
Compare
@olemarkus the sorting in |
I am using go1.13.4 and just running |
60a8f34
to
5df745d
Compare
5df745d
to
d728ac8
Compare
For some reason |
Thanks for the update @olemarkus. I was just about to start checking why it worked differently on my end. |
Me and @rifelpet did some research yesterday on this topic and will add some questions. |
if os.IsNotExist(err) { | ||
alreadyMounted = false | ||
} else { | ||
return fmt.Errorf("error checking for /sys/fs/bpf: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still valid. If there is an unexpected error when reading from /sys/fs/bp
, the systemd mount would not help, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably still valid. Most likely this would be due to the path being absent and if the unit tries to mount it will just fail. Result will be quite similar and I suppose equally easy to detect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
nodeup/pkg/model/network.go
Outdated
} else { | ||
alreadyMounted = true | ||
alreadyMounted = int32(magic) == int32(fsdata.Type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the fs type is not BPF? Would it be better to return with an error also?
The Cilium code describes the logic for mounting the BPF fs and there is also a mention of a fallback location /run/cilium/bpffs
:
https://github.com/cilium/cilium/blob/cbd63be46569751ac45060e54f6787f1286b75cd/pkg/bpf/bpffs_linux.go#L213-L224
// checkOrMountDefaultLocations tries to check or mount the BPF filesystem in
// standard locations, which are:
// - /sys/fs/bpf
// - /run/cilium/bpffs
// There is a procedure of determining which directory is going to be used:
// 1. Checking whether BPFFS filesystem is mounted in /sys/fs/bpf.
// 2. If there is no mount, then mount BPFFS in /sys/fs/bpf and finish there.
// 3. If there is a BPFFS mount, finish there.
// 4. If there is a mount, but with the other filesystem, then it means that most
// probably Cilium is running inside container which has mounted /sys/fs/bpf
// from host, but host doesn't have proper BPFFS mount, so that mount is just
// the empty directory. In that case, mount BPFFS under /run/cilium/bpffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If BPF is not mounted on /sys/fs/bpf
, the path will still exist, which is why the old check fails. So most likly the fs type will not be bpf before we install the unit.
/run/cilium/bpffs
is a fallback if the host is not mounting bpf. This will cause network disruptions every time cilium pods restart though since the bpf mounts are lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, more clear now. Thanks for clarifying the fallback part.
Regarding the check, if the path exists, unix.Statfs
would succeed and will return SYSFS_MAGIC=0x62656572
or something similar. If there is an error other than the "no such file or directory", maybe it's not a good thing to continue. Same goes for a magic number that is not SYSFS_MAGIC
or BPF_FS_MAGIC
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you be able to mount bpffs on whatever path that exists?
I will add back the return on error shortly. unix.Statfs("/sys/fs/bpf", &fsdata)
would return error on file not found too, shouldn't it? So the os.Stat check would be redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked and you are right, if the mount path doesn't exist, the unit won't help. Probably for any type of error BPF mounting will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is mostly ok. There may be a rare case where the magic number is different from SYSFS_MAGIC=0x62656572
or BPF_FS_MAGIC= 0xCAFE4A11
and mounting will fail. I don't think it's worth adding it for now.
nodeup/pkg/model/network.go
Outdated
} else { | ||
return fmt.Errorf("error checking for /sys/fs/bpf: %v", err) | ||
} | ||
return fmt.Errorf("error checking for /sys/fs/bpf: %v", err) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we don't need this else statement anymore.
nodeup/pkg/model/network.go
Outdated
@@ -70,15 +70,15 @@ func (b *NetworkBuilder) Build(c *fi.ModelBuilderContext) error { | |||
if networking.Cilium != nil { | |||
// systemd v238 includes the bpffs mount by default; and gives an error "has a bad unit file setting" if we try to mount it again (see mount_point_is_api) | |||
var alreadyMounted bool | |||
_, err := os.Stat("/sys/fs/bpf") | |||
// bpffs magic number | |||
magic := uint32(0xCAFE4A11) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be worth adding this link as a comment: https://github.com/torvalds/linux/blob/v4.8/include/uapi/linux/magic.h#L80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also would be nice to move the var and comment just before it's used.
/lgtm We increased the frequency of the cilium periodic job to run hourly, we'll keep an eye on it to confirm things are still stable before we merge cherry-picks of this 👍 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: olemarkus, rifelpet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I am a little confused ... https://k8s-testgrid.appspot.com/google-aws#kops-aws-cni-cilium seems to suggest this broke two days ago, but my original (insufficient) PR is from a month or two ago. Do we know what caused it to break? (Was it a newer version of cilium that makes the directory?) |
Just bad timing. The e2e tests broke because of the typo fixed here #8591 |
Ah - thanks @olemarkus (and thanks also for fixing!) |
…9-origin-release-1.15 Automated cherry pick of #8599: Properly detect that bpffs has been mounted
…9-origin-release-1.17 Automated cherry pick of #8599: Properly detect that bpffs has been mounted
…9-origin-release-1.16 Automated cherry pick of #8599: Properly detect that bpffs has been mounted
@olemarkus I'm running k8s 1.15.9 with cilium from kops 1.15.2 but I'm not having this issue.Is it clear for you why this version is not affected? |
It is affected too. If you see the cilium agent logs, you should see a warning about bpf not already mounted on the hair |
I see. I'll take a look at that. Which are the effects of this? I did not notice anything weird in the cluster. Everything is operative and healthy in term of networking. |
The check introduced in #7832 is unfortunately too naïve to detect if bpffs has been mounted leading to the unit file never being installed. This is quite disastrous for cilium.
This check in this PR should be more robust.
/kind bug