Bug 1705686: Work around a kernel bug in the assign-macvlan code#22784
Conversation
|
@danwinship Would it work to grab the macvlan's /sys/class/net//iflink when the netlink parent is 0? That should always be the correct iflink. Then we can jump into host namespace and find the parent by index. |
The code in question is in the CNI plugin, not the egress-router script, so it doesn't have access to the pod's /sys/class/net. |
|
/retest |
If the macvlan is created already, then it does have access to that directory: and if you 'ls' /sys/class/net' you only see lo and macvlan0, so it's not using the host's /sys/class/net. |
|
oh... so /sys/class/net reflects the contents of your network namespace, not your mount namespace? I suppose that makes sense actually. But is getting the value from sysfs really worthwhile anyway? We don't know of any other circumstance where |
|
/retest |
|
@danwinship eh, either way. Scraping sysfs would at least get us the correct value for IFLINK instead of assuming. But it's fine the way it is now, at least until we start getting multiple links in pod namespaces. We might have to revisit this when AdditionalNetworks get more popular. |
|
/retest |
1 similar comment
|
/retest |
|
FWIW a patch to fix the upstream kernel issue was posted by Sabrina Dubroca to netdev with the subject "[PATCH net v2] rtnetlink: always put ILFA_LINK for links with a link-netnsid" |
|
/retest |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, dcbw The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
As seen in https://bugzilla.redhat.com/show_bug.cgi?id=1705686, there's a bug in the kernel where if you create a macvlan directly into a different namespace from its parent, and it ends up getting the same ifindex in that namespace as its parent has in its namespace, then if you ask about the macvlan's attributes via netlink later, the kernel won't report the parent link ifindex, so our golang netlink code leaves the field 0-initialized, and since we don't actually check that the field got set (because it's a macvlan, it has to have a parent) we end up using the "0" later and asking for information about a non-existent interface and getting back
EINVAL.There should eventually be a kernel fix, but this works around the problem on our side by assuming that if netlink reports that the macvlan has a
ParentIndexof0, then that means we hit this bug and so theParentIndexis actually the same as theIndex. I'm pretty sure there shouldn't be any other case whereParentIndexwas 0, and even if there was, the effect of this patch would be either (a)Indexis not a valid ifindex in the root netnamespace, so we still get an error (maybeENOENTinstead ofEINVAL), or (b)Indexis valid in the root netnamespace, but it's the wrong interface, in which case we'd end up adding a route to the wrong IP address (and not adding a route to the parent interface IP address), which is not great, but is at least better than failing the pod creation entirely.(Does not need to make it into 4.1.0 but we do want to get it merged and tested soonish so we can backport it to 3.11.)