Skip to content

Bug 1705686: Work around a kernel bug in the assign-macvlan code#22784

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
danwinship:macvlan-ifindex-workaround
May 16, 2019
Merged

Bug 1705686: Work around a kernel bug in the assign-macvlan code#22784
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
danwinship:macvlan-ifindex-workaround

Conversation

@danwinship
Copy link
Copy Markdown
Contributor

As seen in https://bugzilla.redhat.com/show_bug.cgi?id=1705686, there's a bug in the kernel where if you create a macvlan directly into a different namespace from its parent, and it ends up getting the same ifindex in that namespace as its parent has in its namespace, then if you ask about the macvlan's attributes via netlink later, the kernel won't report the parent link ifindex, so our golang netlink code leaves the field 0-initialized, and since we don't actually check that the field got set (because it's a macvlan, it has to have a parent) we end up using the "0" later and asking for information about a non-existent interface and getting back EINVAL.

There should eventually be a kernel fix, but this works around the problem on our side by assuming that if netlink reports that the macvlan has a ParentIndex of 0, then that means we hit this bug and so the ParentIndex is actually the same as the Index. I'm pretty sure there shouldn't be any other case where ParentIndex was 0, and even if there was, the effect of this patch would be either (a) Index is not a valid ifindex in the root netnamespace, so we still get an error (maybe ENOENT instead of EINVAL), or (b) Index is valid in the root netnamespace, but it's the wrong interface, in which case we'd end up adding a route to the wrong IP address (and not adding a route to the parent interface IP address), which is not great, but is at least better than failing the pod creation entirely.

(Does not need to make it into 4.1.0 but we do want to get it merged and tested soonish so we can backport it to 3.11.)

@danwinship danwinship added kind/bug Categorizes issue or PR as related to a bug. component/networking sig/networking labels May 6, 2019
@danwinship danwinship requested review from dcbw and squeed May 6, 2019 15:36
@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label May 6, 2019
@dcbw
Copy link
Copy Markdown
Contributor

dcbw commented May 6, 2019

@danwinship Would it work to grab the macvlan's /sys/class/net//iflink when the netlink parent is 0? That should always be the correct iflink. Then we can jump into host namespace and find the parent by index.

@danwinship
Copy link
Copy Markdown
Contributor Author

Would it work to grab the macvlan's /sys/class/net//iflink when the netlink parent is 0? That should always be the correct iflink.

The code in question is in the CNI plugin, not the egress-router script, so it doesn't have access to the pod's /sys/class/net.

@danwinship
Copy link
Copy Markdown
Contributor Author

/retest

@dcbw
Copy link
Copy Markdown
Contributor

dcbw commented May 7, 2019

The code in question is in the CNI plugin, not the egress-router script, so it doesn't have access to the pod's /sys/class/net.

If the macvlan is created already, then it does have access to that directory:

ip netns add blue
ip link add macvlan0 link enp0s31f6 netns blue type macvlan mode vepa
ip netns exec blue cat /sys/class/net/macvlan0/iflink

and if you 'ls' /sys/class/net' you only see lo and macvlan0, so it's not using the host's /sys/class/net.

@danwinship
Copy link
Copy Markdown
Contributor Author

oh... so /sys/class/net reflects the contents of your network namespace, not your mount namespace? I suppose that makes sense actually.

But is getting the value from sysfs really worthwhile anyway? We don't know of any other circumstance where IFLA_LINK will be wrong, and if there was some other situation where a kernel bug caused IFLA_LINK to be wrong, then we don't have any strong reason to believe that sysfs would not also be wrong in that hypothetical buggy case. (And it's a bunch more code; ioutil.ReadFile() and then strconv.Atoi(), with error checking for both.)

@danwinship
Copy link
Copy Markdown
Contributor Author

/retest

@dcbw
Copy link
Copy Markdown
Contributor

dcbw commented May 7, 2019

@danwinship eh, either way. Scraping sysfs would at least get us the correct value for IFLINK instead of assuming. But it's fine the way it is now, at least until we start getting multiple links in pod namespaces. We might have to revisit this when AdditionalNetworks get more popular.

@dcbw
Copy link
Copy Markdown
Contributor

dcbw commented May 7, 2019

/retest

1 similar comment
@dcbw
Copy link
Copy Markdown
Contributor

dcbw commented May 13, 2019

/retest

@dcbw
Copy link
Copy Markdown
Contributor

dcbw commented May 13, 2019

FWIW a patch to fix the upstream kernel issue was posted by Sabrina Dubroca to netdev with the subject "[PATCH net v2] rtnetlink: always put ILFA_LINK for links with a link-netnsid"

@danwinship
Copy link
Copy Markdown
Contributor Author

/retest

@dcbw
Copy link
Copy Markdown
Contributor

dcbw commented May 16, 2019

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 16, 2019
@openshift-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, dcbw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 16, 2019
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@danwinship danwinship changed the title Work around a kernel bug in the assign-macvlan code Bug 1705686: Work around a kernel bug in the assign-macvlan code May 16, 2019
@openshift-merge-robot openshift-merge-robot merged commit cf89496 into openshift:master May 16, 2019
@danwinship danwinship deleted the macvlan-ifindex-workaround branch May 29, 2019 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. component/networking kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. sig/networking size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants