-
Couldn't load subscription status.
- Fork 4
[PWCI] "net/tap: add network namespace support" #248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add functions to get/set link flags, MAC address, and MTU using netlink RTM_GETLINK/RTM_SETLINK messages instead of ioctl. These will be used in the next commits for a more robust solution that does not rely on interface names. Signed-off-by: Robin Jarry <rjarry@redhat.com> Signed-off-by: 0-day Robot <robot@bytheb.org>
Prepare to replace ioctl with netlink by renaming enum ioctl_mode to ctrl_mode and wrapping tap_ioctl with tap_ctrl. Signed-off-by: Robin Jarry <rjarry@redhat.com> Signed-off-by: 0-day Robot <robot@bytheb.org>
Make netlink socket available unconditionally, not just for rte_flow. Use netlink for get/set operations on link flags, MAC, and MTU when available. Fall back to ioctl if netlink socket creation fails. Signed-off-by: Robin Jarry <rjarry@redhat.com> Signed-off-by: 0-day Robot <robot@bytheb.org>
When an interface is moved to another network namespace, the kernel sends RTM_DELLINK. Detect this case by using TUNGETDEVNETNS ioctl on the keep-alive fd. If successful, the interface still exists but in a different namespace. To handle this, temporarily switch to the new namespace using setns(), query the new ifindex, recreate netlink and LSC interrupt sockets in that namespace, then switch back. Replace the old netlink socket with the new one so subsequent operations work in the target namespace. This allows the driver to track interfaces across namespace changes without losing control. Signed-off-by: Robin Jarry <rjarry@redhat.com> Signed-off-by: 0-day Robot <robot@bytheb.org>
Reviewer's GuideThis PR refactors the TAP PMD to abstract interface control via a new tap_ctrl layer that uses netlink when available (falling back to ioctl), implements support for detecting and handling network namespace moves, and adds a suite of netlink helper functions for flag, MTU, and MAC management. Sequence diagram for TAP PMD interface control abstraction (netlink fallback to ioctl)sequenceDiagram
participant App
participant TAP_PMD
participant Netlink
participant Ioctl
App->>TAP_PMD: Request interface control (e.g., set MAC, MTU)
TAP_PMD->>Netlink: Try netlink operation (tap_ctrl)
alt Netlink available
Netlink-->>TAP_PMD: Success/failure
else Netlink unavailable
TAP_PMD->>Ioctl: Fallback to ioctl operation
Ioctl-->>TAP_PMD: Success/failure
end
TAP_PMD-->>App: Return result
Sequence diagram for network namespace move detection and handlingsequenceDiagram
participant Kernel
participant TAP_PMD
participant Netlink
participant Namespace
Kernel->>Netlink: RTM_DELLINK event (interface moved)
Netlink->>TAP_PMD: Notify of RTM_DELLINK
TAP_PMD->>Namespace: Query new netns via TUNGETDEVNETNS
TAP_PMD->>Namespace: setns() to new namespace
TAP_PMD->>Netlink: Recreate netlink socket in new namespace
TAP_PMD->>Namespace: setns() back to original namespace
TAP_PMD-->>Netlink: Update ifindex and sockets
Class diagram for TAP PMD interface control abstraction and netlink helpersclassDiagram
class pmd_internals {
int remote_if_index
int if_index
int ioctl_sock
int nlsk_fd
...
}
class tap_ctrl {
+tap_ctrl(pmd, request, ifr, set, mode)
}
class tap_nl_ctrl {
+tap_nl_ctrl(pmd, request, ifr, set, mode)
}
class tap_ioctl {
+tap_ioctl(pmd, request, ifr, set, mode)
}
class NetlinkHelpers {
+tap_nl_get_link_flags()
+tap_nl_set_link_flags()
+tap_nl_set_link_mtu()
+tap_nl_set_link_mac()
+tap_nl_get_link_mac()
}
pmd_internals --> tap_ctrl
tap_ctrl --> tap_nl_ctrl
tap_ctrl --> tap_ioctl
tap_nl_ctrl --> NetlinkHelpers
Class diagram for network namespace change handlingclassDiagram
class pmd_internals {
int if_index
int nlsk_fd
int ka_fd
char name[]
...
}
class tap_netns_change {
+tap_netns_change(dev)
}
pmd_internals --> tap_netns_change
tap_netns_change : - open("/proc/self/ns/net")
tap_netns_change : - setns(netns_fd, CLONE_NEWNET)
tap_netns_change : - update if_index
tap_netns_change : - recreate netlink socket
tap_netns_change : - setns(orig_netns_fd, CLONE_NEWNET)
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
WalkthroughThis pull request introduces a netlink-based control path abstraction layer for the TAP PMD driver, adding netlink socket management, namespace awareness, and new link management APIs while maintaining backward compatibility with existing ioctl-based operations through a fallback mechanism. Changes
Sequence DiagramsequenceDiagram
participant caller as PMD Control Path
participant ctrl as tap_ctrl() wrapper
participant nlctrl as tap_nl_ctrl() netlink
participant ioctrl as tap_ioctl() ioctl
participant kernel as Kernel
caller->>ctrl: tap_ctrl(request, mode)
alt nlsk_fd available
ctrl->>nlctrl: tap_nl_ctrl(request, mode)
nlctrl->>kernel: netlink: RTM_SETLINK / RTM_GETLINK
kernel-->>nlctrl: response + attributes
nlctrl-->>ctrl: result (0 or -1)
else nlsk_fd unavailable (fallback)
ctrl->>ioctrl: tap_ioctl(request, mode)
ioctrl->>kernel: ioctl(SIOCSIFFLAGS / SIOCGIFMTU / ...)
kernel-->>ioctrl: result
ioctrl-->>ctrl: result (0 or -1)
end
ctrl-->>caller: return status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
drivers/net/tap/tap_netlink.h (1)
13-13: Header API additions are fine; consider normalizing ifindex typesPrototypes look correct. For consistency with kernel structs, consider using int for ifindex to match struct ifinfomsg.ifi_index across header/impl in a follow-up.
Also applies to: 41-47
drivers/net/tap/tap_netlink.c (1)
416-471: Netlink link helpers LGTMParsing and RTM_[GET|SET]LINK usage are correct; errno propagation is sane. Minor nit: consider sizing the attribute buffer via RTA_SPACE(...) instead of a fixed 64 bytes for future extensibility.
Also applies to: 486-521, 538-563, 577-608, 623-658, 673-704
drivers/net/tap/rte_eth_tap.c (1)
1905-1952: Make tap_lsc_intr_handle_set idempotent when set==1If an FD is already present, setting a new one should first clean the old one to avoid leaks. You can optionally add:
@@ static int tap_lsc_intr_handle_set(struct rte_eth_dev *dev, int set) - if (set) { + if (set) { + if (rte_intr_fd_get(pmd->intr_handle) >= 0) + (void)tap_lsc_intr_handle_set(dev, 0); /* * Subscribe to RTMGRP_LINK ... */This complements the netns change fix and hardens repeated enable paths.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
drivers/net/tap/rte_eth_tap.c(24 hunks)drivers/net/tap/rte_eth_tap.h(1 hunks)drivers/net/tap/tap_netlink.c(2 hunks)drivers/net/tap/tap_netlink.h(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
drivers/net/tap/tap_netlink.h (1)
drivers/net/tap/tap_netlink.c (5)
tap_nl_get_link_flags(486-521)tap_nl_set_link_flags(538-562)tap_nl_set_link_mtu(577-608)tap_nl_set_link_mac(673-704)tap_nl_get_link_mac(623-658)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (2)
drivers/net/tap/rte_eth_tap.h (1)
80-80: Unconditional nlsk_fd field looks goodInitialized to -1 and properly closed in tap_dev_close; no ABI concerns for internal struct. Please confirm no stale references remain under HAVE_TCA_FLOWER blocks.
drivers/net/tap/rte_eth_tap.c (1)
931-933: Control-path migration to tap_ctrl is consistentCall sites correctly switch from ioctl to tap_ctrl with proper mode selection; fallbacks are preserved. After error-propagation fix in tap_nl_ctrl, behavior parity with ioctl is maintained.
Also applies to: 941-943, 1337-1345, 1359-1370, 1388-1399, 1418-1429, 1448-1459, 1475-1516, 1763-1764, 2201-2253, 2281-2306
| #include <net/if.h> | ||
| #include <linux/if_tun.h> | ||
| #include <linux/if_ether.h> | ||
| #include <linux/sched.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include the correct header for setns()
setns() is declared in <sched.h>, not <linux/sched.h>. Add <sched.h> to avoid implicit declaration (often a build error with -Werror). Keeping <linux/sched.h> only for CLONE_NEWNET is unnecessary; <sched.h> provides the prototype and CLONE_NEWNET as well on glibc systems.
Apply:
-#include <linux/sched.h>
+#include <sched.h>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #include <linux/sched.h> | |
| #include <sched.h> |
🤖 Prompt for AI Agents
In drivers/net/tap/rte_eth_tap.c around line 40, the code currently includes
<linux/sched.h> but setns() is declared in <sched.h>; add an #include <sched.h>
and remove the #include <linux/sched.h> (or at minimum replace it) so the
setns() prototype is available and CLONE_NEWNET is still defined on glibc
systems, avoiding implicit-declaration build errors.
| static int | ||
| tap_nl_ctrl(struct pmd_internals *pmd, unsigned long request, | ||
| struct ifreq *ifr, int set, enum ctrl_mode mode) | ||
| { | ||
| bool remote = pmd->remote_if_index && (mode == REMOTE_ONLY || mode == LOCAL_AND_REMOTE); | ||
| struct rte_ether_addr *mac; | ||
| int ret = 0; | ||
|
|
||
| switch (request) { | ||
| case SIOCSIFFLAGS: | ||
| if (mode == LOCAL_ONLY || mode == LOCAL_AND_REMOTE) { | ||
| ret = tap_nl_set_link_flags(pmd->nlsk_fd, pmd->if_index, | ||
| ifr->ifr_flags, set); | ||
| if (ret < 0) | ||
| return ret; | ||
| } | ||
| if (remote) | ||
| ret = tap_nl_set_link_flags(pmd->nlsk_fd, pmd->remote_if_index, | ||
| ifr->ifr_flags, set); | ||
| break; | ||
|
|
||
| case SIOCGIFFLAGS: | ||
| if (mode == REMOTE_ONLY && pmd->remote_if_index) { | ||
| unsigned int flags = 0; | ||
| ret = tap_nl_get_link_flags(pmd->nlsk_fd, pmd->remote_if_index, &flags); | ||
| if (ret == 0) | ||
| ifr->ifr_flags = flags; | ||
| } else { | ||
| unsigned int flags = 0; | ||
| ret = tap_nl_get_link_flags(pmd->nlsk_fd, pmd->if_index, &flags); | ||
| if (ret == 0) | ||
| ifr->ifr_flags = flags; | ||
| } | ||
| break; | ||
|
|
||
| case SIOCGIFHWADDR: | ||
| mac = (struct rte_ether_addr *)ifr->ifr_hwaddr.sa_data; | ||
| if (mode == REMOTE_ONLY && pmd->remote_if_index) { | ||
| ret = tap_nl_get_link_mac(pmd->nlsk_fd, pmd->remote_if_index, mac); | ||
| if (ret == 0) | ||
| ifr->ifr_hwaddr.sa_family = AF_LOCAL; | ||
| } else { | ||
| ret = tap_nl_get_link_mac(pmd->nlsk_fd, pmd->if_index, mac); | ||
| if (ret == 0) | ||
| ifr->ifr_hwaddr.sa_family = AF_LOCAL; | ||
| } | ||
| break; | ||
|
|
||
| case SIOCSIFHWADDR: | ||
| mac = (struct rte_ether_addr *)ifr->ifr_hwaddr.sa_data; | ||
| if (mode == LOCAL_ONLY || mode == LOCAL_AND_REMOTE) { | ||
| ret = tap_nl_set_link_mac(pmd->nlsk_fd, pmd->if_index, mac); | ||
| if (ret < 0) | ||
| return ret; | ||
| } | ||
| if (remote) | ||
| ret = tap_nl_set_link_mac(pmd->nlsk_fd, pmd->remote_if_index, mac); | ||
| break; | ||
|
|
||
| case SIOCSIFMTU: | ||
| if (mode == LOCAL_ONLY || mode == LOCAL_AND_REMOTE) { | ||
| ret = tap_nl_set_link_mtu(pmd->nlsk_fd, pmd->if_index, ifr->ifr_mtu); | ||
| if (ret < 0) | ||
| return ret; | ||
| } | ||
| if (remote) | ||
| ret = tap_nl_set_link_mtu(pmd->nlsk_fd, pmd->remote_if_index, ifr->ifr_mtu); | ||
| break; | ||
|
|
||
| default: | ||
| TAP_LOG(WARNING, "%s: unsupported netlink request", pmd->name); | ||
| return -EINVAL; | ||
| } | ||
|
|
||
| if (ret < 0) { | ||
| TAP_LOG(DEBUG, "%s: netlink %s failed: %s(%d)", pmd->name, | ||
| tap_ctrl_req2str(request), strerror(errno), errno); | ||
| return -errno; | ||
| } | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalize error returns to negative errno in tap_nl_ctrl
Early returns propagate -1 instead of -errno, unlike tap_ioctl. This breaks callers that rely on negative errno codes.
Apply:
@@
- if (ret < 0)
- return ret;
+ if (ret < 0)
+ return -errno;
@@
- if (ret < 0)
- return ret;
+ if (ret < 0)
+ return -errno;
@@
- if (ret < 0)
- return ret;
+ if (ret < 0)
+ return -errno;Optionally, capture errno into a local before returning to avoid clobbering.
🤖 Prompt for AI Agents
In drivers/net/tap/rte_eth_tap.c around lines 778 to 859, the function
tap_nl_ctrl currently returns ret directly on early error (e.g. "if (ret < 0)
return ret;"), which propagates -1 instead of a canonical negative errno; change
those early returns to capture errno into a local int (e.g. int err = errno;)
and return -err; update every place inside the switch that does "if (ret < 0)
return ret;" (SIOCSIFFLAGS, SIOCSIFHWADDR, SIOCSIFMTU cases) to use this
pattern, and if any code calls other functions that may clobber errno before
returning, capture errno immediately after the failing call before invoking any
other function.
| tap_netns_change(struct rte_eth_dev *dev) | ||
| { | ||
| struct pmd_internals *pmd = dev->data->dev_private; | ||
| int netns_fd, orig_netns_fd, new_nlsk_fd; | ||
|
|
||
| netns_fd = ioctl(pmd->ka_fd, TUNGETDEVNETNS); | ||
| if (netns_fd < 0) { | ||
| TAP_LOG(INFO, "%s: interface deleted", pmd->name); | ||
| return 0; | ||
| } | ||
|
|
||
| /* Interface was moved to another namespace */ | ||
| pmd->if_index = 0; | ||
|
|
||
| /* Save current namespace */ | ||
| orig_netns_fd = open("/proc/self/ns/net", O_RDONLY); | ||
| if (orig_netns_fd < 0) { | ||
| TAP_LOG(ERR, "%s: failed to open original netns: %s", | ||
| pmd->name, strerror(errno)); | ||
| close(netns_fd); | ||
| return -1; | ||
| } | ||
|
|
||
| /* Switch to new namespace */ | ||
| if (setns(netns_fd, CLONE_NEWNET) < 0) { | ||
| TAP_LOG(ERR, "%s: failed to enter new netns: %s", | ||
| pmd->name, strerror(errno)); | ||
| close(netns_fd); | ||
| close(orig_netns_fd); | ||
| return -1; | ||
| } | ||
|
|
||
| /* | ||
| * Update ifindex by querying interface name. | ||
| * The interface now has a new ifindex in the new namespace. | ||
| */ | ||
| pmd->if_index = if_nametoindex(pmd->name); | ||
|
|
||
| /* Recreate netlink socket in new namespace */ | ||
| new_nlsk_fd = tap_nl_init(0); | ||
|
|
||
| /* Recreate LSC interrupt netlink socket in new namespace */ | ||
| rte_intr_callback_unregister_pending(pmd->intr_handle, tap_dev_intr_handler, dev, NULL); | ||
| if (tap_lsc_intr_handle_set(dev, 1) < 0) | ||
| TAP_LOG(WARNING, "%s: failed to recreate LSC interrupt socket", | ||
| pmd->name); | ||
|
|
||
| /* Switch back to original namespace */ | ||
| if (setns(orig_netns_fd, CLONE_NEWNET) < 0) | ||
| TAP_LOG(ERR, "%s: failed to return to original netns: %s", | ||
| pmd->name, strerror(errno)); | ||
|
|
||
| close(orig_netns_fd); | ||
| close(netns_fd); | ||
|
|
||
| if (pmd->if_index == 0) { | ||
| TAP_LOG(WARNING, "%s: interface moved to another namespace, " | ||
| "failed to get new ifindex", | ||
| pmd->name); | ||
| if (new_nlsk_fd >= 0) | ||
| close(new_nlsk_fd); | ||
| return -1; | ||
| } | ||
|
|
||
| if (new_nlsk_fd < 0) { | ||
| TAP_LOG(WARNING, "%s: failed to recreate netlink socket in new namespace", | ||
| pmd->name); | ||
| return -1; | ||
| } | ||
|
|
||
| /* Close old netlink socket and replace with new one */ | ||
| if (pmd->nlsk_fd >= 0) | ||
| tap_nl_final(pmd->nlsk_fd); | ||
| pmd->nlsk_fd = new_nlsk_fd; | ||
|
|
||
| TAP_LOG(INFO, "%s: interface moved to another namespace, new ifindex: %u", | ||
| pmd->name, pmd->if_index); | ||
|
|
||
| return 0; | ||
| } | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespace switch: close and re-arm LSC netlink socket to avoid FD leaks; replace non-portable API
- The sequence re-registers LSC in the new netns without closing the previous intr FD, leaking the old netlink socket and risking stale events.
- rte_intr_callback_unregister_pending(...) is non-standard across DPDK versions; prefer the proven unregister loop used elsewhere in this file.
Fix by explicitly disabling LSC before enabling it again:
@@ static int tap_netns_change(struct rte_eth_dev *dev)
- /* Recreate LSC interrupt netlink socket in new namespace */
- rte_intr_callback_unregister_pending(pmd->intr_handle, tap_dev_intr_handler, dev, NULL);
- if (tap_lsc_intr_handle_set(dev, 1) < 0)
+ /* Recreate LSC interrupt netlink socket in new namespace */
+ (void)tap_lsc_intr_handle_set(dev, 0); /* closes old FD and unregisters */
+ if (tap_lsc_intr_handle_set(dev, 1) < 0)
TAP_LOG(WARNING, "%s: failed to recreate LSC interrupt socket",
pmd->name);This both unregisters the callback and closes the prior FD (clean path), preventing leaks.
🤖 Prompt for AI Agents
drivers/net/tap/rte_eth_tap.c around lines 1783-1863: the LSC netlink socket and
interrupt callback are re-created in the new netns without properly
disabling/unregistering the old callback or closing its FD, and the code uses
the non-portable rte_intr_callback_unregister_pending API; fix by first
disabling LSC on the device in the original namespace (unregister the callback
using the same unregister loop/pattern used elsewhere in this file to remove all
callbacks), close the old LSC netlink FD, then create the new netlink socket in
the new namespace and register the LSC callback; ensure error paths close
new_nlsk_fd when failing and that pmd->intr_handle state reflects the change to
avoid FD leaks or duplicate callbacks.
Auto-submission for "http://patchwork.dpdk.org/project/dpdk/list/?series=36474"
Summary by Sourcery
Add network namespace support and netlink-based control to the tap PMD, replacing direct ioctl calls and enabling dynamic interface movement handling.
New Features:
Enhancements:
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes