Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
network: Adds IPVLAN support #5716
8 times, most recently
May 2, 2019
Ok, so I'm going to take the
We can then rebase this branch onto master once merged and just fix the IPv4 and IPv6 addresses list to be comma separated (with optional spaces).
May 9, 2019
To clarify a bit, the liblxc ipvlan code does the needed sysctls on the generated interface itself but will not alter interfaces which are outside of liblxc's control, if those aren't set to the right value, an error is raised.
@tomponline is the error handling for this good enough for LXD or would we want to improve how we surface liblxc's errors on start or alternatively have a check for those sysctls directly in LXD?
@stgraber I am glad that IPVLAN finely is in the code base, but you are two quick to discard community efforts and issues. This makes community contribution to project almost impossible.
@stgraber the error handling could definitely be made clearer, currently if the required sysctls are missing then the container will not start, and you have to run the suggested lxd log retrieval command for the container to see the underlying LXC error. Its there, but takes some effort to get to.
Adding explicit checks for the same sysctls in LXD would certainly help the experience, so I can add a check for those no problem.
@s3rj1k the IPVLAN functionality has been implemented in the underlying LXC codebase so as to provide the functionality to a wider set of users. It was implemented as 3 discrete sets of functionality:
Then in LXD we made use of this new functionality to implement a simplified IPVLAN mode similar to your original pull-request and the existing MACVLAN experience in LXD.
Additionally, the specific sysctls required have been slightly modified from your original PR, as during my testing I found that only these 3 are specifically required when using l3s mode and proxy ARP/NDP:
I found that these sysctls from the original pull-request weren't specifically needed:
Finally, @s3rj1k is correct that we did originally discuss adding gratuitous ARP and NDP adverts when a container boots so as to announce to the network which is slightly different to the proxy ARP/NDP mode in that the packets are sent out into the network irrespective of whether a device is asking "who has xxx?". This is primarily used for devices on the network that have already resolved the layer 2 address of the container and cached it in their IP neighbour cache. If the container is then migrated to a different host with a different MAC address then the devices on the network could continue to send packets to the old host as they had cached the MAC address (until such time as the cache expires or the OS notices there is no response and re-issues a "who has" request).
This could be added as an extension to the l2proxy mode in LXC (although the code I have today for it is written in Go rather than C so would need to be re-worked). This would then benefit both the IPVLAN mode (without code changes to LXD) and the future routed veth mode.
@tomponline Enabling IPForwarding is a good idea because of inconsistent kernel behaviour, from what I observed older kernels need IP Forwarding enabled, newer ones enable this behaviour then IPVLAN L3s mode is enabled with proxy ARP/NDP records.
also per the kernel docs
Forwarding must be mentioned in documentation. But in my opinion this is needed in code to have defined behaviour across kernel versions.
rp_filter is needed for IPv4 then you have multiple Vlans on parent interface and inside CT.
As a minimum also must be in documentation.
I managed to get working GARP in my test environment, so this one is simple.
The NDP (IPv6) part is not so simple (does not work in my setup).
@s3rj1k RE the ipv4 forwarding sysctls, I made a mistake and missed out the
I am going to add better error output to LXD so that it checks for those 3 sysctls directly and lets the user know if they are incorrect.
I've updated my original post here:
Regarding your note about older kernels requiring additional global forwarding sysctls enabled, I thought I would have a go at re-creating this using a CentOS 7 box which uses an older kernel 3.10.0-957.5.1.el7.x86_64 (it doesnt have IPVLAN support anyway, so its moot, but useful for testing the behaviour of proxy ARP/NDP settings).
For this experiment, the aim was to see if I could get a test host to response to ARP and NDP requests for an IP that it didn't have on its LAN interface, but did have a route to via the lo interface. As I am not testing "routed mode" proper (as that is not required for IPVLAN l3s mode), I just needed to check what the ping packets were arriving at the LAN interface of the test node to consider the test a success (as opposed to requiring a response to be generated) - as this will indicate that APR and NDP resolution has succeeded.
Here are the test steps I ran:
Disable proxy_arp and IPv4 forwarding globally so we have a baseline config:
Enable forwarding on the LAN interface of the node:
Add manual IP proxy entry on LAN interface (this is where IPv4 differs to IPv6 in the kernel because adding a manual proxy entry activates proxy ARP on that interface without needing to set it in the sysctls):
Add a route to the IP via the lo interface (this is to allow proxy ARP to work, using any non-LAN interface works the same):
Then run a tcpdump session:
On another device connected to the same LAN, clear the local neighbour cache and ping 192.168.1.200
We should expect to see this on TCPDUMP:
ARP resolution occurring, ICMP requests arriving at LAN interface (no ICMP reply expected though).
Similarly for IPv6:
On another device connected to the same LAN, clear the local neighbour cache and ping 2a02:xxx:76f4:1::200
We would expect to see this:
So as IPVLAN mode doesn't actually use the host as a router (it just needs to get the packets to arrive at the LAN interface and the IPVLAN device takes it from there), these settings seem to be the only ones required, even on older kernels without IPVLAN support.
I'd be interested to know about specific kernel versions that do not behave like this.
RE rp_filter, I'd be hesitant to encourage people to disable that feature in the docs, as it is intended to filter out spoofed packets, and, as I understand it, only should be disabled in situations where you have the LXD node multi-honed (as you say on multiple VLANs perhaps). Presumably this is true whether you are using ipvlan or some other sort of container networking (such as bridging) where packets arrive at a different interface than the replies would be delivered out of. Have I misunderstood? Is there something specific to ipvlan and rp_filter?
@tomponline You are correct about rp_filter usage, I am saying this should be a documented case then multiple VLANs are needed.
For IPv4 IP Forwarding per interface should be sufficient.
For IPv6 this is not the case, you need to enable
To check this, use IPVLAN L2/L3 (without L3S) mode with IPv6
Actually L3 mode is in fact a router mode, see https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/Documentation/networking/ipvlan.txt#L56
P.S. Do Ping6 checks against google IPv6 DNS for example. Also ND cache must be cleared on GW.
@s3rj1k Yes the docs imply the host is performing routing, and in a way it is, but its more like an L3-switch, because the host does not appear in MTR traces as a traditional router hop, also even in l3s mode, the packets are filtered by the INPUT and OUTPUT chains in iptables rather than the FORWARD chain like a normally routed packet would.
I found this out to my disappointment when I tried to use ipvlan on my home router which has a direct ppp connection to the Internet independent of the LAN the ipvlan device was connected to. Packets always went out of the LAN interface rather than use the host's routing table.
With that in mind, I've re-tested proxy NDP with
The problem comes when you bring IPv4 into the mix, as proxy ARP does not seem to work in l2 and l3 modes (and only l3s), even with
This is why I settled on forcing l3s mode for LXD's IPVLAN.
@tomponline actualy L2 mode is also usable, does not need ProxyARP or ProxyNDP, only IP Forwarding, look at output of
I would consider adding l2 mode as an option. Works similar to MacVtap but saves you MAC table entries on a dumb hardware switch.
IPVLAN is designed to send packets ONLY from parent interface, so no wonder you had issues with home router, to overcome this you need to set up policy routing.
Plain L3 mode is useless in my opinion.
And I am actually a bit disappointed of @stgraber behaviour, did not even mentioned at all any work that was done by me, convincing him to actually do something about IPVLAN, spending lots of personal time rebasing, testing, rewriting initial PR.
May this be a warning to other members of community to think twice before contributing.
@tomponline thanks for your work.
@s3rj1k OK thanks will give l2 another go, as you say l3 seems to be fairly redundant given l3s mode exists.
Now that the l3s mode that your original PR instigated has been merged, which put in place a lot of the underlying 'plumbing' into LXC (including l2 mode), adding an l2 mode, and required gateway params shouldn't be too hard. It may be that we use the presence of the "gateway" setting to switch into l2 mode. Although that has security implications (in that the host's firewall is bypassed on the inbound). I'll have a think and create a card to track the additional features we discussed here.