Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A More Usable IPv6 Network Approach #45296

Open
bradleypeabody opened this issue Apr 7, 2023 · 9 comments
Open

A More Usable IPv6 Network Approach #45296

bradleypeabody opened this issue Apr 7, 2023 · 9 comments
Labels
area/networking/ipv6 Issues related to ipv6 area/networking kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/0-triage

Comments

@bradleypeabody
Copy link

Description

After reading through the various features and modes related to IPv6 support, there is a behavior that I think would make sense and I wanted to propose it in the hopes that maybe it could simplify IPv6 deployments in the wild.

The basic requirements I would expect from such a feature would be:

  • Global addresses - link-local and ULA addresses have their uses but one of the major benefits IPv6 brings to the table is to remove NATing and just use global addresses everywhere.
  • I would generally not expect it to convert host machines into routers. Router advertisements are often carefully controlled in a network environment and assuming that any host can simply be converted into a router seems like it is unlikely to work as a widespread solution. (Allowing any machine on a network to advertise whatever prefixes it wants via IPv6 RA has network security and reliability implications.)

What I would expect to see instead is something like:

  • Each container gets its own virtual interface with a unique global IPv6 address from a block specified by the user, or perhaps as a default behavior implied from the primary interface of the machine. MAC addresses and the corresponding SLAAC IPv6 addresses could be generated at random and ideally checked to see if they are in use before assigning.
  • These devices are connected together via Linux bridge device.
  • nftables or similar tooling could/would be used to impose firewall-like rules for these bridged packets (so giving containers public IPv6 addresses does not just expose them publicly to inbound traffic, although this could be an option)
  • Because these interfaces are bridged (not routed), the usual IPv6 multicast approach of asking "who has this address" works the same way it would for any device connected via a switch - no router advertisements are needed.
  • For outbound requests, containers could just use their IPv6 address as-is, no tricks needed.
  • For inbound requests, it should be possible to proxy (either via userland daemon or iptables) such requests from the main address on the machine to the specific IPv6 address of the container serving that port.

I believe such a configuration would provide IPv6 functionality that "just works" in most environments in a way that is also familiar to existing IPv4 docker users.

Does this seem like a workable solution? Has this been considered?

@bradleypeabody bradleypeabody added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/0-triage labels Apr 7, 2023
@benw10-1
Copy link

benw10-1 commented Apr 8, 2023

I would really appreciate this change, I have been having issues with using IPv6 addresses for my current project as well.

@bradleypeabody
Copy link
Author

Just to preempt the question, this would likely require the main interface to be able to participate in a network bridge (related), which means (sadly) that the machine's address would have to be moved from it's primary interface (e.g. eth0 or enp1s0) to a bridge interface. This step would be annoying for users, but would only need to be done once and every other step would be much more sane. The corresponding docker configuration could simply indicate which bridge interface to participate with.

@yorickdowne
Copy link

This may work on standalone docker: I am a little worried about swarm mode. host networking isn't a thing, there, and you have the behavior that ingress to any node in the swarm gets moved to the container, regardless of where in the swarm it is.

@bradleypeabody
Copy link
Author

@yorickdowne Does using bridged networking (as opposed to routing or something else) fundamentally make this:

ingress to any node in the swarm gets moved to the container

problematic?

@sam-thibault sam-thibault added the area/networking/ipv6 Issues related to ipv6 label Apr 11, 2023
@yorickdowne
Copy link

I admit I do not know. Swarm mode uses overlay networks. How that interacts with the rest of its networking, I am unsure. I wanted to raise it as something to take into account, though.

I generally like the idea of not using NPT for ipv6, so I get where you’re coming from. If NPT is required it wouldn’t be the end of the world. The current situation is not truly usable particularly in a home / PD setting, fully agreed.

@gertvdijk
Copy link

@bradleypeabody
The general use case for most people's host running Docker standalone would be: one interface in a LAN in a /64 IPv6 global network, self-assigned SLAAC host address (or static), correct? Given that situation I would like to comment on this part with 'no tricks needed':

  • For outbound requests, containers could just use their IPv6 address as-is, no tricks needed.
  • For inbound requests, it should be possible to proxy (either via userland daemon or iptables) such requests from the main address on the machine to the specific IPv6 address of the container serving that port.

Depending on what you define as outbound requests exactly, I don't think it will work out of the box beyond the on-host Docker network. The host running Docker would not announce a container IP in the Docker-network address on its LAN interface automatically for any responses back to the host/container. IIUC, neighbour discovery solicit requests will go unanswered when communicating outside of the host on the LAN or further - because these addresses aren't neighbours on this interface (they're one hop away on the Docker bridge).

In order to bridge the gap to the Docker network bridge - without requiring a separate routeable subnet from the upstream provider/router - the best one can do is proxying the neighbour discovery packets. This is analogous to proxy-ARP in IPv4. systemd-networkd can already do this; one would have to include every individual address in the IPv6ProxyNDPAddress setting on the LAN interface. ndppd is a tool that can do it independently of the network orchestration used. Regardless how it's set up, once the host is answering the neighbour discovery solicit messages, regular forwarding of packets should work and for the other hosts in the LAN it would look like these IPv6 addresses are directly available on the LAN, including the default gateway. No routing setup is needed, yet firewalling is still possible using ip6tables/nftables. I think that would please this generic use case.

Issues that will show up in practice though:

  • The internal Docker bridge network would need to have a larger prefix size (smaller subnet), because otherwise the host would have two directly connected interfaces to the same /64-prefixes. So, one could select a unique /112 to the Docker bridge network.
  • The IPv6 global address prefix can be very dynamic. My ISP at home delegates me a /56 but it changes regularly. My ISP-provided router or cascading routers behind it assign a /64 out of this on a LAN and things work, also dynamically. Hosts change their prefix on demand just fine, using updated route announces. If this happens, also the Docker bridge network should be fully re-configuring the addresses on all containers. And while a transition happens a host can have multiple global addresses each with a preference and lifetime... it's not at all trivial to see how and when to act on prefix changes.
  • Any other host on the LAN self-assigning a random address being unlucky to be inside the same sub-subnet as in one's Docker bridge network would become unreachable (or the container) depending where in the network a packet originates. Mandating dad (duplicate address discovery) may fix this, unsure if feasible.
  • A downside of all this is having to enumerate all the addresses individually, setting a requirement for Docker daemon to inform the proxy daemon about every address assignment change of each contaner. I'm not aware of a way to proxy-NDP a whole subnet at once.

Another downside I see is that a 'proper' network set up with an IPv6 subnet routed to your Docker host will become a second class citizen, as one must disable the automatic 'proxy NDP' feature. 😕

FWIW, using that ndppd/IPv6ProxyNDPAddress is how I set up IPv6 Global addresses on containers running on a host with a /64-assigned static subnet (and even larger prefixes) commonly found on budget cloud hosting services without offering a routed subnet as well.
See also https://gdevillele.github.io/engine/userguide/networking/default_network/ipv6/#/using-ndp-proxying

Sorry for being a party-pooper with this long post, just wanted to address my concerns over a proposal that seems to oversimplify the actual situation and would set back those running a 'proper' routed network. HTH.

@bradleypeabody
Copy link
Author

bradleypeabody commented Apr 12, 2023

@gertvdijk Thanks for the detailed reply.

In order to bridge the gap to the Docker network bridge - without requiring a separate routeable subnet from the upstream provider/router - the best one can do is proxying the neighbour discovery packets.

I think there is still a simpler option using a regular Linux network "bridge". I spent a little time trying to make things work as described but ran into some issues using dummy devices (and I don't seem to have all of the right tooling to hand to quickly try it with tap devices, which may be necessary). But here's what I was thinking so far:

The target environment begins with a regular IPv6 address directly on an ethernet interface:

# ip a
...
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
...
    link/ether 36:ec:16:86:cf:79 brd ff:ff:ff:ff:ff:ff
    inet6 2602:fbbc:0:68:34ec:16ff:fe86:cf79/64 scope global
...

We need to convert this configuration into a bridge, which means basically making a bridge interface, assigning enp1s0 as a member, and moving this IP address to the bridge - and doing these last two steps at once so we don't lose our SSH connection. I would not expect Docker to do this automatically, my suggestion would be instead to simply provide instructions of how to do this in common environments and tell the user to do it. This is not an obscure feature or approach, it's used frequently, but it does modify the main network interface, and I'm thinking the user would have to be responsible for ensuring this gets into what Linux-distro-specific configuration (e.g. (https://www.cyberciti.biz/faq/how-to-configuring-bridging-in-debian-linux/)[this sort of thing]), etc. so it is applied again upon reboot and to ensure that other network tooling doesn't interfere with the config, etc. Anyway, the command line way to effect this change would be:

Create the bridge interface:

# ip link add name dockerbr0 type bridge
# ip link set dev dockerbr0 up

And then make the main interface a member and move the IP address to the bridge (this is required just due to how Linux bridges work):

# IPADDR=2602:fbbc:0:68:34ec:16ff:fe86:cf79/64
# ip link set enp1s0 master dockerbr0; ip addr del $IPADDR dev enp1s0; ip addr add $IPADDR dev dockerbr0

Which results in this:

# ip a
...
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 36:ec:16:86:cf:79 brd ff:ff:ff:ff:ff:ff
    inet6 2602:fbbc:0:68:34ec:16ff:fe86:cf79/64 scope global
       valid_lft forever preferred_lft forever
...
3: dockerbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 8e:7c:bf:51:00:d4 brd ff:ff:ff:ff:ff:ff
    inet6 2602:fbbc:0:68:34ec:16ff:fe86:cf70/64 scope global
       valid_lft forever preferred_lft forever
...

The bridge device dockerbr0 is now acting as a software bridge (aka switch) and all of the usual NDP traffic should travel across it automatically (you can filter it with nftables/ebtables, but by default, as far as I understand it, everything passes).

So if we add a new interface and join it to the bridge, it should be exposed to the external network in the same way as the main IP address. Here's the example I tried and died on using a dummy interface, adding another address on the same /64 network:

ip link add dummy0 type dummy
ip link set dev dummy0 up
ip link set dummy0 master dockerbr0
ip addr add 2602:fbbc:0:68::f0f0/64 dev dummy0

So far this didn't work for me, but I suspect as mentioned that it's due to the use of a dummy interface, I could be wrong. (I did also try enabling arp and multicast options on the interface with ip link set dev dummy0 arp on; ip link set dev dummy0 multicast on, but no change).

I do know for a fact that this works as expected if you add the additional address to the bridge directly, but obviously that doesn't help for what we're discussing here.

Let me know if at least that explanation so far makes sense. Maybe I'm missing some basic aspect of this, but still seems to me like this can be made to work.

References:

@bradleypeabody
Copy link
Author

Also, to address this:

Sorry for being a party-pooper with this long post, just wanted to address my concerns over a proposal that seems to oversimplify the actual situation and would set back those running a 'proper' routed network. HTH.

I do appreciate the detail, and to some degree understand where you're coming from. The aspect that I don't fully understand is why using routing instead of bridging is considered more 'proper'. The way I look at it (and I'm not trying to argue here, just articulate another perspective on it), network routing is something that is typically handled by routing hardware/software and often involves propagation using a routing protocol such as BGP, OSPF, etc. Although there are many topologies possible, I think it's fair to say that a common network scenario is to have routing handled by dedicated routing infrastructure ("above" the host, at it's gateway(s)), and within each subnet there exists any number of switches (or software bridges - effectively the same thing). The rationale of my approach here is that for a lot of environments (not just home ISP stuff, many enterprise and cloud environments too), converting a host running docker from a single IP host into a software switch/bridge with multiple IPs on the same subnet fits, I believe, a lot more naturally into many network topologies.

Also just to be clear, I'm thinking this bridged mode would be something entirely separate from other network modes that use routed prefixes - I agree that there are other scenarios possible and bridging doesn't solve everything for every scenario, but I'm thinking it should be an option for those cases where it fits well with the topology.

@yorickdowne
Copy link

I'm going down this rabbit hole a little. There is a solution that works, though it is experimental and it really needs #43033 to land.

Without that fix, it looks something like this:

{
    "userland-proxy": false,
    "ipv6": true,
    "ip6tables": true,
    "fixed-cidr-v6": "fd00:1::/64",
    "experimental": true,
    "default-address-pools":[
      {"base": "172.31.0.0/16", "size": 24},
      {"base": "fd12:3456::/64", "size": 64},
      {"base": "fd12:3456:1::/64", "size": 64},
      {"base": "fd12:3456:2::/64", "size": 64},
      {"base": "fd12:3456:3::/64", "size": 64}
    ]     
}

The "many many address pools" (I actually have 32) is why the lazy allocation or a fix like it is needed, so this can become fd12:3456:789a::/48 as the base.

And even better if it actually worked with something in the ULA by default, without needing to set default-address-pools. Baby steps.

The advantage of iterating on what exists is that it's, well, iterative. It doesn't require an entirely new way of doing things, it doesn't require changes to how host interfaces are configured or changes to the v6 networking in the DC. It works with PD as well as without.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking/ipv6 Issues related to ipv6 area/networking kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/0-triage
Projects
None yet
Development

No branches or pull requests

6 participants