Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable IPv6 NAT (rebase of #2023) #2572

Merged
merged 13 commits into from Oct 29, 2020
Merged

Enable IPv6 NAT (rebase of #2023) #2572

merged 13 commits into from Oct 29, 2020

Conversation

bboehmke
Copy link
Contributor

This is mainly a rebase (with some additions) of the MR #2023 from @wrridgwa.

Original description:

This pull request addresses this open issue moby/moby#25407
Previously when IPv6 was enabled on Docker, ip6tables weren't manipulated to do Port Forwarding and IP masquerading. This PR addresses that in a way similar to the IPv4 implementation.

Relies on the following Moby PR to bring the EnableIPv6 flag through to libnetwork: moby/moby#35252

When combined with the Docker pull request, the fix can be verified after starting a container in bridge mode with port forwarding and IPv6 enabled:

Compare the iptables -t nat -L command with the ip6tables -t nat -L command
Compare iptables -t filter -L command with ip6tables -t filter -L command
Ping6 from within a running container to an outside device.

Note: This is the same as #1992 but cleaner commits.

wrridgwa and others added 3 commits July 19, 2020 16:16
Signed-off-by: Billy Ridgway <wrridgwa@us.ibm.com>
Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
removeIPChains()
natChain, filterChain, isolationChain1, isolationChain2, err = setupIPChains(config)

removeIPChains(iptables.IPv4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we remove/setup either ipv4 or ipv6 chains in this function ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old patch was build in a way that it only enables the IPv6 handling if also the IPv4 handling is enabled.
For sure it is an option to enable it independent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resorted the EnableIP6Tables usage in driver.configure

@@ -667,16 +707,16 @@ func (d *driver) createNetwork(config *networkConfiguration) (err error) {

// Add inter-network communication rules.
setupNetworkIsolationRules := func(config *networkConfiguration, i *bridgeInterface) error {
if err := network.isolateNetwork(networkList, true); err != nil {
if err = network.isolateNetwork(networkList, false); err != nil {
if err := network.isolateNetwork(iptables.IPv4, networkList, true); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why we are setting this to ipv4 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was done to not change the previous behavior of the lib.
Basically the IPv6 patch is not 100% complete.

Sadly I am (at least for now) not really familiar with the code base of libnetwork or docker at all so I can not real say if it is relevant to add here also IPv6 handling.

Copy link
Contributor

@arkodg arkodg Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be passing the IPtables version in this API, this call will finally call setINC which should hopefully do the right thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved handling of IP version inside network.isolateNetwork and also handle IPv6

if err := m.userlandProxy.Start(); err != nil {
if err := cleanup(); err != nil {
return nil, fmt.Errorf("Error during port allocation cleanup: %v", err)
if hostIP.To4() != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we skipping this section for ipv6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called twice for each port which cases a problem with the userland proxy which tries to bind the port twice.
Currently this is handled by only start the user land proxy on the IPv4 port forwarding.
At the moment I have no idea how to handle this in a proper way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you testing this with the user land proxy enabled ?
does this work with it disabled ?
we can add a TODO for the user land proxy rather than add the hostIP.To4() checks in the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply run the unit test which fail, so I thing the user land proxy is enabled there.

In part of the rework of the allocatePorts I found maybe a better solution.
Now the user land proxy will only be enabled for the IPv4 allocation and not for IPv6 (see comment below)

logrus.Warnf("Setting the default DROP policy on firewall reload failed, %v", err)
}
})
}

// add only iptables rules - forwarding is handled by setupIPv6Forwarding in setup_ipv6
if enableIP6Tables {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a iptables.OnReloaded call as well (for firewalld)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added iptables.OnReloaded for IPv6

@@ -69,9 +83,15 @@ func (n *bridgeNetwork) allocatePort(bnd *types.PortBinding, containerIP, defHos
return err
}

portmapper := n.portMapper

if containerIP.To4() == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hoping we can be consistent with the way we decipher ipv6 and ipv4 throughout the codebase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all checks should now check against net.IP.To4() so it should now be more consistent


portmapper := n.portMapper

if strings.ContainsAny(host.String(), "]") == true {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with net.IP.To4() check

logrus.Warnf("Failed on removing the inter-network iptables rules on cleanup: %v", err)
}
return err
}
// register the cleanup function
network.registerIptCleanFunc(func() error {
nwList := d.getNetworks()
return network.isolateNetwork(nwList, false)
return network.isolateNetwork(iptables.IPv4, nwList, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved handling of IP version inside network.isolateNetwork and also handle IPv6

if err := network.isolateNetwork(networkList, true); err != nil {
if err = network.isolateNetwork(networkList, false); err != nil {
if err := network.isolateNetwork(iptables.IPv4, networkList, true); err != nil {
if err = network.isolateNetwork(iptables.IPv4, networkList, false); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved handling of IP version inside network.isolateNetwork and also handle IPv6

return n.allocatePortsInternal(ep.extConnConfig.PortBindings, ep.addr.IP, defHostIP, ulPxyEnabled)
var pb []types.PortBinding

if ep.addrv6 != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a network will be either a ipv4 or a ipv6 network, why do we need to allocate both kind of ports here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it is possible to have a network without an IPv4 address.
But otherwise a network can be a only IPv4 network or a IPv6/IPv4 (dual stack) network.

In case of a dual stack network the port is allocated on both versions.
For example in case of an web server container that exposes port 80 it is allocated as 0.0.0.0:80 and [::]:80.

If IPv6 only networks are also possible this should be changed of course.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we take care of this case later, and only pick one binding for now
if ep.addrv6 != nil {
// bind to ipv6 addr
} else {
// bind to ipv4 addr
}

Copy link
Contributor Author

@bboehmke bboehmke Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it would be implemented this way it would cause only a forward for IPv6 which break the IPv4 behavior.

I have reworked this section a bit and also handle the EnableIP6Tables correctly.
I also changed the signature back and handle the different calls completely outside.
Hope this is a proper way to handle the port allocation.

})

n.portMapper.SetIptablesChain(natChain, n.getNetworkBridgeName())
}

d.Lock()
err = iptables.EnsureJumpRule("FORWARD", IsolationChain1)
err = iptable.EnsureJumpRule("FORWARD", IsolationChain1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we split up setupIPTables into setupIP4Tables and setupIP6Tables based on the ip version value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split into setupIP4Tables and setupIP6Tables with a shared setupIPTables for the common parts

@arkodg
Copy link
Contributor

arkodg commented Jul 20, 2020

thanks for carrying forward this PR @bboehmke .
I've taken a quick pass and the arch mostly looks fine to me.
Hoping you can address the comments

we also need to make sure that EnableIPMasquerade for bridge config is false by default for ipv6 networks to maintain backward compatibility

@arkodg arkodg requested a review from thaJeztah July 20, 2020 16:49
@bboehmke
Copy link
Contributor Author

thanks for the comments.

I will try my best to address the comments but I think I will not be able to find a good solution for every thing without some help.

Regarding the activation of this change I thought about a "--ip6tables" for the daemon that is by default false.
Something like that.
But I think this is a topic for later but for sure by default IPv6 NAT should be disabled.

Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
@bboehmke
Copy link
Contributor Author

@arkodg I tried to address all of your comments with exception of two of them.

The first is the allocatePorts function seams to work as indented/required here but maybe I have missed something.

The second (and in my opinion really bad) section is the handling of the port forwarding for the userland proxy.
If have not really an idea how this can be handled in a better way to ensure the ports are only bind once.

Would be great if you can take look into the changes and maybe give me a hint what I can do with the userland proxy.

@@ -54,11 +62,17 @@ var (
initOnce sync.Once
)

// IPTable defines struct with IPVersion and few other members are added later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove the and few other members are added later for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -26,18 +26,19 @@ func arrangeUserFilterRule() {
if ctrl == nil || !ctrl.iptablesEnabled() {
return
}
_, err := iptables.NewChain(userChain, iptables.Filter, false)
iptable := iptables.GetIptable(iptables.IPv4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we will need to add another DOCKER_USER jump rule for ipv6, you can add that as a TODO for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also some more of this fixed IPv4 (iptables.GetIptable(iptables.IPv4)) location:
resolver_unix.go
service_linux.go
drivers/overlay/encryption.go
drivers/overlay/filter.go

This are mainly sections for overly networks.
It would be really great if this would also work with IPv6 because I think this would make the swarm mode IPv6 NAT compatible.
But there are some section that can not be translated to IPv6 because there are some missing features.
(addrtype LOCAL in ip6tables in combination with route_localnet sys config for IPv6)

I am currently trying to change as much as possible and test if it is working.
If this will not be added in this MR should I add a TODO on any of this iptables.GetIptable(iptables.IPv4) calls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure you can add a TODO above those

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added TODO

@@ -17,7 +17,7 @@ func TestSetupIPForwarding(t *testing.T) {
}

// Set IP Forwarding
if err := setupIPForwarding(true); err != nil {
if err := setupIPForwarding(true, false); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we enable this in the TC
I think the docker0 bridge is usually a dual ipv4+ipv6 bridge, so this will be a good test for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed second parameter to true

Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>

// IPv6 port binding excluding user land proxy
if n.driver.config.EnableIP6Tables && ep.addrv6 != nil {
pbv6, err := n.allocatePortsInternal(ep.extConnConfig.PortBindings, ep.addrv6.IP, defaultBindingIPV6, false)
Copy link
Contributor

@arkodg arkodg Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another TODO is to have a defaultBindingIPv6 similar to what IPv4 has, that can be specified by user via driver opts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added TODO

Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Thanks for carrying this PR forward and addressing all the changes, really appreciate it
@thaJeztah can you PTAL :)

Signed-off-by: Benjamin Böhmke <benjamin@boehmke.net>
@bboehmke
Copy link
Contributor Author

I added the TODOs to the section that are only supporting IPv4 for now.

This is mainly the case for all overlay network components.
Sadly it seems not so easy if not impossible to enable IPv6 here.
Something like iptables -t nat -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER-INGRESS is simply availablefor IPv6 (or I have not fund it until now).

Hopefully we will find a solution for that in the future.
A fully functional IPv6 swarm would be really great.

@bboehmke
Copy link
Contributor Author

bboehmke commented Aug 3, 2020

@arkodg @thaJeztah Is there anything I can do to move this MR forward?

@bboehmke
Copy link
Contributor Author

@arkodg Its now 3 month since the last reaction.
Is there anything missing in this pull request that prevents the merge?

I would be happy to make any changes so we can bring this feature forward.

@arkodg
Copy link
Contributor

arkodg commented Oct 23, 2020

@bboehmke I think @thaJeztah might be busy with the 20.10 release .
would ideally like 2 approvals before merging this one
sorry for the delay

@arkodg arkodg merged commit d511c60 into moby:master Oct 29, 2020
arkodg pushed a commit to arkodg/moby that referenced this pull request Oct 29, 2020
Bring in changes from moby/libnetwork#2572 to moby

Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
docker-jenkins pushed a commit to docker/docker-ce that referenced this pull request Nov 5, 2020
Bring in changes from moby/libnetwork#2572 to moby

Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
Upstream-commit: 1623e6b222a95f1b0e4e031ebebf88808b362d9b
Component: engine
@grigorig
Copy link

grigorig commented Nov 23, 2020

I don't really understand the change. What's the point of NAT if you have a publicly routable IPv6 prefix allocated for Docker? Docker only needs to do stateful filtering for IPv6, not NAT. I'm looking at this particular w.r.t. issue #2557.

@bboehmke
Copy link
Contributor Author

@grigorig Sure if you have a routable IPv6 prefix for docker that should be the preferred way to go. But there are situations where only one IPv6 address is available (some VPS provider or private networks). That is also the reason why this feature was requested multiple times in the past and even a separate project was created for this use case (see docker-ipv6nat).

But for sure this does not solve your issue #2557 and is a completely different use case.

@timcoote
Copy link

NAT for IPv6? Is that a joke or some sort of evil plot?

@msva
Copy link

msva commented Mar 9, 2022

Sorry for semi-necroposting, but this PR/issue looks like best fitting the situation I'm hitting.
If I took wrong place for it, please lead me to the right place. Thanks in advance!

Well, the problem is in that fact, that even with enabled ip6tables option, docker don't create ip6tables DNAT rules to forward publishied ports to the container.
It perfectly doing that for containers that run with docker run -p 1234:1234, bot don't bother for create similar rules for containers deployed through swarm...

Wnd if I add the rule manually, like

ip6tables -t nat -A DOCKER ! -i docker0 -p tcp -m tcp --dport 33333 -j DNAT --to-destination [d0ce::242:c0a8:1]:33333

all works fine.
But the big flaw here, is that I can't predict destination addressed for every possible container every time, so I can neither hardcode such rule to ip6tables chains, nor somehow automate that.

Is it any way to somehow fix it on my side, or is it bug/regression?

Thanks in advance!

@bboehmke
Copy link
Contributor Author

bboehmke commented Mar 9, 2022

@msva If I understand it correctly you are using docker in swarm mode.

Sadly the IPv6 NAT is only working with regular containers.
Based on my basic understanding how the swarm network is working, this seems to be not possible in the same way with IPv6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants