New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Links: Upgrading the network model #7467

Closed
erikh opened this Issue Aug 7, 2014 · 19 comments

Comments

Projects
None yet
@erikh
Copy link
Contributor

erikh commented Aug 7, 2014

This is a two part proposal. The other portion is at #7468.

Problem

Links currently do not satisfy the needs of users for a couple of reasons:

  • service discovery suffers from the static nature of linking
  • links are volatile. ip addresses, port mappings, link names can change as the result of manipulating a link, and other containers are not notified of these changes nor can they trivially deal with these issues.

Solution:

These subjects will be addressed:

  • fixating containers to resources
  • essential service discovery changes

Fixating containers to resources

We should guarantee that:

  • IP addresses should not change between specific container invocations. A container is assigned a specific IP address that will not change until the container is removed, or if stopped and the existing address space is exhausted (in least recently used order of stopped containers).
  • Likewise, the above should apply to external port mappings.
  • Changing a link should:
    • retarget the port mappings
    • retarget the IP address(es)
    • rewrite the hosts files (see more on service discovery below) for all linked resources with updated content
    • Strategies for retargeting (only one will be used for now):
      1. moving the veth to a new network namespace for the target container (via ip link set) from the host, should incur no arp issues. This method still needs to be tested but should provide great failover benefits that we wouldn’t get with ip address reassignment.
      2. reassign the ip to the veth in the target container. This has some arp issues but may be significantly less surprising or complicated to end-users digging through the links abstraction.
      3. Use an iptables facade to mimic a higher level network that links containers. This would be the point of reference, making retargeting these addresses easily. This would use another internal subnet but is rather flexible.

Service Discovery Changes

Service discovery currently relies on two methods of propagating links information: the environment and hosts files. Since the environment is essentially static, and will become outdated, the solution is to rely on hosts files only. A future proposal will bring DNS support to this scheme.

As mentioned above, global rewriting of hosts files will be necessary. This should be done at link change time, unless the container is started with --link.

Port Discovery

Port discovery will be solved in two fashions. Both are subject to further discussion.

  • The environment will provide DOCKER_LINK_PORT_${link_name}. The value will be a comma separated list of ports.
  • The environment will provide SRV records to be queried with DNS. The record format will be _${link_name}._${proto} and will respond with multiple port -> A mappings.

Tickets Resolved:

#5186
#2733
#3285
#3155
#2658
#2588 (I think)

@gabrtv

This comment has been minimized.

Copy link

gabrtv commented Aug 7, 2014

reassign the ip to the veth in the target container. This has some arp issues but may be significantly less surprising or complicated to end-users digging through the links abstraction.

Not a fan of swapping ip's of existing veths and the ARP issues that would ensue.

Use an iptables facade to mimic a higher level network that links containers. This would be the point of reference, making retargeting these addresses easily. This would use another internal subnet but is rather flexible.

Seems the most flexible and understandable for admins, but feels like the 5 min fix for a problem that needs a longer-term solution.

moving the veth to a new network namespace for the target container (via ip link set) from the host, should incur no arp issues. This method still needs to be tested but should provide great failover benefits that we wouldn’t get with ip address reassignment.

This seems like the cleanest implementation with the most long-term appeal. Do we have a sense for how feasible this really is?

@erikh

This comment has been minimized.

Copy link
Contributor

erikh commented Aug 7, 2014

I'll be testing the netns stuff later today, hopefully. I will get back to you once I know more.

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Aug 7, 2014

Links ideally do not need to be named, the container ID should suffice in a hosts file.

Not sure I understand this; currently, I'm able to name a link (e.g. db) so that I can pre-configure an application to connect to a linked database container using db as host name.

If this is reloaded with the container-id of the linked container, how would I use that (without using additional service-discovery software)?

@LK4D4

This comment has been minimized.

Copy link
Contributor

LK4D4 commented Aug 8, 2014

@thaJeztah +1, this is main case of links for me and all my friends.

@erikh

This comment has been minimized.

Copy link
Contributor

erikh commented Aug 8, 2014

Sorry, that was tacked on at the end and in retrospect it probably shouldn’t be — it’s more related to the links UI (which was split out from this proposal) than this.

I’ll retool in a few minutes here.

On Aug 7, 2014, at 2:08 PM, Sebastiaan van Stijn notifications@github.com wrote:

Links ideally do not need to be named, the container ID should suffice in a hosts file.

Not sure I understand this; currently, I'm able to name a link (e.g. db) so that I can pre-configure an application to connect to a linked database container using db as host name.

If this is reloaded with the container-id of the linked container, how would I use that (without using additional service-discovery software)?


Reply to this email directly or view it on GitHub #7467 (comment).

@erikh

This comment has been minimized.

Copy link
Contributor

erikh commented Aug 11, 2014

@gabrtv @crosbymichael and I worked out the details and think the veth move should be feasible, but we're probably going to have to tear up a bit of docker to accomplish it. He gave me some sample code to chew on and I'm fairly confident we'll be able to do this.

The trick is to orchestrate it with named network namespaces as opposed to PID namespaces (which is what docker and libcontainer do currently). Then they should be portable. I'm still a little hazy on implementation but that should be resolved in a few days. I just need some time to play.

@jpetazzo

This comment has been minimized.

Copy link
Contributor

jpetazzo commented Aug 15, 2014

+1!

Minor remark though:

A container is assigned a specific IP address that will not change until the container is removed, or if stopped and the existing address space is exhausted (in least recently used order of stopped containers).

I would not free up IP addresses when the address space is exhausted. I would rather require the user to explicitly clean up old containers. I understand that it could be an annoyance, but it makes me feel uncomfortable for two reasons:

  • some things might behave differently depending on whether the address space is under pressure or not (e.g. people might have weird link issues because they have many containers and IP addresses get reused too fast)
  • IMHO, IP addresses are a resource just like any other, e.g. disk usage. We wouldn't automatically destroy old containers if the disk is full (would we?), so why would we free up IP addresses if the address space is full?

Just my 2c.

Other than that, everything else is +1 +1 +1 :-)

@erikh

This comment has been minimized.

Copy link
Contributor

erikh commented Aug 15, 2014

I agree it is surprising. I’ll bring it up with the powers that be.

-Erik

On Aug 15, 2014, at 11:25 AM, Jérôme Petazzoni notifications@github.com wrote:

+1!

Minor remark though:

A container is assigned a specific IP address that will not change until the container is removed, or if stopped and the existing address space is exhausted (in least recently used order of stopped containers).

I would not free up IP addresses when the address space is exhausted. I would rather require the user to explicitly clean up old containers. I understand that it could be an annoyance, but it makes me feel uncomfortable for two reasons:

some things might behave differently depending on whether the address space is under pressure or not (e.g. people might have weird link issues because they have many containers and IP addresses get reused too fast)
IMHO, IP addresses are a resource just like any other, e.g. disk usage. We wouldn't automatically destroy old containers if the disk is full (would we?), so why would we free up IP addresses if the address space is full?
Just my 2c.

Other than that, everything else is +1 +1 +1 :-)


Reply to this email directly or view it on GitHub #7467 (comment).

@bfirsh bfirsh added the Proposal label Aug 15, 2014

@erikh

This comment has been minimized.

Copy link
Contributor

erikh commented Aug 19, 2014

@gabrtv we accomplished this yesterday, so hopefully we'll have some code to show soon.

@tristanz

This comment has been minimized.

Copy link

tristanz commented Aug 31, 2014

+1 for not automatically reusing IPs. This would help make things more predictable. I confront this when implementing firewall rules.

@erikh

This comment has been minimized.

Copy link
Contributor

erikh commented Oct 20, 2014

This has been updated to accommodate port discovery. Comments welcome.

@duglin

This comment has been minimized.

Copy link
Contributor

duglin commented Oct 20, 2014

for port discovery, I assume it'll be "port/protocol" and not just "port", right?

@erikh

This comment has been minimized.

Copy link
Contributor

erikh commented Oct 20, 2014

Perhaps we should make it DOCKER_LINK_${proto}_PORTS or something like that.

I don't like the idea of carrying the proto in the value largely because it makes it harder to parse with tools like cut and awk.

@duglin

This comment has been minimized.

Copy link
Contributor

duglin commented Oct 20, 2014

I'd prefer to avoid having to go "looking for" info like this. Today we have tcp and udp, if a 3rd pops up then we have to modify code (hard coded) to search for that env var too. If its all in one list then we only need to grab one env var and parse and we don't need to presume any protocols at all - we just parse it. As long as its always of the form #/proto I don't see the issue in parsing.

Adding the info we're trying to "discover" into the env name is part of the issue I have with our current solution. I feels like you have to already know what the answer is in order to find what you're looking for - kind of silly. That's why I proposed: #8515

Keep in mind usecases where the code doing the "discovery" know nothing - and assume all it might be doing is displaying what it discovers (via some kind of "inspect" op). In those cases it shouldn't really need to have a list of protocols - it should discover those too and having the protocol in the name would make that harder if not impossible.

@erikh

This comment has been minimized.

Copy link
Contributor

erikh commented Oct 20, 2014

I don't really foresee us adding a new layer 4 protocol, but I suppose anything's possible. :)

Let's not over-architect this. I need to think about the rest of your message, but I'm not sure what the problem is here if you know the link name:

echo "${DOCKER_LINK_TCP_PORTS_MYLINK}" | cut -d,

Is not that complicated.

@duglin

This comment has been minimized.

Copy link
Contributor

duglin commented Oct 20, 2014

If you know in advance what the answer is, there's no problem.
edit: I'm not concerned about the parsing

@adamkdean

This comment has been minimized.

Copy link

adamkdean commented May 31, 2015

Any update on this?

@cpuguy83

This comment has been minimized.

Copy link
Contributor

cpuguy83 commented May 31, 2015

@adamkdean A tremendous amount has happened.
See github.com/docker/libnetwork
This is now being used by Docker, which includes plugins support, as well.
And see: #13441

@erikh

This comment has been minimized.

Copy link
Contributor

erikh commented May 31, 2015

We should probably close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment