Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictive interface order / default gateway #2093

Open
cziebuhr opened this issue Feb 27, 2018 · 8 comments · Fixed by moby/moby#37209
Open

Predictive interface order / default gateway #2093

cziebuhr opened this issue Feb 27, 2018 · 8 comments · Fixed by moby/moby#37209

Comments

@cziebuhr
Copy link
Contributor

cziebuhr commented Feb 27, 2018

I tried to setup a container with several networks, but didn't succeed in successfully configuring them. Especially when using ipv6 and trying to get a stable interface name of a macvlan network, I got mad. After looking at the docker sources, I stumbled across the following items:

a) I wonder about the usage of container/heap for sb.endpoints. IMHO heap is useful when using heap.Pop() for removing the respective minimum element. Instead, sb.Endpoints() and sb.getConnectedEndpoints() iterate directly over the sb.endpoints slice. In this case, only the first element is guaranteed to be the minimum element of the slice. All other elements have random order.
In some setups, this leads to interface names inside the container having unpredictive order (moby/moby#25181) and an unpredictive default-gateway. I created a patch to change from heap to an ordered array instead.

b) The eh.Less() function also produces unstable results, when e.g. several networks are internal. I'm unsure if checking for endpointInGWNetwork(), getNetwork().Internal() and joinInfo.gw is essential or if it was just convenience for most setups. I think we should skip processing these checks in eh.Less() and instead move them to sb.getGatewayEndpoint(). Also see #1609, #1443 and #1141.

c) There is already some code to set epPriority, unfortunately it's not yet exported through the api and it's still subject to additional auto-magic. I'd favor a solution where I can change the interface order as a user instead of having different results in each docker version.

@cziebuhr
Copy link
Contributor Author

In my usecase, I need these networks:

  • default bridge with ipv4 and default gw
  • custom macvlan with ipv6 (for use with multicast, so I need to know the interface name)
  • optional additional networks

Result: When only having two networks, the macvlan becomes eth0 and default gw. When having more networks, it's totally unpredictable.

@networkop
Copy link

Brilliant solution @cziebuhr , your patch worked for me. Do you think it's likely to get merged soon (or at all)?

@cziebuhr
Copy link
Contributor Author

cziebuhr commented Mar 8, 2018

Created PR #2103 which at least solves point a)

@ctelfer
Copy link
Contributor

ctelfer commented Mar 25, 2018

Hey. Sorry for the delayed response. Needed to do some background research as I wasn't familiar with this section of the code.

Overall, as stated, I think that the change in #2103 takes us in the right direction. From my query of those who know the history of this part of the code a bit better, the intent of the priority was definitely to allow the user to order the interfaces. Agree that keeping them in a heap works against that goal and so putting them in an ordered array as in the PR will fix that.

Now regarding part b) my reading of the code is that the sorting was stable, but perhaps not what we might want. The endpoints were effectively sorted into categories. Non-internal endpoints in the gateway networks come last preceeded by internal endpoints in gateway networks. Internal endpoints in non-gateway networks come precede those and finally the first category is non-internal endpoints in non-gateway networks. Within those categories, if both endpoints have join info, then interfaces with v6 info always come before those without. Finally, they are sorted by priority with ties being resolved by network name lexicographic order. So the question becomes whether this is the correct ordering.

Pictorally, this looks something like:

First: +----
       |   non Gateway, non Internal sorted by prio w/ more specific (i.e. v6) join info first
       |...
       +----
       |   Internal, non-Gateway sorted by prio w/ more specific (i.e. v6) join info first
       |...
       +----
       |   Gateway, non-Internal sorted by prio w/ more specific (i.e. v6) join info first
       |...
       +----
       |   Gateway, internal sorted by prio w/ more specific (i.e. v6) join info first
       |...
Last:  +----

Now we could make it so that priority is considered first so instead what we have is a set of lists like the following ordered by priority:

      +----
      |  non-GW, non-Internal, specific joinInfo
      |  non-GW, non-internal, less specific joinInfo
High  |  non-GW, Internal, specific joinInfo
Prio  |  non-GW, Internal, less specific joinInfo
      |  GW, non-Internal, specific joinInfo
      |....
      +----
Lower |...
Prio  |...

This would allow the user (when we get to part c)) to specify ordering but still have a set of criteria for breaking ties when priority is the same. Also, if priority isn't specified, the behavior should match what we have today.

W.r.t. c) Yes, we need to expose the priority in some way for the endpoints in order to allow the users to specify the ordering. This would require more investigation and obviously would require a separate PR in https://github.com/moby/moby. Suggestions / patches are by all means welcome.

@cziebuhr
Copy link
Contributor Author

Am I right that there can only be one gateway network? If yes, then if epX.endpointInGWNetwork() { return X } is fine. But I assume there can be multiple internal networks. In that case, comparing two internal endpoints currently isn't stable, because one of them will return true or false before the next level of comparison is done. IMHO we need to change the fallthrough mechanism, to advance to the next level when both endpoints are considered equal on one level.
So I suggest this algorithm:

# <=> Returns true if a < b, false if a > b and advances to next level if a == b
epi.prio <=> epj.prio           # 1 < 2
epi.gw <=> epj.gw               # non-gw < gw
epi.internal <=> epj.internal   # non-internal < internal
epi.joininfo <=> epj.joininfi   # ipv6 < ipv4
epi.name <=> epj.name           # bar < foo

@ctelfer
Copy link
Contributor

ctelfer commented Mar 27, 2018

Yep, that's what I was trying to get at. Sorry if unclear. There is one difference though: priority comparison currently puts higher priority as lower in the comparison so higher priority interfaces come first:

func (eh epHeap) Less(i, j int) bool {
...
        return cip > cjp      // endpoint i < endpoint j if prio i > prio j
}

(annotation mine) I'd think we'd want to keep this. Although it's less numerically intuitive in the code, having higher priority endpoint interfaces come earlier in the listing would match my intuition.

@aradmidias
Copy link

Regarding c) The PRs docker/cli#2529 and moby/moby#40974 seem related to this one, the latter requiring #2550 to compile.

I do wonder, whether this implementation would expose users to "auto-magic" of epPriority, mentioned in @cziebuhr's initial post.

@Vlad1mir-D
Copy link

Any updates on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
@aradmidias @cziebuhr @Vlad1mir-D @ctelfer @networkop and others