Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy sometimes incorrectly chooses a veth as the host interface #4218

Closed
justinsb opened this issue Feb 6, 2015 · 21 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@justinsb
Copy link
Member

justinsb commented Feb 6, 2015

This is on Ubuntu 14.04 (on AWS, though I think that doesn't matter).

Log output from kube-proxy (with -v2):

I0206 18:48:38.551724   11226 proxier.go:786] Choosing interface veth29028d0 for from-host portals
I0206 18:48:38.552133   11226 proxier.go:791] Interface veth29028d0 = fe80::e046:dcff:fe5e:6554/64
I0206 18:48:38.552149   11226 proxier.go:328] Initializing iptables

I am pretty sure it should be choosing eth0 or similar. veth29028d0 is a particularly bad choice because it only has an IPv6 address which then breaks the iptables logic.

@justinsb
Copy link
Member Author

justinsb commented Feb 6, 2015

Hmmm... not really sure how to fix this one. Here is the output from a test program:

package main
import "net"
import "log"

func main() {
        intfs, err := net.Interfaces()
        if err != nil {
        log.Fatal("interfaces failed", err)
        }
        i := 0
        for i = range intfs {
                log.Printf("Interface flags=%v v=%v", intfs[i].Flags, intfs[i])
        }
}
2015/02/06 19:16:30 Interface flags=up|loopback v={1 65536 lo  up|loopback}
2015/02/06 19:16:30 Interface flags=up|broadcast|multicast v={2 1500 eth0 22:00:0a:e9:8b:fc up|broadcast|multicast}
2015/02/06 19:16:30 Interface flags=up|broadcast|multicast v={3 1500 cbr0 0e:19:16:13:f7:d1 up|broadcast|multicast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={535 1500 veth6955e04 7e:e2:e7:3f:da:55 up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={537 1500 veth170c37b ee:86:e9:9f:9a:2b up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={539 1500 vethec42502 ca:8a:3f:66:55:0c up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={541 1500 veth23cdc8e f6:13:9c:d9:84:73 up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={543 1500 vethacfda1e 7e:60:6c:19:f5:c2 up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={545 1500 vetha38fa16 fe:e0:aa:dd:65:9a up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={547 1500 veth416c9db ae:6e:07:04:82:af up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={549 1500 vethc10fba7 0e:19:16:13:f7:d1 up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={551 1500 veth941b819 26:35:c3:bb:8e:e4 up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={553 1500 vethc840b94 22:1c:c8:88:8a:39 up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={555 1500 veth17d8699 1a:7b:09:0e:bd:67 up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={557 1500 veth5964a64 fe:21:af:af:73:e8 up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={559 1500 veth5b352bf 76:56:92:f8:ab:ad up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={561 1500 veth611d238 a6:a4:00:4f:56:8e up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={563 1500 veth7fc64ef 26:e3:b3:02:af:51 up|broadcast}
2015/02/06 19:16:30 Interface flags=up|broadcast v={565 1500 vethe740014 d6:f9:9e:0b:96:8f up|broadcast}

@justinsb
Copy link
Member Author

justinsb commented Feb 6, 2015

Including addresses this time.... we could eliminate anything without an IPv4 address, but that still leaves an arbitrary choice between cbr0 and eth0:

2015/02/06 19:20:27 Interface flags=up|loopback v={1 65536 lo  up|loopback}
2015/02/06 19:20:27     [127.0.0.1/8 ::1/128]
2015/02/06 19:20:27 Interface flags=up|broadcast|multicast v={2 1500 eth0 22:00:0a:e9:8b:fc up|broadcast|multicast}
2015/02/06 19:20:27     [10.233.XXX.XXX/26 fe80::2000:aff:fee9:8bfc/64]
2015/02/06 19:20:27 Interface flags=up|broadcast|multicast v={3 1500 cbr0 0e:19:16:13:f7:d1 up|broadcast|multicast}
2015/02/06 19:20:27     [100.64.1.1/24 fe80::a498:70ff:fe05:4cbc/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={535 1500 veth6955e04 7e:e2:e7:3f:da:55 up|broadcast}
2015/02/06 19:20:27     [fe80::7ce2:e7ff:fe3f:da55/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={537 1500 veth170c37b ee:86:e9:9f:9a:2b up|broadcast}
2015/02/06 19:20:27     [fe80::ec86:e9ff:fe9f:9a2b/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={539 1500 vethec42502 ca:8a:3f:66:55:0c up|broadcast}
2015/02/06 19:20:27     [fe80::c88a:3fff:fe66:550c/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={541 1500 veth23cdc8e f6:13:9c:d9:84:73 up|broadcast}
2015/02/06 19:20:27     [fe80::f413:9cff:fed9:8473/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={543 1500 vethacfda1e 7e:60:6c:19:f5:c2 up|broadcast}
2015/02/06 19:20:27     [fe80::7c60:6cff:fe19:f5c2/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={545 1500 vetha38fa16 fe:e0:aa:dd:65:9a up|broadcast}
2015/02/06 19:20:27     [fe80::fce0:aaff:fedd:659a/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={547 1500 veth416c9db ae:6e:07:04:82:af up|broadcast}
2015/02/06 19:20:27     [fe80::ac6e:7ff:fe04:82af/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={549 1500 vethc10fba7 0e:19:16:13:f7:d1 up|broadcast}
2015/02/06 19:20:27     [fe80::c19:16ff:fe13:f7d1/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={551 1500 veth941b819 26:35:c3:bb:8e:e4 up|broadcast}
2015/02/06 19:20:27     [fe80::2435:c3ff:febb:8ee4/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={553 1500 vethc840b94 22:1c:c8:88:8a:39 up|broadcast}
2015/02/06 19:20:27     [fe80::201c:c8ff:fe88:8a39/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={555 1500 veth17d8699 1a:7b:09:0e:bd:67 up|broadcast}
2015/02/06 19:20:27     [fe80::187b:9ff:fe0e:bd67/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={557 1500 veth5964a64 fe:21:af:af:73:e8 up|broadcast}
2015/02/06 19:20:27     [fe80::fc21:afff:feaf:73e8/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={559 1500 veth5b352bf 76:56:92:f8:ab:ad up|broadcast}
2015/02/06 19:20:27     [fe80::7456:92ff:fef8:abad/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={561 1500 veth611d238 a6:a4:00:4f:56:8e up|broadcast}
2015/02/06 19:20:27     [fe80::a4a4:ff:fe4f:568e/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={563 1500 veth7fc64ef 26:e3:b3:02:af:51 up|broadcast}
2015/02/06 19:20:27     [fe80::24e3:b3ff:fe02:af51/64]
2015/02/06 19:20:27 Interface flags=up|broadcast v={565 1500 vethe740014 d6:f9:9e:0b:96:8f up|broadcast}
2015/02/06 19:20:27     [fe80::d4f9:9eff:fe0b:968f/64]

@justinsb
Copy link
Member Author

justinsb commented Feb 6, 2015

One last comment: this happens when we restart kube-proxy with pods/docker instances running. (It doesn't happen in the more normal scenario when kube-proxy is started before any pods).

@thockin
Copy link
Member

thockin commented Feb 6, 2015

This issue has popped up in a few contexts and I am really stumped on it.
I don't really know how to crack this nut. Maybe we can do the hostname -i trick and try to do a DNS lookup of $(hostname) ?

On Fri, Feb 6, 2015 at 11:51 AM, Justin Santa Barbara <
notifications@github.com> wrote:

One last comment: this happens when we restart kube-proxy with pods/docker
instances running. (It doesn't happen in the more normal scenario when
kube-proxy is started before any pods).

Reply to this email directly or view it on GitHub
#4218 (comment)
.

@justinsb
Copy link
Member Author

justinsb commented Feb 6, 2015

It's not a perfect solution, but we could also just skip anything that doesn't have an IPv4 address, and anything that starts with "lo", "veth", "br" or "cbr".

We should also probably not early-exit from the loop, but instead collect all the candidates. If there is more than one candidate, we should log them all and then make an arbitrary choice.

I can put together a strawman patch if that would be useful.

@thockin
Copy link
Member

thockin commented Feb 6, 2015

I'm fine with collecting them all and logging the arbitrary choice. I'm
less happy about a whitelist of prefixes (to which you also have to add
"docker" and "kbr"). Do you see problems with doing a DNS lookup of your
own hostname? I think that should be a reasonable foundation to depend
on.

On Fri, Feb 6, 2015 at 3:54 PM, Justin Santa Barbara <
notifications@github.com> wrote:

It's not a perfect solution, but we could also just skip anything that
doesn't have an IPv4 address, and anything that starts with "lo", "veth",
"br" or "cbr".

We should also probably not early-exit from the loop, but instead collect
all the candidates. If there is more than one candidate, we should log them
all and then make an arbitrary choice.

I can put together a strawman patch if that would be useful.

Reply to this email directly or view it on GitHub
#4218 (comment)
.

@justinsb
Copy link
Member Author

justinsb commented Feb 7, 2015

Would hostname -i just mean that kube-proxy takes a command-line argument to get this IP? That seems reasonable to me.

@justinsb
Copy link
Member Author

justinsb commented Feb 7, 2015

On second thoughts, I have read more about why we actually need hostIP. It is only used in iptablesHostPortalArgs, and only as a workaround for some weirdness around listening on 0.0.0.0 vs DNAT. Based on that comment, it sounds like any IPv4 that isn't loopback would be fine.

So I think a good patch for now might be: filter out IPv6 addresses along with loopbacks (they cause an immediate problem); log if there are multiple choices (this seems likely to cause problems in future).

I think that's simple, and avoids creating another required parameter.

I'm not wild about directly calling hostname -i (or doing something similar); e.g. what do we do if it resolves to a loopback address or if the hostname can't be resolved? (I've had both of these problems while working on AWS)

@thockin
Copy link
Member

thockin commented Feb 7, 2015

I was suggesting doing the equivalent of hostname -i -- is it unreasonable
to expect a machine to be able to nslookup its own name? Maybe we could
start with that and then add this filter list as the last resort?

On Fri, Feb 6, 2015 at 4:40 PM, Justin Santa Barbara <
notifications@github.com> wrote:

On second thoughts, I have read more about why we actually need hostIP. It
is only used in iptablesHostPortalArgs, and only as a workaround for some
weirdness around listening on 0.0.0.0 vs DNAT. Based on that comment, it
sounds like any IPv4 that isn't loopback would be fine.

So I think a good patch for now might be: filter out IPv6 addresses along
with loopbacks (they cause an immediate problem); log if there are multiple
choices (this seems likely to cause problems in future).

I think that's simple, and avoids creating another required parameter.

I'm not wild about directly calling hostname -i (or doing something
similar); e.g. what do we do if it resolves to a loopback address or if the
hostname can't be resolved? (I've had both of these problems while working
on AWS)

Reply to this email directly or view it on GitHub
#4218 (comment)
.

@zmerlynn
Copy link
Member

zmerlynn commented Feb 7, 2015

I've seen nslookup of self fail in a variety of misconfigured (especially
on-prem) environments, but it's the most reliable/sane way.

On Fri, Feb 6, 2015, 21:18 Tim Hockin notifications@github.com wrote:

I was suggesting doing the equivalent of hostname -i -- is it unreasonable
to expect a machine to be able to nslookup its own name? Maybe we could
start with that and then add this filter list as the last resort?

On Fri, Feb 6, 2015 at 4:40 PM, Justin Santa Barbara <
notifications@github.com> wrote:

On second thoughts, I have read more about why we actually need hostIP.
It
is only used in iptablesHostPortalArgs, and only as a workaround for some
weirdness around listening on 0.0.0.0 vs DNAT. Based on that comment, it
sounds like any IPv4 that isn't loopback would be fine.

So I think a good patch for now might be: filter out IPv6 addresses along
with loopbacks (they cause an immediate problem); log if there are
multiple
choices (this seems likely to cause problems in future).

I think that's simple, and avoids creating another required parameter.

I'm not wild about directly calling hostname -i (or doing something
similar); e.g. what do we do if it resolves to a loopback address or if
the
hostname can't be resolved? (I've had both of these problems while
working
on AWS)

Reply to this email directly or view it on GitHub
<
#4218 (comment)

.


Reply to this email directly or view it on GitHub
#4218 (comment)
.

@zmerlynn
Copy link
Member

zmerlynn commented Feb 7, 2015

Actually, couldn't it also scour the route table for the default gateway
network's interface and camp on that?

On Fri, Feb 6, 2015, 21:32 Zachary Loafman zml@google.com wrote:

I've seen nslookup of self fail in a variety of misconfigured (especially
on-prem) environments, but it's the most reliable/sane way.

On Fri, Feb 6, 2015, 21:18 Tim Hockin notifications@github.com wrote:

I was suggesting doing the equivalent of hostname -i -- is it unreasonable
to expect a machine to be able to nslookup its own name? Maybe we could
start with that and then add this filter list as the last resort?

On Fri, Feb 6, 2015 at 4:40 PM, Justin Santa Barbara <
notifications@github.com> wrote:

On second thoughts, I have read more about why we actually need hostIP.
It
is only used in iptablesHostPortalArgs, and only as a workaround for
some
weirdness around listening on 0.0.0.0 vs DNAT. Based on that comment, it
sounds like any IPv4 that isn't loopback would be fine.

So I think a good patch for now might be: filter out IPv6 addresses
along
with loopbacks (they cause an immediate problem); log if there are
multiple
choices (this seems likely to cause problems in future).

I think that's simple, and avoids creating another required parameter.

I'm not wild about directly calling hostname -i (or doing something
similar); e.g. what do we do if it resolves to a loopback address or if
the
hostname can't be resolved? (I've had both of these problems while
working
on AWS)

Reply to this email directly or view it on GitHub
<#4218
issuecomment-73339681>
.


Reply to this email directly or view it on GitHub
#4218 (comment)
.

@thockin
Copy link
Member

thockin commented Feb 7, 2015

Ooh, that's good, I like that one.
On Feb 6, 2015 9:44 PM, "Zach Loafman" notifications@github.com wrote:

Actually, couldn't it also scour the route table for the default gateway
network's interface and camp on that?

On Fri, Feb 6, 2015, 21:32 Zachary Loafman zml@google.com wrote:

I've seen nslookup of self fail in a variety of misconfigured (especially
on-prem) environments, but it's the most reliable/sane way.

On Fri, Feb 6, 2015, 21:18 Tim Hockin notifications@github.com wrote:

I was suggesting doing the equivalent of hostname -i -- is it
unreasonable
to expect a machine to be able to nslookup its own name? Maybe we could
start with that and then add this filter list as the last resort?

On Fri, Feb 6, 2015 at 4:40 PM, Justin Santa Barbara <
notifications@github.com> wrote:

On second thoughts, I have read more about why we actually need
hostIP.
It
is only used in iptablesHostPortalArgs, and only as a workaround for
some
weirdness around listening on 0.0.0.0 vs DNAT. Based on that comment,
it
sounds like any IPv4 that isn't loopback would be fine.

So I think a good patch for now might be: filter out IPv6 addresses
along
with loopbacks (they cause an immediate problem); log if there are
multiple
choices (this seems likely to cause problems in future).

I think that's simple, and avoids creating another required parameter.

I'm not wild about directly calling hostname -i (or doing something
similar); e.g. what do we do if it resolves to a loopback address or
if
the
hostname can't be resolved? (I've had both of these problems while
working
on AWS)

Reply to this email directly or view it on GitHub
<#4218
issuecomment-73339681>
.

Reply to this email directly or view it on GitHub
<
#4218 (comment)

.

Reply to this email directly or view it on GitHub
#4218 (comment)
.

@thockin
Copy link
Member

thockin commented Feb 8, 2015

Can folks sanity check this?

http://play.golang.org/p/kzib_NygC6

We could have a cascade of modes, if we're not confident in this.

First try the default gateway, then DNS lookup os.Hostname(), finally fall
back on the first interface that passes the propose filters? Of course, I
would say to start with one mode and see if that fails for anyone...

On Sat, Feb 7, 2015 at 8:36 AM, Tim Hockin thockin@google.com wrote:

Ooh, that's good, I like that one.
On Feb 6, 2015 9:44 PM, "Zach Loafman" notifications@github.com wrote:

Actually, couldn't it also scour the route table for the default gateway
network's interface and camp on that?

On Fri, Feb 6, 2015, 21:32 Zachary Loafman zml@google.com wrote:

I've seen nslookup of self fail in a variety of misconfigured
(especially
on-prem) environments, but it's the most reliable/sane way.

On Fri, Feb 6, 2015, 21:18 Tim Hockin notifications@github.com wrote:

I was suggesting doing the equivalent of hostname -i -- is it
unreasonable
to expect a machine to be able to nslookup its own name? Maybe we could
start with that and then add this filter list as the last resort?

On Fri, Feb 6, 2015 at 4:40 PM, Justin Santa Barbara <
notifications@github.com> wrote:

On second thoughts, I have read more about why we actually need
hostIP.
It
is only used in iptablesHostPortalArgs, and only as a workaround for
some
weirdness around listening on 0.0.0.0 vs DNAT. Based on that
comment, it
sounds like any IPv4 that isn't loopback would be fine.

So I think a good patch for now might be: filter out IPv6 addresses
along
with loopbacks (they cause an immediate problem); log if there are
multiple
choices (this seems likely to cause problems in future).

I think that's simple, and avoids creating another required
parameter.

I'm not wild about directly calling hostname -i (or doing something
similar); e.g. what do we do if it resolves to a loopback address or
if
the
hostname can't be resolved? (I've had both of these problems while
working
on AWS)

Reply to this email directly or view it on GitHub
<#4218
issuecomment-73339681>
.

Reply to this email directly or view it on GitHub
<
#4218 (comment)

.

Reply to this email directly or view it on GitHub
#4218 (comment)
.

@mbforbes mbforbes added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Feb 8, 2015
@sub-mod
Copy link
Contributor

sub-mod commented Feb 9, 2015

hostname -i is kinda unreliable.
Just for my clarity why are we favoring the nic with gateway ? This can be the Public IP of the node.
We can end up exposing the services to outside world.I think we need to to have a consistent parameter across all daemons to define the network it needs to use. #4115

@thockin
Copy link
Member

thockin commented Feb 9, 2015

I agree we need a consistent parameter across daemons. But I don't think
we want to force most users to use the override - I think we need some
discovery heuristic that works in most cases.

On Mon, Feb 9, 2015 at 10:27 AM, sub-mod notifications@github.com wrote:

hostname -i is kinda unreliable.
Just for my clarity why are we favoring the nic with gateway ? This can be
the Public IP of the node.
We can end up exposing the services to outside world.I think we need to to
have a consistent parameter across all daemons to define the network it
needs to use. #4115
#4115

Reply to this email directly or view it on GitHub
#4218 (comment)
.

@a-robinson
Copy link
Contributor

Using the default gateway as in @thockin's playground code with fallbacks as needed sounds good to me, for what it's worth. The majority of users (particularly cloud users) shouldn't need to worry about configuring this stuff.

The only caveat I'd have is that it'd be nice to verify it works on standard AWS and Azure instances before running with it.

@thockin thockin modified the milestone: v1.0 Feb 17, 2015
@larsks
Copy link

larsks commented Feb 26, 2015

I just upgraded from 0.7 to 0.9 (kubernetes-0.9.1-0.2.git3623a01 in Fedora 21), and I have been bit by this hard. Specifically, on startup, kube-proxy is selecting an interface that looks like this:

2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000
    link/ether 20:cf:30:46:7e:62 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::22cf:30ff:fe46:7e62/64 scope link
       valid_lft forever preferred_lft forever

This interface doesn't have any IPv4 addresses (it's a member of a bridge), so kube-proxy takes the IPv6 address and starts trying to create rules that look like:

iptables.go:186] running iptables -C [KUBE-PORTALS-HOST -t nat -m comment --comment test-web -p tcp -m tcp -d 10.254.75.66/32 --dport 8888 -j DNAT --to-destination [fe80::22cf:30ff:fe46:7e62]:45131]

Of course, iptables doesn't know what to do with an ipv6 address, so it blows up:

proxier.go:375] Failed to ensure portal for "test-web": error checking rule: exit status 2: iptables v1.4.21: Port `:22cf:30ff:fe46:7e62]:45131' not valid

@sub-mod
Copy link
Contributor

sub-mod commented Feb 26, 2015

the code in #4115 would fix this issue , if proxier.go uses the same method as apiserver.go to fetch an IP based on gateway

@thockin
Copy link
Member

thockin commented Mar 2, 2015

I like #4115 as a heuristic. Once that is in, we can use that for proxy.

@a-robinson
Copy link
Contributor

This should be fixed now that #4865 is in, right?
@sub-mod

@sub-mod
Copy link
Contributor

sub-mod commented Mar 11, 2015

@a-robinson yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

8 participants