Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DualStack / IPv6-only: parseIP error with IPVS proxy on CentOS 7 #89520

Closed
duylong opened this issue Mar 26, 2020 · 40 comments · Fixed by #90555
Closed

DualStack / IPv6-only: parseIP error with IPVS proxy on CentOS 7 #89520

duylong opened this issue Mar 26, 2020 · 40 comments · Fixed by #90555
Assignees
Labels
area/ipv6 area/ipvs kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@duylong
Copy link

duylong commented Mar 26, 2020

Hi,

Since my upgrade to 1.18 version, I have errors in kube-proxy:

E0326 13:14:23.847364       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847388       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::a]:53/UDP, err: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847479       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847501       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::a]:53/TCP, err: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847595       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847617       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::a]:9153/TCP, err: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847706       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 0 0 16 2 69 0 0 192 187 182 147 174 207 103 7]
E0326 13:14:23.847728       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::7964]:443/TCP, err: parseIP Error ip=[253 0 0 16 2 69 0 0 192 187 182 147 174 207 103 7]
E0326 13:14:23.847813       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 0 0 16 2 69 0 0 192 187 182 147 174 207 103 41]
E0326 13:14:23.847835       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::dc23]:80/TCP, err: parseIP Error ip=[253 0 0 16 2 69 0 0 192 187 182 147 174 207 103 41]
E0326 13:14:23.848063       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 221 172 173 0 21 1 42 2 80 86 255 254 177 6 5]
E0326 13:14:23.848085       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::1]:443/TCP, err: parseIP Error ip=[253 221 172 173 0 21 1 42 2 80 86 255 254 177 6 5]
...

I have ipv4/ipv6 dualstack enable. No problem with cluster and IPVS works despite errors.

Do you have also this issue ?

  • OS: RHEL7
  • Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}```
@duylong duylong added the kind/bug Categorizes issue or PR as related to a bug. label Mar 26, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 26, 2020
@neolit123
Copy link
Member

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 26, 2020
@liggitt
Copy link
Member

liggitt commented Mar 27, 2020

Related to https://github.com/kubernetes/kubernetes/pull/89343/files#diff-71a24186bd4c7e94c34819c55dfa89af?

cc @aojea @thockin

@liggitt liggitt changed the title parseIP Error with kube-proxy 1.18.0: parseIP Error with kube-proxy Mar 27, 2020
@SataQiu
Copy link
Member

SataQiu commented Mar 27, 2020

/cc

@aojea
Copy link
Member

aojea commented Mar 27, 2020

Taking one of the outputs it gives me the following addresses?
https://play.golang.org/p/K196eO64r5v

Address: fd00:10:245:0:c0bb:b693:aecf:6707, [253 0 0 16 2 69 0 0 192 187 182 147 174 207 103 7]
Address: fd00:10:96::a [253 0 0 16 0 150 0 0 0 0 0 0 0 0 0 10]

@duylong can you share more details about your environment,

  • Is it possible that the CoreDNS pod has the address fd00:10:245:0:c0bb:b693:aecf:6707?

  • what are your cluster-cidr and service-cidr?

Indeed that new net.Utils change looks suspicious , a quick grep in the repo gives me some suspects, maybe the service/ipallocator?

$ grep -r GetIndexedIP *
cmd/kubeadm/app/util/apiclient/init_dryrun.go:  internalAPIServerVirtualIP, err := utilnet.GetIndexedIP(svcSubnet, 1)
cmd/kubeadm/app/constants/constants.go: dnsIP, err := utilnet.GetIndexedIP(svcSubnetCIDR, 10)
cmd/kubeadm/app/constants/constants.go: internalAPIServerVirtualIP, err := utilnet.GetIndexedIP(svcSubnet, 1)
cmd/kubeadm/app/preflight/checks.go:    testIP, err := utilsnet.GetIndexedIP(cidr, 1)
pkg/master/services.go: apiServerServiceIP, err := utilnet.GetIndexedIP(&serviceClusterIPRange, 1)
pkg/registry/core/service/ipallocator/allocator.go:             ip, _ := utilnet.GetIndexedIP(r.net, offset+1) // +1 because Range doesn't store IP 0

It's important to mention there were similar issues with IPVS before, interpreting IPv4 addresses as IPv6, @uablrek has some experience with this kind of issues, maybe he can see something:)

#87604
#65006

/cc

@uablrek
Copy link
Contributor

uablrek commented Mar 27, 2020

A bit tricky to trace since libnetwork/ipvs is moving around at the moment. Problem seems to be in;
https://github.com/moby/ipvs/blob/8f137da6850a975020f4f739c589d293dd3a9d7b/netlink.go#L260-L274

The "family" must be wrong. It is called from;
https://github.com/moby/ipvs/blob/8f137da6850a975020f4f739c589d293dd3a9d7b/netlink.go#L463

@uablrek
Copy link
Contributor

uablrek commented Mar 27, 2020

@duylong Please do a ipvsadm -Ln and attach the output.

@duylong
Copy link
Author

duylong commented Mar 27, 2020

Hi,

Some informations:

default       kubernetes                   [fddd:acad:15:12a:250:56ff:feb1:605]:6443                                                                                           42d
kube-system   kube-dns                     [fd00:10:245:0:c0bb:b693:aecf:6727]:53,[fd00:10:245:0:dde8:fb36:cc62:37c]:53,[fd00:10:245:0:c0bb:b693:aecf:6727]:9153 + 3 more...   42d
kube-system   metrics-server               [fd00:10:245:0:c0bb:b693:aecf:6707]:4443          

cluster-cidr and service-cidr:

  • fd00:10:245:0:0:0:0:0/64,10.245.0.0/16
  • fd00:10:96:0:0:0:0:0/112,10.96.0.0/12

ipvsadm output:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  [fd00:10:96::1]:443 lc
  -> [fddd:acad:15:12a:250:56ff:feb1:605]:6443 Masq    1      1          0
TCP  [fd00:10:96::a]:53 lc
  -> [fd00:10:245:0:c0bb:b693:aecf:6727]:53 Masq    1      0          0
  -> [fd00:10:245:0:dde8:fb36:cc62:37c]:53 Masq    1      0          0
TCP  [fd00:10:96::a]:9153 lc
  -> [fd00:10:245:0:c0bb:b693:aecf:6727]:9153 Masq    1      0          0
  -> [fd00:10:245:0:dde8:fb36:cc62:37c]:9153 Masq    1      0          0
TCP  [fd00:10:96::7964]:443 lc
  -> [fd00:10:245:0:c0bb:b693:aecf:6707]:4443 Masq    1      2          0
UDP  [fd00:10:96::a]:53 lc
  -> [fd00:10:245:0:c0bb:b693:aecf:6727]:53 Masq    1      0          0
  -> [fd00:10:245:0:dde8:fb36:cc62:37c]:53 Masq    1      0          0
TCP  [fd00:10:96::dc23]:80 lc
  -> [fd00:10:245:0:c0bb:b693:aecf:6729]:80 Masq    1      0          0
....

@SataQiu
Copy link
Member

SataQiu commented Mar 27, 2020

Ref: 8d7780d

@uablrek
Copy link
Contributor

uablrek commented Mar 27, 2020

@SataQiu The commit in the ref above should be in v1.18.0 which is the version used in this issue. So the commit does not fix the problem.

@SataQiu
Copy link
Member

SataQiu commented Mar 27, 2020

@uablrek Maybe something wrong around here https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/docker/libnetwork/ipvs/netlink.go#L463
Before this commit, it looks like this:

ip, err := parseIP(attr.Value, syscall.AF_INET)

Now it looks like this:

ip, err := parseIP(attr.Value, d.AddressFamily)

However d.AddressFamily is not initialized,it is zero :(.

@uablrek
Copy link
Contributor

uablrek commented Mar 27, 2020

I do not get these faults. Has the netlink API changed?

@duylong What kernel version are you using?

@uablrek
Copy link
Contributor

uablrek commented Mar 27, 2020

d.AddressFamily is supposed to be set from the "NetlinkRouteAttr" slice;

d.AddressFamily = native.Uint16(attr.Value)

It should definitely not be hard-coded to ipv4.

@rikatz
Copy link
Contributor

rikatz commented Mar 27, 2020

Made a gist trying to reproduce this in my environment but didn't got this error. I may try this on some older kernel like from CentOS 7

Added exactly the same IPVS configuration here:

ipvsadm -A -t [fd00:10:96::1]:443 -s lc
ipvsadm -a -t [fd00:10:96::1]:443 -r [fddd:acad:15:12a:250:56ff:feb1:605]:6443 -m
ipvsadm -L -n
ipvsadm -A -t [fd00:10:96::a]:53 -s lc
ipvsadm -L -n
ipvsadm -a -t [fd00:10:96::a]:53 -r [fd00:10:245:0:c0bb:b693:aecf:6727]:53
ipvsadm -a -t [fd00:10:96::a]:53 -r [fd00:10:245:0:dde8:fb36:cc62:37c]:53
ipvsadm -A -u [fd00:10:96::a]:53 -s lc
ipvsadm -a -u [fd00:10:96::a]:53 -r [fd00:10:245:0:dde8:fb36:cc62:37c]:53
ipvsadm -a -u [fd00:10:96::a]:53 -r [fd00:10:245:0:c0bb:b693:aecf:6727]:53

And when running this 'gist/program' pointing to the IPVS from the DNS (fd00:10:96::a port 53 UDP) got the expected result:

&{fd00:10:245:0:c0bb:b693:aecf:6727 53 1 0 0}&{fd00:10:245:0:dde8:fb36:cc62:37c 53 1 0 0}

Maybe trying to run this also in some other scenarios could clarify if this is something related to Netlink API changed between Kernels, something with the used library, etc.

@SataQiu
Copy link
Member

SataQiu commented Mar 28, 2020

After some study, I find that this issue most likely is caused by the low linux kernel version.
I have reproduced this problem on Linux CentOS 3.10.0-693.el7.x86_64.

Accroding to the code, netlink will try to get d.AddressFamily attribute, but I find the kernel just does not support it :(

This is the Destination Attributes defined in /usr/include/linux/ip_vs.h (kernel 3.10)

/*
 * Attributes used to describe a destination (real server)
 *
 * Used inside nested attribute IPVS_CMD_ATTR_DEST
 */
enum {
	IPVS_DEST_ATTR_UNSPEC = 0,
	IPVS_DEST_ATTR_ADDR,		/* real server address */
	IPVS_DEST_ATTR_PORT,		/* real server port */

	IPVS_DEST_ATTR_FWD_METHOD,	/* forwarding method */
	IPVS_DEST_ATTR_WEIGHT,		/* destination weight */

	IPVS_DEST_ATTR_U_THRESH,	/* upper threshold */
	IPVS_DEST_ATTR_L_THRESH,	/* lower threshold */

	IPVS_DEST_ATTR_ACTIVE_CONNS,	/* active connections */
	IPVS_DEST_ATTR_INACT_CONNS,	/* inactive connections */
	IPVS_DEST_ATTR_PERSIST_CONNS,	/* persistent connections */

	IPVS_DEST_ATTR_STATS,		/* nested attribute for dest stats */
	__IPVS_DEST_ATTR_MAX,
};

No IPVS_DEST_ATTR_ADDR_FAMILY attribute is defined!

But in new kernel version, the Destination Attributes is defined like this:

/*
 * Attributes used to describe a destination (real server)
 *
 * Used inside nested attribute IPVS_CMD_ATTR_DEST
 */
enum {
	IPVS_DEST_ATTR_UNSPEC = 0,
	IPVS_DEST_ATTR_ADDR,		/* real server address */
	IPVS_DEST_ATTR_PORT,		/* real server port */

	IPVS_DEST_ATTR_FWD_METHOD,	/* forwarding method */
	IPVS_DEST_ATTR_WEIGHT,		/* destination weight */

	IPVS_DEST_ATTR_U_THRESH,	/* upper threshold */
	IPVS_DEST_ATTR_L_THRESH,	/* lower threshold */

	IPVS_DEST_ATTR_ACTIVE_CONNS,	/* active connections */
	IPVS_DEST_ATTR_INACT_CONNS,	/* inactive connections */
	IPVS_DEST_ATTR_PERSIST_CONNS,	/* persistent connections */

	IPVS_DEST_ATTR_STATS,		/* nested attribute for dest stats */

	IPVS_DEST_ATTR_ADDR_FAMILY,	/* Address family of address */

	IPVS_DEST_ATTR_STATS64,		/* nested attribute for dest stats */

	IPVS_DEST_ATTR_TUN_TYPE,	/* tunnel type */

	IPVS_DEST_ATTR_TUN_PORT,	/* tunnel port */

	IPVS_DEST_ATTR_TUN_FLAGS,	/* tunnel flags */

	__IPVS_DEST_ATTR_MAX,
};

Obviously, the kernel has added some attributes (IPVS_DEST_ATTR_ADDR_FAMILY, IPVS_DEST_ATTR_STATS64...).

That is why kube-proxy works well on systems with a higher version of the kernel.

So we can address this issue by upgrading our linux kernel.
Not sure which kernel version is the minimum requirement, maybe we should document something about this.

@SataQiu
Copy link
Member

SataQiu commented Mar 28, 2020

Here is the Destination Attributes defined in docker/libnetwork:
https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/docker/libnetwork/ipvs/constants.go#L76-L89

// Attributes used to describe a destination (real server). Used
// inside nested attribute ipvsCmdAttrDest.
const (
	ipvsDestAttrUnspec int = iota
	ipvsDestAttrAddress
	ipvsDestAttrPort
	ipvsDestAttrForwardingMethod
	ipvsDestAttrWeight
	ipvsDestAttrUpperThreshold
	ipvsDestAttrLowerThreshold
	ipvsDestAttrActiveConnections
	ipvsDestAttrInactiveConnections
	ipvsDestAttrPersistentConnections
	ipvsDestAttrStats
	ipvsDestAttrAddressFamily
)

@uablrek
Copy link
Contributor

uablrek commented Mar 30, 2020

@SataQiu Good finding.

So, we can conclude that this really is kernel version dependent. It is hard to know what to do about it. We must check the family. I would propose that we make it work for ipv4 with some fallback and put a requirement on kernel version for ipv6-only/dual-stack. But the decision is others to make.

/cc @andrewsykim @thockin

@uablrek
Copy link
Contributor

uablrek commented Mar 30, 2020

/cc @m1093782566

@uablrek
Copy link
Contributor

uablrek commented Mar 30, 2020

The symbol in the kernel tree https://github.com/torvalds/linux/blob/7111951b8d4973bda27ff663f2cf18b663d15b48/include/uapi/linux/ip_vs.h#L404

The IPVS_SVC_ATTR_AF for service family https://github.com/torvalds/linux/blob/7111951b8d4973bda27ff663f2cf18b663d15b48/include/uapi/linux/ip_vs.h#L360 seems very old. Since the service must have the same family as the real-servers perhaps it can be used for compatibility with old kernels?

@uablrek
Copy link
Contributor

uablrek commented Mar 30, 2020

The commit that adds the symbol;torvalds/linux@6cff339
Seem to go into linux-3.18

@duylong
Copy link
Author

duylong commented Mar 30, 2020

I did not know the minimum required for the kernel version, I was using the stable and official version of RHEL7.

My kernel version is:

Linux XXX 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST 2019 x86_64 x86_64 x86_64 GNU/Linux

Currently the max version in the repository is:

3.10.0-1062.18.1.el7.x86_64

According to the documentation (https://access.redhat.com/articles/3078), I should switch to RHEL8 if I want a recent kernel :-\

@uablrek
Copy link
Contributor

uablrek commented Mar 30, 2020

There is no minimum kernel version requirement afaik. So I think this must be fixed for ipv4-only clusters IMO. However ipv6-only and dual-stack are in "alpha" so a requirement on kernel version might be ok, but I can't say myself.
A complication is that the problem is not in k8s itself but in a 3rd party lib.

@aojea
Copy link
Member

aojea commented Mar 30, 2020

However ipv6-only and dual-stack are in "alpha"

ipv6-only is beta since 1.18 😄

@andrewsykim
Copy link
Member

/assign

@k8s-ci-robot k8s-ci-robot added the triage/unresolved Indicates an issue that can not or will not be resolved. label Apr 2, 2020
@andrewsykim
Copy link
Member

/remove-triage unresolved

@k8s-ci-robot k8s-ci-robot removed the triage/unresolved Indicates an issue that can not or will not be resolved. label Apr 2, 2020
@andrewsykim
Copy link
Member

Thanks @rikatz just waiting on moby/ipvs#15 now

@EvanPrivate
Copy link

same issue,sloved by update kernel to 5.X

@duylong
Copy link
Author

duylong commented Apr 3, 2020

I didn't want to wait, I updated to kernel-lt in the version 4.4 . The problem is solved on my side :)

@andrewsykim
Copy link
Member

Thanks for confirming. We'll still try to get moby/ipvs#15 in since I'm sure other folks with older kernels will run into this issue.

@andrewsykim
Copy link
Member

/retitle parseIP error with IPVS proxy

@moweiraul
Copy link

upgrade your system kernel to 4.xx edition

@andrewsykim
Copy link
Member

FYI: moby/ipvs#15 just merged, need to run some validation & cut a new release (v1.0.1).

@andrewsykim
Copy link
Member

FYI #90555

@andrewsykim
Copy link
Member

/retitle DualStack / IPv6-only: parseIP error with IPVS proxy on CentOS 7

@k8s-ci-robot k8s-ci-robot changed the title DualStack / IPv6-only: parseIP error with IPVS proxy DualStack / IPv6-only: parseIP error with IPVS proxy on CentOS 7 Apr 28, 2020
@andrewsykim
Copy link
Member

v1.18 cherry-pick #90678

@Duanzhiwei
Copy link

Hi,

Since my upgrade to 1.18 version, I have errors in kube-proxy:

E0326 13:14:23.847364       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847388       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::a]:53/UDP, err: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847479       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847501       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::a]:53/TCP, err: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847595       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847617       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::a]:9153/TCP, err: parseIP Error ip=[253 0 0 16 2 69 0 0 221 232 251 54 204 98 3 124]
E0326 13:14:23.847706       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 0 0 16 2 69 0 0 192 187 182 147 174 207 103 7]
E0326 13:14:23.847728       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::7964]:443/TCP, err: parseIP Error ip=[253 0 0 16 2 69 0 0 192 187 182 147 174 207 103 7]
E0326 13:14:23.847813       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 0 0 16 2 69 0 0 192 187 182 147 174 207 103 41]
E0326 13:14:23.847835       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::dc23]:80/TCP, err: parseIP Error ip=[253 0 0 16 2 69 0 0 192 187 182 147 174 207 103 41]
E0326 13:14:23.848063       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[253 221 172 173 0 21 1 42 2 80 86 255 254 177 6 5]
E0326 13:14:23.848085       1 proxier.go:1192] Failed to sync endpoint for service: [fd00:10:96::1]:443/TCP, err: parseIP Error ip=[253 221 172 173 0 21 1 42 2 80 86 255 254 177 6 5]
...

I have ipv4/ipv6 dualstack enable. No problem with cluster and IPVS works despite errors.

Do you have also this issue ?

  • OS: RHEL7
  • Kubernetes version (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}```

@rikatz
Copy link
Contributor

rikatz commented Jul 27, 2020

Hi, it seems that you're using v1.18.0 and this has been corrected in v1.18.3, can you please update and check?

Tks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipv6 area/ipvs kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet