Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Add [experimental] option for using IPVS proxy mode #1074

Merged
merged 3 commits into from
Dec 26, 2017
Merged

Add [experimental] option for using IPVS proxy mode #1074

merged 3 commits into from
Dec 26, 2017

Conversation

ivanilves
Copy link
Contributor

@ivanilves ivanilves commented Dec 14, 2017

Add [experimental] option to use IPVS kube-proxy mode instead of [default] iptables one.

Hope to address performance issues of iptables mode for clusters with big number of nodes and services!

(>20 nodes, >5k services is a big cluster to me)

We also hope IPVS will handle UDP traffic better (This needs to be validated, a hope-based guess only)

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 14, 2017
@codecov-io
Copy link

codecov-io commented Dec 14, 2017

Codecov Report

Merging #1074 into master will increase coverage by 0.15%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1074      +/-   ##
==========================================
+ Coverage   35.13%   35.28%   +0.15%     
==========================================
  Files          59       59              
  Lines        3393     3401       +8     
==========================================
+ Hits         1192     1200       +8     
  Misses       2039     2039              
  Partials      162      162
Impacted Files Coverage Δ
core/controlplane/config/config.go 61.18% <100%> (+0.38%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae23d78...0c88cc0. Read the comment docs.

@ivanilves
Copy link
Contributor Author

Could we review this? /cc @mumoshu @redbaron 🙏

Sorry for being intrusive! 😭

@camilb
Copy link
Contributor

camilb commented Dec 18, 2017

Hi @ivanilves did you create a cluster using these changes? From what I'm seeing, in 1.8.4, kube-dns and flannel fail to start. Didn't have time to investigate further.

Copy link
Contributor

@mumoshu mumoshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution - I'm really looking forward to try it myself!

Mostly questions/nits. Would you mind addressing these?

- mountPath: /etc/kubernetes/kubeconfig
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/kube-proxy
name: kube-proxy-config
readOnly: true
volumes:
- name: lib-modules
hostPath:
path: /lib/modules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enclose this volume and the corresponding volumemount inside {{if .Experimental.IPVSProxy.Enabled}} if it is necessary when and only when IPVSProxy is enabled?

ipvsProxy:
enabled: true
scheduler: lc
syncPeriod: 900s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reference for properly configuring this setting?
At first glance, a sync period of 15min seems like a bit long - does it mean that a newly added pod/svc IP becomes accessible from the entire cluster after at most 15min?

@@ -1261,6 +1261,14 @@ experimental:
kube2IamSupport:
enabled: false

# Use IPVS kube-proxy mode instead of [default] iptables one (requires Kubernetes 1.8.3+)
# This is intended to address performance issues of iptables mode for clusters with big number of nodes and services
ipvsProxy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this to kubeProxy.ipvsMode?
Sorry for taking your time but we're migrating away from the experimental settings because not only experimental but everything could change while we're in pre v1.0 😉

@ivanilves
Copy link
Contributor Author

ivanilves commented Dec 18, 2017

@camilb yes we did for both 1.8.4 and 1.8.5. 😟 Could you share some logs? 🙏

@mumoshu thanks for your comments! 👏 I will try to address them all ASAP.

@camilb
Copy link
Contributor

camilb commented Dec 18, 2017

Hi @ianilves , I managed to fix kube-dns by specifying these parameters to kube-proxy's DaemonSet:

env:
- name: KUBEPROXY_MODE
  value: ipvs
command:
 - /hyperkube
 - proxy
 - --config=/etc/kubernetes/kube-proxy/kube-proxy-config.yaml
 - --feature-gates=SupportIPVSProxyMode=true

They are mentioned here:

https://github.com/kubernetes/kubernetes/tree/master/pkg/proxy/ipvs#why-kube-proxy-cant-start-ipvs-mode

However, it started to crash again after rebooting the nodes
kube-dns logs

I1218 12:01:17.148395       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
E1218 12:01:17.148904       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
E1218 12:01:17.149031       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.3.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
I1218 12:01:17.648421       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...

Also:

  • trying to run a pod in the cluster, wasn't able to resolve any external dns when kube-dns was up
  • flannel started well in the end
  • if you can, please also run https://scanner.heptio.com in your cluster and check the results

@ivanilves
Copy link
Contributor Author

OK, at least kube-proxy starts well with settings from PR (Using 1.8.5_coreos.0 with calico)

I1218 12:08:36.732790       1 feature_gate.go:156] feature gates: map[SupportIPVSProxyMode:true]
I1218 12:08:36.744495       1 server_others.go:164] Using ipvs Proxier.
I1218 12:08:36.782612       1 server_others.go:187] Tearing down inactive rules.
I1218 12:08:37.289500       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I1218 12:08:37.289633       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1218 12:08:37.289694       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I1218 12:08:37.289781       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I1218 12:08:37.289952       1 config.go:102] Starting endpoints config controller
I1218 12:08:37.290010       1 controller_utils.go:1041] Waiting for caches to sync for endpoints config controller
I1218 12:08:37.290062       1 config.go:202] Starting service config controller
I1218 12:08:37.290103       1 controller_utils.go:1041] Waiting for caches to sync for service config controller
I1218 12:08:37.390208       1 controller_utils.go:1048] Caches are synced for service config controller
I1218 12:08:37.390208       1 controller_utils.go:1048] Caches are synced for endpoints config controller

@ivanilves
Copy link
Contributor Author

ivanilves commented Dec 19, 2017

FYI @camilb I've run a decent amout of cluster creations and upgrades.

And I may state this:

  • both your and mine settings for kube-proxy IPVS mode are correct.
  • ... but kube-proxy in IPVS mode starts in a flaky maner - sometimes it outputs this:
E1219 12:27:06.901138       1 proxier.go:1362] Failed to unbind service from dummy interface, error: error unbind address: 172.16.0.10 from dummy interface: kube-ipvs0, err: exit status 2: RTNETLINK answers: Cannot assign requested address

and does not load DNS service (which has IP 172.16.0.10 in my cluster and is UDP). Still I can connect to other cluster services (which are TCP) by curl/telnet and that makes me a little bit puzzled... 🤔

I think it's something with a way how kube-proxy in IPVS mode is getting initialized.
This explains both why it gets "fixed" after restart (e.g. after DaemonSet change) and why it could get broken again after node reboot.

Meanwhile I'm working towards:

  • fix/workaround of kube-proxy IPVS issue (to add more sense to this PR)
  • integrating Heptio Sonobuoy in my pipeline (actually this is already done by one brilliant man from my team)

Thank you again for your input!

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 21, 2017
@ivanilves
Copy link
Contributor Author

ivanilves commented Dec 21, 2017

Hey @camilb I was running the IPVS thing for the last few days, and what I've found:

How could I know it is decent ?

I've made my own version of hyperkube ivanilves/hyperkube:v1.9.0_coreos.0 https://hub.docker.com/r/ivanilves/hyperkube/ and running w/o any issue creating/updating/rebooting clusters for the time being 😅

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 21, 2017
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Dec 21, 2017
@ivanilves
Copy link
Contributor Author

@mumoshu could you take a look on this PR now? (I've changed it)

Due to some concerns of IPVS integration state I would like to not merge it now, but maintain it until kube-aws will have K8s 1.9.1+ as default. Meanwhile I would maintain/rebase it. WDYT?

@ivanilves
Copy link
Contributor Author

ivanilves commented Dec 22, 2017

BTW Heptio tests ran well: https://scanner.heptio.com/d30af2b8fc3d94de31f341036da6165e/diagnostics/ ☺️

@mumoshu
Copy link
Contributor

mumoshu commented Dec 26, 2017

@ivanilves Hi, thanks a lot for your efforts!

After seeing your great explanations in cluster.yaml about various gotchas to get it running, I'm willing to merge this now to reduce hassle of resolving conflicts again and again.

Would it be ok for you, too? 😃

@ivanilves
Copy link
Contributor Author

@mumoshu YES!!! Please!!! 🙏

@mumoshu mumoshu merged commit d15e5cf into kubernetes-retired:master Dec 26, 2017
@mumoshu
Copy link
Contributor

mumoshu commented Dec 26, 2017

@ivanilves Thanks again for your contribution 👍

@mumoshu mumoshu added this to the v0.9.10.rc-1 milestone Dec 26, 2017
@camilb
Copy link
Contributor

camilb commented Jan 5, 2018

@ivanilves @mumoshu Tested the Google's hyperkube 1.9.1 image with IPVS in #1104 .
Everything looks fine: https://scanner.heptio.com/59e89093d850dc7b5c8719cb87757646/diagnostics/

@mumoshu
Copy link
Contributor

mumoshu commented Jan 5, 2018

Great!

@ivanilves
Copy link
Contributor Author

👏 👏 👏

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this pull request Mar 27, 2018
@cknowles
Copy link
Contributor

cknowles commented Apr 5, 2018

These docs mention that ipvs falls back to iptables when kube-proxy is started with the --cluster-cidr option set. Does anyone know if that's specifically just the flag and not the config file?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants