kube-proxy ProxyLoop() contends on mutex with iptables sync loop #11977

bnprss · 2015-07-29T13:55:57Z

By looking why sometime my sites magicaly slow down for a short period of time I got that :
each 5 seconds dns query are jumping from < 2 ms to that ...

until [[ 0 -eq 1 ]]; do dig mysql-change @10.0.0.10 | grep -E "time: [0-9]{3,}" && date; done
;; Query time: 2187 msec
Wed Jul 29 13:39:31 UTC 2015
;; Query time: 2114 msec
Wed Jul 29 13:39:36 UTC 2015
;; Query time: 1890 msec
Wed Jul 29 13:39:41 UTC 2015
;; Query time: 1874 msec
Wed Jul 29 13:39:46 UTC 2015
;; Query time: 1930 msec
Wed Jul 29 13:39:51 UTC 2015
;; Query time: 1942 msec
Wed Jul 29 13:39:56 UTC 2015
;; Query time: 2350 msec
Wed Jul 29 13:40:01 UTC 2015
;; Query time: 2469 msec
Wed Jul 29 13:40:06 UTC 2015
;; Query time: 1898 msec
Wed Jul 29 13:40:11 UTC 2015
;; Query time: 2083 msec
Wed Jul 29 13:40:16 UTC 2015
;; Query time: 2110 msec
Wed Jul 29 13:40:21 UTC 2015
;; Query time: 1921 msec
Wed Jul 29 13:40:26 UTC 2015
;; Query time: 2023 msec
Wed Jul 29 13:40:31 UTC 2015
;; Query time: 1836 msec
Wed Jul 29 13:40:36 UTC 2015
;; Query time: 2182 msec
Wed Jul 29 13:40:41 UTC 2015
;; Query time: 2079 msec
Wed Jul 29 13:40:46 UTC 2015
;; Query time: 2066 msec
Wed Jul 29 13:40:51 UTC 2015
;; Query time: 1905 msec
Wed Jul 29 13:40:56 UTC 2015
;; Query time: 1999 msec
Wed Jul 29 13:41:01 UTC 2015
;; Query time: 1855 msec
Wed Jul 29 13:41:06 UTC 2015
;; Query time: 2043 msec
Wed Jul 29 13:41:11 UTC 2015

But it's not over, curl on ip gives me that :
until [[ 0 -eq 1 ]]; do curl -I 10.0.126.115:9200 -s -w '%{time_total}' -o /dev/null | grep -E "^[1-9]" && date; done
2.083
Wed Jul 29 13:50:16 UTC 2015
2.247
Wed Jul 29 13:50:21 UTC 2015
1.874
Wed Jul 29 13:50:26 UTC 2015
2.019
Wed Jul 29 13:50:31 UTC 2015
2.006
Wed Jul 29 13:50:36 UTC 2015
1.718
Wed Jul 29 13:50:41 UTC 2015
1.954
Wed Jul 29 13:50:46 UTC 2015
2.065
Wed Jul 29 13:50:51 UTC 2015
1.896
Wed Jul 29 13:50:56 UTC 2015
1.968
Wed Jul 29 13:51:01 UTC 2015
1.556
Wed Jul 29 13:51:05 UTC 2015

There is something wrong in the way kube-proxy is working/updating. It's a real source of troubles on my side.

bnprss · 2015-07-29T15:28:47Z

With further analysis anything run good on docker part, checking directly the virtual ip address of docker pod always return in time!
checked on coreos and default kubernetes debian image.

bnprss · 2015-07-29T16:54:58Z

Open question :
Why using iptables to handle alone the inter-cluster communication instead of routing? each pod has it's own ip, it doesn't suffer from iptables refresh slowness, as well as missing rule. Nodes forwarding work great and on gce, it's ok to do so.

Skydns might handle pod ip as RR dns A record and all that mess will run smoother.

Currently I do have dbs with replication, php with some dependancies to other pods, so you can easily understand how high I'm dependant to a clean and really fast network.

I've got ~150 services wich lead to 2s with complete network stale, with 375 services it would take the whole refresh loop, and with more services what would happens?

I've got the feeling that generating a complete ipfile and injecting with iprestore will do less harm.
It's somehow important that kube team is looking at that. Production ready isn't really true on my case.

thockin · 2015-07-29T18:05:50Z

http://docs.k8s.io/v1.0/user-guide/services.html#why-not-use-round-robin-dns

There is not enough information here to properly address your issue. What Kubernetes version, what OS, what cloud, how did you do setup, what network overlay if any, etc.

Running on my own cluster, for example:

root@af0e40b6c43d:/# dig +search @10.0.0.10 kubernetes.default

; <<>> DiG 9.9.5-3ubuntu0.4-Ubuntu <<>> +search @10.0.0.10 kubernetes.default
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44773
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;kubernetes.default.svc.cluster.local. IN A

;; ANSWER SECTION:
kubernetes.default.svc.cluster.local. 30 IN A   10.0.0.1

;; Query time: 4 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Wed Jul 29 18:01:48 UTC 2015
;; MSG SIZE  rcvd: 70

root@af0e40b6c43d:/# for i in $(seq 1 10); do dig +search @10.0.0.10 kubernetes.default | grep "Query time:"; done
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec

and

root@af0e40b6c43d:/# curl -Ik https://kubernetes.default -s -w '%{time_total}' -o /dev/null; echo
0.016
root@af0e40b6c43d:/# for i in $(seq 1 10); do curl -Ik https://kubernetes.default -s -w '%{time_total}' -o /dev/null; echo; done
0.018
0.016
0.017
0.016
0.016
0.027
0.020
0.017
0.017
0.017

bnprss · 2015-07-29T18:49:24Z

@thockin you're not running enough samples, to be sure to have a least one pass on kube-proxy refresh. You'll have to run more than 5000 loops. That's why I filtered on time > 1s.

Good points on DNS RR. But it's the easiest way to handle multiple endpoints while real balancing is WIP.

So I'm on a full 1.0.1 cluster with 1 master, 12 nodes : 11 coreos, 1 debian. The gce nodes are all on same power : n1-highmem-2.

The installation is from a kube-up on release 1.0.1 package.
2 Modifications has been made on kubelet pkg to fit my need : a relocation of safe_format_and_mount shell script and a safe parameter on iptables (-w switch) and that's all for coreos.
The debian node is full stock, "only" there to perform debug tasks and analysis as I send here.
The debian master is full stock too except the installation of ... htop .
The tests provided is from the debian node, as I know it is the main OS.

I've got:
104 pods (2 kubedns pods)
102 rc
101 svc
19 namespaces
17 external lb.

No overlays , nothing special, I stood as close as possible to upstream, even on the coreos installation.

We are not live now, so I still have time to debug anything you want and test whatever would make differences.

thockin · 2015-07-30T00:57:26Z

We know kube-proxy has a long-tail problem, but it's not usually
multi-second. See the "histogram" below. Your point about doing
iptables-restore in one shot is taken.

root@af0e40b6c43d:/# for i in $(seq 1 100000); do curl -Ik
https://kubernetes.default -s -w '%{time_total}' -o /dev/null; echo; done |
sort | uniq -c
171 0.014
14170 0.015
39305 0.016
19709 0.017
9367 0.018
4064 0.019
2361 0.020
1163 0.021
691 0.022
564 0.023
1097 0.024
981 0.025
2267 0.026
725 0.027
351 0.028
175 0.029
98 0.030
75 0.031
55 0.032
47 0.033
95 0.034
204 0.035
275 0.036
98 0.037
33 0.038
55 0.039
229 0.040
183 0.041
100 0.042
37 0.043
30 0.044
35 0.045
32 0.046
16 0.047
14 0.048
6 0.049
14 0.050
16 0.051
9 0.052
18 0.053
16 0.054
14 0.055
7 0.056
14 0.057
10 0.058
6 0.059
12 0.060
10 0.061
7 0.062
9 0.063
11 0.064
8 0.065
8 0.066
6 0.067
17 0.068
6 0.069
5 0.070
5 0.071
26 0.072
29 0.073
16 0.074
7 0.075
8 0.076
7 0.077
4 0.078
2 0.079
3 0.080
1 0.081
2 0.082
1 0.085
1 0.086
3 0.091
4 0.092
1 0.094
2 0.095
4 0.096
1 0.099
1 0.102
1 0.104
1 0.107
2 0.108
1 0.111
1 0.115
2 0.119
2 0.129
1 0.138
1 0.141
1 0.142
2 0.143
4 0.144
3 0.145
6 0.146
3 0.147
8 0.148
12 0.149
25 0.150
24 0.151
26 0.152
28 0.153
37 0.154
33 0.155
36 0.156
36 0.157
34 0.158
30 0.159
40 0.160
25 0.161
32 0.162
29 0.163
35 0.164
22 0.165
17 0.166
17 0.167
16 0.168
14 0.169
17 0.170
11 0.171
11 0.172
9 0.173
8 0.174
8 0.175
6 0.176
8 0.177
5 0.178
6 0.179
3 0.180
7 0.181
5 0.182
6 0.183
4 0.184
3 0.185
3 0.186
3 0.187
4 0.188
3 0.189
1 0.190
1 0.191
2 0.192
1 0.193
1 0.194
1 0.196
2 0.197
2 0.198
1 0.201
1 0.203
1 0.204
1 0.205
1 0.211
2 0.216
1 0.222
1 0.224
1 0.225
1 0.361
1 0.373
2 0.374
1 0.378
1 0.379
1 0.381
1 0.383
1 0.388
1 0.389
2 0.391
1 0.960

On Wed, Jul 29, 2015 at 11:50 AM, bnprss notifications@github.com wrote:

@thockin https://github.com/thockin you're not running enough samples,
to be sure to have a least one pass on kube-proxy refresh. You'll have to
run more than 5000 loops. That's why I filtered on time > 1s.

Good points on DNS RR. But it's the easiest way to handle multiple
endpoints while real balancing is WIP.

So I'm on a full 1.0.1 cluster with 1 master, 12 nodes : 11 coreos, 1
debian. The gce nodes are all on same power : n1-highmem-2.

The installation is from a kube-up on release 1.0.1 package.
2 Modifications has been made on kubelet pkg to fit my need : a relocation
of safe_format_and_mount shell script and a safe parameter on iptables (-w
switch) and that's all for coreos.
The debian node is full stock, "only" there to perform debug tasks and
analysis as I send here.
The debian master is full stock too except the installation of ... htop .
The tests provided is from the debian node, as I know it is the main OS.

I've got:
104 pods (2 kubedns pods)
102 rc
151 svc
19 namespaces
17 external lb.

No overlays , nothing special, I stood as close as possible to upstream,
even on the coreos installation.

We are not live now, so I still have time to debug anything you want and
test whatever would make differences.

—
Reply to this email directly or view it on GitHub
#11977 (comment)
.

ghost · 2015-07-30T05:56:12Z

I was thinking more on iptables-restore, and I don't really see how to use
it safely. There's no compare-and-swap optimistic concurrency mode and
there's no way I can see to load just a single chain - only whole tables.
This makes it very awkward to use for our "check if it exists and if not
add it" mode, especially if anyone else is using the NAT table. Consider:

kube-proxy reads state of -t nat
kube-proxy iterates over state, checking for each rule it needs
user modifies -t nat
kube-proxy needs to make a change, replaces the -t nat section for just its
own chain
kube-proxy restores the state
the user change is wiped out

On Wed, Jul 29, 2015 at 5:58 PM, Tim Hockin notifications@github.com
wrote:

We know kube-proxy has a long-tail problem, but it's not usually
multi-second. See the "histogram" below. Your point about doing
iptables-restore in one shot is taken.

root@af0e40b6c43d:/# for i in $(seq 1 100000); do curl -Ik
https://kubernetes.default -s -w '%{time_total}' -o /dev/null; echo; done
|
sort | uniq -c
171 0.014
14170 0.015
39305 0.016
19709 0.017
9367 0.018
4064 0.019
2361 0.020
1163 0.021
691 0.022
564 0.023
1097 0.024
981 0.025
2267 0.026
725 0.027
351 0.028
175 0.029
98 0.030
75 0.031
55 0.032
47 0.033
95 0.034
204 0.035
275 0.036
98 0.037
33 0.038
55 0.039
229 0.040
183 0.041
100 0.042
37 0.043
30 0.044
35 0.045
32 0.046
16 0.047
14 0.048
6 0.049
14 0.050
16 0.051
9 0.052
18 0.053
16 0.054
14 0.055
7 0.056
14 0.057
10 0.058
6 0.059
12 0.060
10 0.061
7 0.062
9 0.063
11 0.064
8 0.065
8 0.066
6 0.067
17 0.068
6 0.069
5 0.070
5 0.071
26 0.072
29 0.073
16 0.074
7 0.075
8 0.076
7 0.077
4 0.078
2 0.079
3 0.080
1 0.081
2 0.082
1 0.085
1 0.086
3 0.091
4 0.092
1 0.094
2 0.095
4 0.096
1 0.099
1 0.102
1 0.104
1 0.107
2 0.108
1 0.111
1 0.115
2 0.119
2 0.129
1 0.138
1 0.141
1 0.142
2 0.143
4 0.144
3 0.145
6 0.146
3 0.147
8 0.148
12 0.149
25 0.150
24 0.151
26 0.152
28 0.153
37 0.154
33 0.155
36 0.156
36 0.157
34 0.158
30 0.159
40 0.160
25 0.161
32 0.162
29 0.163
35 0.164
22 0.165
17 0.166
17 0.167
16 0.168
14 0.169
17 0.170
11 0.171
11 0.172
9 0.173
8 0.174
8 0.175
6 0.176
8 0.177
5 0.178
6 0.179
3 0.180
7 0.181
5 0.182
6 0.183
4 0.184
3 0.185
3 0.186
3 0.187
4 0.188
3 0.189
1 0.190
1 0.191
2 0.192
1 0.193
1 0.194
1 0.196
2 0.197
2 0.198
1 0.201
1 0.203
1 0.204
1 0.205
1 0.211
2 0.216
1 0.222
1 0.224
1 0.225
1 0.361
1 0.373
2 0.374
1 0.378
1 0.379
1 0.381
1 0.383
1 0.388
1 0.389
2 0.391
1 0.960

On Wed, Jul 29, 2015 at 11:50 AM, bnprss notifications@github.com wrote:

@thockin https://github.com/thockin you're not running enough samples,

to be sure to have a least one pass on kube-proxy refresh. You'll have to
run more than 5000 loops. That's why I filtered on time > 1s.

Good points on DNS RR. But it's the easiest way to handle multiple
endpoints while real balancing is WIP.

So I'm on a full 1.0.1 cluster with 1 master, 12 nodes : 11 coreos, 1
debian. The gce nodes are all on same power : n1-highmem-2.

The installation is from a kube-up on release 1.0.1 package.
2 Modifications has been made on kubelet pkg to fit my need : a
relocation
of safe_format_and_mount shell script and a safe parameter on iptables
(-w
switch) and that's all for coreos.
The debian node is full stock, "only" there to perform debug tasks and
analysis as I send here.
The debian master is full stock too except the installation of ... htop .
The tests provided is from the debian node, as I know it is the main OS.

I've got:
104 pods (2 kubedns pods)
102 rc
151 svc
19 namespaces
17 external lb.

No overlays , nothing special, I stood as close as possible to upstream,
even on the coreos installation.

We are not live now, so I still have time to debug anything you want and
test whatever would make differences.

—
Reply to this email directly or view it on GitHub
<
#11977 (comment)

.

—
Reply to this email directly or view it on GitHub
#11977 (comment)
.

bnprss · 2015-07-30T07:44:20Z

@thockin I made some test, and I'm now sure: no traffic is handled at all while kube-proxy iptables loop, checked by logging on level 5 .

I see three potential solutions :

keep the proxysocket.go run whatever it cost

Advantages:
little changes
Drawback:
It will be a major problem with a lot of services, the check will take much more time than the syncloop allow.
might be a huge CPU consummer

change the way kube-proxy check and insert rules
step one : iptables-save
step two : check rule existance
step three : iptables inclusion of missing rule (like it does right now)

Advantages:
limit the rate of external program call, do the thing safely and keep custom rules by not using iptables-restore. I guess there will be some perf improvement by just parsing/checking one file.
Drawback:
none?

use external tool
I've got in mind haproxy
Advantages:
All logic in the roadmap (and more) is allready implemented and well tested. Performance is really good.
Drawback:
dynamic inclustion/deletion of things must trig a reload of a map file...

Network transaction shouldn't be stop anytime in the kube-proxy process.

bnprss · 2015-07-30T10:15:41Z

I'll post the following PR that cover the 1st case :
#12017

time to check : for i in $(seq 1 100000); do curl -Ik 10.0.249.163 -s -w '%{time_total}' -o /dev/null; echo; done | sort | uniq -c

With the PR and ~420 netfilter rules to check I get the following histogram (100 000 samples is a long run :) ):
6370 0.008
33409 0.009
18312 0.010
6803 0.011
2961 0.012
1593 0.013
1418 0.014
2506 0.015
3225 0.016
3621 0.017
3558 0.018
2965 0.019
2627 0.020
2544 0.021
2167 0.022
1499 0.023
1051 0.024
779 0.025
542 0.026
395 0.027
253 0.028
205 0.029
144 0.030
113 0.031
77 0.032
76 0.033
64 0.034
59 0.035
45 0.036
44 0.037
43 0.038
34 0.039
20 0.040
28 0.041
16 0.042
18 0.043
16 0.044
16 0.045
13 0.046
10 0.047
12 0.048
8 0.049
14 0.050
9 0.051
9 0.052
4 0.053
5 0.054
7 0.055
8 0.056
5 0.057
10 0.058
7 0.059
5 0.060
9 0.061
5 0.062
4 0.063
4 0.064
3 0.065
3 0.066
6 0.067
5 0.068
5 0.069
3 0.070
8 0.071
6 0.072
3 0.073
4 0.074
7 0.075
1 0.076
4 0.077
5 0.078
5 0.079
1 0.080
4 0.081
4 0.082
4 0.083
4 0.084
5 0.085
3 0.086
3 0.088
3 0.089
4 0.090
3 0.092
2 0.093
4 0.094
3 0.095
6 0.096
4 0.097
1 0.098
4 0.100
1 0.101
1 0.102
2 0.103
1 0.104
2 0.105
1 0.106
3 0.107
2 0.108
2 0.109
1 0.110
1 0.111
1 0.113
4 0.114
3 0.115
1 0.118
2 0.119
1 0.122
1 0.124
1 0.125
1 0.126
1 0.127
2 0.128
3 0.131
2 0.132
1 0.134
1 0.135
3 0.136
1 0.138
1 0.142
2 0.144
1 0.145
1 0.146
2 0.150
1 0.151
1 0.156
1 0.162
1 0.170
1 0.172
2 0.173
1 0.177
1 0.179
1 0.180
1 0.181
1 0.184
1 0.186
1 0.198
1 0.201
1 0.202
1 0.206
1 0.211
1 0.215
3 0.219
1 0.234
2 0.237
1 0.238
1 0.240
1 0.243
1 0.249
2 0.256
1 0.260
1 0.263
1 0.266
1 0.268
1 0.271
1 0.275
1 0.290
1 0.301
1 0.303
1 0.313
1 0.322
1 0.327
1 0.336
1 0.369
1 0.374
1 0.381
1 0.463
1 0.492
1 0.493
1 0.504
1 0.567
1 0.681

Without the PR and 5000 samples :
959 0.008
2333 0.009
816 0.010
267 0.011
143 0.012
102 0.013
75 0.014
84 0.015
37 0.016
31 0.017
33 0.018
18 0.019
10 0.020
15 0.021
8 0.022
11 0.023
6 0.024
6 0.025
3 0.026
3 0.027
2 0.028
1 0.030
2 0.031
1 0.032
1 0.033
2 0.034
2 0.035
2 0.036
1 0.037
2 0.038
1 0.042
1 0.043
1 0.045
1 0.046
1 0.057
1 0.075
1 1.386
1 1.392
1 1.394
1 1.420
1 1.439
1 1.454
1 1.466
1 1.467
1 1.484
1 1.503
1 1.513
1 1.514
1 1.519
1 1.520
1 1.532
1 1.539
1 1.548
1 1.691

Low number of curl > 1s due to checks running in seq. By the way, the time to complete the run will be a good indicator.

thockin · 2015-07-31T03:53:12Z

So the real issue is that ProxyLoop() calls proxier.getServiceInfo(), which
takes the mutex. This is an excellent point, thanks for raising it. I
think that the algorithm we're discussing in the other thread will mitigate
the problem mostly, but it's still not a great design overall.

A better design would be store an atomic value in the ServiceInfo struct
and simply check that value in each ProxyLoop().

I'm going to re-title this bug to reflect the problem

On Thu, Jul 30, 2015 at 3:16 AM, bnprss notifications@github.com wrote:

I'll post the following PR that cover the 1st case :
#12017 #12017

time to check : for i in $(seq 1 100000); do curl -Ik 10.0.249.163 -s -w
'%{time_total}' -o /dev/null; echo; done | sort | uniq -c

With the PR and ~420 netfilter rules to check I get the following
histogram (100 000 samples is a long run :) ):
6370 0.008
33409 0.009
18312 0.010
6803 0.011
2961 0.012
1593 0.013
1418 0.014
2506 0.015
3225 0.016
3621 0.017
3558 0.018
2965 0.019
2627 0.020
2544 0.021
2167 0.022
1499 0.023
1051 0.024
779 0.025
542 0.026
395 0.027
253 0.028
205 0.029
144 0.030
113 0.031
77 0.032
76 0.033
64 0.034
59 0.035
45 0.036
44 0.037
43 0.038
34 0.039
20 0.040
28 0.041
16 0.042
18 0.043
16 0.044
16 0.045
13 0.046
10 0.047
12 0.048
8 0.049
14 0.050
9 0.051
9 0.052
4 0.053
5 0.054
7 0.055
8 0.056
5 0.057
10 0.058
7 0.059
5 0.060
9 0.061
5 0.062
4 0.063
4 0.064
3 0.065
3 0.066
6 0.067
5 0.068
5 0.069
3 0.070
8 0.071
6 0.072
3 0.073
4 0.074
7 0.075
1 0.076
4 0.077
5 0.078
5 0.079
1 0.080
4 0.081
4 0.082
4 0.083
4 0.084
5 0.085
3 0.086
3 0.088
3 0.089
4 0.090
3 0.092
2 0.093
4 0.094
3 0.095
6 0.096
4 0.097
1 0.098
4 0.100
1 0.101
1 0.102
2 0.103
1 0.104
2 0.105
1 0.106
3 0.107
2 0.108
2 0.109
1 0.110
1 0.111
1 0.113
4 0.114
3 0.115
1 0.118
2 0.119
1 0.122
1 0.124
1 0.125
1 0.126
1 0.127
2 0.128
3 0.131
2 0.132
1 0.134
1 0.135
3 0.136
1 0.138
1 0.142
2 0.144
1 0.145
1 0.146
2 0.150
1 0.151
1 0.156
1 0.162
1 0.170
1 0.172
2 0.173
1 0.177
1 0.179
1 0.180
1 0.181
1 0.184
1 0.186
1 0.198
1 0.201
1 0.202
1 0.206
1 0.211
1 0.215
3 0.219
1 0.234
2 0.237
1 0.238
1 0.240
1 0.243
1 0.249
2 0.256
1 0.260
1 0.263
1 0.266
1 0.268
1 0.271
1 0.275
1 0.290
1 0.301
1 0.303
1 0.313
1 0.322
1 0.327
1 0.336
1 0.369
1 0.374
1 0.381
1 0.463
1 0.492
1 0.493
1 0.504
1 0.567
1 0.681

Without the PR and 5000 samples :
959 0.008
2333 0.009
816 0.010
267 0.011
143 0.012
102 0.013
75 0.014
84 0.015
37 0.016
31 0.017
33 0.018
18 0.019
10 0.020
15 0.021
8 0.022
11 0.023
6 0.024
6 0.025
3 0.026
3 0.027
2 0.028
1 0.030
2 0.031
1 0.032
1 0.033
2 0.034
2 0.035
2 0.036
1 0.037
2 0.038
1 0.042
1 0.043
1 0.045
1 0.046
1 0.057
1 0.075
1 1.386
1 1.392
1 1.394
1 1.420
1 1.439
1 1.454
1 1.466
1 1.467
1 1.484
1 1.503
1 1.513
1 1.514
1 1.519
1 1.520
1 1.532
1 1.539
1 1.548
1 1.691

Because the check is in seq it doesn't reflect the blocking reality.

—
Reply to this email directly or view it on GitHub
#11977 (comment)
.

bnprss · 2015-07-31T05:54:33Z

It is really a simple ugly workaround focused on the big loop that takes ages to complete, I agree with that. And I assume it's mitigation role.

My knowledge of go wont't help for your proposal, as you might have understood, it was the first time I code in that language and as a SysAdmin my skills are also limited in development tasks.
So I can't handle such logic changes in an efficient way.

Could you take care of it?

thockin · 2015-07-31T06:03:36Z

I'll see if I can throw something together. Do you want me to take over the lock mitigation too, or do you want to do that? It should be a relatively simple change given that you already found the place to do that.

bnprss · 2015-07-31T06:35:43Z

No, it will be ok for the lock mitigation I figured out why red light cames from shippable and travis. And will comment on the PR.

Thank you for taking care of the mid term solution.

This should make throughput better on the userspace proxier. Fixes kubernetes#11977

bnprss changed the title ~~Major slow down each 5s on kube-proxy refresh on 1.0.1~~ Major slow down each 5s on kube-proxy refresh on 1.0.1 cluster is unreachable up to 40% of time (2s/5s) Jul 29, 2015

thockin changed the title ~~Major slow down each 5s on kube-proxy refresh on 1.0.1 cluster is unreachable up to 40% of time (2s/5s)~~ kube-proxy ProxyLoop() contends on mutex with iptables sync loop Jul 31, 2015

alex-mohr assigned thockin Aug 3, 2015

alex-mohr added the area/kube-proxy label Aug 3, 2015

mbforbes added priority/backlog Higher priority than priority/awaiting-more-evidence. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. team/master labels Aug 16, 2015

davidopp added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed team/master labels Aug 22, 2015

thockin mentioned this issue Aug 29, 2015

Don't take the proxy mutex in the traffic path #13345

Merged

thockin added a commit to thockin/kubernetes that referenced this issue Sep 1, 2015

Don't take the proxy mutex in the traffic path

f0a9bad

This should make throughput better on the userspace proxier. Fixes kubernetes#11977

lavalamp closed this as completed in #13345 Sep 1, 2015

thockin added a commit to thockin/kubernetes that referenced this issue Sep 2, 2015

Don't take the proxy mutex in the traffic path

16e3e3c

This should make throughput better on the userspace proxier. Fixes kubernetes#11977

thockin mentioned this issue Sep 2, 2015

Automated cherry pick of #13345 upstream release 1.0 #13498

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-proxy ProxyLoop() contends on mutex with iptables sync loop #11977

kube-proxy ProxyLoop() contends on mutex with iptables sync loop #11977

bnprss commented Jul 29, 2015

bnprss commented Jul 29, 2015

bnprss commented Jul 29, 2015

thockin commented Jul 29, 2015

bnprss commented Jul 29, 2015

thockin commented Jul 30, 2015

ghost commented Jul 30, 2015

bnprss commented Jul 30, 2015

bnprss commented Jul 30, 2015

thockin commented Jul 31, 2015

bnprss commented Jul 31, 2015

thockin commented Jul 31, 2015

bnprss commented Jul 31, 2015

kube-proxy ProxyLoop() contends on mutex with iptables sync loop #11977

kube-proxy ProxyLoop() contends on mutex with iptables sync loop #11977

Comments

bnprss commented Jul 29, 2015

bnprss commented Jul 29, 2015

bnprss commented Jul 29, 2015

thockin commented Jul 29, 2015

bnprss commented Jul 29, 2015

thockin commented Jul 30, 2015

ghost commented Jul 30, 2015

bnprss commented Jul 30, 2015

bnprss commented Jul 30, 2015

thockin commented Jul 31, 2015

bnprss commented Jul 31, 2015

thockin commented Jul 31, 2015

bnprss commented Jul 31, 2015