Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory allocations in kube proxy #46033

Conversation

wojtek-t
Copy link
Member

Memory allocation (and Go GarbageCollection) seems to be one of the most expensive operations in kube-proxy (I've seen profiles where it was more than 50%).

The commits are mostly independent from each other and all of them are mostly about reusing already allocated memory.

This PR is reducing memory allocation by ~5x (results below from 100-node load test):

before:

(pprof) top
38.64GB of 39.11GB total (98.79%)
Dropped 249 nodes (cum <= 0.20GB)
Showing top 10 nodes out of 61 (cum >= 0.20GB)
      flat  flat%   sum%        cum   cum%
   15.10GB 38.62% 38.62%    15.10GB 38.62%  bytes.makeSlice
    9.48GB 24.25% 62.87%     9.48GB 24.25%  runtime.rawstringtmp
    8.30GB 21.21% 84.07%    32.47GB 83.02%  k8s.io/kubernetes/pkg/proxy/iptables.(*Proxier).syncProxyRules
    2.08GB  5.31% 89.38%     2.08GB  5.31%  fmt.(*fmt).padString
    1.90GB  4.86% 94.24%     3.82GB  9.77%  strings.Join
    0.67GB  1.72% 95.96%     0.67GB  1.72%  runtime.hashGrow
    0.36GB  0.92% 96.88%     0.36GB  0.92%  runtime.stringtoslicebyte
    0.31GB  0.79% 97.67%     0.62GB  1.58%  encoding/base32.(*Encoding).EncodeToString
    0.24GB  0.62% 98.29%     0.24GB  0.62%  strings.genSplit
    0.20GB   0.5% 98.79%     0.20GB   0.5%  runtime.convT2E

after:

7.94GB of 8.13GB total (97.75%)
Dropped 311 nodes (cum <= 0.04GB)
Showing top 10 nodes out of 65 (cum >= 0.11GB)
      flat  flat%   sum%        cum   cum%
    3.32GB 40.87% 40.87%     8.05GB 99.05%  k8s.io/kubernetes/pkg/proxy/iptables.(*Proxier).syncProxyRules
    2.85GB 35.09% 75.95%     2.85GB 35.09%  runtime.rawstringtmp
    0.60GB  7.41% 83.37%     0.60GB  7.41%  runtime.hashGrow
    0.31GB  3.76% 87.13%     0.31GB  3.76%  runtime.stringtoslicebyte
    0.28GB  3.43% 90.56%     0.55GB  6.80%  encoding/base32.(*Encoding).EncodeToString
    0.19GB  2.29% 92.85%     0.19GB  2.29%  strings.genSplit
    0.18GB  2.17% 95.03%     0.18GB  2.17%  runtime.convT2E
    0.10GB  1.28% 96.31%     0.71GB  8.71%  runtime.mapassign
    0.10GB  1.21% 97.51%     0.10GB  1.21%  syscall.ByteSliceFromString
    0.02GB  0.23% 97.75%     0.11GB  1.38%  syscall.SlicePtrFromStrings

@wojtek-t wojtek-t added area/kube-proxy release-note-none Denotes a PR that doesn't merit a release note. labels May 18, 2017
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 18, 2017
@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 18, 2017
@wojtek-t wojtek-t force-pushed the reduce_memory_allocations_in_kube_proxy branch from 3dd0222 to 524c13a Compare May 18, 2017 15:59
@wojtek-t
Copy link
Member Author

We can probably optimize much more, but I would like to wait with that for all other changes to be merged first to see where we will be (in particular periodic runner)

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Sort of sad how easy it is to write slice-abusive code.

Just a few nits. Will approve, you can fixup and self-lgtm

@@ -417,6 +425,11 @@ func NewProxier(ipt utiliptables.Interface,
recorder: recorder,
healthChecker: healthChecker,
healthzServer: healthzServer,
iptablesLines: bytes.NewBuffer(nil),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This isn't really parsed into lines yet - call it iptablesData or iptablesRaw ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

filterRules := bytes.NewBuffer(nil)
natChains := bytes.NewBuffer(nil)
natRules := bytes.NewBuffer(nil)
// Reset all buffers used later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that this is to avoid re-allocations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -228,36 +228,27 @@ func saveChain(chain *fakeChain, data *bytes.Buffer) {
}

func (f *fakeIPTables) Save(tableName utiliptables.Table) ([]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need plain Save() any more?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still using it in few places in kubelet. Though I'm happy to send a PR migrating it to SaveInto (should be pretty simple). But I would prefer to do it in a separate one. Will send one on Monday.

args := []string{"-t", string(table)}
glog.V(4).Infof("running iptables-save %v", args)
cmd := runner.exec.Command(cmdIPTablesSave, args...)
cmd.SetStdout(buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment on this vs CombinedOutput() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@thockin
Copy link
Member

thockin commented May 19, 2017

/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thockin, wojtek-t

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@wojtek-t wojtek-t force-pushed the reduce_memory_allocations_in_kube_proxy branch from 524c13a to 919f931 Compare May 19, 2017 18:49
Copy link
Member Author

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tim - comments applied.

@@ -228,36 +228,27 @@ func saveChain(chain *fakeChain, data *bytes.Buffer) {
}

func (f *fakeIPTables) Save(tableName utiliptables.Table) ([]byte, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still using it in few places in kubelet. Though I'm happy to send a PR migrating it to SaveInto (should be pretty simple). But I would prefer to do it in a separate one. Will send one on Monday.

@@ -417,6 +425,11 @@ func NewProxier(ipt utiliptables.Interface,
recorder: recorder,
healthChecker: healthChecker,
healthzServer: healthzServer,
iptablesLines: bytes.NewBuffer(nil),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

filterRules := bytes.NewBuffer(nil)
natChains := bytes.NewBuffer(nil)
natRules := bytes.NewBuffer(nil)
// Reset all buffers used later.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

args := []string{"-t", string(table)}
glog.V(4).Infof("running iptables-save %v", args)
cmd := runner.exec.Command(cmdIPTablesSave, args...)
cmd.SetStdout(buffer)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wojtek-t
Copy link
Member Author

Applying label based on Tim`s comment above.

@wojtek-t wojtek-t added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2017
@wojtek-t wojtek-t force-pushed the reduce_memory_allocations_in_kube_proxy branch from 919f931 to f53440e Compare May 19, 2017 19:05
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2017
@wojtek-t
Copy link
Member Author

Just fixed compile errors that appeared (didn't rename all occurences of variable).

@wojtek-t wojtek-t added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2017
@wojtek-t wojtek-t force-pushed the reduce_memory_allocations_in_kube_proxy branch from f53440e to a3da8d7 Compare May 19, 2017 19:34
@wojtek-t
Copy link
Member Author

Sorry - fixed gofmt.

@wojtek-t wojtek-t added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels May 19, 2017
Copy link
Contributor

@freehan freehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just nits. Looks good overall.


glog.V(3).Infof("Restoring iptables rules: %s", lines)
err = proxier.iptables.RestoreAll(lines, utiliptables.NoFlushTables, utiliptables.RestoreCounters)
if glog.V(4) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove if condition here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - initially it was different in my code.
So it sometimes makes sense to have such iffs, if the arguments of glog are expensive to compute. So imagine that you have the following line:

glog.V(4).Infof("xxxL: %v", callFunction(args))

Then callFunction(args) will be called no matter what the verbosity is and whether the log will be printed or not. In case of expensive computations, it's simply waste of time.

This is no longer the case here, so will send a PR to remove it (or will integrate it to one of other PRs I already have).

@@ -1497,8 +1497,8 @@ func (proxier *Proxier) syncProxyRules(reason syncReason) {
proxier.iptablesLines.Write(proxier.natChains.Bytes())
proxier.iptablesLines.Write(proxier.natRules.Bytes())

if glog.V(4) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? I feel like it has to be 4 at least. Otherwise, there is no enough actionable information to debug e2e tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is extremely expensive (in my tests in just 100-node cluster, this line is responsible for ~10% of whole cpu usage and ~20% of all memory allocations and GC).

I think that we can just bump the verbosity in tests in case of problems.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 20, 2017

@wojtek-t: The following test(s) failed:

Test name Commit Details Rerun command
pull-kubernetes-federation-e2e-gce a3da8d7 link @k8s-bot pull-kubernetes-federation-e2e-gce test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 3456d4d into kubernetes:master May 20, 2017

glog.V(3).Infof("Restoring iptables rules: %s", lines)
err = proxier.iptables.RestoreAll(lines, utiliptables.NoFlushTables, utiliptables.RestoreCounters)
if glog.V(5) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird constructs like this warrant a comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it's not longer needed - I will send a PR removing it on Monday.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#46201 is already out for review

@wojtek-t wojtek-t deleted the reduce_memory_allocations_in_kube_proxy branch May 23, 2017 09:59
k8s-github-robot pushed a commit that referenced this pull request May 30, 2017
Automatic merge from submit-queue (batch tested with PRs 45534, 37212, 46613, 46350)

Speed up and reduce number of memory allocations in kube-proxy

This is a second (and last PR) in this series - this solves all very-low-hanging fruits.

This PR:
- reduces cpu usage by ~25%
- reduces memory allocations by ~3x (together with #46033 by 10-12x)

Without this PR:
```
(pprof) top
8.59GB of 8.79GB total (97.75%)
Dropped 238 nodes (cum <= 0.04GB)
Showing top 10 nodes out of 64 (cum >= 0.11GB)
      flat  flat%   sum%        cum   cum%
    3.66GB 41.60% 41.60%     8.72GB 99.17%  k8s.io/kubernetes/pkg/proxy/iptables.(*Proxier).syncProxyRules
    3.07GB 34.96% 76.56%     3.07GB 34.96%  runtime.rawstringtmp
    0.62GB  7.09% 83.65%     0.62GB  7.09%  runtime.hashGrow
    0.34GB  3.82% 87.46%     0.34GB  3.82%  runtime.stringtoslicebyte
    0.29GB  3.24% 90.71%     0.58GB  6.61%  encoding/base32.(*Encoding).EncodeToString
    0.22GB  2.47% 93.18%     0.22GB  2.47%  strings.genSplit
    0.18GB  2.04% 95.22%     0.18GB  2.04%  runtime.convT2E
    0.11GB  1.22% 96.44%     0.73GB  8.36%  runtime.mapassign
    0.10GB  1.08% 97.52%     0.10GB  1.08%  syscall.ByteSliceFromString
    0.02GB  0.23% 97.75%     0.11GB  1.25%  syscall.SlicePtrFromStrings
```

with this PR:
```
(pprof) top
2.98GB of 3.08GB total (96.78%)
Dropped 246 nodes (cum <= 0.02GB)
Showing top 10 nodes out of 70 (cum >= 0.10GB)
      flat  flat%   sum%        cum   cum%
    1.99GB 64.60% 64.60%     1.99GB 64.60%  runtime.rawstringtmp
    0.58GB 18.95% 83.55%     0.58GB 18.95%  runtime.hashGrow
    0.10GB  3.40% 86.95%     0.69GB 22.47%  runtime.mapassign
    0.09GB  2.86% 89.80%     0.09GB  2.86%  syscall.ByteSliceFromString
    0.08GB  2.63% 92.44%     0.08GB  2.63%  runtime.convT2E
    0.03GB  1.13% 93.56%     0.03GB  1.13%  syscall.Environ
    0.03GB  0.99% 94.56%     0.03GB  0.99%  bytes.makeSlice
    0.03GB  0.97% 95.52%     0.03GB  1.06%  os.Stat
    0.02GB  0.65% 96.18%     3.01GB 97.79%  k8s.io/kubernetes/pkg/proxy/iptables.(*Proxier).syncProxyRules
    0.02GB   0.6% 96.78%     0.10GB  3.35%  syscall.SlicePtrFromStrings
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kube-proxy cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants