Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move iptables logging in kubeproxy from Errorf to V(2).Infof #48085

Merged
merged 1 commit into from
Jun 26, 2017

Conversation

shyamjvs
Copy link
Member

@shyamjvs shyamjvs commented Jun 26, 2017

Fixes #48052

This will stop fluentd from OOM'ing in reasonably large clusters with services due to kube-proxy. You'll still get iptables printed on setups which run at >= v2, but we can at least optout.
@bowei Does this look reasonable?

cc @kubernetes/sig-network-misc

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 26, 2017
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Jun 26, 2017
@bowei
Copy link
Member

bowei commented Jun 26, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2017
@k8s-github-robot k8s-github-robot added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jun 26, 2017
@shyamjvs shyamjvs added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. release-note-label-needed labels Jun 26, 2017
@shyamjvs
Copy link
Member Author

@gmarek We can try turning on fluentd again with this change.

@shyamjvs
Copy link
Member Author

@thockin / @freehan Could one of you approve the PR?

@gmarek
Copy link
Contributor

gmarek commented Jun 26, 2017

If it's not that important, I'd do it v3.

@shyamjvs
Copy link
Member Author

We can, but currently quite some tests (and normal kube-up setups) are running at v2 and having iptables printed might be useful there for debugging. If need be, we can increase in future IMO.

@@ -1571,7 +1571,8 @@ func (proxier *Proxier) syncProxyRules() {
glog.V(5).Infof("Restoring iptables rules: %s", proxier.iptablesData.Bytes())
err = proxier.iptables.RestoreAll(proxier.iptablesData.Bytes(), utiliptables.NoFlushTables, utiliptables.RestoreCounters)
if err != nil {
glog.Errorf("Failed to execute iptables-restore: %v\nRules:\n%s", err, proxier.iptablesData.Bytes())
glog.Errorf("Failed to execute iptables-restore: %v", err)
glog.V(2).Infof("Rules:\n%s", proxier.iptablesData.Bytes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error... it shouldn't be happening at all. Why is it happening so much that it causes problems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't merge this PR without also filing an issue about the actual bug. (ie, that apparently there are locking timeout issues causing some kube-proxy iptables updates to get dropped)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #48107. Thanks.

@gmarek
Copy link
Contributor

gmarek commented Jun 26, 2017

SGTM

@dchen1107
Copy link
Member

/lgtm

I am ok to cherrypick this one to 1.7 to help scalability test. But we need a real solution for production. :-)

@dchen1107 dchen1107 added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cherrypick-candidate labels Jun 26, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bowei, dchen1107, shyamjvs

Associated issue: 48052

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2017
@dchen1107 dchen1107 added this to the v1.7 milestone Jun 26, 2017
@shyamjvs
Copy link
Member Author

Sounds good. Thanks @dchen1107 !

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bowei, dchen1107, shyamjvs

Associated issue: 48052

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

1 similar comment
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bowei, dchen1107, shyamjvs

Associated issue: 48052

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bowei, dchen1107, shyamjvs

Associated issue: 48052

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)

@k8s-github-robot k8s-github-robot merged commit a3df4bf into kubernetes:master Jun 26, 2017
@gmarek
Copy link
Contributor

gmarek commented Jun 27, 2017

@dchen1107 - I completely agree.

The thing is that it's a complex problem with multiple actors involved, so it'd need to be a very wide effort, which was never prioritized highly enough. There are at least two problems:

  • fluentd OOM-crashloops when it needs to process more logs that it can handle (which can very easily be triggered by user) @crassirostris @piosz
  • given big enough cluster Node agents react to OOM-crashloop by spamming API server to the point it responds with 429 to vast majority of requests, including Node heartbeats, which destabilizes the cluster @kubernetes/sig-node-proposals (I'm missing sig-node-misc group...) @kubernetes/sig-api-machinery-misc

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/design Categorizes issue or PR as related to design. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jun 27, 2017
@shyamjvs shyamjvs deleted the reduce-kubeproxy-logs branch June 27, 2017 09:03
k8s-github-robot pushed a commit that referenced this pull request Jun 27, 2017
…#47986-#47152-#47860-#47945-#47961-#47986-#47993-#48012-#48085-upstream-release-1.7

Automatic merge from submit-queue

Automated cherry pick of #47986 #47152 #47860 #47945 #47961 #47986 #47993 #48012 #48085

Cherry pick of #47986 #47152 #47860 #47945 #47961 #47986 #47993 #48012 #48085 on release-1.7.

#47986: Change service port to avoid collision
#47152: Kubelet doesn't override addrs from Cloud provider
#47860: Make fluentd log to stdio instead of a dedicated file
#47945: add level for print flags
#47961: Bumped Heapster to v1.4.0-beta.0
#47986: Change service port to avoid collision
#47993: Use a different env var to enable the ip-masq-agent addon. We
#48012: Extending timeout waiting for delete node to become ready
#48085: Move iptables logging in kubeproxy from Errorf to V(2).Infof
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

k8s-github-robot pushed a commit that referenced this pull request Aug 2, 2017
Automatic merge from submit-queue

Log abridged set of rules at v2 in kube-proxy on error

**What this PR does / why we need it**:
this is a follow-on to #48085

**Special notes for your reviewer**:
we hit this in operations where we typically run in v2, and would like to log abridged set of output rather than full output.

**Release note**:
```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants