Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not attempt to overwrite higher system (sysctl) values #103174

Merged
merged 1 commit into from Sep 16, 2021

Conversation

@Napsty
Copy link
Contributor

@Napsty Napsty commented Jun 25, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:

With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values.

When Kubernetes runs on a Node which itself is a container (e.g. LXC), and the value is changed on the (LXC) host, kube-proxy then fails at the next start as it does not recognize the current value and attempts to overwrite the current value with the previously known one. This result in:

I0624 07:38:23.053960      54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
F0624 07:38:23.053999      54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.

Signed-off-by: Claudio Kuenzler ck@claudiokuenzler.com

Which issue(s) this PR fixes:

Fixes rancher/rancher#33360

Special notes for your reviewer:

The code change was mistakenly created as PR in the k3s project (see k3s-io/k3s#3505).
A real life use case is described in Rancher issue rancher/rancher#33360.

Does this PR introduce a user-facing change?

Changes behaviour of kube-proxy start; does not attempt to set specific sysctl values (which does not work in recent Kernel versions anymore in non-init namespaces), when the current sysctl values are already set higher.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Jun 25, 2021

@Napsty: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Loading

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Jun 25, 2021

Welcome @Napsty!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

Loading

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Jun 25, 2021

Hi @Napsty. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Loading

@Napsty
Copy link
Contributor Author

@Napsty Napsty commented Jul 23, 2021

ping @andrewsykim and @dcbw . Not really sure who else to ping. Let me know if someone else needs to do something first so this gets rolling. thx

Loading

@brandond
Copy link

@brandond brandond commented Jul 30, 2021

Might this be sig-node since it's kubelet sysctl stuff?

Worked around in downstream projects:

Loading

With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values.
The code change was mistakenly created as PR in the k3s project (see k3s-io/k3s#3505). 
A real life use case is described in Rancher issue rancher/rancher#33360.

When Kubernetes runs on a Node which itself is a container (e.g. LXC), and the value is changed on the (LXC) host, kube-proxy then fails at the next start as it does not recognize the current value and attempts to overwrite the current value with the previously known one. This result in:

```
I0624 07:38:23.053960      54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
F0624 07:38:23.053999      54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
```

However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.

Signed-off-by: Claudio Kuenzler ck@claudiokuenzler.com
@Napsty
Copy link
Contributor Author

@Napsty Napsty commented Aug 25, 2021

ping @andrewsykim and @dcbw , who should be assigned to this?

Loading

@khenidak
Copy link
Contributor

@khenidak khenidak commented Sep 3, 2021

/assign @khenidak

is it about running proxy with lower permission? or preventing the proxy from setting a value that might have impact outside lxc container?

Loading

@brandond
Copy link

@brandond brandond commented Sep 3, 2021

It's about the kernel no longer allowing these sysctls to be set within non-root namespaces.

Loading

@Napsty
Copy link
Contributor Author

@Napsty Napsty commented Sep 3, 2021

@khenidak as @brandond said, but to be set within non "net init" namespaces to be more precise. Which is true for all started containers (LXC, Docker, ...).
The PR does not solve this a 100%, but allows a workaround that the server admin can set certain sysctl values high enough that kube-proxy accepts them (which is actually already the case, e.g. Ubuntu 20.04). In the current situation kube-proxy tries to set sysctl values to a certain pre-defined value - even if it is smaller than the current sysctl value.

Loading

@khenidak
Copy link
Contributor

@khenidak khenidak commented Sep 15, 2021

@Napsty ACK. can you add release note?

Loading

@khenidak
Copy link
Contributor

@khenidak khenidak commented Sep 15, 2021

/retest

Loading

@Napsty
Copy link
Contributor Author

@Napsty Napsty commented Sep 15, 2021

@khenidak

I'm sorry, but what exactly is meant with release note? I read https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md but I still don't understand whether this involves an additional file or just a comment in the commit? Do you have an example at hand or point to another PR for comparison? Thank you!

Loading

@brandond
Copy link

@brandond brandond commented Sep 15, 2021

@Napsty there's a bit in the PR template where it says Does this PR introduce a user-facing change and you've responded with NONE. You should replace this with the actual user-facing change, as it will be worded in the release notes. You can look at pretty much any other PR for an example.

Loading

@Napsty
Copy link
Contributor Author

@Napsty Napsty commented Sep 15, 2021

@brandond correct. I understood "user facing" as something requiring user input - which is not the case here. In fact, the PR is without any user interaction.

We could still mention the different behavior when Kernel sys values are already set higher than the kube-proxy expected value. But I fail to understand where this needs to be done. I read the contributors release notes twice now and I'm still none the wiser ;-)

A bit of help/guidance for a first timer please :-)

Loading

@brandond
Copy link

@brandond brandond commented Sep 15, 2021

It doesn't need to be something that the user has to take action on, just something that they should know about when upgrading. Think of it from a user or administrator's perspective - what would you like to know about this change? Would you like to know that you will no longer have to manually set sysctls before starting kube-proxy?

Here's an example of a PR with an information changelog entry:
#104997

Loading

@khenidak
Copy link
Contributor

@khenidak khenidak commented Sep 16, 2021

/retest
/lgtm
/approve

The user facing section has been filled. I think we are good to go. Thanks @Napsty for this.

Loading

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Sep 16, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khenidak, Napsty

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Loading

@k8s-ci-robot k8s-ci-robot merged commit 16823fc into kubernetes:master Sep 16, 2021
14 checks passed
Loading
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Sep 16, 2021
@Napsty Napsty deleted the rancher-33360 branch Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment