Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update client/kernel version requirements for nftables kube-proxy #124152

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented Apr 2, 2024

What type of PR is this?

/kind cleanup
/sig network
/area kube-proxy
/priority important-soon

What this PR does / why we need it:

Addresses the problem that (a) sometimes older versions of nft will crash if they see rules created by newer versions of nft, and that (b) with nft < 1.0.1, when you do nft -f ... it will try to parse the entire ruleset at startup, rather than only trying to parse the tables that you actually reference in the commands passed to nft. Thus:

  1. Require that kube-proxy be using nft 1.0.1 or later, to ensure that other people's use of nftables won't break kube-proxy.
  2. Require that the kernel is 5.13 or newer, to hopefully ensure that the system has nft 1.0.1 or later, to ensure that kube-proxy's use of nftables won't break other people. (We can't directly check the system nft version, but all known distros with kernel 5.13 or later also have nft 1.0.1 or later.)
  3. The nftables.skipKernelVersionCheck config option allows bypassing the above check for dev/testing purposes. (As of April 2024 there are still supported versions of LTS distros using slightly older kernels, though by the time we go GA this should no longer be true.)

More discussion in #122743.

Which issue(s) this PR fixes:

Fixes #122743

Special notes for your reviewer:

knftables rebase stolen from #123389...

Does this PR introduce a user-facing change?

The (alpha) nftables mode of kube-proxy now requires version 1.0.1 or later
of the nft command-line, and kernel 5.13 or later. (For testing/development
purposes, you can use older kernels, as far back as 5.4, if you set the
`nftables.skipKernelVersionCheck` option in the kube-proxy config, but this is not
recommended in production since it may cause problems with other nftables
users on the system.)

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Apr 2, 2024
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.30 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.30.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Tue Apr 2 14:19:09 UTC 2024.

@k8s-ci-robot k8s-ci-robot added area/kube-proxy priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 2, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/code-generation area/dependency Issues or PRs related to dependency changes kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Apr 2, 2024
@k8s-ci-robot k8s-ci-robot requested review from andrewsykim, bowei and a team April 2, 2024 14:42
@danwinship danwinship mentioned this pull request Apr 2, 2024
27 tasks
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

// should only be used if you know nothing else on the system is using nftables,
// since nft 0.9.8 and older will crash if kube-proxy's nftables rules are
// present.
AllowOlderKernel bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be more specific?

AllowNFTablesLEQ_0_9_8

I have some concerns that AllowOlderKernel is pretty vague as a flag name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vagueness is intentional, since we can't actually check for the thing that we want to check for (host filesystem's nft is 0.9.8 or later), so we're doing something vague to approximate it.

This flag will hopefully go away in GA anyway because (a) by that point the "older kernels" will be old enough that we don't have to feel as guilty about not supporting them, and (b) the problems caused by too-old-nft are annoying enough that we don't really want anybody using this option in production.

Probably I should document that better...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, renamed this to skipKernelVersionCheck and tried to clarify the docs. I didn't say anything about removing this in GA because maybe we won't.

pkg/proxy/nftables/proxier.go Outdated Show resolved Hide resolved
@cici37
Copy link
Contributor

cici37 commented Apr 4, 2024

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Apr 4, 2024
@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Apr 5, 2024
@tnqn
Copy link
Member

tnqn commented Apr 7, 2024

/lgtm

The PR description is stale about the name of the new option.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 7, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: fac0697c8091a789ac01e4d7a7df4fd8103dcc56

@danwinship
Copy link
Contributor Author

/retest-required
/assign @thockin
for KubeProxyConfiguration API changes (and vendor update but that commit is identical to the already-/approved one in #123389)

@aojea
Copy link
Member

aojea commented Apr 7, 2024

/test pull-kubernetes-e2e-kind-nftables

@@ -261,7 +265,7 @@ func NewProxier(ipFamily v1.IPFamily,
}

// Create a knftables.Interface and check if we can use the nftables proxy mode on this host.
func getNFTablesInterface(ipFamily v1.IPFamily) (knftables.Interface, error) {
func getNFTablesInterface(ipFamily v1.IPFamily, skipKernelVersionCheck bool) (knftables.Interface, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in iptables we do this stuff before creating the proxy

// getIPTables returns an array of [IPv4, IPv6] utiliptables.Interfaces. If primaryFamily
// is not v1.IPFamilyUnknown then it will also separately return the interface for just
// that family.
func getIPTables(primaryFamily v1.IPFamily) ([2]utiliptables.Interface, utiliptables.Interface) {

should we move this validation to that place instead at the time of creating the proxier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do it that way for iptables to avoid having identical iptables-constructing functions in pkg/proxy/iptables and pkg/proxy/ipvs. But really, the logic belongs inside the individual backends; cmd/kube-proxy shoudn't be making assumptions about what they need.

Copy link
Contributor Author

@danwinship danwinship Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or alternatively, we could call into a pkg/proxy/nftables function from cmd/kube-proxy to construct and sanity-check before we actually create the Proxier; we sort of do that with ipvs. But that's mostly only advantageous because it lets us avoid separately doing sanity-checking for the IPv4 and IPv6 Proxiers, and I have plans to eventually fix that anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was a curious comment 👍

// versions of the nft CLI in the host filesystem. Skipping the kernel version
// check will allow running nftables kube-proxy on older distros, but beware that
// it may interfere with the host OS's use of nftables.
SkipKernelVersionCheck bool
Copy link
Member

@aojea aojea Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: @thockin @danwinship these options are kind of developer options, adding it to the config sounds a bit excessive to me, we have some places that use environment variables for similar reasons

staging/src/k8s.io/apimachinery/pkg/util/net/http.go:   if s := os.Getenv("DISABLE_HTTP2"); len(s) > 0 {
staging/src/k8s.io/apimachinery/pkg/util/net/http.go:   if s := os.Getenv("HTTP2_READ_IDLE_TIMEOUT_SECONDS"); len(s) > 0 {
staging/src/k8s.io/apimachinery/pkg/util/net/http.go:   if s := os.Getenv("HTTP2_PING_TIMEOUT_SECONDS"); len(s) > 0 {
staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher.go:      fatalOnDecodeError, _ = strconv.ParseBool(os.Getenv("KUBE_PANIC_WATCH_DECODE_ERROR"))

WDYT if we use an environment variables for this instead of leaking this into the config?
KUBE_PROXY_NFTABLES_SKIP_KERNEL_VERSION_CHECK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love it, but I see the argument. Config feels "more serious".

Another option would be a devMode sub-stanza of the config?

@alexzielenski
Copy link
Contributor

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Apr 9, 2024
@danwinship
Copy link
Contributor Author

Another possibility is that we could have kubelet pass an "nftables hint" to kube-proxy like we do with iptables. eg

table inet kubernetes-hints {
    chain networking.k8s.io/nft-version {
        continue comment "v1.0.7"
    }
}

or something. (Older-but-still-supported kernels don't allow putting comments on chains or tables, so we need an actual rule (eg continue).)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 19, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aojea
Copy link
Member

aojea commented Apr 19, 2024

Another possibility is that we could have kubelet pass an "nftables hint" to kube-proxy like we do with iptables. eg

table inet kubernetes-hints {
    chain networking.k8s.io/nft-version {
        continue comment "v1.0.7"
    }
}

or something. (Older-but-still-supported kernels don't allow putting comments on chains or tables, so we need an actual rule (eg continue).)

it took you a while to get rid of the kproxy/kubelet combo , I think we should avoid creating dependencies between components if possible

What about the magic env variable #124152 (comment) ,

@danwinship
Copy link
Contributor Author

it took you a while to get rid of the kproxy/kubelet combo , I think we should avoid creating dependencies between components if possible

The annoying part of IPTablesOwnershipCleanup was the fact that kube-proxy's rules depended on chains that kubelet created. The KUBE-IPTABLES-HINT has always worked pretty well and not caused problems. (And it helps components other than kube-proxy too.)

What about the magic env variable #124152 (comment) ,

I'm not opposed but I feel like that's less kubernetes-y than having a proper config option. I guess it partly depends on whether we expect this flag to go away or not. (I don't know of any reason right now why we'd need it in a year or so, but I can't say for sure that we wouldn't either.)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 24, 2024
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot requested review from thockin and tnqn May 24, 2024 15:38
@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label May 24, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: danwinship
Once this PR has been reviewed and has the lgtm label, please ask for approval from thockin. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Move the "nftables is supported" check into a separate function, and
call it before the --init-only return.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2024
@cici37
Copy link
Contributor

cici37 commented May 30, 2024

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/code-generation area/dependency Issues or PRs related to dependency changes area/kube-proxy cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

figure out / document nftables version requirements
9 participants