-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update client/kernel version requirements for nftables kube-proxy #124152
base: master
Are you sure you want to change the base?
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Tue Apr 2 14:19:09 UTC 2024. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
pkg/proxy/apis/config/types.go
Outdated
// should only be used if you know nothing else on the system is using nftables, | ||
// since nft 0.9.8 and older will crash if kube-proxy's nftables rules are | ||
// present. | ||
AllowOlderKernel bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be more specific?
AllowNFTablesLEQ_0_9_8
I have some concerns that AllowOlderKernel
is pretty vague as a flag name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vagueness is intentional, since we can't actually check for the thing that we want to check for (host filesystem's nft
is 0.9.8 or later), so we're doing something vague to approximate it.
This flag will hopefully go away in GA anyway because (a) by that point the "older kernels" will be old enough that we don't have to feel as guilty about not supporting them, and (b) the problems caused by too-old-nft
are annoying enough that we don't really want anybody using this option in production.
Probably I should document that better...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, renamed this to skipKernelVersionCheck
and tried to clarify the docs. I didn't say anything about removing this in GA because maybe we won't.
/remove-sig api-machinery |
9cfb85a
to
edda5e7
Compare
/lgtm The PR description is stale about the name of the new option. |
LGTM label has been added. Git tree hash: fac0697c8091a789ac01e4d7a7df4fd8103dcc56
|
/test pull-kubernetes-e2e-kind-nftables |
@@ -261,7 +265,7 @@ func NewProxier(ipFamily v1.IPFamily, | |||
} | |||
|
|||
// Create a knftables.Interface and check if we can use the nftables proxy mode on this host. | |||
func getNFTablesInterface(ipFamily v1.IPFamily) (knftables.Interface, error) { | |||
func getNFTablesInterface(ipFamily v1.IPFamily, skipKernelVersionCheck bool) (knftables.Interface, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in iptables we do this stuff before creating the proxy
kubernetes/cmd/kube-proxy/app/server_linux.go
Lines 108 to 111 in 9791f0d
// getIPTables returns an array of [IPv4, IPv6] utiliptables.Interfaces. If primaryFamily | |
// is not v1.IPFamilyUnknown then it will also separately return the interface for just | |
// that family. | |
func getIPTables(primaryFamily v1.IPFamily) ([2]utiliptables.Interface, utiliptables.Interface) { |
should we move this validation to that place instead at the time of creating the proxier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do it that way for iptables to avoid having identical iptables-constructing functions in pkg/proxy/iptables
and pkg/proxy/ipvs
. But really, the logic belongs inside the individual backends; cmd/kube-proxy
shoudn't be making assumptions about what they need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or alternatively, we could call into a pkg/proxy/nftables
function from cmd/kube-proxy
to construct and sanity-check before we actually create the Proxier
; we sort of do that with ipvs. But that's mostly only advantageous because it lets us avoid separately doing sanity-checking for the IPv4 and IPv6 Proxier
s, and I have plans to eventually fix that anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was a curious comment 👍
// versions of the nft CLI in the host filesystem. Skipping the kernel version | ||
// check will allow running nftables kube-proxy on older distros, but beware that | ||
// it may interfere with the host OS's use of nftables. | ||
SkipKernelVersionCheck bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: @thockin @danwinship these options are kind of developer options, adding it to the config sounds a bit excessive to me, we have some places that use environment variables for similar reasons
staging/src/k8s.io/apimachinery/pkg/util/net/http.go: if s := os.Getenv("DISABLE_HTTP2"); len(s) > 0 {
staging/src/k8s.io/apimachinery/pkg/util/net/http.go: if s := os.Getenv("HTTP2_READ_IDLE_TIMEOUT_SECONDS"); len(s) > 0 {
staging/src/k8s.io/apimachinery/pkg/util/net/http.go: if s := os.Getenv("HTTP2_PING_TIMEOUT_SECONDS"); len(s) > 0 {
staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher.go: fatalOnDecodeError, _ = strconv.ParseBool(os.Getenv("KUBE_PANIC_WATCH_DECODE_ERROR"))
WDYT if we use an environment variables for this instead of leaking this into the config?
KUBE_PROXY_NFTABLES_SKIP_KERNEL_VERSION_CHECK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love it, but I see the argument. Config feels "more serious".
Another option would be a devMode
sub-stanza of the config?
/remove-sig api-machinery |
Another possibility is that we could have kubelet pass an "nftables hint" to kube-proxy like we do with iptables. eg
or something. (Older-but-still-supported kernels don't allow putting comments on chains or tables, so we need an actual rule (eg |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
it took you a while to get rid of the kproxy/kubelet combo , I think we should avoid creating dependencies between components if possible What about the magic env variable #124152 (comment) , |
The annoying part of
I'm not opposed but I feel like that's less kubernetes-y than having a proper config option. I guess it partly depends on whether we expect this flag to go away or not. (I don't know of any reason right now why we'd need it in a year or so, but I can't say for sure that we wouldn't either.) |
edda5e7
to
5578e5b
Compare
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: danwinship The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Move the "nftables is supported" check into a separate function, and call it before the --init-only return.
5578e5b
to
7da9b2e
Compare
7da9b2e
to
f4a189d
Compare
/remove-sig api-machinery |
What type of PR is this?
/kind cleanup
/sig network
/area kube-proxy
/priority important-soon
What this PR does / why we need it:
Addresses the problem that (a) sometimes older versions of
nft
will crash if they see rules created by newer versions of nft, and that (b) withnft
< 1.0.1, when you donft -f ...
it will try to parse the entire ruleset at startup, rather than only trying to parse the tables that you actually reference in the commands passed tonft
. Thus:nft
1.0.1 or later, to ensure that other people's use of nftables won't break kube-proxy.nft
1.0.1 or later, to ensure that kube-proxy's use of nftables won't break other people. (We can't directly check the system nft version, but all known distros with kernel 5.13 or later also have nft 1.0.1 or later.)nftables.skipKernelVersionCheck
config option allows bypassing the above check for dev/testing purposes. (As of April 2024 there are still supported versions of LTS distros using slightly older kernels, though by the time we go GA this should no longer be true.)More discussion in #122743.
Which issue(s) this PR fixes:
Fixes #122743
Special notes for your reviewer:
knftables rebase stolen from #123389...
Does this PR introduce a user-facing change?