New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail Kubelet at startup if swap is configured with cgroup v1 #122241
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Fri Dec 8 16:13:50 UTC 2023. |
This PR addresses the comments in #119327. I decided to drop the swap_test changes to minimize noise. |
@@ -224,6 +224,8 @@ func NewContainerManager(mountUtil mount.Interface, cadvisorInterface cadvisor.I | |||
return nil, fmt.Errorf("running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: %v", swapLines) | |||
} | |||
} | |||
} else if utilfeature.DefaultFeatureGate.Enabled(kubefeatures.NodeSwap) && !cgroups.IsCgroup2UnifiedMode() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok , there is some complexity here, there is an option that defines the behavior of the kubelet if swap is enabled or no
// failSwapOn tells the Kubelet to fail to start if swap is enabled on the node.
--
581 | // Default: true
582 | // +optional
583 | FailSwapOn *bool `json:"failSwapOn,omitempty"`
if that option is not enabled, the kubelet keeps working.
Now we are going to add a new failure mode, if the feature gate NodeSwap
is enabled and cgroupsv1, it will fail too.
The question here is that featuregates are not flags, so they can not be used to define conditional behaviors, when the feature graduates the feature gate will no longer be available and will make the kubelet to always fail on hosts with cgroupsv1 and swap enabled but the "FailSwapOn" to false, that will be confusing
/cc @SergeyKanzhelev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out.
Generally, we are only hitting this change if someone says failSwapOn=false and then we verify that it must be cgroupv2. We do hit a undefined case if NodeSwap is enabled but failSwapOn=true. It essentially makes the feature gate useless. So we will require both NodeSwap=true and failSwapOn=false for swap to be enabled. We also require that the node actually has swap enabled on the kernel level.
I don't really know what we want to do with FailSwapOn for GA for this feature tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I think this should be fine as is.
Even though swap is in beta (technically beta2), we are rolling this out with the feature flag off. Its a bit more involved feature as the node needs to have swap set up correctly to work. As with any feature we will eventually drop the feature gate when we GA but I think we need a lot more soak time for this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those are the right questions Kevin, and that is the goal of the KEPs , you discuss this ambiguous problems and reach a conclusion ... and there is a place to go back and see "oh, this was done because of this", there is a section already for these kind of problems that PRR reviewers check , cc: @wojtek-t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see @SergeyKanzhelev also did similar comments in the previous PR #119327 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to not use the feature toggle and I use the config option --fail-swap-on=false
as that will be required to run kubelet on swap.
We are still debating about how swap rollout should go so I think this is fine now.
/cc @harche
/triage accepted |
/test |
cb9abec
to
09e4cd3
Compare
09e4cd3
to
a864519
Compare
cmd/kubelet/app/server.go
Outdated
@@ -839,7 +839,7 @@ func run(ctx context.Context, s *options.KubeletServer, kubeDeps *kubelet.Depend | |||
TopologyManagerScope: s.TopologyManagerScope, | |||
TopologyManagerPolicyOptions: topologyManagerPolicyOptions, | |||
}, | |||
s.FailSwapOn, | |||
s.KubeletConfiguration.FailSwapOn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test is correct, the flag is true
Jan 10 16:09:36 kind-control-plane kubelet[324]: I0110 16:09:36.749688 324 flags.go:64] FLAG: --fail-swap-on="true"
the problem is that here you are setting the value of the configuration
This is a breaking change, you can not change the order of preference of flags vs configuration so lightly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I logged what s.FailSwapOn is and its false in this case. So this is probably not being set..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are parsing this bool from the cli and for some reason its set to false (or not being set at all).
eabc119
to
7657e46
Compare
Signed-off-by: Itamar Holder <iholder@redhat.com>
7657e46
to
bbf69eb
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kannon92 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@aojea I tried to reproduce it locally and the config is parsed correctly. I started to read the logs https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/122241/pull-kubernetes-e2e-kind/1745131801791172608/build-log.txt and I notice that kubeadm is actually setting the KubeletConfig with failSwapOn to false. We are actually ignoring the CLI and using the config. I think this is WAI as the cli options are actually deprecated. It looks like kind has
So there seems to be some legacy reasons for setting I don't really know the reason why this is set up as is but I'm not sure how to proceed with this PR then. So what I see is there are a lot of templates for setting swap to false. We also set to swap to false for development environment (in hack/local-up-cluster). So I am not sure if this PR is the right path forward. The Swap KEP is not intended to be used with cgroup v1 but I realize that it seems that it has/is possible to deploy kubelet on a swap enabled node. We just don't have workloads actually use swap but that doesn't mean that we can set swap and cgroupv1 as a failure path. |
} else if !cgroups.IsCgroup2UnifiedMode() { | ||
return nil, fmt.Errorf("failSwapOn is %v but running swap with cgroups v1 is not supported", failSwapOn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that what you want to do is
if failSwapOn && !cgroups.IsCgroup2UnifiedMode() {
return nil, fmt.Errorf("failSwapOn is %v but running swap with cgroups v1 is not supported", failSwapOn)
}
and this before the if in line 207
So thinking about this I think we have three cases that we need to support: case 1: case 3 is the main one we want to fail. Case 2 should not be a fail case for swap as kube isn't actually using swap. |
/test pull-kubernetes-e2e-kind-ipv6 |
everything that worked before has to keep working the same case 1 and case 2 are things that people use already and we can not break, is not only kind , CI , dev, there are people out there setting that value ... those has to have the same behavior, you can not change it
These are the kind of things that have to be decided on the KEP and that is the reason we have PRR reviews, to evaluate the risks of rolling these changes in production |
/retest |
/cc @harche |
@@ -803,6 +803,10 @@ func run(ctx context.Context, s *options.KubeletServer, kubeDeps *kubelet.Depend | |||
return fmt.Errorf("topology manager policy options %v require feature gates %q enabled", | |||
s.TopologyManagerPolicyOptions, features.TopologyManagerPolicyOptions) | |||
} | |||
workloadsUseSwap := false | |||
if utilfeature.DefaultFeatureGate.Enabled(features.NodeSwap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking with @aojea if feature gate check is fine to use it here.
@harche @kannon92 I was rereading the kep https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2400-node-swap#api-changes and we need to change the KEP as what we are implementing here is different that what is described there. That is completely ok, no problem with that, but KEPs are kind of source of truth for docs, so the behavior implemented has to be reflected in the KEP
I understand that we want to re-scope the feature to work only with cgroupsv2, so let me suggest another angle. diff --git a/pkg/kubelet/kuberuntime/kuberuntime_container_linux.go b/pkg/kubelet/kuberuntime/kuberuntime_container_linux.go
index f14b6d1a6fc..b0200953bf2 100644
--- a/pkg/kubelet/kuberuntime/kuberuntime_container_linux.go
+++ b/pkg/kubelet/kuberuntime/kuberuntime_container_linux.go
@@ -167,6 +167,11 @@ func (m *kubeGenericRuntimeManager) configureContainerSwapResources(lcr *runtime
klog.InfoS("No swap cgroup controller present", "swapBehavior", m.memorySwapBehavior, "pod", klog.KObj(pod), "containerName", container.Name)
return
}
+
+ if !libcontainercgroups.IsCgroup2UnifiedMode() {
+ klog.InfoS("Swap behavior only available with cgroupsv2", "swapBehavior", m.memorySwapBehavior, "pod", klog.KObj(pod), "containerName", container.Name)
+ return
+ }
swapConfigurationHelper := newSwapConfigurationHelper(*m.machineInfo)
if !utilfeature.DefaultFeatureGate.Enabled(kubefeatures.NodeSwap) { |
@aojea I have been working on https://docs.google.com/document/d/1S75d_0N0i1taGTGVjQ8YLYymq8D5Yc0aVu-ROMqQkSs/edit?usp=sharing to summarize our findings for evalulating swap. We are planning to update the KEP to reflect a lot of these changes. Currently this was called out in the Beta1 section but I guess the main text was not updated. |
Thanks @aojea for that feedback. As @kannon92 already pointed out, the Goals section should have been updated when we updated the graduation criteria for Beta1. However, we need to reach a consensus on kubelet behavior in case of swap and cgroup v1. @iholder101 started working on the original PR to fail the kubelet after receiving the feedback during the review process. cc @SergeyKanzhelev @pacoxu |
I opened up kubernetes/enhancements#4401 to reflect some of this @aojea. We added a non goal to support cgroupv1 and this feature. I think we need a new configuration for swap enabled nodes with this feature off. This is a tricky feature and we didn't actually have any way for a user to opt out of kube using swap. Its clear to me that this feature will be an advanced one and I think we should have fall back options even if this feature is GAed. |
it is a technical problem, you know more than me in this area so I trust your judgment, you have several options on the table, flags, noop for cgroupsv1, ... so describe them, study the pros and cons and choose the ones you think is better for the project and the users Thanks for the healthy discussion |
@kannon92: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Since Swap requires cgroup v2 to work, we want to fail starting kubelet if swap is enabled and the node is using cgroup v1.
Which issue(s) this PR fixes:
Fixes #119427
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: