-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the restart kubelet check from the test. #108563
Remove the restart kubelet check from the test. #108563
Conversation
Upon reconsidering as to the purpose of the test i.e to test the lock contention flags (--lock-file-contention and --lock-file), it makes sense that we test only the actual functionality which is the kubelet should stop once there is a lock contention. In no way it is the responsiblity of the kubelet to restart, which would be the responsiblity of a higher system such as systemd. Hence the removal of the check for releasing the lock and checking for whether the kubelet is healthy again or not seem out of scope from kubelet's responsiblities. Signed-off-by: Imran Pochi <imran@kinvolk.io>
/test pull-kubernetes-integration |
kubernetes/cmd/kubelet/app/server.go Lines 488 to 503 in d7d1219
kubernetes/cmd/kubelet/app/server_linux.go Lines 26 to 48 in c3d9b10
Looking at the logic, the old kubelet will exit after the new kubelet is started. this exited kubelet should indeed not be restarted. |
kubernetes/cmd/kubelet/app/options/options.go Lines 308 to 309 in 06e1070
By the way, these two flags look very old, and there is still |
We want to graduate the flags and move to Kubelet configuration, hence this work. One of the prerequisites of graduating is to have e2e test for the feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
> find . -name '*.go' | xargs -I {} grep -rni "<Warning: Alpha feature>" {}
./cmd/kubelet/app/options/options.go:307: fs.Var(&bindableNodeLabels, "node-labels", fmt.Sprintf("<Warning: Alpha feature> Labels to add when registering the node in the cluster. Labels must be key=value pairs separated by ','. Labels in the 'kubernetes.io' namespace must begin with an allowed prefix (%s) or be in the specifically allowed set (%s)", strings.Join(kubeletapis.KubeletLabelNamespaces(), ", "), strings.Join(kubeletapis.KubeletLabels(), ", ")))
./cmd/kubelet/app/options/options.go:308: fs.StringVar(&f.LockFilePath, "lock-file", f.LockFilePath, "<Warning: Alpha feature> The path to file for kubelet to use as a lock file.")
./cmd/kubelet/app/options/options.go:310: fs.BoolVar(&f.SeccompDefault, "seccomp-default", f.SeccompDefault, "<Warning: Alpha feature> Enable the use of `RuntimeDefault` as the default seccomp profile for all workloads. The SeccompDefault feature gate must be enabled to allow this flag, which is disabled per default.")
./cmd/kubelet/app/options/options.go:452: fs.DurationVar(&c.CPUManagerReconcilePeriod.Duration, "cpu-manager-reconcile-period", c.CPUManagerReconcilePeriod.Duration, "<Warning: Alpha feature> CPU Manager reconciliation period. Examples: '10s', or '1m'. If not supplied, defaults to 'NodeStatusUpdateFrequency'")
./cmd/kubelet/app/options/options.go:453: fs.Var(cliflag.NewMapStringString(&c.QOSReserved), "qos-reserved", "<Warning: Alpha feature> A set of ResourceName=Percentage (e.g. memory=50%) pairs that describe how pod resource requests are reserved at the QoS level. Currently only memory is supported. Requires the QOSReserved feature gate to be enabled.") It looks like there are a few more of this type, migrating to the Kubelet configuration file I feel needs KEP which involves API changes |
/lgtm |
/triage accepted |
@cynepco3hahue I rememeber you were looking into this in one of the tests. Do you have any comments here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long term we want to merge this test back into the Serial lane (once it will be configured using config file). And the fact that kubelet wouldn't restart will become a problem. I pinged @cynepco3hahue in case there is a quick fix here. For now this is OK to merge
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ipochi, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks @SergeyKanzhelev for the insight, Now I know why we were checking restart as well. But rather making it a part of test itself we can add kubelet restart as cleanup or AfterEach`,let this pass successfully we can later iterate over to merge in serial lane |
I'm a little hesitant about this change - I think it makes sense for the test in a case like this (especially in an e2e suite) to also describe expected interactions with other systems, but appreciate that could hinder testing Kubelet in stranger deployment cases with the upstream suite. We definitely need to add a cleanup step at a minimum here though. |
This commit moves the `--exit-on-lock-contention` and `--lock-file` kubelet flags to Kubelet Configuration. This PR is built on the following PRs: Corresponding E2E test PRs : kubernetes#103608, kubernetes#104334, kubernetes#108563 Corresponding Job in test-infra : https://github.com/kubernetes/test-infra/blob/e684255cc8701ef97b6832e3daadb6841c00cc65/config/jobs/kubernetes/sig-node/containerd.yaml#L1315-#L1343 Signed-off-by: Imran Pochi <imran@kinvolk.io>
Upon reconsidering as to the purpose of the test i.e to test the lock contention flags
(--lock-file-contention and --lock-file), it makes sense that we test only the actual functionality
which is the kubelet should stop once there is a lock contention.
In no way it is the responsiblity of the kubelet to restart, which would be the responsiblity
of a higher system such as systemd. Hence the removal of the check for releasing the lock
and checking for whether the kubelet is healthy again or not seem out of scope from kubelet's responsiblities.
Thanks to @adisky for this perspective.
/cc @endocrimes @SergeyKanzhelev
Signed-off-by: Imran Pochi imran@kinvolk.io
What type of PR is this?
/kind failing-test
What this PR does / why we need it:
This PR addresses the failing test.
https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-gce-e2e-lock-contention
Which issue(s) this PR fixes:
Fixes #108348
Special notes for your reviewer:
Does this PR introduce a user-facing change?