New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert #104308 to bring back LockContention tests #104334
Revert #104308 to bring back LockContention tests #104334
Conversation
/approve cancel it is failing on serial tab. Can you please check why? |
Lock contention flags are not added to the the tests which are failing. Created a PR to update the job. Question: If I want to re-run the failing test but this time I want the test to use the job yaml from the PR and not the master branch. How can I do that ? /cc @ehashman |
@ipochi Unfortunately the easiest way to do that is by running them from your machine, which requires having a Google Cloud account configured, with a default application profile and project that won't interfere with any production environments you care about ( and then e.g:
|
/triage accepted |
|
||
const contentionLockFile = "/var/run/kubelet.lock" | ||
|
||
var _ = SIGDescribe("Lock contention [Slow] [Disruptive] [Serial] [NodeFeature:LockContention]", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var _ = SIGDescribe("Lock contention [Slow] [Disruptive] [Serial] [NodeFeature:LockContention]", func() { | |
var _ = SIGDescribe("Lock contention [Slow] [Disruptive] [Feature:LockContention]", func() { |
We'll need to update the test-infra manifest. This should avoid this getting picked up by the suites that launch the kubelet without the right command line flags.
@SergeyKanzhelev do we want to consider marking this [NodeSpecialFeature]
and skip that in the regular feature suite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might have to stay serial? - (given that it messes with kubelet health)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehashman I've made it NodeSpecialFeature:Contention
, I think that works better than Feature:LockContention
.
What do you think ?
/assign |
7b3eb73
to
a6a558e
Compare
435d353
to
61c4732
Compare
…608-imran/e2e-lock-contention" This reverts commit 9d09c9d This E2E test was reverted becuase the test was failing continously. More on the issue here kubernetes#104307 This commit re-reverts and brings back the LockContention test, with the addition of [Serial] tag to the test.
@SergeyKanzhelev friendly ping :) |
var _ = SIGDescribe("Lock contention [Slow] [Disruptive] [Serial] [NodeFeature:LockContention]", func() { | ||
|
||
ginkgo.It("Kubelet should stop when the test acquires the lock on lock file and restart once the lock is released", func() { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be useful to check the kubelet was started with the proper flag and skip if not. Or at lease add a comment here explaining that this is needed for test to run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, beyond file rename and adding a comment or skipper
This commit is to be squashed and merged with the first commit. Signed-off-by: Imran Pochi <imran@kinvolk.io>
@SergeyKanzhelev Thank you for reviewing the PR. I think I accidentally forced pushed an older version of the commit which resulted in the name changed (_linux) not appearing and couple of other review comments which I addressed such as the setting the permalinks. I've ensured this time around by pushing in the latest commit. Please review once again (I've created a secondary commit containing the changes). Sorry for the mess. |
The job for the lock contention test was removed in the test-infra repo. What needs to be done on that repo, to get the job back ? |
@ipochi: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest-required |
@SergeyKanzhelev @ehashman friendly ping :) |
@ipochi We'll need a new PR in |
to the test suite. This test is created due to the need of `--lock-file` and `--exit-on-lock-contention` be moved as flags and into the Kubelet configuration file rather than be dropped. Adds a new tab in TestGrid named `kubelet-gce-e2e-lock-contention` running tests focused on `NodeSpecialFeature:LockContention`. Corresponding e2e test is at kubernetes/kubernetes#104334 Signed-off-by: Imran Pochi <imranpochi@microsoft.com>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
this looks fine, we can revert if it's not
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ehashman, ipochi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This commit moves the `--exit-on-lock-contention` and `--lock-file` kubelet flags to Kubelet Configuration. This PR is built on the following PRs: Corresponding E2E test PRs : kubernetes#103608, kubernetes#104334, kubernetes#108563 Corresponding Job in test-infra : https://github.com/kubernetes/test-infra/blob/e684255cc8701ef97b6832e3daadb6841c00cc65/config/jobs/kubernetes/sig-node/containerd.yaml#L1315-#L1343 Signed-off-by: Imran Pochi <imran@kinvolk.io>
This reverts commit 9d09c9d
This E2E test was reverted because the test was failing continuously.
More on the issue here #104307
What type of PR is this?
/kind feature
/sig node
What this PR does / why we need it:
This PR re-reverts and brings back the LockContention test, with
the addition of [Serial] tag to the test.
Which issue(s) this PR fixes:
This E2E test was reverted because the test was failing continuously.
More on the issue here #104307