Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix evented pleg mirror pod & use IsEventedPLEGInUse instead of FG status check #122778

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

pacoxu
Copy link
Member

@pacoxu pacoxu commented Jan 15, 2024

This PR is based on #122763, and we use IsEventedPLEGInUse() instead of utilfeature.DefaultFeatureGate.Enabled(features.EventedPLEG).

Fixes #123087

Context can be found in #122763.

kubelet: fix a static pod startup bug with Evented PLEG

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 15, 2024
@pacoxu
Copy link
Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features
/test pull-kubernetes-e2e-kind-beta-features

@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 15, 2024
@pacoxu
Copy link
Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

@pacoxu
Copy link
Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-unit
/test pull-kubernetes-integration

@pacoxu
Copy link
Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

2 similar comments
@pacoxu
Copy link
Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

@pacoxu
Copy link
Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

@pacoxu pacoxu marked this pull request as ready for review January 15, 2024 08:41
@pacoxu
Copy link
Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

1 similar comment
@pacoxu
Copy link
Member Author

pacoxu commented Jan 15, 2024

/test pull-kubernetes-e2e-kind-alpha-beta-features
/test pull-kubernetes-e2e-kind-alpha-features

@pacoxu

This comment was marked as duplicate.

1 similar comment
@pacoxu

This comment was marked as duplicate.

@pacoxu
Copy link
Member Author

pacoxu commented Feb 7, 2024

rerun
/test pull-kubernetes-e2e-kind-evented-pleg

@sairameshv
Copy link
Member

/test pull-kubernetes-e2e-kind-evented-pleg

@MaryamTavakkoli
Copy link

Hello!
Release Signal shadow here.
The code freeze is starting 02:00 UTC Wednesday 6th March 2024 / 18:00 PDT Tuesday 5th March 2024 (about three weeks from now), and while there is still plenty of time, we want to ensure that each PR has a chance to be merged on time.

As this PR is tagged for 1.30, is it still planned for this release?

@pacoxu pacoxu changed the title Fix evented pleg mirro pod & use IsEventedPLEGInUse instead of FG status check Fix evented pleg mirror pod & use IsEventedPLEGInUse instead of FG status check Feb 18, 2024
@pacoxu
Copy link
Member Author

pacoxu commented Feb 18, 2024

As this PR is tagged for 1.30, is it still planned for this release?

PTAL @smarterclayton @dchen1107 when you have time.

@pacoxu
Copy link
Member Author

pacoxu commented Feb 18, 2024

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 18, 2024
@harche
Copy link
Contributor

harche commented Feb 21, 2024

Just to summarize where we are at with this PR.

The most important change in this PR is changing the way ShouldContainerBeRestarted treats the containers that are in Created state. The original behaviour is to restart the containers that are found in the Created state. This was brought in with this PR to cover an edge case.

This PR restricts the change in ShouldContainerBeRestarted only for the Evented PLEG. IMO, this is a safer approach considering we still don't have a good answer on why when using Generic PLEG the the function ShouldContainerBeRestarted is never reached.

Considering we have two distinct CRI calls to first create a container and then start it, there always a possibility that a container could get caught in Created state (and could get restarted as a result of it) irrespective of which PLEG you are using. But maybe it is just that frequent relisting by the Generic PLEG is masking this issue, maybe.

/cc @pacoxu @mikebrow @smarterclayton @dchen1107 @mrunalp @sairameshv @SergeyKanzhelev

@k8s-ci-robot
Copy link
Contributor

@harche: GitHub didn't allow me to request PR reviews from the following users: pacoxu.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Just to summarize where we are at with this PR.

The most important change in this PR is changing the way ShouldContainerBeRestarted treats the containers that are in Created state. The original behaviour is to restart the containers that are found in the Created state. This was brought in with this PR to cover an edge case.

This PR restricts the change in ShouldContainerBeRestarted only for the Evented PLEG. IMO, this is a safer approach considering we still don't have a good answer on why when using Generic PLEG the the function ShouldContainerBeRestarted is never reached.

Considering we have two distinct CRI calls to first create a container and then start it, there always a possibility that a container could get caught in Created state (and could get restarted as a result of it) irrespective of which PLEG you are using. But maybe it is just that frequent relisting by the Generic PLEG is masking this issue, maybe.

/cc @pacoxu @mikebrow @smarterclayton @dchen1107 @mrunalp @sairameshv @SergeyKanzhelev

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pacoxu
Copy link
Member Author

pacoxu commented Feb 22, 2024

cc @yujuhong
as Yu-ju may be familiar with this, according to #123057 (comment).

@pacoxu
Copy link
Member Author

pacoxu commented Feb 28, 2024

Hello!
on behalf of Release Signal Team.

The code freeze is starting 02:00 UTC Wednesday 6th March 2024 / 18:00 PDT Tuesday 5th March 2024 in a week, and we want to ensure that each PR has a chance to be merged on time.

@pacoxu
Copy link
Member Author

pacoxu commented Mar 7, 2024

We missed the code freeze of v1.30.
Ping @smarterclayton if you have time to take a look.
We can later cherry-pick to old releases then.

@liggitt
Copy link
Member

liggitt commented Mar 14, 2024

/milestone clear

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2024
@pacoxu
Copy link
Member Author

pacoxu commented Jun 13, 2024

/remove-lifecycle stale
@smarterclayton @yujuhong Do you have time to take a look in v1.31 release cycle?

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 13, 2024
@aojea
Copy link
Member

aojea commented Jun 17, 2024

/assign @yujuhong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Status: Needs Reviewer
Status: Release Blocker
SIG Node PR Triage
Needs Reviewer
Development

Successfully merging this pull request may close these issues.

kubelet started multi-containers for static pods when EventedPLEG is enabled