Skip to content

Conversation

@Priyankasaggu11929
Copy link
Member

@Priyankasaggu11929 Priyankasaggu11929 commented Oct 16, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

Ongoing work for KEP-3085 to prepare for stable graduation.

  • First part (747aaf4) of the PR implements the following changes, to allow kubelet to update the PodReadyToStartContainers pod condition immediately after all three requirements (pod sandbox, networking, volume)are ready, but before container images are pulled or containers are created.

    • add OnPodSandboxReady method to the RuntimeHelper interface in container/helpers.go
    • implement the OnPodSandboxReady method in Kubelet
    • inside (containerRuntime).SyncPod, after sandbox creation and network configuration, invoke runtimeHelper.OnPodSandboxReady() directly
      (this method retrieves current pod status, generates updated API status, and notifies the status manager to sync to the API server)
  • Second Part (b566e88) of the PR is adding tests to verify invocation of OnPodSandboxReady method and PodReadyToStartContainers condition. These tests also validates the order between the DRA allocate calls and PodReadytoStartContainers condition.

The new code implementation is gated under PodReadyToStartContainersCondition feature gate, and fails gracefully, i.e, it only logs error and continues the pod creation process to make sure that these new changes don't block pod startup.

Which issue(s) this PR is related to:

Fixes #134460

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Kubelet now sets `PodReadyToStartContainers` condition immediately after sandbox creation rather than after image pull, reducing the time to condition True.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

KEP: https://github.com/kubernetes/enhancements/issues/4138

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 16, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Oct 16, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Priyankasaggu11929
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Priyankasaggu11929 Priyankasaggu11929 changed the title Kep 3085 add callback [KEP-3085] kubelet - update PodSync flow to set PodReadyToStartContainers condition immediately after sandbox creation Oct 16, 2025
@Priyankasaggu11929
Copy link
Member Author

Windows unit test failure is tracked here #134300

@Priyankasaggu11929
Copy link
Member Author

cc: @SergeyKanzhelev @haircommander for review too. Thanks!

klet.allocationManager.SetContainerRuntime(runtime)

// Register pod sandbox ready callback with runtime manager.
// The Type assertion is used because the `SetPodSandboxReadyCallback` method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should it not be part of the interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @haircommander, do you mean adding the OnPodSandboxReady callback to RuntimeHelper interface directly and and calling it as m.runtimeHelper.OnPodSandboxReady() in containerruntime.SyncPod()?

I tried that approach and that was cleaner as well.
My only thought was if extending the RuntimeHelper interface was fine, provided all components implementing it will now need to define this additional function as well.

But now I think that is simpler approach, since within Kubernetes codebase, there're only 2 places which implements this interface (Kubelet and FakeRuntimeHelper).

I'll update the PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

747aaf4 addresses this.

@haircommander, please review again. Thanks!

@SergeyKanzhelev
Copy link
Member

What about DRA? Can we include it in the description as well. I wonder if there are more places where it should be mentioned that DRA allocation is also included into the time before this condition is set

Comment on lines +1205 to +1211
// SyncPod syncs the running pod into the desired pod by executing following steps:
//
// 1. Compute sandbox and container changes.
// 2. Kill pod sandbox if necessary.
// 3. Kill any containers that should not be running.
// 4. Create sandbox if necessary.
// 5. Create ephemeral containers.
// 6. Create init containers.
// 7. Resize running containers (if InPlacePodVerticalScaling==true)
// 8. Create normal containers.
// 5. Invoke sandbox ready callback to Kubelet to update pod status
// 6. Create ephemeral containers.
// 7. Create init containers.
// 8. Resize running containers (if InPlacePodVerticalScaling==true)
// 9. Create normal containers.
Copy link
Member Author

@Priyankasaggu11929 Priyankasaggu11929 Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @SergeyKanzhelev,

What about DRA? Can we include it in the description as well. I wonder if there are more places where it should be mentioned that DRA allocation is also included into the time before this condition is set

I was preparing a separate PR I will update this PR itself to add tests that validates this order between the DRA allocate calls and PodReadytoStartContainers condition.

I can also extend the documentation comments over SyncPod function to clarify the above order.

Plus, I am also planning to update the Pod Lifecycle Docs to mention this.

Would that be sufficient? Or I'm missing some more places? Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b566e88 add tests for validating order of DRA Allocate calls and PodReadyToStartContainers.

@SergeyKanzhelev, please review. Thanks!

@Priyankasaggu11929 Priyankasaggu11929 changed the title [KEP-3085] kubelet - update PodSync flow to set PodReadyToStartContainers condition immediately after sandbox creation [KEP-3085] kubelet - extend RuntimeHelper interface with OnPodSandboxReady to update PodReadyToStartContainers condition correctly Oct 22, 2025
…ce to update `PodReadyToStartContainers` condition immediately after sandbox creation

This is to address the bug (gh-issue 134460), which reported that currently `PodReadyToStartContainers` condition is only set to `True` after the container image pull is completed. so, if the image size is big and image pull takes significant time to finish, the pod status managaer is blocked and the condition remaind `False`.

The commit implements the following changes, to allow kubelet to update the `PodReadyToStartContainers` pod condition immediately after all three requirements (pod sandbox, networking, volume)are ready, but before container images are pulled or containers are created.
* add `OnPodSandboxReady` method to the `RuntimeHelper` interface in `container/helpers.go`
* implement the `OnPodSandboxReady` method in Kubelet
* inside `(containerRuntime).SyncPod`, after sandbox creation and network configuration, invoke `runtimeHelper.OnPodSandboxReady()` directly
  (this method retrieves current pod status, generates updated API status, and notifies the status manager to sync to the API server)

This implementation is gated under `PodReadyToStartContainersCondition` feature gate, and fails gracefully, i.e, it only logs error and continues the pod creation process to make sure that these new changes don't block pod startup.
@Priyankasaggu11929
Copy link
Member Author

Hello @haircommander @SergeyKanzhelev - just another ping for requesting review, to make sure I address any more follow up review items in time for the 1.35 upcoming code freeze timeline. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Development

Successfully merging this pull request may close these issues.

PodReadyToStartContainers condition gets flipped to true after image pull.

4 participants