Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update KEP with new name: PodReadyToStartContainers condition #3778

Merged
merged 2 commits into from Feb 9, 2023

Conversation

ddebroy
Copy link
Member

@ddebroy ddebroy commented Jan 24, 2023

  • One-line PR description: Update KEP with PodReadyToStartContainers condition and updating milestone for Beta
  • Other comments: This KEP updates the name of the condition to PodReadyToStartContainers (from the previous PodHasNetwork that sig-network had concerns with) as previously discussed with sig-node and sig-network leads. The description of the condition and goals are updated accordingly.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jan 24, 2023
@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jan 24, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 24, 2023
Signed-off-by: Deep Debroy <ddebroy@gmail.com>
@dchen1107
Copy link
Member

dchen1107 commented Jan 24, 2023

/lgtm
/approve

Not sure the new name is better than the previous one though since it implies that if a pod can start the container purely depends on the network, which is not true. But I am not good at naming, go with the majority here. :-)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 24, 2023
@ddebroy
Copy link
Member Author

ddebroy commented Jan 24, 2023

/hold

for sig-network lgtm as well.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 24, 2023
@aojea
Copy link
Member

aojea commented Jan 25, 2023

/lgtm

previous name imply a lot of things for the network that may not be true or may be misleading, no strong opinion about the name, but I'll adventure to throw a suggestion: PodInitialized 😄

@mrunalp
Copy link
Contributor

mrunalp commented Jan 25, 2023

I like PodInitialized :)

@bart0sh bart0sh added this to Needs Reviewer in SIG Node PR Triage Jan 26, 2023
@bart0sh bart0sh moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Jan 28, 2023
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2023
@ddebroy
Copy link
Member Author

ddebroy commented Jan 31, 2023

Latest commit updates the PRR section to Beta. No other sections already LGTM ed were altered.

and CNI plugins). Completion of all these phases puts the pod sandbox in a
state where the containers in a pod can be started. This KEP proposes a
`PodReadyToStartContainers` condition in pod status to indicate a pod has
reached a state where it's containers are ready to be started. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: its

@mrunalp
Copy link
Contributor

mrunalp commented Feb 1, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 1, 2023
Copy link
Contributor

@cici37 cici37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments regarding with PRR.
One more question regarding with Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?(Ask here since cannot comment on unchanged line), due to Pod startup latency SLI/SLO details, will this feature slightly add up on the latency?

/cc @johnbelamaric

```
apiserver_request_total{verb="PATCH", resource="pods", subresource="status"}
```
for this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increase of this metrics could potentially be caused by other issue besides current feature. Is there any other info cluster admin could use to inform a rollback due to PodReadyToStartContainersCondition issue?

Copy link
Member Author

@ddebroy ddebroy Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't any specific metric right now for distinct Pod Status updates (specific pod conditions or other aspects of pod status like container statuses). We may consider that as an independent enhancement to the Kubelet status manager (covering all pod status fields) as mentioned below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there is a node label on those metrics? If the spike is specifically from nodes with this feature enabled, that could be why. I believe @logicalhan recently added feature gate metrics, so in theory you could construct a query that looked for spikes only on nodes with the feature enabled.

Copy link
Member Author

@ddebroy ddebroy Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@ddebroy
Copy link
Member Author

ddebroy commented Feb 8, 2023

One more question regarding with Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?(Ask here since cannot comment on unchanged line), due to Pod startup latency SLI/SLO details, will this feature slightly add up on the latency?

This may result in more API calls from Kubelet Status Manager to API server (under certain circumstances when the Kubelet Status Manager did not batch a bunch of updates for some reason). This aspect is covered under the section ###### Will enabling / using this feature result in any new API calls?

No specific SLIs or metrics are surfaced by Kubelet associated with distinct pod status updates. So that can be considered a separate enhancement to broadly address the overall Kubelet Status manager.

```
apiserver_request_total{verb="PATCH", resource="pods", subresource="status"}
```
for this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there is a node label on those metrics? If the spike is specifically from nodes with this feature enabled, that could be why. I believe @logicalhan recently added feature gate metrics, so in theory you could construct a query that looked for spikes only on nodes with the feature enabled.

Signed-off-by: Deep Debroy <ddebroy@gmail.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 9, 2023
@ddebroy
Copy link
Member Author

ddebroy commented Feb 9, 2023

@johnbelamaric @cici37 the PRR section has been updated with the suggestions above. Please note that the update (scoped to the last commit) also removed the original LGTM from sig-node reviews.

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

```
apiserver_request_total{verb="PATCH", resource="pods", subresource="status"}
```
for this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 9, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, ddebroy, johnbelamaric

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 9, 2023
@ddebroy
Copy link
Member Author

ddebroy commented Feb 9, 2023

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 9, 2023
@k8s-ci-robot k8s-ci-robot merged commit c80f656 into kubernetes:master Feb 9, 2023
SIG Node PR Triage automation moved this from Needs Approver to Done Feb 9, 2023
@ddebroy ddebroy changed the title Update KEP with PodReadyToStartContainers condition for Beta Update KEP with new name: PodReadyToStartContainers condition Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants