Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use user-defined readinessProbe in queue-proxy #4731

Merged
merged 9 commits into from Jul 16, 2019

Conversation

@joshrider
Copy link
Member

joshrider commented Jul 12, 2019

Signed-off-by: Shash Reddy shashwathireddy@gmail.com
Co-authored-by: Shash Reddy shashwathireddy@gmail.com

Fixes #4014

Proposed Changes

  • add default readiness probe to revision spec when user does not specify one
  • remove HTTP and TCP readiness probes from user-container when creating deployments, instead translate them into probe performed by queue-proxy against user-container
  • when user specifies an Exec readiness probe, it will stay on the user-container and the queue-proxy will perform a TCP probe against the user-container to ensure a path is open
  • have the handler used by the activator (to check that the pod is ready) use the same readiness criteria defined by the user

NOTE: for the activator's probe, we are using the same count of "successful probes" as the pod's usual readiness probe. That is, if the activator and "kubelet" are both probing concurrently and the probe's SuccessThreshold is 4, they will only need 4 consecutive successes collectively (as opposed to 4 each). Please poke holes in this.

Release Note

HTTP and TCP readinessProbes are performed by the queue-proxy against the user-container
@googlebot

This comment has been minimized.

Copy link

googlebot commented Jul 12, 2019

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

Copy link

knative-prow-robot left a comment

@joshrider: 0 warnings.

In response to this:

Signed-off-by: Shash Reddy shashwathireddy@gmail.com
Co-authored-by: Shash Reddy shashwathireddy@gmail.com

Fixes #4014

Proposed Changes

  • add default readiness probe to revision spec when user does not specify one
  • remove HTTP and TCP readiness probes from user-container when creating deployments, instead translate them into probe performed by queue-proxy against user-container
  • when user specifies an Exec readiness probe, it will stay on the user-container and the queue-proxy will perform a TCP probe against the user-container to ensure a path is open
  • have the handler used by the activator (to check that the pod is ready) use the same readiness criteria defined by the user

NOTE: for the activator's probe, we are using the same count of "successful probes" as the pod's usual readiness probe. That is, if the activator and "kubelet" are both probing concurrently and the probe's SuccessThreshold is 4, they will only need 4 consecutive successes collectively (as opposed to 4 each). Please poke holes in this.

Release Note

HTTP and TCP readinessProbes are performed by the queue-proxy against the user-container

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@joshrider

This comment has been minimized.

Copy link
Member Author

joshrider commented Jul 12, 2019

/test pull-knative-serving-integration-tests

@joshrider joshrider changed the title use user-defined readinessprobe in queue-proxy Use user-defined readinessProbe in queue-proxy Jul 12, 2019
@knative-metrics-robot

This comment has been minimized.

Copy link

knative-metrics-robot commented Jul 12, 2019

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/serving/k8s_validation.go 98.9% 98.6% -0.3
pkg/apis/serving/v1beta1/revision_defaults.go 87.5% 89.5% 2.0
@joshrider joshrider force-pushed the joshrider:queue-probe branch from e75388c to 41e095d Jul 12, 2019
@shashwathi shashwathi force-pushed the joshrider:queue-probe branch from 41e095d to 08cd78e Jul 12, 2019
@joshrider joshrider force-pushed the joshrider:queue-probe branch from 08cd78e to 24a1e5c Jul 12, 2019
@shashwathi

This comment has been minimized.

Copy link
Contributor

shashwathi commented Jul 13, 2019

/test pull-knative-serving-integration-tests

Copy link
Contributor

markusthoemmes left a comment

A few flyby comments. I have a really hard time keeping track of what calls what, which probes go where and which retries are applied at which spots.

Do you mind drawing a picture of where we want to apply which retry? The nested retrying feels a little odd to me, maybe there's room for an interim change there as well as this PR is pretty big.

Thanks for doing this though, this is great stuff 🙂

if probeUserContainer() {
// Respond with the name of the component handling the request.
w.Write([]byte(queue.Name))
if prober != nil {

This comment has been minimized.

Copy link
@markusthoemmes

markusthoemmes Jul 15, 2019

Contributor

Maybe in a separate PR: Is there a reason why we don't return the state from healthState here? Seems unnecessarily redundant to probe on this path 🤔

@greghaynes do you need that for your "direct to ip" work?

This comment has been minimized.

Copy link
@joshrider

joshrider Jul 15, 2019

Author Member

That seems like an excellent suggestion. 👍

cmd/queue/main.go Outdated Show resolved Hide resolved
cmd/queue/main.go Show resolved Hide resolved
cmd/queue/main.go Outdated Show resolved Hide resolved
pkg/reconciler/revision/resources/queue.go Outdated Show resolved Hide resolved
pkg/reconciler/revision/resources/queue.go Outdated Show resolved Hide resolved
@joshrider

This comment has been minimized.

Copy link
Member Author

joshrider commented Jul 15, 2019

/test pull-knative-serving-smoke-tests

@joshrider joshrider force-pushed the joshrider:queue-probe branch from 24a1e5c to dbe0ec8 Jul 15, 2019
@shashwathi shashwathi force-pushed the joshrider:queue-probe branch from c141077 to a37f22b Jul 15, 2019
cmd/queue/main.go Outdated Show resolved Hide resolved
pkg/apis/serving/k8s_validation.go Outdated Show resolved Hide resolved
pkg/reconciler/revision/resources/queue.go Outdated Show resolved Hide resolved
joshrider and others added 6 commits May 31, 2019
Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>
Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>
Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>
Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>
- merge logic for knative probes and user defined probes
- use probe-period as argument name
- pass probe as environment variable instead of container args

Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>
- Use context for timeout
- do not override exec probe
- simplify the logic for errors when multiple probes are mentioned

Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>
@shashwathi shashwathi force-pushed the joshrider:queue-probe branch from 468c0cd to 7ab8681 Jul 16, 2019
@shashwathi

This comment has been minimized.

Copy link
Contributor

shashwathi commented Jul 16, 2019

@mattmoor : Addressed all your comments. Ready for another review 👍

cmd/queue/main.go Show resolved Hide resolved
// started as early as possible while still wanting to give the container some breathing
// room to get up and running.
timeoutErr := wait.PollImmediate(25*time.Millisecond, timeout, func() (bool, error) {
timeoutErr := wait.PollImmediateUntil(aggressivePollInterval, func() (bool, error) {

This comment has been minimized.

Copy link
@vagababov

vagababov Jul 16, 2019

Contributor

Though I know Matt suggested it I liked the previous version more, it's shorter :)
🤷‍♀

This comment has been minimized.

Copy link
@mattmoor

mattmoor Jul 16, 2019

Member

What'd I suggest?

cmd/queue/main_test.go Outdated Show resolved Hide resolved
joshrider and others added 2 commits Jul 16, 2019
Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>
Co-authored-by: Shash Reddy <shashwathireddy@gmail.com>
@mattmoor

This comment has been minimized.

Copy link
Member

mattmoor commented Jul 16, 2019

I think things largely look good. Going to give others a chance to leave comments, but if nothing comes up I'll do a final pass later so we can get this baking. It may be worth checking out the data race failure above, since this PR touches the queue logic. thanks for all the work leading up to this!

@joshrider

This comment has been minimized.

Copy link
Member Author

joshrider commented Jul 16, 2019

Sounds good. Neither of us have been able to recreate that data race locally. Would be curious to hear if someone else knows how it happened.

/test pull-knative-serving-unit-tests

Signed-off-by: Shash Reddy <shashwathireddy@gmail.com>
Copy link
Member

mattmoor left a comment

/lgtm
/approve
🎉

@googlebot

This comment has been minimized.

Copy link

googlebot commented Jul 16, 2019

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

ℹ️ Googlers: Go here for more info.

@mattmoor

This comment has been minimized.

Copy link
Member

mattmoor commented Jul 16, 2019

/approve

@knative-prow-robot

This comment has been minimized.

Copy link

knative-prow-robot commented Jul 16, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joshrider, mattmoor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot merged commit f53271a into knative:master Jul 16, 2019
8 checks passed
8 checks passed
cla/google CLAs have been manually verified by Googler who has set the 'cla: yes' label
pull-knative-serving-build-tests Job succeeded.
Details
pull-knative-serving-go-coverage Job succeeded.
Details
pull-knative-serving-integration-tests Job succeeded.
Details
pull-knative-serving-smoke-tests Job succeeded.
Details
pull-knative-serving-unit-tests Job succeeded.
Details
pull-knative-serving-upgrade-tests Job succeeded.
Details
tide In merge pool.
Details
nak3 added a commit to nak3/serving that referenced this pull request Jul 17, 2019
This patch makes a tiny fix which removes invalid setting in
configuration example.

After knative#4731, `periodSeconds`
needs to be set with `failureThreshold` and `timeoutSeconds`. This
patch simply removes `periodSeconds` from the config.
@joshrider joshrider deleted the joshrider:queue-probe branch Aug 6, 2019
knative-prow-robot added a commit that referenced this pull request Sep 29, 2019
This patch makes a tiny fix which removes invalid setting in
configuration example.

After #4731, `periodSeconds`
needs to be set with `failureThreshold` and `timeoutSeconds`. This
patch simply removes `periodSeconds` from the config.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.