Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gate readiness probes #6707

Merged

Conversation

vagababov
Copy link
Contributor

This aims at limiting only a single readiness probe at a time.
We noticed a problem with the activator daemon set that when a new pod comes to life and is probed by say 30 probes at the same time (the size of our load benchmark cluster), the user container can be overwhelmed and fails the probes.
So this basically does gating of the readiness probes.
If a probe is already in progress, we attach to its result. Otherwise create a new one.
There's more details due to the possible races, so the code feels complex, but in essence it's simple :)

/assign @markusthoemmes mattmoor

1. Foremost, make sure the tests don't run for 25 seconds, but rather just 4
2. Remove the unneeded method
3. More to come
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Feb 3, 2020
@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/networking approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 3, 2020
@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/queue/readiness/probe.go 100.0% 96.4% -3.6

pkg/queue/readiness/probe.go Outdated Show resolved Hide resolved
pkg/queue/readiness/probe.go Outdated Show resolved Hide resolved
pkg/queue/readiness/probe_test.go Outdated Show resolved Hide resolved
@vagababov
Copy link
Contributor Author

Fixes #6680

@knative-test-reporter-robot

The following jobs failed:

Test name Triggers Retries
pull-knative-serving-integration-tests pull-knative-serving-integration-tests 1/3

Automatically retrying due to test flakiness...
/test pull-knative-serving-integration-tests

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 3, 2020
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: markusthoemmes, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot merged commit bccfd01 into knative:master Feb 3, 2020
@vagababov vagababov deleted the 20200131-clean-readiness branch June 23, 2020 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/networking cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants