Avoid thundering herd of probes during failover. #231

mattmoor · 2020-08-28T14:47:36Z

This change guards the status prober with a simple readiness check as an
optimization for the case of failing over and globally resyncing resources.
When a fresh controller pod slurps through ~100 kingress for the first time
we will currently probe ALL of them. On a 10 Node cluster, with 1 public and
3 private hostnames, this works out to a lot of probes:

  100 * (1 * 10 + 3 * 10) = 4000

Factor in what we do with endpoint probing, and it adds roughly 2000 probes.

Currently, this is what the scale-100 test in serving is doing every 30s or so,
in addition to the standard work it is doing.

Fixes: #203

Possibly related to: #226

This change guards the status prober with a simple readiness check as an optimization for the case of failing over and globally resyncing resources. When a fresh controller pod slurps through ~100 kingress for the first time we will currently probe ALL of them. On a 10 Node cluster, with 1 public and 3 private hostnames, this works out to a lot of probes: ``` 100 * (1 * 10 + 3 * 10) = 4000 ``` Factor in what we do with endpoint probing, and it adds roughly 2000 probes. Currently, this is what the scale-100 test in serving is doing every 30s or so, in addition to the standard work it is doing. Fixes: knative-extensions#203 Possibly related to: knative-extensions#226

mattmoor · 2020-08-28T15:46:02Z

Ok, this clearly has bugs.

knative-metrics-robot · 2020-08-28T16:12:42Z

The following is the coverage report on the affected files.
Say /test pull-knative-sandbox-net-contour-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/contour/contour.go	91.0%	91.5%	0.5

n3wscott · 2020-08-28T17:14:02Z

/lgtm
/approve

knative-prow-robot · 2020-08-28T17:14:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor, n3wscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mattmoor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mattmoor assigned tcnghia and n3wscott Aug 28, 2020

googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Aug 28, 2020

knative-prow-robot requested review from tcnghia and vaikas August 28, 2020 14:47

knative-prow-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 28, 2020

With moar !Ready-ness

6f83918

mattmoor mentioned this pull request Aug 28, 2020

Roll forward to contour 1.8 #232

Merged

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 28, 2020

knative-prow-robot merged commit 6bb057e into knative-extensions:master Aug 28, 2020

mattmoor deleted the ready-already branch August 28, 2020 17:23

mattmoor mentioned this pull request Sep 4, 2020

net-istio should guard the prober against resync flooding knative-extensions/net-istio#262

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid thundering herd of probes during failover. #231

Avoid thundering herd of probes during failover. #231

mattmoor commented Aug 28, 2020

mattmoor commented Aug 28, 2020

knative-metrics-robot commented Aug 28, 2020

n3wscott commented Aug 28, 2020

knative-prow-robot commented Aug 28, 2020

Avoid thundering herd of probes during failover. #231

Avoid thundering herd of probes during failover. #231

Conversation

mattmoor commented Aug 28, 2020

mattmoor commented Aug 28, 2020

knative-metrics-robot commented Aug 28, 2020

n3wscott commented Aug 28, 2020

knative-prow-robot commented Aug 28, 2020