Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaler: rash decisions after crash or start #2705

Closed
mgencur opened this issue Dec 12, 2018 · 1 comment · Fixed by #3771
Closed

Autoscaler: rash decisions after crash or start #2705

mgencur opened this issue Dec 12, 2018 · 1 comment · Fixed by #3771
Labels
area/autoscale kind/bug Categorizes issue or PR as related to a bug.

Comments

@mgencur
Copy link
Contributor

mgencur commented Dec 12, 2018

When the autoscaler first starts or crashes and restarts it makes rash decisions about scaling.
For example, when it crashes and restarts, and there are no requests for 2 seconds it will immediately scale to down significantly (scale to 1). See my test below which simulates the situation when there are no statistics (after crash) except for those from last two seconds. The test returns 0 but there's actually a guard against scaling to 0, so in reality it will scale to 1.
Another example is going to a panic mode, it's enough to have a high load for two seconds (instead of the default 6-second window) and the autoscaler goes into the panic mode. However, in this case it's probably not such a big concern.

Expected Behavior

Make sure we have statistics from a stable window (by default 60 seconds) before scaling down significantly.

Actual Behavior

Scaling down very quickly when there are not enough statistics.

Steps to Reproduce the Problem

func TestAutoscaler_ShortlyAfterStartOrCrash_ScaleToZero(t *testing.T) {
	a := newTestAutoscaler(10.0)
	now := a.recordLinearSeries(
		t,
		time.Now(),
		linearSeries{
			startConcurrency: 0,
			endConcurrency:   0,
			durationSeconds:  2,
			podCount:         10,
		})

	a.expectScale(t, now, 10, true)
	// ^^^fails with "Expected 10. Got 0."
}
@knative-prow-robot knative-prow-robot added area/autoscale kind/bug Categorizes issue or PR as related to a bug. labels Dec 12, 2018
@mgencur mgencur changed the title Autoscaler: rash decision after crash or start Autoscaler: rash decisions after crash or start Dec 12, 2018
@hohaichi
Copy link
Contributor

@mgencur PR#3771 should fix this issue. The repro test above is not applicable though, because it tests how the autoscaler calculates the desired scale from (concurrency) stats instead of testing where the scaling action is decided from the desired scale. There are tests in the PR that assert the scale-to-minScale and bounce-from-0 when there is no metric (looks for scaleTo: -1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/autoscale kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants