Immediately scale up autoscaler for cold-start #2963

greghaynes · 2019-01-23T00:04:21Z

When the autoscaler gets a stat to scale a service up from 0 we should
act on this immediately in order to decrease cold-start time.

Closes #2961

Release Note

Immediately scale from zero in autoscaler.

knative-prow-robot

@greghaynes: 0 warnings.

In response to this:

When the autoscaler gets a stat to scale a service up from 0 we should
act on this immediately in order to decrease cold-start time.

Closes #2961

Release Note
Immediately scale from zero in autoscaler.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

markusthoemmes

Approach is lgtm but I think there's a bug here wrt. to how we get to know if it's the activator sending the metrics.

On that note: Is it even important that the activator is sending the metric in the end? Is there any harm dropping that requirement and force a tick on any incoming metric when at 0?

markusthoemmes · 2019-01-23T07:25:49Z

pkg/autoscaler/multiscaler.go

 		scaler.scaler.Record(ctx, stat)
+		if scaler.metric.Status.DesiredScale == 0 && stat.PodName == ActivatorPodName {


Since we're scaling the activator, this won't work anymore I believe. You presumably need to do the same check as the autoscaler does, which is: if strings.HasPrefix(stat.PodName, ActivatorPodName) {

Has this worked locally?

Can the first half of this be: scaler.getLatestScale() == 0?

Good find - I removed the pod name check entirely since it really shouldnt matter where the stat came from (as you point out).

mattmoor · 2019-01-29T01:45:55Z

+1 to @markusthoemmes comment, but when we reverse the flow of metrics (@yanweiguo is working on this) it shouldn't matter (any push is activator then), so will it matter?

@greghaynes I put this in 0.4, so LMK if that timeframe is a problem.

markusthoemmes · 2019-01-30T21:37:01Z

Thanks for making the change. Can you add a test for this to bump coverage up again?

mattmoor · 2019-01-31T05:44:35Z

/retest

mattmoor · 2019-01-31T05:45:11Z

@greghaynes I'm curious what speedup you see with this?

greghaynes · 2019-01-31T19:30:12Z

@mattmoor In theory it should be ~.5 sec avg (0-1sec depending on when a stat is sent). Ive been trying to test this out the past day on my shared k8s cluster and unfortunately my k8s controlplane is way too noisy to get useful data out of (another interesting argument for us doing local scheduling). I'll try and set up a local private cluster to see, but if anyone has one and wants to check we now have a perf test for this: go test -v -tags performance ./test/performance -run ScaleFromZero1 | grep Average:

greghaynes · 2019-02-01T01:28:09Z

@mattmoor Using #3059 I got the following results:

without this patch:
min: 6.3 max: 8.1 avg: 6.9
with this patch:
min: 5.9 max: 6.5 avg: 6.2

mattmoor · 2019-02-05T22:55:08Z

Ping @greghaynes I think there's one outstanding comment to merge this.

markusthoemmes · 2019-02-06T07:10:42Z

For clarity, that comment is:

Can you add a test for this to bump coverage up again?

markusthoemmes · 2019-02-06T07:57:49Z

/assign @markusthoemmes

So it appears in my review lists and I keep track of it. We should land this in 0.4 (as scheduled)

greghaynes · 2019-02-06T17:07:59Z

Thanks for the reminder - ill get to this today.

mattmoor · 2019-02-12T00:44:23Z

/test pull-knative-serving-go-coverage

mattmoor · 2019-02-12T00:45:41Z

I think not commenting may mean same-coverage? If so we should file a bug that it doesn't remove the old comment, I reran it to test.

markusthoemmes

I think we should refine the test a bit to prevent regression in the future.

markusthoemmes · 2019-02-12T09:14:14Z

pkg/autoscaler/multiscaler_test.go

+		// We got the signal!
+	case <-time.After(30 * time.Millisecond):
+		t.Fatalf("timed out waiting for Watch()")
+	}


I think this test would also pass without the changes made? I get coverage on the newly added path only intermittently. The multiscaler runs on a tick interval of 1ms so it will be quick either way. Should we increase the tick interval of the multiscaler and assert we get a tick in a substantially quicker time? (For example: tickInterval of 100ms, make the others wait for 200ms and this wait for 30ms).

+1 let's add a parallel test to this with a tick interval larger than the overall test timeout. Then we should be able to properly test this, right?

Hrm, I am having a hard time coming up with a way to do this without chopping up or adding some multiscaler access. When we create a metric which in turn creates a scaler we need to go through one tickScaler cycle in order for the scaler desiredScale to be set to 0. If I up the tick interval then we'll have to wait for that time period. AFAICT theres no good interface for me to inject this tick cycle, do you have any thoughts on this @markusthoemmes ?

@greghaynes yeah, I realize it's not as easy. The following adjustments trigger the path we want in the test pretty reliably:

diff --git a/pkg/autoscaler/multiscaler_test.go b/pkg/autoscaler/multiscaler_test.go index a39ba0d7..840220d5 100644 --- a/pkg/autoscaler/multiscaler_test.go +++ b/pkg/autoscaler/multiscaler_test.go @@ -110,9 +110,11 @@ func TestMultiScalerScaling(t *testing.T) { } func TestMultiScalerScaleToZero(t *testing.T) { + tickInterval := 50 * time.Millisecond + ctx := context.TODO() ms, stopCh, uniScaler := createMultiScaler(t, &autoscaler.Config{ - TickInterval: time.Millisecond * 1, + TickInterval: tickInterval, EnableScaleToZero: true, }) defer close(stopCh) @@ -160,18 +162,10 @@ func TestMultiScalerScaleToZero(t *testing.T) { select { case <-done: // We got the signal! - case <-time.After(30 * time.Millisecond): + case <-time.After(tickInterval * 2): t.Fatalf("timed out waiting for Watch()") } - // Verify that we stop seeing "ticks" - select { - case <-done: - t.Fatalf("Got unexpected tick") - case <-time.After(30 * time.Millisecond): - // We got nothing! - } - desiredScaleMutex.Lock() desiredScale = 1 desiredScaleMutex.Unlock() @@ -188,7 +182,7 @@ func TestMultiScalerScaleToZero(t *testing.T) { select { case <-done: // We got the signal! - case <-time.After(30 * time.Millisecond): + case <-time.After(tickInterval / 2): t.Fatalf("timed out waiting for Watch()") }

Does that make sense?

@markusthoemmes hrm, check out what I just pushed - I added a method to directly set the desiredscale of a scalerrunner which lets us actually test this code path 100% of the time.

mattmoor · 2019-02-14T01:39:24Z

Ping, I'd like to see this land in 0.4

knative-metrics-robot · 2019-02-14T06:51:49Z

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/autoscaler/multiscaler.go	92.3%	94.7%	2.4

When the autoscaler gets a stat to scale a service up from 0 we should act on this immediately in order to decrease cold-start time. Closes knative#2961

markusthoemmes

Thank you for doing these changes

/lgtm
/approve

knative-prow-robot · 2019-02-14T06:56:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: greghaynes, markusthoemmes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/autoscaler/OWNERS~~ [markusthoemmes]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

knative-prow-robot requested review from josephburnett and markusthoemmes January 23, 2019 00:04

knative-prow-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 23, 2019

knative-prow-robot reviewed Jan 23, 2019

View reviewed changes

markusthoemmes suggested changes Jan 23, 2019

View reviewed changes

mattmoor added this to the Serving 0.4 milestone Jan 28, 2019

greghaynes force-pushed the fix/autoscaler-immediate-scale-from-zero branch from 4be0da9 to fcaf0c1 Compare January 30, 2019 21:32

knative-prow-robot assigned markusthoemmes Feb 6, 2019

greghaynes force-pushed the fix/autoscaler-immediate-scale-from-zero branch from fcaf0c1 to 17e9996 Compare February 7, 2019 20:28

knative-prow-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 7, 2019

markusthoemmes reviewed Feb 12, 2019

View reviewed changes

greghaynes force-pushed the fix/autoscaler-immediate-scale-from-zero branch from 17e9996 to 000e179 Compare February 14, 2019 05:51

knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 14, 2019

greghaynes force-pushed the fix/autoscaler-immediate-scale-from-zero branch 2 times, most recently from 9616881 to b691dff Compare February 14, 2019 06:49

knative-prow-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 14, 2019

greghaynes force-pushed the fix/autoscaler-immediate-scale-from-zero branch 3 times, most recently from 723019a to 3b777b3 Compare February 14, 2019 06:54

Immediately scale up autoscaler for cold-start

ebc0a58

When the autoscaler gets a stat to scale a service up from 0 we should act on this immediately in order to decrease cold-start time. Closes knative#2961

greghaynes force-pushed the fix/autoscaler-immediate-scale-from-zero branch from 3b777b3 to ebc0a58 Compare February 14, 2019 06:55

markusthoemmes approved these changes Feb 14, 2019

View reviewed changes

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 14, 2019

knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 14, 2019

knative-prow-robot merged commit 5b7e155 into knative:master Feb 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Immediately scale up autoscaler for cold-start #2963

Immediately scale up autoscaler for cold-start #2963

greghaynes commented Jan 23, 2019

knative-prow-robot left a comment

markusthoemmes left a comment

markusthoemmes Jan 23, 2019

greghaynes Jan 30, 2019

mattmoor commented Jan 29, 2019

markusthoemmes commented Jan 30, 2019

mattmoor commented Jan 31, 2019

mattmoor commented Jan 31, 2019

greghaynes commented Jan 31, 2019

greghaynes commented Feb 1, 2019

mattmoor commented Feb 5, 2019

markusthoemmes commented Feb 6, 2019

markusthoemmes commented Feb 6, 2019

greghaynes commented Feb 6, 2019

mattmoor commented Feb 12, 2019

mattmoor commented Feb 12, 2019 •

edited

markusthoemmes left a comment

markusthoemmes Feb 12, 2019

mattmoor Feb 12, 2019

greghaynes Feb 12, 2019 •

edited

markusthoemmes Feb 13, 2019

greghaynes Feb 14, 2019

mattmoor commented Feb 14, 2019

knative-metrics-robot commented Feb 14, 2019

markusthoemmes left a comment

knative-prow-robot commented Feb 14, 2019

		scaler.scaler.Record(ctx, stat)
		if scaler.metric.Status.DesiredScale == 0 && stat.PodName == ActivatorPodName {

Immediately scale up autoscaler for cold-start #2963

Immediately scale up autoscaler for cold-start #2963

Conversation

greghaynes commented Jan 23, 2019

knative-prow-robot left a comment

Choose a reason for hiding this comment

markusthoemmes left a comment

Choose a reason for hiding this comment

markusthoemmes Jan 23, 2019

Choose a reason for hiding this comment

greghaynes Jan 30, 2019

Choose a reason for hiding this comment

mattmoor commented Jan 29, 2019

markusthoemmes commented Jan 30, 2019

mattmoor commented Jan 31, 2019

mattmoor commented Jan 31, 2019

greghaynes commented Jan 31, 2019

greghaynes commented Feb 1, 2019

mattmoor commented Feb 5, 2019

markusthoemmes commented Feb 6, 2019

markusthoemmes commented Feb 6, 2019

greghaynes commented Feb 6, 2019

mattmoor commented Feb 12, 2019

mattmoor commented Feb 12, 2019 • edited

markusthoemmes left a comment

Choose a reason for hiding this comment

markusthoemmes Feb 12, 2019

Choose a reason for hiding this comment

mattmoor Feb 12, 2019

Choose a reason for hiding this comment

greghaynes Feb 12, 2019 • edited

Choose a reason for hiding this comment

markusthoemmes Feb 13, 2019

Choose a reason for hiding this comment

greghaynes Feb 14, 2019

Choose a reason for hiding this comment

mattmoor commented Feb 14, 2019

knative-metrics-robot commented Feb 14, 2019

markusthoemmes left a comment

Choose a reason for hiding this comment

knative-prow-robot commented Feb 14, 2019

mattmoor commented Feb 12, 2019 •

edited

greghaynes Feb 12, 2019 •

edited