Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activator to perform health checks before forwarding real requests #2856

Closed
tcnghia opened this issue Jan 4, 2019 · 4 comments · Fixed by #3257

Comments

@tcnghia
Copy link
Contributor

@tcnghia tcnghia commented Jan 4, 2019

Expected Behavior

We can handle POST request without needing to retries them. We minimize the unneeded GET retries.

If activator performs health check before forwarding requests we could potentially fix the POST attempt to one, and also reduce a lot of GET retries (of real requests).

Actual Behavior

Currently activator doesn't perform health checks before forwarding requests. This means we have to retry for POST, and we perform unnecessary GETs before the Revision is handling traffic.

Steps to Reproduce the Problem

Additional Info

@tcnghia

This comment has been minimized.

Copy link
Contributor Author

@tcnghia tcnghia commented Jan 5, 2019

This may be done in conjunction of queue-proxy since activator no longer reads from Revision.

@vvraskin

This comment has been minimized.

Copy link
Member

@vvraskin vvraskin commented Jan 5, 2019

Referring to the Get retries, do you mean this peace of code? https://github.com/knative/serving/blob/master/pkg/activator/revision.go
It will be replaced by Revision informer soonish as part of #1573

@markusthoemmes

This comment has been minimized.

Copy link
Contributor

@markusthoemmes markusthoemmes commented Jan 8, 2019

@vvraskin this is about retrying GET HTTP requests to the user application, because we wait for network programming. It's not about retries towards the kubernetes API.

@mattmoor

This comment has been minimized.

Copy link
Member

@mattmoor mattmoor commented Feb 7, 2019

This feels like it fits with the class of probing we're doing for positive handoff.

mattmoor added a commit to mattmoor/serving that referenced this issue Feb 17, 2019
This adds a header `k-network-probe`, to which the Knative networking elements
respond without forwarding the requests.  They also identify themselves in their
response, so that we know what component is handling the probe.

This is related to: knative#2856, knative#2849, knative#3239
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 17, 2019
This adds a header `k-network-probe`, to which the Knative networking elements
respond without forwarding the requests.  They also identify themselves in their
response, so that we know what component is handling the probe.

This is related to: knative#2856, knative#2849, knative#3239
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 17, 2019
This adds a header `k-network-probe`, to which the Knative networking elements
respond without forwarding the requests.  They also identify themselves in their
response, so that we know what component is handling the probe.

This is related to: knative#2856, knative#2849, knative#3239
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 17, 2019
This adds a header `k-network-probe`, to which the Knative networking elements
respond without forwarding the requests.  They also identify themselves in their
response, so that we know what component is handling the probe.

This is related to: knative#2856, knative#2849, knative#3239
knative-prow-robot added a commit that referenced this issue Feb 17, 2019
This adds a header `k-network-probe`, to which the Knative networking elements
respond without forwarding the requests.  They also identify themselves in their
response, so that we know what component is handling the probe.

This is related to: #2856, #2849, #3239
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 17, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Unit testing for the get probes
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 18, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 18, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 22, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 22, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 22, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
knative-prow-robot added a commit that referenced this issue Feb 22, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on #3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: #3239
Fixes: #2856
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.