Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on Get Revision error? #1558

Closed
akyyy opened this issue Jul 10, 2018 · 6 comments
Closed

Retry on Get Revision error? #1558

akyyy opened this issue Jul 10, 2018 · 6 comments
Labels
area/API API objects and controllers kind/bug Categorizes issue or PR as related to a bug.

Comments

@akyyy
Copy link
Contributor

akyyy commented Jul 10, 2018

/area API
/kind bug

Expected Behavior

Get api should be reliable.

Actual Behavior

We saw this error a few times:
Unable to get revision: Get https://10.35.240.1:443/apis/serving.knative.dev/v1alpha1/namespaces/default/revisions/configuration-example-00001: unexpected EOF
In this case, a potential reason is master is down.

Additional Info

@google-prow-robot google-prow-robot added area/API API objects and controllers kind/bug Categorizes issue or PR as related to a bug. labels Jul 10, 2018
@akyyy
Copy link
Contributor Author

akyyy commented Jul 10, 2018

@mdemirhan saw this as well.

@mdemirhan
Copy link
Contributor

In cases of reconcilliation, this might not be an issue because in 30 seconds, another attempt will happen and will likely succeed.

In activator's case though, a transient failure like this causes the call to be dropped. But we should instead retry this a couple of times before we drop the call. @akyyy can you please open a tracing item for activator to handle this case?

@akyyy
Copy link
Contributor Author

akyyy commented Jul 11, 2018

The activator specific issue is #1573

@mattmoor
Copy link
Member

Given a distinct activator issue for this, I'm not sure what the scope of this issue is?

tl;dr Without HA masters I think that this is just a reality of the world in which we live.

The availability of our control plane is tied to master availability, which can be low (~99.5%?).

We should strive to maximize the availability of our data plane, which would ideally be distinct, but creeps in when you start to scale based on data plane metrics. I think we maximize data plane availability by minimizing our hard dependency on the control plane in the data plane.

I believe the only place with a truly hard dependency is the activator.

Autoscaling is clearly affected as well, but besides 0->1 its success doesn't block request routing.

@mdemirhan
Copy link
Contributor

Given that we have #1573 for activator, I propose that we close this one. @akyyy @mattmoor WDYT?

@akyyy
Copy link
Contributor Author

akyyy commented Jul 12, 2018

sgtm

@akyyy akyyy closed this as completed Jul 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants