New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry on Get Revision error? #1558
Comments
@mdemirhan saw this as well. |
In cases of reconcilliation, this might not be an issue because in 30 seconds, another attempt will happen and will likely succeed. In activator's case though, a transient failure like this causes the call to be dropped. But we should instead retry this a couple of times before we drop the call. @akyyy can you please open a tracing item for activator to handle this case? |
The activator specific issue is #1573 |
Given a distinct activator issue for this, I'm not sure what the scope of this issue is? tl;dr Without HA masters I think that this is just a reality of the world in which we live. The availability of our control plane is tied to master availability, which can be low (~99.5%?). We should strive to maximize the availability of our data plane, which would ideally be distinct, but creeps in when you start to scale based on data plane metrics. I think we maximize data plane availability by minimizing our hard dependency on the control plane in the data plane. I believe the only place with a truly hard dependency is the activator. Autoscaling is clearly affected as well, but besides 0->1 its success doesn't block request routing. |
sgtm |
/area API
/kind bug
Expected Behavior
Get api should be reliable.
Actual Behavior
We saw this error a few times:
Unable to get revision: Get https://10.35.240.1:443/apis/serving.knative.dev/v1alpha1/namespaces/default/revisions/configuration-example-00001: unexpected EOF
In this case, a potential reason is master is down.
Additional Info
The text was updated successfully, but these errors were encountered: