New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
State of a failing revision is not carry-forwarded to the CR status and the logs #10267
Comments
This is indeed a hole in our readiness state. A revision that was once considered Ready will currently never become unready. |
Do you think that this known hole also covers the behavior I noticed in the v0.14.0 - the status never reporting unready state when the first request comes on the scaled down to zero ksvc? ref: https://knative.slack.com/archives/CA4DNJ9A4/p1606458112460300 |
yep, same with v0.18.0, just tested the flow. The status remains |
I'm wondering about how legitimately we should consider this for a few reasons:
I think that there are two scenarios worth spelling out:
I think that if we can manage |
cc @dprotaso |
slack thread discussion: https://knative.slack.com/archives/CA4DNJ9A4/p1606458112460300 |
/area API Is there more that needs to be done here? @dprotaso , can you put a priority/hint here? |
/triage needs-user-input |
Hey, I just noticed the label, is there any more info I could provide? |
I think Evan was looking for input from myself. I'll surface what I mentioned slack. We can't automatically distinguish when a runtime failure should gate readiness of the revision. We need hints from the user via readiness/liveness probes. Matt's description of 2) in my opinion should be managed by some higher level continuous delivery tool. That would not only capture runtime errors (counting HTTP status codes) but also performance regressions (changes in request latency). I'm going to close this issue out and recommend maybe looking at spinnaker and other CD tools |
I made a follow up issue to confirm that if we have the right hints (readiness/liveness failing) that we behave properly. |
What version of Knative?
I have tested the behavior in v0.14.0 and v0.19.0
Expected Behavior
When a revision fails, and the pod shows
Error
state due to a bad gateway 502 response for example, this state should be captured in the logs of controller pod as https://github.com/knative/serving/blob/master/pkg/reconciler/configuration/configuration.go#L111 and similarly this should be carried forward to the status of revision, configuration and ksvc CRs through https://github.com/knative/serving/blob/master/pkg/reconciler/configuration/configuration.go#L113 for example.Actual Behavior
I will distinguish the behavior in the two versions I have tested:
v01.14.0 - Here the logs and the status are captured correctly but the problem is that this happens only when the queue-proxy container is ready (I am doubtful if this is the reason but on observation it appears so) - when the ksvc instance is scaled down and the first request goes, even though the response might be 502 and the pod will go into
Error
, it wont report the log or the status of the CRs (RevisionFailed
) and the pod goes from0/2
to1/2
and then intoError
. But after some seconds, when pod the goes back to2/2
with a newuser-container
(note: ksvc hasnt scaled down to zero) and I send a request, the pod goes intoError
and both the logs and status of the CRs are captured correctly. So essentially, it won't report the failure for the first time and till the pod goes to2/2
stage for the first time.v0.19.0 - Here even the above behavior is absent, I can't see the relevant log and the status stays
True
for all the CRs at all times.Steps to Reproduce the Problem
unmarshalling
because of the variableresult
having a wrong data type:The text was updated successfully, but these errors were encountered: