New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python/Core: Parsing error on client since v1.44.0, UNAUTHENTICATED
error converted to UNKNOWN, Stream closed
#31657
Comments
Thanks for raising this issue! |
Tagging @ctiller for promises related changes |
I've got a suspicion as to what's happening here, but I need to add a repro test case for this first, which is some non-trivial work that I need to time for. I'll try to do it in this week or the coming one |
@yashykt Friendly ping. |
So, I tried debugging this with https://github.com/yashykt/grpc/tree/31657Debugging The bad news is that I wasn't able to reproduce the issue where the final status changes to Unknown, but I did see similar logs
These logs seem to start in 1.46.x and disappear in 1.51.x with #31474. This makes me suspect that the issue is fixed with that PR, but I cannot say for sure till I have a repro. The repro would also help creating a backport. |
I know I forgot about this bug, but given that this was fixed in later branches, I am slightly inclined to not try to isolate the fix, and instead recommend using the later branches. |
Closing since this issue was fixed in 1.51.3. |
@gnossen @yashykt: This issue seems to have resurfaced with |
Summary
There may be a bunch of red herrings in here, but I want to give as much details as possible in case any of it turns out to be relevant:
grpcio
on the clientStartTx
RPC, that we expect to return401 Unauthenticated
errors for some calls, and we test that in our CI401 Unauthenticated
error becomes anUNKNOWN
error, with details"Stream removed"
GRPC_TRACE=all
, I established that the first discrepancy on the client side appears to be whether aREAD
operation sees just the initialHEADER
frame (in passing runs), or the concatenatedHEADER
andDATA
frames (in failing runs)grpcio==1.44.0
, but does on all more recent releasesI believe that there is some issue with the network reading/parsing logic, either due to a platform switch or a race condition, potentially introduced/exacerbated since v1.44.0, that results in valid gRPC responses being lost and converted to
UNKNOWN: "Stream removed"
errors.What version of gRPC and what language are you using?
Using Python
grpcio
, initially on v1.50.0. Was able to reproduce this issue on v1.46.0, v1.47.0, and v1.49.1. Was not able to reproduce the issue on v1.44.0.What operating system (Linux, Windows,...) and version?
What runtime / compiler are you using (e.g. python version or version of gcc)
What did you do?
I don't have a minimal repro, but I'm happy to work with you to attempt to construct one. We have discussion of the issue here, including verbose grpc trace logging.
The failing client side code boils down to this:
The server produces the following response frames:
The first is a
HEADERS
frame containing a401
status, the second is aDATA
frame containing a HTTP/2 response body.Note that the response is actually
content-type: application/json
. We get the following error from that:We believe that's harmless, since it's also printed on passing runs, but maybe that's what produces a racey failure state?
What did you expect to see?
(Note, the logs below are trimmed to hopefully highlight the interesting details. Full trace logs are available here as
works.client.txt
andfails.client.txt
)I expect the
StartTx
call to throw anRpcError
with anUNAUTHENTICATED
status code, so the assertion should pass. Here are trace logs around the parsing of this response on a passing run:What did you see instead?
The status code in the thrown error is instead
UNKNOWN
. Tracing shows that the expected response was received, and passed around, but at some point is lost and replaced by thisUNKNOWN: Stream removed
error. Trace logs around a failing run:The first difference is that the
READ
line sees both frames, rather than just theHEADER
frame. Then theset_final_status
appears to drop the original401
, and replaces it with a"Stream removed"
error. Despite continued logging around the401
response, the final error that reaches Python isUNKNOWN
.Anything else we should know about your project / environment?
Hopefully covered everything important above. Happy to answer any further questions.
The text was updated successfully, but these errors were encountered: