Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISPN-14571 Propagate errors in REST chunked response #10693

Merged
merged 1 commit into from Apr 14, 2023

Conversation

jabolina
Copy link
Member

@jabolina jabolina commented Mar 9, 2023

https://issues.redhat.com/browse/ISPN-14571

It doesn't seem the nicest way. If an error happens midway through writing the chunked response (key and entries), we cancel the stream, write HTTP last content, and add a trailer header with the exception message.

@wburns
Copy link
Member

wburns commented Mar 9, 2023

While the changes themselves seem fine, I am not sure if this what we want to do. From I have seen many clients don't pay attention to trailer headers. The safest may be just to kill the connection, but maybe someone with more HTTP experience could chime in.

@wburns
Copy link
Member

wburns commented Mar 30, 2023

I would vote to kill the connection instead of using trailer headers, but as I mentioned previously I am open to either suggestion. I just can't imagine a client would know to check this specific trailer header for an exception. Where as if the client dies, they will try again and realize it disconnected again, so maybe something is wrong. And also since this is logged as an ERROR it should be more than visible in the server log file.

@jabolina
Copy link
Member Author

jabolina commented Apr 3, 2023

While changing the behavior to close the connection, everything works fine with HTTP 1.1. With HTTP 2.0, we have a slightly different behavior. The server sends back to the client a frame for the stream canceled and then closes. OkHttp has a bug with that, entering into an infinite retrying loop. We can work around that by not retrying requests when using HTTP 2.0.

I need to check if we can disconnect the client with HTTP 2.0 without sending the cancel stream. But I am afraid that could cause an issue with other HTTP clients.

@wburns
Copy link
Member

wburns commented Apr 3, 2023

I need to check if we can disconnect the client with HTTP 2.0 without sending the cancel stream. But I am afraid that could cause an issue with other HTTP clients.

I don't think so. Not sending a cancel stream would be like the socket was killed by something else, which is perfectly fine for behavior here as we aren't doing this for correct propagation but rather because it is an error we can't recover from and this is the only way we can notify the client that something really went wrong.

@@ -116,6 +116,8 @@ public RestClientOkHttp(RestClientConfiguration configuration) {
} else {
builder.protocols(Arrays.asList(Protocol.HTTP_2, Protocol.HTTP_1_1));
}
// OkHttp might retry infinitely on HTTP/2.
builder.retryOnConnectionFailure(false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have HTTP/2 and close the connection on the server, OkHttp will wrap the exception into a StreamResetException and retry infinitely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed if the CANCEL frame isn't sent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both cases :/ If the server sends the CANCEL frame, OkHttp creates the StreamResetException. If the server closes the connection, OkHttp swallows the EofException and turns it into a StreamResetException. So, in either case, it ends up retrying infinitely.

return;
}

unsafe.closeForcibly();
Copy link
Member Author

@jabolina jabolina Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling ctx.close() or the closeForcibly() directly from the channel calls the AbstractHttp2StreamChannel.close(...) to send the CANCEL frame before closing the connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is no difference between ctx.close and unsafe.closeForcibly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am misreading, but it seems there is a difference that calling unsafe.closeForcibly() doesn't produce the cancel frame but calling ctx.close() would produce the frame right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but on the parent unsafe in the case of HTTP 2. From debugging:

  • ctx.close() and ctx.channel().unsafe().closeForcibly() => Both calls AbstractHttp2StreamChannel and produces the frame;
  • ctx.channel().parent().close() and ctx.channel().parent().unsafe().closeForcibly() => Do not produce the cancel frame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay, the key difference is the parent then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check the parent in the first if then instead? Or does the HTTP2 first if check not pass?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the check order causes issues with HTTP/1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all the checks, and now it is just calling ctx.close() for simplicity. When using HTTP/1, all is good. If it is HTTP/2, it sends the cancel the frame first, but it ends up closing the connection after.

@wburns wburns merged commit a28cef4 into infinispan:main Apr 14, 2023
3 of 4 checks passed
@wburns
Copy link
Member

wburns commented Apr 14, 2023

Integrated into main, thanks @jabolina !

@jabolina jabolina deleted the ISPN-14571 branch April 14, 2023 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants