Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawn H2 Data frame processing into a separate task #2033

Merged
merged 2 commits into from Sep 26, 2023

Conversation

yaroslavros
Copy link
Contributor

I noticed that under load DoH server once in a while goes into deadlock. This is particularly visible when ECH is enabled in Chrome and it makes three simultaneous requests in a single DoH session (for A, AAAA and HTTPS records). If all Data frames do not arrive at the same time as Header frames, DoH server goes into deadlock and never responds to requests with missing data frames.

Spawning Data frame processing into a separate Tokio task resolves this issue. Same logic is actually used in the basic h2 server example at https://github.com/hyperium/h2/blob/master/examples/server.rs (line 34).

@djc
Copy link
Collaborator

djc commented Sep 22, 2023

The changes seem reasonable but I'd like to better understand why this is necessary.

In HTTP land, I guess the more common strategy is to spawn a task per client (connection) rather than one per request -- at least, that's how hyper does things, which makes sense to me.

Have you looked at how this is handled for the other protocols? We should probably use the same strategy across protocols unless there are overriding reasons to do different things.

@yaroslavros
Copy link
Contributor Author

This is not about efficiency but about preventing a deadlock.
Underlying H2 body stream that we extract in https://github.com/bluejekyll/trust-dns/blob/90edda6b0558282abf852494d7c4f4004e47e1d3/crates/proto/src/https/https_server.rs#L52 only streams from received buffer, https://github.com/bluejekyll/trust-dns/blob/90edda6b0558282abf852494d7c4f4004e47e1d3/crates/proto/src/https/https_server.rs#L68 does not progress underlying socket and if we don't have complete request in the buffer immediately (which happens in case of multiple concurrent requests) it just hangs.
The way to resolve this is to keep progressing the socket which is what this PR accomplishes.

Key difference between DoH and other protocols is that it can get multiple concurrent in-flight requests on a single socket.

@djc
Copy link
Collaborator

djc commented Sep 22, 2023

Okay, so if I understand correctly the model here is that we have to keep pulling frames off the H2 connection so that it is able to (in addition to new requests) read further body chunks for previous requests?

@djc
Copy link
Collaborator

djc commented Sep 22, 2023

I think this is the relevant bit of documentation from the h2 crate:

The Connection instance is used to manage connection state. The caller is required to call either Connection::accept or Connection::poll_close in order to advance the connection state. Simply operating on SendStream or RecvStream will have no effect unless the connection state is advanced.

@yaroslavros
Copy link
Contributor Author

Exactly. By not calling Connection::accept we are blocking new data frames.

@djc
Copy link
Collaborator

djc commented Sep 23, 2023

(The CI failures are unrelated, and addressed in #2028.)

Copy link
Member

@bluejekyll bluejekyll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I'm rebasing, assuming tests are passing I'll merge after. Thanks for the PR!

@bluejekyll bluejekyll merged commit 615c5b4 into hickory-dns:main Sep 26, 2023
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants