Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify & simplify quic traits #177

Open
FlorianUekermann opened this issue Mar 17, 2023 · 6 comments
Open

Clarify & simplify quic traits #177

FlorianUekermann opened this issue Mar 17, 2023 · 6 comments
Labels
A-trait Area: quic trait abstraction B-rfc Blocked: request for comments. Needs more discussion. C-refactor Category: refactor. This would improve the clarity of internal code.

Comments

@FlorianUekermann
Copy link
Contributor

FlorianUekermann commented Mar 17, 2023

Big picture problem

I believe there are a few issues with the h3::quic::Connection trait, which lead to expected behavior being under-defined and some hidden bugs in h3, which don't surface in tests due to h3-quinn implementation decisions. The topic easily gets out of hand, because the space of possible interpretations of "correct", as well as the solution space is large, so I'll break it up a bit and leave some stuff out.

Example

Possibly a bug, subject to interpretation:

h3/h3/src/connection.rs

Lines 244 to 257 in da29aea

loop {
match self.conn.poll_accept_recv(cx)? {
Poll::Ready(Some(stream)) => self
.pending_recv_streams
.push(AcceptRecvStream::new(stream)),
Poll::Ready(None) => {
return Poll::Ready(Err(Code::H3_GENERAL_PROTOCOL_ERROR.with_reason(
"Connection closed unexpected",
crate::error::ErrorLevel::ConnectionError,
)))
}
Poll::Pending => break,
}
}

The Poll::Ready(None) arm implies that a connection closure is not expected when this is called. Yet, this snippet is actually almost certainly where connection close is observed for the first time. But h3-quinn treats any close as an error, even non-error ones. Therefore the arm that is selected is actually the Poll::Ready(Err(_)) inside the ? in L245, which is why no tests have caught this.

If you use other quic implementations, which return None on non-error closes, graceful shutdown tests start failing. I think this is a tip-of-the-iceberg situation, because I observe other shutdown related oddities, depending on similar details, which I don't want to get into here for brevity.
Even if you do know what h3 actually expects, I believe there are race conditions, for example: Observing control stream reset is never expected , but implied by a quic connection close.

Underlying problem

These methods are the meat of the quic traits.

h3/h3/src/quic.rs

Lines 45 to 71 in da29aea

/// Accept an incoming unidirectional stream
///
/// Returning `None` implies the connection is closing or closed.
fn poll_accept_recv(
&mut self,
cx: &mut task::Context<'_>,
) -> Poll<Result<Option<Self::RecvStream>, Self::Error>>;
/// Accept an incoming bidirectional stream
///
/// Returning `None` implies the connection is closing or closed.
fn poll_accept_bidi(
&mut self,
cx: &mut task::Context<'_>,
) -> Poll<Result<Option<Self::BidiStream>, Self::Error>>;
/// Poll the connection to create a new bidirectional stream.
fn poll_open_bidi(
&mut self,
cx: &mut task::Context<'_>,
) -> Poll<Result<Self::BidiStream, Self::Error>>;
/// Poll the connection to create a new unidirectional stream.
fn poll_open_send(
&mut self,
cx: &mut task::Context<'_>,
) -> Poll<Result<Self::SendStream, Self::Error>>;

Especially taking into account the comments explicitly stating that None is a way to indicate a close. There is a good amount of ambiguity in where connection closes should be communicated first and how. From a naive reading, even a first error surfacing via a RecvStream method seems reasonable, growing the list of affected methods even more.
With the current implementation, there doesn't seem to be a clear distinction between non-error closes and other terminations in the quic traits, which should be handled differently.
I think the example demonstrates that these issues not only make life hard for implementers of the traits, but also for contributors to h3.

Small detour to increase the solution space

Neither Quic nor Http3 provide a way to accept/reject opening streams/requests by the remote. In Http3 the stream can be closed and stopped immediately with an appropriate error code to indicate rejection after the fact, but opening a stream/request is a unilateral decision by the remote.
As a result, the differentiated h3::quic::Connection::poll_accept_* methods aren't particularly useful. At least with quinn, calling "accept" has no stream concurrency implications. Other Quic implementations may choose a different approach, but without more explicit ways to apply backpressure on remote stream creation I'm not sure that would even be desirable.

As a result, it would be an option to merge the h3::quic::Connection::poll_accept_* methods to something like this:

fn poll(&mut self) -> Poll<Result<RecvOrBidiStream, Error>>

The Result vs Option<Result> and Error type considerations aren't the point here, see next section for that stuff.

Solutions

As mentioned above the solution space is large, but there are some key choices:

A. Connection trait complexity

  1. Keep the current or a similar Connection trait and exhaustively document the contract (expectations on how and in which order closes become visible on different methods and promises what is polled first and under which conditions).
    • Cons: Test coverage will always be problematic and the contract is surprisingly complex.
    • Pros: Unclear to me.
  2. Reduce the Connection trait to a single poll (plus poll_open_*), which emits streams, as well as connection errors and close information. This should either:
    1. not include an Option, but require explicit communication of a final close or error and return Poll::Pending instead of None: fn poll(&mut self) -> Poll<Result<RecvOrBidiStream, Close>>
    2. include only an option: fn poll(&mut self) -> Poll<Option<RecvOrBidiStream>>, with None implying connection termination, and an extra fn closed(&mut self) -> Option<Close>.
    • Cons: Giving up hypothetical implicit stream concurrency control, if a Quic implementation chooses to delay increasing remote stream opening budgets via MAX_STREAMS frames until the application layer calls "poll_accept_*". As mentioned earlier, I think adding explicit methods for this would be more appropriate if this is ever desired.
    • Pros: A single place for connection errors and closes to surface.

B. Close type
Above a Close type is mentioned. Currently the equivalent is Box<dyn Error> and maybe Option::None. Note: Application closes are an expected way to lose connection, even if no GOAWAY was sent. Other closes are not and may indicate a problem that the application needs to report or react to. Therefore they should be clearly distinguishable. Options:

  1. Keep the simple Error trait and maybe expand the required methods a bit more to distinguish application closes.
  2. A Close enum, either custom or just Result<ApplicationClose, ConnectionError>. ConnectionError would be constrained like Error is atm. I'm using this on the quic side and it is very nice. The content of ApplicationClose is fairly well defined by the Quic RFC. Benefit: Meaning is obvious to the user if propagated, and obvious to the implementer of the quic side.

C. Stream errors
It should be defined whether connection errors are picked up and processed via stream interactions the same way as they are if they surface via Connection methods. I think there is no point in doing so. It just complicates the code. If a stream error implies that the connection terminated, the connection level method(s) should still be required to emit the respective Close.

Additionally, the RFC does not require any stream error to have connection level effects, but allows implementations to choose to handle them that way. Again, I don't think that would help simplicity.

E. Stream order
The emission order of streams of same type should be defined as ascending by id. It keeps the h3 side simple and most Quic implementations guarantee this anyway. If they don't it is trivial to fix in the trait implementation due to underlying guarantees in Quic. There have been examples of streams being emitted in slightly shuffled order by Quic implementations, so this should be explicitly required or handled internally.

E. Redundancy
There's a bit of redundancy in the OpenStreams and Connection trait, which is unnecessary and has a few awkward side-effects. #173 deals with that. It is largely orthogonal to this topic, but would be nice to get out of the way, because it makes (Draft) PRs for this issue a bit easier to read.

Conclusion

I hope the above makes sense. And I hope my concerns aren't based on a series of misunderstandings on my side. I think there's potential for h3 code to become simpler and easier to reason about. A draft PR should be fairly straightforward and I'm happy to give it a shot. But with a change that is this substantial, I wanted to explain my perspective a bit and check if you are open to such changes.

@FlorianUekermann
Copy link
Contributor Author

A PR should probably cover #144 as well.

@Ruben2424
Copy link
Contributor

Thanks for this issue. I can see the Problem.

A + B:
I think it is worth to explore the possibility of just one poll method to “accept” streams.
And especially the idea of a Close Enum sounds promising to me.

C. Stream errors
It should be defined whether connection errors are picked up and processed via stream interactions the same way as they are if they surface via Connection methods. I think there is no point in doing so. It just complicates the code. If a stream error implies that the connection terminated, the connection level method(s) should still be required to emit the respective Close.

I'm afraid I didn't quite understand what you meant with this. Could you give me an example?

The emission order of streams of same type should be defined as ascending by id. It keeps the h3 side simple and most Quic implementations guarantee this anyway. If they don't it is trivial to fix in the trait implementation due to underlying guarantees in Quic. There have been examples of streams being emitted in slightly shuffled order by Quic implementations, so this should be explicitly required or handled internally.

I have not thought about this. To what extent could that be a problem?

There's a bit of redundancy in the OpenStreams and Connection trait, which is unnecessary and has a few awkward side-effects. #173 deals with that. It is largely orthogonal to this topic, but would be nice to get out of the way, because it makes (Draft) PRs for this issue a bit easier to read.

Thanks for that PR. I hope to have time to review that this weekend.


This are just my thoughts. I'm interested in getting other people's perspectives on this.

@Ruben2424 Ruben2424 added C-refactor Category: refactor. This would improve the clarity of internal code. B-rfc Blocked: request for comments. Needs more discussion. A-trait Area: quic trait abstraction labels Mar 17, 2023
@FlorianUekermann
Copy link
Contributor Author

FlorianUekermann commented Mar 17, 2023

C. Stream errors
It should be defined whether connection errors are picked up and processed via stream interactions the same way as they are if they surface via Connection methods. I think there is no point in doing so. It just complicates the code. If a stream error implies that the connection terminated, the connection level method(s) should still be required to emit the respective Close.

I'm afraid I didn't quite understand what you meant with this. Could you give me an example?

Stream errors can include info about connection closure (e.g. "read failed because connection was implicitly reset due to receiving close frame"). I'm suggesting that h3 should explicitly document whether it'll pick up on the fact that a connection is terminated from an error returned by a quic stream method. This will presumably be obvious from the types in the new traits, but atm it isn't (and it does matter in my example, where Quic must not return Ready(None) from certain methods until we know h3 has understood that the connection terminated).

The emission order of streams of same type should be defined as ascending by id. It keeps the h3 side simple and most Quic implementations guarantee this anyway. If they don't it is trivial to fix in the trait implementation due to underlying guarantees in Quic. There have been examples of streams being emitted in slightly shuffled order by Quic implementations, so this should be explicitly required or handled internally.

I have not thought about this. To what extent could that be a problem?

Well, confusing the control stream with another one would be really bad for example.
Edit: The RFC says: Each side MUST initiate a single control stream at the beginning of the connection, whether that implies that the control stream is the first one being initiated isn't 100% clear to me. Either way, we should probably not rely on it.
To send GOAWAY frames which limit the number of requests in flight it is also necessary to know which stream ids we haven't seen yet. They need to come in in ascending order for the bookkeeping to be trivial (remember max id).
It's just something to be documented. No implementation will struggle to solve this, because in Quic this order is enforced. So it's just about the quic implementation not accidentally shuffling them if multiple streams are opened between calls to poll_accept_*.

Thanks for the quick response. It's good to know that you are in principle open to this. The big step is to start being precise about how termination and close info flows from quic to h3. The details can be iterated easily and decisions will be more obvious with code to look at.

@Ruben2424 Ruben2424 mentioned this issue May 16, 2023
9 tasks
@FlorianUekermann
Copy link
Contributor Author

It's been a while. Is there a realistic chance of making progress soonish? I see 3 reasons for urgency:

  • Adding additional quic implementations is desirable because working with one obscures bugs/problems/ambiguities. Making trait changes beforehand reduces the total amount of work. I'm working on a runtime independent quic implementation, but have paused that effort waiting for resolution of seemingly broken behavior that isn't triggered by h3-quinn.
  • There are known bugs/problems stemming from the current trait design (see above for an example) and probably more we don't know about yet.
  • Runtime independent quic implementations are unnecessarily complex with the current API

@ten3roberts
Copy link
Contributor

Adding to this we have run into limitations with the current traits when implementing webtransport #183.

Now Webtransport is merged and there have been proposals for additions to the API which find further blockings on the api and need to extend it and create Ext traits.

The current situation seems to be in a position to get unwieldy if we keep stacking things ontop of the current traits in my opinion.

@Ruben2424
Copy link
Contributor

I implemented A.2.i in #200 to see how it could work out. Is it like you proposed?
What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-trait Area: quic trait abstraction B-rfc Blocked: request for comments. Needs more discussion. C-refactor Category: refactor. This would improve the clarity of internal code.
Projects
None yet
Development

No branches or pull requests

3 participants