Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quinn-proto: Chunks panic-on-drop behavior is a footgun and makes control flow annoying #1531

Closed
abonander opened this issue Apr 12, 2023 · 12 comments · Fixed by #1983
Closed
Assignees
Labels

Comments

@abonander
Copy link
Contributor

abonander commented Apr 12, 2023

I was just bitten by this assert while trying to debug a silent failure of my application during testing: https://github.com/quinn-rs/quinn/blob/main/quinn-proto/src/connection/streams/recv.rs#L289-L292

This was in a spawned Tokio task which apparently suppresses the normal printing of panic backtraces to stderr and I was running this as a test which buffered the panic backtrace but not letting it complete. And I hadn't yet wired it up to check the task's exit status as it's supposed to be long-running anyway and I'm still working on the principal implementation here.

On StreamEvent::Opened or StreamEvent::Readable I essentially call into code that looks like this:

impl<'a> StreamReadContext<'a> {
    fn read(mut self) -> anyhow::Result<()> {
        // `self.stream: quinn_proto::RecvStream`
        let mut chunks = self.stream.read(true)?;

        loop {
            match self.activity {
                StreamActivity::PendingHeader => {
                    if let Some(header) = self.reader.try_peek_header().context("invalid header")? {
                        Notification::validate_header(header).context("invalid header")?;
                        *self.activity = StreamActivity::ReadingNotification { header };
                    };
                }
                StreamActivity::ReadingNotification { header } => {
                    if self.reader.input_len() >= header.length {
                        let notification = Notification::decode(&mut *self.reader)
                            .context("invalid notification")?;

                        // Handle `Notification`
                    }
                }
            }

            // Lazily read chunks so we don't buffer a ton of data up-front.
            if let Some(chunk) = chunks.next(MAX_CHUNK_SIZE)? { // 8 KiB
                tracing::trace!("read {} bytes", chunk.bytes.len());
                self.reader.push(chunk.bytes)?;
            } else {
                break;
            }
        }

        // If we need to transmit it'll be checked when we return to the event loop.
        let _ = chunks.finalize();

        Ok(())
    }
}

As implied by the code here, Notification messages have a header that gives their length so we know how much data we need to buffer to fully decode them. reader is the type that buffers chunks and implements the right traits for Notification::decode().

Due to some bug in my code, .try_peek_header() is returning an error which normally would be logged when returned from this method. However, because the impl Drop for Chunks executes on the return, and because I hadn't called .finalize() yet, this is turned into a panic!(), which in the current state of my code just causes the task to silently exit.

The thing is, I don't need the ShouldTransmit flag because this immediately returns to an event loop where I call all the appropriate .handle*() and .poll*() methods on the quinn_proto::Connection. So the implicit finalize triggered by the Drop impl would otherwise be sufficient for me.

Because the panic message was being suppressed, it took me an annoyingly long time to figure this out. It wasn't until I ran the code in a debugger (which helpfully automatically breaks on panics) to figure this out. While I understand the point of the debug_assert!() is to teach the user how to use this API correctly, in my case it was much more of a hindrance than anything.

(Tokio's partly to blame here... why suppress backtraces in debug mode? That doesn't make any sense. However, the panic doesn't strictly need to happen either. quinn-proto is by design a low-level API, so why is it assuming the user might not know what they're doing here?)

@abonander
Copy link
Contributor Author

abonander commented Apr 12, 2023

As a workaround, I'm calling RecvStream::read() outside this method and including &mut Chunks in StreamReadContext instead of the RecvStream, then always calling .finalize() afterward regardless of what it returns.

@abonander
Copy link
Contributor Author

abonander commented Apr 13, 2023

Belatedly I have realized that panics not being printed out were my fault. My code was technically running as a test, so Cargo was buffering the output as it normally does, but I was not letting the test complete, as I was cancelling it early while watching the log output, which wasn't buffered.

Still, my primary issue stands, that the panic doesn't need to happen. Worse yet, it's a debug_assert!() so it's not strictly required for correct functioning of the API. It's just a weird little random speedbump that doesn't really need to be there, IMHO.

@djc
Copy link
Member

djc commented Apr 13, 2023

Sorry for the trouble you've been having with our code! When we designed this we felt it was important to make sure callers use finalize(). We tried to find some way to make a "must move"-like type (something that can't be dropped), but unfortunately that's not possible in today's Rust. I think the debug_assert!() functioned exactly as intended, to make you aware that you should be calling transmit soon after dropping Chunks. In your case, you were already aware, but it's not obvious. Then there were further issues with your test strategy that made it so that this was much harder to localize.

If you had gotten a full panic (plus backtrace) on your first test failures, would this have been as aggravating to you as it seems to have been now?

@abonander
Copy link
Contributor Author

If you had gotten a full panic (plus backtrace) on your first test failures, would this have been as aggravating to you as it seems to have been now?

I still had to restructure my code to make it not panic, so that was annoying regardless. If you're married to the debug_assert!() I would suggest to at least make it clear in the docs that it will panic if you don't call .finalize() which it isn't right now: https://docs.rs/quinn-proto/latest/quinn_proto/struct.Chunks.html#method.next

@djc
Copy link
Member

djc commented Apr 14, 2023

I'm not "married" to the assert -- there's clearly a trade-off for which the best solution is not obvious to me, and I was asking for your feedback.

@abonander
Copy link
Contributor Author

abonander commented Apr 14, 2023

My original point still stands: this is a low-level API, right? So why not assume the user knows what they're doing?

It's certainly worth documenting that one should check for pending transmits after reading, because that might not immediately be obvious. Either way, documentation on the Chunks type could be better.

@aochagavia
Copy link
Contributor

I also was bitten by this when writing tests that use Chunks, because one of the assertions failed and it caused a double panic (the assertion panics, then Chunk is dropped, and the drop code panics too). It was puzzling and difficult to debug (somehow I wasn't able to get a proper backtrace).

Not sure what an alternative solution would be, just wanted to mention that in my eyes it wasn't very ergonomic. I ended up extracting the logic that used Chunks into a separate function that never panics, to avoid problems:

fn stream_chunks(mut recv: RecvStream) -> Vec<u8> {
let mut buf = Vec::new();
let mut chunks = recv.read(true).unwrap();
while let Ok(Some(chunk)) = chunks.next(usize::MAX) {
buf.extend(chunk.bytes);
}
let _ = chunks.finalize();
buf
}

@djc
Copy link
Member

djc commented Apr 18, 2023

Okay, I think I would be on board with removing the assertion in favor of better docs. Anyone want to submit a PR?

@djc djc added good first issue Good for newcomers Hacktoberfest labels Oct 13, 2023
@boserohan
Copy link

Can this be assigned to me? I am new to contributing, so if anyone can guide me to the process that would be great.

@djc
Copy link
Member

djc commented Nov 6, 2023

Sure -- what do you want to know? It's mostly just -- send a PR, ideally with small atomic commits each of which should be able to pass CI.

@boserohan
Copy link

Thanks, I have submitted PR 1703. Please could you kindly review the changes. I think I will have more questions later on as I get to contributing more to this project.

@abonander
Copy link
Contributor Author

abonander commented Sep 7, 2024

Just ran into this again. It ends up being really annoying with control flow:

        'inner: loop {
            match chunks.next(MAX_CHUNK_SIZE) {
                Ok(Some(chunk)) => {
                    // use `chunk`
                }
                Ok(None) => {
                    // Note: this needs to be called in every diverging branch
                    // or `chunks` will panic on-drop
                    
                    let _ = chunks.finalize();
                    break 'outer;
                },
                Err(proto::ReadError::Blocked) => { 
                    let _ = chunks.finalize();
                    break 'inner; 
                },
                Err(proto::ReadError::Reset(code)) => {
                    let _ = chunks.finalize();
                    eyre::bail!("peer reset stream {stream_id:?} with code {code}");
                }
            }
        }

Since #1703 appears stalled, I may just open my own PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants