Skip to content

HTTP/1 client busy-loops at 100% CPU when peer half-closes with an open request body (regression in 1.10.0) #4085

@shikhar

Description

@shikhar

This issue (and the linked reproducer) was authored by Claude Opus 4.8, reviewed before posting.

Version

hyper 1.10.0

Platform

aarch64-apple-darwin

Summary

On hyper 1.10.0, an HTTP/1 client connection busy-loops at 100% CPU when the peer half-closes (sends FIN) without sending a response while the client still has an open streaming request body. The task does no I/O while spinning — it just re-polls flush forever.

Code Sample

https://github.com/shikhar/hyper-1.10-h1-spin-repro

Expected Behavior

With the request body open and the read side finished (peer half-closed), the connection task should park and be woken when the body produces its next frame — the behavior of hyper 1.9.0.

Actual Behavior

The connection task spins at 100% CPU indefinitely (until the application happens to produce the next body frame). The transport's poll_flush is called hundreds of millions of times while poll_write/poll_read are called once or twice — i.e. no I/O progress, purely spinning in proto::h1::dispatch::Dispatcher::poll_loop.

Same repro over real loopback TCP, only the hyper pin changes:

hyper output
1.10.0 poll_flush in 2s: 127164355 poll_write=1 poll_read=2BUSY-LOOP
1.9.0 poll_flush in 2s: 2 poll_write=1 poll_read=2parked (ok)

Additional Context

  • Root cause: the only behavioral change here is the rewrite of Dispatcher::poll_loop (src/proto/h1/dispatch.rs). 1.9.0 parked as soon as it didn't want to read again; 1.10.0 added a write-continuation gate:

    let write_ready = self.poll_write(cx)?.is_ready();
    let flush_ready = self.poll_flush(cx)?.is_ready();
    let wants_write_again = self.can_write_again() && (write_ready || flush_ready);
    // ...
    let wants_read_again = self.conn.wants_read_again();
    if !(wants_write_again || wants_read_again) {
        return Poll::Ready(Ok(()));   // only parks if BOTH are false
    }
    if !wants_read_again && wants_write_again {
        if write_ready { continue; }                       // <-- hot path
        if self.poll_write(cx)?.is_pending() { return Poll::Ready(Ok(())); }
    }

    with fn can_write_again(&mut self) -> bool { self.body_rx.is_some() }.

    can_write_again() is true for the entire lifetime of a streaming request body (it only checks that a body channel exists, not that a frame is available). After the peer half-closes with no response, the read side is finished (wants_read_again() == false) and poll_write returns Ready without doing any I/O (the write side is winding down — the transport's poll_write is never called again, hence poll_write=1). So write_ready == true, flush_ready == true (a real TCP poll_flush is a no-op that returns Ready), and body_rx.is_some() == truewants_write_again == true; the loop never parks, if write_ready { continue; } is taken every iteration, the for _ in 0..16 runs out, and task::yield_now(cx) reschedules the task immediately ⇒ a 100% CPU busy-loop. The code even anticipates this hazard (comment below the continue: "the case of an unbuffered writer where flush is always ready would cause us to hot loop"), but the guard only covers the write_ready == false branch.

  • Profile:

    client::conn::http1::UpgradeableConnection::poll
    └ proto::h1::dispatch::Dispatcher::poll_catch        (poll_loop, inlined)
      ├ proto::h1::io::Buffered::poll_flush
      ├ proto::h1::dispatch::Dispatcher::poll_write
      └ proto::h1::conn::{State::try_keep_alive, Conn::maybe_notify}
        └ tokio::task::waker::wake_by_ref → hyper::common::task::yield_now
    
  • Scope (HTTP/1 vs HTTP/2): the bug is h1-only code (proto::h1::dispatch::poll_loop); the profile is entirely proto::h1. 1.10.0 also separately reworked the h2 body-send path (proto::h2::mod's PipeToSendStream, new buffered_data + RST_STREAM registration); I tried to provoke an analogous spin on the h2 client (open request body, then server RST_STREAMs / closes the connection without responding) and it parks correctly in both cases, so I have no evidence h2 is affected — flagging the rework only so it isn't assumed untouched.

  • Workaround: pin hyper = "=1.9.0".

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: bug. Something is wrong. This is bad!

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions