Skip to content

Slower client can cause h1 server to disconnect without flushing Full body #4022

@deven96

Description

@deven96

Version

hyper 0.14.x -> 1.8.x, h1

Platform

Linux 6.12.67 aarch64 GNU/Linux

Summary

Focusing on a recurrent issue in production that happens from 0.14.x up to the latest 1.8.x upgrade where responses end up as 200 but the actual body length seems not to match the headers that were sent and the server shuts down the connection.

It's a bit difficult to run strace on all the important syscalls as this issue dissapears with that, however by focusing on recvfrom,sendto,shutdown,epoll_ctl I'm able to capture a request that exhibits this behavior and I see the following

// Request read begins at this point ✅ 
1977315 1770641958.064243 recvfrom(20, "POST /transform HTTP/1.1 ...", 8192,...) = 1027
1977315 1770641958.067062 recvfrom(20, "\xff\xd8...", 16384,...) = 16384
...
1977315 1770641958.631908 recvfrom(20, "....", 57031, ....) = 62
// Request body is complete at this point ✅
....
// Response write begins here ✅ 
1977315 1770641958.930401 sendto(20, "HTTP/1.1 200 OK\r\nContent-Length: 14896092\r\n ...", 14896450,...) = 219264
1977315 1770641958.930539 shutdown(20, SHUT_WR) = 0 // Socket shutdown just 138 microseconds after first chunk was sent! ⚠️ 
1977315 1770641958.930601 epoll_ctl(8, EPOLL_CTL_DEL, 20, NULL) = 0 

We didn't have any shutdowns that quick and so I did a bit of digging into hyper and realized that if all the right pieces aligned, we could get into a state where a h1 server shuts down a connection that still has data that hasn't yet been flushed.
The issue resides within the logic of the poll_loop and the step by step looks something like this

  • Socket default size 212_922
  • Request comes in and gets read fully
  • A body (X) larger than the socket size gets created and passed to buffer with Full::new(Bytes::new())
  • On poll_write, Full provides the data to internal buffer at once which means encoder.is_eof() is equals true
  • State is transitioned to Writing::Closed
  • Within poll_flush loop
  • First loop attempt to write X to socket but only successfully writes 212_922
  • Second loop gets pending as reader is slow and socket is blocked
  • poll_flush returns Pending
  • poll_loop ignores result from poll_flush. Sees !wants_read_again() as request has been fully read and exits
  • is_done() returns true as read is done and write is closed

Code Sample

Expected Behavior

Connection stays open until entier body is flushed

Actual Behavior

Connection gets closed once the socket is filled up.. leaving the rest of Full body

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: bug. Something is wrong. This is bad!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions