DM-44641: Extend qhttp to allow streaming read of requests #857

iagaponenko · 2024-06-06T04:54:08Z

No description provided.

fritzm

This looks good, but one question: in the old mode of read-everything-at-once, the server-side request timeout protected the entire request. It wasn't clear to me whether the timer is now reset at the beginning of each "chunk" of the chained read? Theoretically if so a malicious client could hold on to a connection indefinitely that way. Maybe we don't care because it would be and odd corner case and we are our own client for qhttp.

Also, had you considered an API where the client specifies the amount wanted in each partial read (like standard library read calls) and gets a byte count back, rather than fixed size chunk buffers? Sometimes that can make for cleaner/simpler code on the client side.

iagaponenko · 2024-06-29T01:12:40Z

On your comment on the change in the timer use policy. Yes, the timer is reset (restarted) before attempting to read each chunk of data (the maximum chunk size is limited by Request::_recordSizeBytes). I thought that having the common server-wide constraint affecting all requests (Server::_requestTimeout) may pose a risk of premature timer activation when handling very large inputs sent to the server. I'm not sure I understand the risk of having a client that would hold on to a connection indefinitely. In the worst case scenario, the connection couldn't be held longer than:

Server::_requestTimeout * <num-chunks>

Where:

<num-chunks> := <content-length> / Request::_recordSizeBytes

What we may want to implement here could be a fine-grain timeout management scheme, in which:

there would be some "global" server-wide timeout to limit the duration of any request (Server::_requestTimeout)
and, a much smaller request-level timeout for reading chunks (Request::_chunkReadTimeout)

Regarding your suggestion to provide the request handler with an option to specify how much data to read in each chunk. This is a good idea. I can add the optional parameter to:

void Request::readPartialBodyAsync(
    BodyReadCallback onFinished,
    std::size_t bytesToRead = 0);

Getting the number of bytes read is not strictly required since the streaming handlers are supposed to track the progress by monitoring (comparing):

 std::size_t Request::contentLengthBytes() const ;
 std::size_t Request::contentReadBytes() const;

However, it wouldn't be a big problem to add this parameter to the callback function definition:

class Request : public std::enable_shared_from_this<Request> {
public:
    using Ptr = std::shared_ptr<Request>;
    using BodyReadCallback = std::function<void(std::shared_ptr<Request>, std::shared_ptr<Response>, bool, std::size_t)>;

iagaponenko · 2024-06-29T01:36:33Z

I forgot to mention another important reason for the change in the timer management logic. In the previous version of the code the reading sequence was completely controlled by the server. The new API allows request handlers to initiate chunk read requests arbitrary when the handler is ready to do so. It wouldn't be so uncommon to have a handler pulling data from a client, processing the data (possibly using an intermediate file on disk), and writing the data into MySQL ...which as we know may be a lengthy operation. Hitting the global Server-wide timeout may cause problems. The new code makes a compromise by only timing data reads and not the whole duration of a request.

fritzm · 2024-06-29T03:20:05Z

Well, the global server-side request timeout protects the server from a client that sends in valid header and then never sends the full number of promised bytes. Without that timeout protection, each such stalled request consumes a socket and associated server-side data structures indefinitely.

fritzm · 2024-06-29T03:22:18Z

Re. the read request length argument, I only wanted to raise the possibility in case it had been overlooked, and was something you thought might work out better. Leaving this as is is also fine -- both will work, and I trust your judgement here.

Refactored request handling flow of qhttp Added unit tests for the new API.

The pre-commit version of the code was using std::istream_iterator which truncated line termination sequences. The new code uses std::istreambuf_iterator.

iagaponenko force-pushed the tickets/DM-44641 branch 4 times, most recently from 085e7ba to 958800c Compare June 11, 2024 18:00

fritzm approved these changes Jun 28, 2024

View reviewed changes

iagaponenko added 2 commits June 29, 2024 18:47

Extended qhttp::Server API with the body stream read option

27ea6c0

Refactored request handling flow of qhttp Added unit tests for the new API.

Fixed a bug in the unit test

b619220

The pre-commit version of the code was using std::istream_iterator which truncated line termination sequences. The new code uses std::istreambuf_iterator.

iagaponenko force-pushed the tickets/DM-44641 branch from 6194754 to b619220 Compare June 30, 2024 01:48

iagaponenko merged commit 9d18e74 into main Jun 30, 2024
15 checks passed

iagaponenko deleted the tickets/DM-44641 branch June 30, 2024 02:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-44641: Extend qhttp to allow streaming read of requests #857

DM-44641: Extend qhttp to allow streaming read of requests #857

iagaponenko commented Jun 6, 2024

fritzm left a comment •

edited

Loading

iagaponenko commented Jun 29, 2024

iagaponenko commented Jun 29, 2024

fritzm commented Jun 29, 2024 •

edited

Loading

fritzm commented Jun 29, 2024

DM-44641: Extend qhttp to allow streaming read of requests #857

DM-44641: Extend qhttp to allow streaming read of requests #857

Conversation

iagaponenko commented Jun 6, 2024

fritzm left a comment • edited Loading

Choose a reason for hiding this comment

iagaponenko commented Jun 29, 2024

iagaponenko commented Jun 29, 2024

fritzm commented Jun 29, 2024 • edited Loading

fritzm commented Jun 29, 2024

fritzm left a comment •

edited

Loading

fritzm commented Jun 29, 2024 •

edited

Loading