Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-44641: Extend qhttp to allow streaming read of requests #857

Merged
merged 2 commits into from
Jun 30, 2024

Conversation

iagaponenko
Copy link
Contributor

No description provided.

@iagaponenko iagaponenko force-pushed the tickets/DM-44641 branch 4 times, most recently from 085e7ba to 958800c Compare June 11, 2024 18:00
Copy link
Contributor

@fritzm fritzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but one question: in the old mode of read-everything-at-once, the server-side request timeout protected the entire request. It wasn't clear to me whether the timer is now reset at the beginning of each "chunk" of the chained read? Theoretically if so a malicious client could hold on to a connection indefinitely that way. Maybe we don't care because it would be and odd corner case and we are our own client for qhttp.

Also, had you considered an API where the client specifies the amount wanted in each partial read (like standard library read calls) and gets a byte count back, rather than fixed size chunk buffers? Sometimes that can make for cleaner/simpler code on the client side.

@iagaponenko
Copy link
Contributor Author

On your comment on the change in the timer use policy. Yes, the timer is reset (restarted) before attempting to read each chunk of data (the maximum chunk size is limited by Request::_recordSizeBytes). I thought that having the common server-wide constraint affecting all requests (Server::_requestTimeout) may pose a risk of premature timer activation when handling very large inputs sent to the server. I'm not sure I understand the risk of having a client that would hold on to a connection indefinitely. In the worst case scenario, the connection couldn't be held longer than:

Server::_requestTimeout * <num-chunks>

Where:

<num-chunks> := <content-length> / Request::_recordSizeBytes

What we may want to implement here could be a fine-grain timeout management scheme, in which:

  • there would be some "global" server-wide timeout to limit the duration of any request (Server::_requestTimeout)
  • and, a much smaller request-level timeout for reading chunks (Request::_chunkReadTimeout)

Regarding your suggestion to provide the request handler with an option to specify how much data to read in each chunk. This is a good idea. I can add the optional parameter to:

void Request::readPartialBodyAsync(
    BodyReadCallback onFinished,
    std::size_t bytesToRead = 0);

Getting the number of bytes read is not strictly required since the streaming handlers are supposed to track the progress by monitoring (comparing):

 std::size_t Request::contentLengthBytes() const ;
 std::size_t Request::contentReadBytes() const;

However, it wouldn't be a big problem to add this parameter to the callback function definition:

class Request : public std::enable_shared_from_this<Request> {
public:
    using Ptr = std::shared_ptr<Request>;
    using BodyReadCallback = std::function<void(std::shared_ptr<Request>, std::shared_ptr<Response>, bool, std::size_t)>;

@iagaponenko
Copy link
Contributor Author

I forgot to mention another important reason for the change in the timer management logic. In the previous version of the code the reading sequence was completely controlled by the server. The new API allows request handlers to initiate chunk read requests arbitrary when the handler is ready to do so. It wouldn't be so uncommon to have a handler pulling data from a client, processing the data (possibly using an intermediate file on disk), and writing the data into MySQL ...which as we know may be a lengthy operation. Hitting the global Server-wide timeout may cause problems. The new code makes a compromise by only timing data reads and not the whole duration of a request.

@fritzm
Copy link
Contributor

fritzm commented Jun 29, 2024

Well, the global server-side request timeout protects the server from a client that sends in valid header and then never sends the full number of promised bytes. Without that timeout protection, each such stalled request consumes a socket and associated server-side data structures indefinitely.

@fritzm
Copy link
Contributor

fritzm commented Jun 29, 2024

Re. the read request length argument, I only wanted to raise the possibility in case it had been overlooked, and was something you thought might work out better. Leaving this as is is also fine -- both will work, and I trust your judgement here.

Refactored request handling flow of qhttp
Added unit tests for the new API.
The pre-commit version of the code was using std::istream_iterator
which truncated line termination sequences. The new code uses
std::istreambuf_iterator.
@iagaponenko iagaponenko merged commit 9d18e74 into main Jun 30, 2024
15 checks passed
@iagaponenko iagaponenko deleted the tickets/DM-44641 branch June 30, 2024 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants