Skip to content

engineering: Multiple http streaming improvements#560

Merged
frhuelsz merged 12 commits into
mainfrom
user/frhuelsz/download-timeouts
Mar 24, 2026
Merged

engineering: Multiple http streaming improvements#560
frhuelsz merged 12 commits into
mainfrom
user/frhuelsz/download-timeouts

Conversation

@frhuelsz
Copy link
Copy Markdown
Contributor

@frhuelsz frhuelsz commented Mar 14, 2026

Changes

  1. Replace total request timeout with connect-only timeout (file.rs)

    • Large HTTP range requests (multi-GB) were failing with error decoding response body: TimedOut because reqwest's .timeout() on each GET request applied to the entire transfer including body streaming — not just the connection. On slow or congested links, the body could not be fully received within the timeout window, causing the download to fail. HttpSubFile's retry logic would take over and continue the download, but this would lead to a lot of requests and logs that could be interpreted to mean that trident was failing, but it was just a self-imposed timeout.
    • Client::new()ClientBuilder::new().connect_timeout(10s) so only connection establishment is bounded, not body streaming.
    • Removed .timeout(self.timeout) from individual GET requests in HttpSubFile.
    • The existing retriable_request_sender retry loop still bounds connection/header-level retries.
  2. Add ReadMonitor (read_monitor.rs, new file)

    • Generic Read wrapper that tracks download speed using a moving average over the last 10 reads (ring buffer).
    • Emits debug! logs when speed drops below a configurable threshold (default: 15 Mbps), including progress %, throughput, and estimated time remaining.
    • Configurable reporting cadence to avoid log spam. (default: at least 10 second)
  3. Wrap section_reader/complete_reader with HttpDownloadMonitor (file.rs)

    • Both methods now return HttpDownloadMonitor<HttpSubFile> for automatic slow-download detection during streaming.
  4. Use saturating_sub in HttpSubFile::size() (subfile.rs)

    • Prevents underflow panic if start > end.
  5. Expand error formatting in HttpSubFile::read() (subfile.rs)

    • Changed {e}{e:?} to show the full error chain (e.g., reqwest::Error { kind: Body, source: TimedOut }).
  6. Log throughput after image streaming (image_streamer.rs)

    • Added Mbps to the existing "Copied ... in N seconds" debug message.
  7. Harden HttpFile Read impl (file.rs)

    • read(): clamp the requested size to remaining bytes and return Ok(0) at EOF, preventing out-of-bounds range requests.
    • read_exact(): return UnexpectedEof early if the buffer is larger than the remaining file, instead of issuing a doomed range request.
    • read_to_end(): return Ok(0) at EOF instead of requesting a zero-length range.
    • Added doc comments clarifying that each call results in a new HTTP request and recommending section_reader()/complete_reader() for efficiency.

@frhuelsz frhuelsz requested a review from a team as a code owner March 14, 2026 21:49
Copilot AI review requested due to automatic review settings March 14, 2026 21:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves HTTP streaming robustness and observability in Trident’s HTTP-backed readers, aiming to avoid self-imposed timeouts on large range downloads while adding throughput monitoring and better logging.

Changes:

  • Switched from reqwest total request timeouts to connect-only timeouts and removed per-GET .timeout(...) for range streaming.
  • Introduced HttpDownloadMonitor (a Read wrapper) to track moving-average throughput and emit debug logs on slow downloads.
  • Hardened HttpFile Read behavior and enhanced throughput logging after image streaming.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
crates/trident/src/io_utils/image_streamer.rs Adds Mbps throughput to the existing “Copied …” debug log.
crates/trident/src/io_utils/http/subfile.rs Adjusts subfile sizing and request construction; improves read error logging detail.
crates/trident/src/io_utils/http/mod.rs Wires in the new download monitor module and re-exports it.
crates/trident/src/io_utils/http/file.rs Uses connect-timeout client, wraps readers with HttpDownloadMonitor, and hardens Read impl behavior.
crates/trident/src/io_utils/http/download_monitor.rs New moving-average speed monitor wrapper with unit tests.
Comments suppressed due to low confidence (1)

crates/trident/src/io_utils/http/subfile.rs:252

  • Removing the per-request reqwest timeout means a stalled request (e.g., server accepts TCP but never sends headers, or body stops making progress) can block forever; retriable_request_sender can't enforce its overall timeout if req.send()/body streaming never returns. Please add a bounded timeout for at least the response/header phase or a no-progress/read timeout (potentially scaled to the requested range size) so downloads can’t hang indefinitely.
        let response = super::retriable_request_sender(
            || {
                let mut req = self.client.get(self.url.clone());

                if let Some(range) = range.to_header_value_option() {
                    req = req.header(RANGE, range);
                }

                if let Some(auth) = &self.authorization {
                    req = req.header(AUTHORIZATION, auth);
                }

                req.send()
            },
            self.timeout,
        )?;

Comment thread crates/trident/src/io_utils/http/file.rs
Comment thread crates/trident/src/io_utils/image_streamer.rs
Comment thread crates/trident/src/io_utils/http/subfile.rs
Comment thread crates/trident/src/io_utils/http/file.rs
Comment thread crates/trident/src/io_utils/http/file.rs
@frhuelsz
Copy link
Copy Markdown
Contributor Author

/AzurePipelines run [GITHUB]-trident-pr-e2e

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@frhuelsz
Copy link
Copy Markdown
Contributor Author

/AzurePipelines run [GITHUB]-trident-pr-e2e

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread crates/trident/src/io_utils/image_streamer.rs Outdated
Comment thread crates/trident/src/io_utils/http/subfile.rs
/// retries may result in additional requests.
fn read_to_end(&mut self, buf: &mut Vec<u8>) -> IoResult<usize> {
if self.position >= self.size {
return Ok(0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be error?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question for read()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, per read's docs, Ok(0) is the expected response for EOF.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is position being passed the size really the same as EOF?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe?

    if self.position == self.size {
        return Ok(0);
    if self.position > self.size {
        return Err(..);

Copy link
Copy Markdown
Contributor

@fintelia fintelia Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't self.position > self.size already be caught in seek? That seems like the more natural place to return an error

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know ... was just reacting to the code here :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, seek does cover all out-of-bounds requests. The check here >= and not just == for safety. But I can also see the argument for making this an error.

Copy link
Copy Markdown
Contributor Author

@frhuelsz frhuelsz Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From stdlib, the general convention does not appear to be to return errors in this case. Cursor has the same behavior as this where the position is clamped (via max(position, size) ), and Ok(0) is returned for any position >= size.

Same for BufReader. Here they add a debug_assert to check for the condition in debug builds.

fintelia
fintelia previously approved these changes Mar 23, 2026
Copilot AI review requested due to automatic review settings March 23, 2026 23:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves reliability and observability of large HTTP-backed image streaming by adjusting reqwest timeout behavior, adding a Read wrapper to monitor throughput, and hardening HTTP range-reading edge cases.

Changes:

  • Replace reqwest total request timeouts with connect-only timeout behavior for HTTP range requests.
  • Introduce ReadMonitor to detect/log slow streaming and wire it into image/ESP streaming paths.
  • Harden HttpFile/HttpSubFile behaviors around EOF, 0-length reads, and error logging.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
crates/trident_api/src/constants.rs Adds internal param keys for slow-stream reporting threshold/cadence.
crates/trident/src/subsystems/esp.rs Wraps ESP image stream with ReadMonitor during extraction.
crates/trident/src/osimage/cosi/mod.rs Simplifies reader creation (removes redundant boxing).
crates/trident/src/lib.rs Adds default slow-stream threshold/cadence constants.
crates/trident/src/io_utils/read_monitor.rs New ReadMonitor implementation + unit tests.
crates/trident/src/io_utils/mod.rs Exposes the new read_monitor module.
crates/trident/src/io_utils/image_streamer.rs Logs throughput (MB/s) after streaming completes.
crates/trident/src/io_utils/http/subfile.rs Adjusts subfile sizing, removes per-request timeout, improves error formatting.
crates/trident/src/io_utils/http/file.rs Uses connect-timeout client, adds 0-size handling, hardens Read impls.
crates/trident/src/engine/storage/image.rs Wraps OS image readers with ReadMonitor; makes deploy helper generic over Read.
crates/trident/src/engine/context/mod.rs Adds read_monitor_params() to read internal params with defaults.

Comment thread crates/trident/src/lib.rs
Comment thread crates/trident/src/io_utils/http/file.rs Outdated
Comment thread crates/trident/src/io_utils/http/file.rs Outdated
Comment thread crates/trident/src/io_utils/http/subfile.rs
Comment thread crates/trident_api/src/constants.rs Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 24, 2026 00:02
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@frhuelsz
Copy link
Copy Markdown
Contributor Author

/AzurePipelines run [GITHUB]-trident-pr-e2e

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Comment thread crates/trident_api/src/constants.rs
Comment thread crates/trident/src/lib.rs Outdated
Comment thread crates/trident/src/io_utils/http/file.rs Outdated
Comment thread crates/trident/src/engine/context/mod.rs
Comment thread crates/trident/src/io_utils/read_monitor.rs
Copilot AI review requested due to automatic review settings March 24, 2026 00:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

crates/trident/src/io_utils/http/subfile.rs:257

  • Removing the per-request .timeout(...) means a single req.send() can block indefinitely while waiting for response headers (or if the server stalls), and the retriable_request_sender timeout will not be able to interrupt it because it only checks elapsed time after request_sender() returns.

Suggestion: keep some bounded timeout for the send() phase (headers) and/or implement a progress-based timeout for body streaming (e.g., abort/retry if no bytes are read for N seconds), so the overall timeout argument still provides a hard upper bound on how long a read can hang.

        let response = super::retriable_request_sender(
            || {
                let mut req = self.client.get(self.url.clone());

                if let Some(range) = range.to_header_value_option() {
                    req = req.header(RANGE, range);
                }

                if let Some(auth) = &self.authorization {
                    req = req.header(AUTHORIZATION, auth);
                }

                req.send()
            },
            self.timeout,
        )?;

Comment thread crates/trident/src/lib.rs Outdated
Comment thread crates/trident/src/io_utils/image_streamer.rs
@frhuelsz
Copy link
Copy Markdown
Contributor Author

/AzurePipelines run [GITHUB]-trident-pr-e2e

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@frhuelsz frhuelsz merged commit b4c759e into main Mar 24, 2026
92 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants