engineering: Multiple http streaming improvements#560
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves HTTP streaming robustness and observability in Trident’s HTTP-backed readers, aiming to avoid self-imposed timeouts on large range downloads while adding throughput monitoring and better logging.
Changes:
- Switched from reqwest total request timeouts to connect-only timeouts and removed per-GET
.timeout(...)for range streaming. - Introduced
HttpDownloadMonitor(aReadwrapper) to track moving-average throughput and emit debug logs on slow downloads. - Hardened
HttpFileReadbehavior and enhanced throughput logging after image streaming.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/trident/src/io_utils/image_streamer.rs | Adds Mbps throughput to the existing “Copied …” debug log. |
| crates/trident/src/io_utils/http/subfile.rs | Adjusts subfile sizing and request construction; improves read error logging detail. |
| crates/trident/src/io_utils/http/mod.rs | Wires in the new download monitor module and re-exports it. |
| crates/trident/src/io_utils/http/file.rs | Uses connect-timeout client, wraps readers with HttpDownloadMonitor, and hardens Read impl behavior. |
| crates/trident/src/io_utils/http/download_monitor.rs | New moving-average speed monitor wrapper with unit tests. |
Comments suppressed due to low confidence (1)
crates/trident/src/io_utils/http/subfile.rs:252
- Removing the per-request
reqwesttimeout means a stalled request (e.g., server accepts TCP but never sends headers, or body stops making progress) can block forever;retriable_request_sendercan't enforce its overall timeout ifreq.send()/body streaming never returns. Please add a bounded timeout for at least the response/header phase or a no-progress/read timeout (potentially scaled to the requested range size) so downloads can’t hang indefinitely.
let response = super::retriable_request_sender(
|| {
let mut req = self.client.get(self.url.clone());
if let Some(range) = range.to_header_value_option() {
req = req.header(RANGE, range);
}
if let Some(auth) = &self.authorization {
req = req.header(AUTHORIZATION, auth);
}
req.send()
},
self.timeout,
)?;
|
/AzurePipelines run [GITHUB]-trident-pr-e2e |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/AzurePipelines run [GITHUB]-trident-pr-e2e |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| /// retries may result in additional requests. | ||
| fn read_to_end(&mut self, buf: &mut Vec<u8>) -> IoResult<usize> { | ||
| if self.position >= self.size { | ||
| return Ok(0); |
There was a problem hiding this comment.
No, per read's docs, Ok(0) is the expected response for EOF.
There was a problem hiding this comment.
is position being passed the size really the same as EOF?
There was a problem hiding this comment.
maybe?
if self.position == self.size {
return Ok(0);
if self.position > self.size {
return Err(..);
There was a problem hiding this comment.
Shouldn't self.position > self.size already be caught in seek? That seems like the more natural place to return an error
There was a problem hiding this comment.
i don't know ... was just reacting to the code here :)
There was a problem hiding this comment.
Indeed, seek does cover all out-of-bounds requests. The check here >= and not just == for safety. But I can also see the argument for making this an error.
There was a problem hiding this comment.
From stdlib, the general convention does not appear to be to return errors in this case. Cursor has the same behavior as this where the position is clamped (via max(position, size) ), and Ok(0) is returned for any position >= size.
Same for BufReader. Here they add a debug_assert to check for the condition in debug builds.
There was a problem hiding this comment.
Pull request overview
This PR improves reliability and observability of large HTTP-backed image streaming by adjusting reqwest timeout behavior, adding a Read wrapper to monitor throughput, and hardening HTTP range-reading edge cases.
Changes:
- Replace reqwest total request timeouts with connect-only timeout behavior for HTTP range requests.
- Introduce
ReadMonitorto detect/log slow streaming and wire it into image/ESP streaming paths. - Harden
HttpFile/HttpSubFilebehaviors around EOF, 0-length reads, and error logging.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/trident_api/src/constants.rs | Adds internal param keys for slow-stream reporting threshold/cadence. |
| crates/trident/src/subsystems/esp.rs | Wraps ESP image stream with ReadMonitor during extraction. |
| crates/trident/src/osimage/cosi/mod.rs | Simplifies reader creation (removes redundant boxing). |
| crates/trident/src/lib.rs | Adds default slow-stream threshold/cadence constants. |
| crates/trident/src/io_utils/read_monitor.rs | New ReadMonitor implementation + unit tests. |
| crates/trident/src/io_utils/mod.rs | Exposes the new read_monitor module. |
| crates/trident/src/io_utils/image_streamer.rs | Logs throughput (MB/s) after streaming completes. |
| crates/trident/src/io_utils/http/subfile.rs | Adjusts subfile sizing, removes per-request timeout, improves error formatting. |
| crates/trident/src/io_utils/http/file.rs | Uses connect-timeout client, adds 0-size handling, hardens Read impls. |
| crates/trident/src/engine/storage/image.rs | Wraps OS image readers with ReadMonitor; makes deploy helper generic over Read. |
| crates/trident/src/engine/context/mod.rs | Adds read_monitor_params() to read internal params with defaults. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
/AzurePipelines run [GITHUB]-trident-pr-e2e |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
crates/trident/src/io_utils/http/subfile.rs:257
- Removing the per-request
.timeout(...)means a singlereq.send()can block indefinitely while waiting for response headers (or if the server stalls), and theretriable_request_sendertimeout will not be able to interrupt it because it only checks elapsed time afterrequest_sender()returns.
Suggestion: keep some bounded timeout for the send() phase (headers) and/or implement a progress-based timeout for body streaming (e.g., abort/retry if no bytes are read for N seconds), so the overall timeout argument still provides a hard upper bound on how long a read can hang.
let response = super::retriable_request_sender(
|| {
let mut req = self.client.get(self.url.clone());
if let Some(range) = range.to_header_value_option() {
req = req.header(RANGE, range);
}
if let Some(auth) = &self.authorization {
req = req.header(AUTHORIZATION, auth);
}
req.send()
},
self.timeout,
)?;
|
/AzurePipelines run [GITHUB]-trident-pr-e2e |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Changes
Replace total request timeout with connect-only timeout (
file.rs)error decoding response body: TimedOutbecause reqwest's.timeout()on each GET request applied to the entire transfer including body streaming — not just the connection. On slow or congested links, the body could not be fully received within the timeout window, causing the download to fail.HttpSubFile's retry logic would take over and continue the download, but this would lead to a lot of requests and logs that could be interpreted to mean that trident was failing, but it was just a self-imposed timeout.Client::new()→ClientBuilder::new().connect_timeout(10s)so only connection establishment is bounded, not body streaming..timeout(self.timeout)from individual GET requests inHttpSubFile.retriable_request_senderretry loop still bounds connection/header-level retries.Add
ReadMonitor(read_monitor.rs, new file)Readwrapper that tracks download speed using a moving average over the last 10 reads (ring buffer).debug!logs when speed drops below a configurable threshold (default: 15 Mbps), including progress %, throughput, and estimated time remaining.Wrap
section_reader/complete_readerwithHttpDownloadMonitor(file.rs)HttpDownloadMonitor<HttpSubFile>for automatic slow-download detection during streaming.Use
saturating_subinHttpSubFile::size()(subfile.rs)start > end.Expand error formatting in
HttpSubFile::read()(subfile.rs){e}→{e:?}to show the full error chain (e.g.,reqwest::Error { kind: Body, source: TimedOut }).Log throughput after image streaming (
image_streamer.rs)Harden
HttpFileReadimpl (file.rs)read(): clamp the requested size to remaining bytes and returnOk(0)at EOF, preventing out-of-bounds range requests.read_exact(): returnUnexpectedEofearly if the buffer is larger than the remaining file, instead of issuing a doomed range request.read_to_end(): returnOk(0)at EOF instead of requesting a zero-length range.section_reader()/complete_reader()for efficiency.