Fix daemon client timeout and improve network timeout UX#39
Conversation
Increase daemon client read timeout from 500ms to 30s so commands like screenshot that take >500ms no longer time out when routed through the daemon. Also bump default --network-timeout from 5s to 10s, and add timeout_reached/hint fields to network JSON output so users know when results may be truncated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis change addresses daemon client timeout issues and improves network timeout UX. It increases the default navigation network timeout from 5 to 10 seconds, extends network event handling to track timeout occurrence, propagates timeout information through the call chain to summary generation, increases the daemon server client TCP read timeout from 500ms to 30 seconds, and adds an end-to-end test for daemon screenshot functionality. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR addresses daemon-mode command timeouts (notably screenshot) by increasing the daemon’s client socket read timeout, and improves navigate --with-network UX by increasing the default network collection timeout and exposing timeout truncation signals in JSON output.
Changes:
- Increase daemon client read timeout from 500ms to 30s to prevent daemon-mode operations from timing out.
- Bump
--network-timeoutdefault from 5s to 10s and addtimeout_reached+ conditionalhintin network summaries. - Add a daemon parity e2e test for
screenshotand unit tests forbuild_network_summarytimeout behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| kb/iterations/iteration-38-daemon-client-timeout.md | Iteration plan/notes documenting the timeout chain and proposed fixes. |
| crates/ff-rdp-cli/tests/daemon_parity_e2e_test.rs | Adds daemon-mode screenshot e2e test using the mock RDP server. |
| crates/ff-rdp-cli/src/daemon/server.rs | Raises daemon per-client socket read timeout to 30s. |
| crates/ff-rdp-cli/src/commands/network.rs | Extends network summary to include timeout_reached and optional hint, updates tests/callers. |
| crates/ff-rdp-cli/src/commands/network_events.rs | Adds timeout_reached return value to the timed drain API. |
| crates/ff-rdp-cli/src/commands/navigate.rs | Threads timeout_reached through navigate-with-network output pipeline. |
| crates/ff-rdp-cli/src/cli/args.rs | Changes default --network-timeout from 5000ms to 10000ms. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| last_recv_was_event = false; | ||
| match transport.recv() { | ||
| Ok(msg) => { | ||
| last_recv_was_event = true; | ||
| let msg_type = msg.get("type").and_then(Value::as_str).unwrap_or_default(); |
There was a problem hiding this comment.
In drain_network_events_timed, last_recv_was_event is set to true for any received message, even if msg_type is not a network resource event. This can incorrectly mark timeout_reached as true due to unrelated traffic (e.g., navigateTo ack or console events), causing misleading output/hints. Consider only setting the flag when the received message is a resources-available-array or resources-updated-array (or rename the flag to reflect “any message”).
| if use_detail { | ||
| let controls = OutputControls::from_cli(cli, SortDir::Desc); | ||
| let mut detail = network_entries; | ||
| if cli.sort.is_none() { | ||
| let dir = controls.sort_dir; | ||
| detail.sort_by(|a, b| { | ||
| let da = a["duration_ms"].as_f64().unwrap_or(0.0); | ||
| let db = b["duration_ms"].as_f64().unwrap_or(0.0); | ||
| let cmp = da.partial_cmp(&db).unwrap_or(std::cmp::Ordering::Equal); | ||
| match dir { | ||
| SortDir::Asc => cmp, | ||
| SortDir::Desc => cmp.reverse(), | ||
| } | ||
| }); | ||
| } else { | ||
| controls.apply_sort(&mut detail); | ||
| } | ||
| let (limited, total, truncated) = controls.apply_limit(detail, Some(20)); | ||
| let limited = controls.apply_fields(limited); | ||
| if truncated { | ||
| let shown = limited.len(); | ||
| json!({ | ||
| "entries": limited, | ||
| "shown": shown, | ||
| "total": total, | ||
| "truncated": true, | ||
| "hint": format!("showing {shown} of {total}, use --all for complete list"), | ||
| }) | ||
| } else { | ||
| json!(limited) | ||
| } | ||
| } else { | ||
| super::network::build_network_summary(&network_entries) | ||
| super::network::build_network_summary(&network_entries, timeout_reached) | ||
| } |
There was a problem hiding this comment.
apply_network_controls forwards timeout_reached only into the summary path. In detail mode (array output, or the {entries, shown, ...} truncation object) the timeout_reached signal and timeout hint are lost, which conflicts with the PR goal of making truncation detectable for users/LLMs. Consider surfacing timeout_reached (and hint when true) alongside detail results as well (e.g., wrap detail output in an object that includes the entries plus these fields, or add separate fields at the parent navigate result level).
| // Kill daemon before asserting so cleanup always happens, even on panic. | ||
| daemon.kill(); | ||
| let _ = mock_handle.join(); | ||
|
|
||
| // Clean up the PNG file regardless of test outcome. | ||
| let _ = std::fs::remove_file(&out_path); | ||
|
|
||
| assert!( | ||
| output.status.success(), | ||
| "daemon screenshot must succeed; stderr: {}", | ||
| String::from_utf8_lossy(&output.stderr) | ||
| ); | ||
|
|
||
| let json: serde_json::Value = | ||
| serde_json::from_slice(&output.stdout).expect("stdout must be valid JSON"); | ||
|
|
||
| assert_eq!(json["total"], 1, "total should be 1"); | ||
| assert_eq!(json["results"]["width"], 1, "width should match fixture"); | ||
| assert_eq!(json["results"]["height"], 1, "height should match fixture"); | ||
| assert!( | ||
| json["results"]["bytes"].as_u64().unwrap_or(0) > 0, | ||
| "bytes should be non-zero" | ||
| ); | ||
| let path_str = json["results"]["path"].as_str().unwrap_or(""); | ||
| assert!( | ||
| std::path::Path::new(path_str) | ||
| .extension() | ||
| .is_some_and(|ext| ext.eq_ignore_ascii_case("png")), | ||
| "results.path should end with .png; got: {path_str}" | ||
| ); |
There was a problem hiding this comment.
The test deletes out_path before performing assertions, so it never verifies that the screenshot file was actually created/written. It also only checks that results.path ends with .png, not that it matches the requested --output path. Consider asserting the file exists (and optionally has non-zero length / PNG header) and that the returned path equals out_path, then removing it after assertions.
| stream | ||
| .set_read_timeout(Some(Duration::from_millis(500))) | ||
| .set_read_timeout(Some(Duration::from_millis(30_000))) | ||
| .context("setting client read timeout")?; |
There was a problem hiding this comment.
Increasing the per-client socket read timeout to 30s means an idle client handler can block in recv_from for up to 30s before noticing state.shutdown is set. This can noticeably delay daemon shutdown/cleanup while a client is connected but idle. If the goal is to avoid timing out long-running commands while still polling shutdown promptly, consider decoupling the shutdown poll interval from the command/IO timeout (e.g., keep a short poll interval and handle long waits differently, or reduce this to a smaller value).
| let server = screenshot_daemon_server(); | ||
| let mock_port = server.port(); | ||
| let mock_handle = std::thread::spawn(move || server.serve_one()); | ||
|
|
There was a problem hiding this comment.
This new daemon screenshot e2e test doesn’t appear to exercise the original 500ms-timeout failure mode: the mock server replies/followups are sent immediately, and the screenshot fixture payload is tiny. With the previous 500ms read timeout, this test would likely still pass. Consider making the mock server intentionally delay the evaluationResult followup (e.g., sleep >500ms before sending it) so the test would fail on the old behavior and protect against regressions.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/ff-rdp-cli/src/commands/network_events.rs (1)
73-107:⚠️ Potential issue | 🟠 MajorOnly count network watcher frames toward
timeout_reached.
last_recv_was_eventflips totruefor every successfulrecv(), but in daemon mode the same socket can also deliverconsoleAPICall/pageErrorforwarded bycrates/ff-rdp-cli/src/daemon/server.rs:228-233. A late console push would therefore settimeout_reached=trueeven after network traffic finished, which makes the new hint inaccurate. Please set the flag only forresources-available-array/resources-updated-arrayand add a regression test around a non-network frame near the deadline.💡 Suggested fix
- let mut last_recv_was_event = false; + let mut last_recv_was_network_event = false; @@ - last_recv_was_event = false; + last_recv_was_network_event = false; match transport.recv() { Ok(msg) => { - last_recv_was_event = true; let msg_type = msg.get("type").and_then(Value::as_str).unwrap_or_default(); match msg_type { "resources-available-array" => { + last_recv_was_network_event = true; all_resources.extend(parse_network_resources(&msg)); } "resources-updated-array" => { + last_recv_was_network_event = true; all_updates.extend(parse_network_resource_updates(&msg)); } _ => {} } } @@ - Ok((all_resources, all_updates, last_recv_was_event)) + Ok((all_resources, all_updates, last_recv_was_network_event))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/ff-rdp-cli/src/commands/network_events.rs` around lines 73 - 107, The loop currently sets last_recv_was_event = true for any successful transport.recv(), causing non-network frames (e.g., consoleAPICall/pageError) to count as network activity; change the code so last_recv_was_event is set only inside the match arms for "resources-available-array" and "resources-updated-array" (where parse_network_resources / parse_network_resource_updates are called) instead of immediately after Ok(msg), and ensure Err(ProtocolError::Timeout) handling is unchanged; add a regression test that simulates a non-network frame arriving just before the total timeout and asserts that the returned last_recv_was_event (used for timeout_reached) remains false.
🧹 Nitpick comments (1)
crates/ff-rdp-cli/tests/daemon_parity_e2e_test.rs (1)
287-310: This mock never reaches the timeout-causing screenshot path.The server only exercises the
evaluateJSAsync/drawWindow flow.crates/ff-rdp-cli/src/commands/screenshot.rs:22-53falls back toprepareCapture+capturewhen that returnsnull, and that's the path described in this PR's notes as timing out in daemon mode. A regression there would still leave this test green. If this test is meant to lock in the bug, please force the fallback path and, ideally, delay the final reply past 500ms.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/ff-rdp-cli/tests/daemon_parity_e2e_test.rs` around lines 287 - 310, The mock server in screenshot_daemon_server currently exercises the evaluateJSAsync/drawWindow flow so the fallback path (prepareCapture + capture) and its timeout behavior are never tested; modify screenshot_daemon_server to force the fallback by having the evaluateJSAsync response return null (e.g., change the fixture used for "evaluateJSAsync" to one that yields null or adjust the on_with_followup to simulate a null evaluationResult) so the code paths in crates/ff-rdp-cli/src/commands/screenshot.rs that call prepareCapture and capture are exercised, and additionally make the final reply from the mock delayed beyond 500ms (adjust the follow-up fixture or server response timing) to reproduce the timeout behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/ff-rdp-cli/tests/daemon_parity_e2e_test.rs`:
- Around line 328-377: The test currently deletes out_path before verifying the
daemon actually wrote the PNG; move the cleanup
(std::fs::remove_file(&out_path)) to after the assertions and add an assertion
that the path reported in json["results"]["path"] exists and is a PNG file
(e.g., check std::path::Path::exists and either the .extension() equals "png" or
read the file header bytes for the PNG magic) using the path_str or out_path
variable before calling remove_file; update references around out_path, json,
and remove_file so the file existence/PNG check runs after parsing output but
before cleanup.
In `@kb/iterations/iteration-38-daemon-client-timeout.md`:
- Around line 23-40: Add language identifiers to the fenced code blocks in
kb/iterations/iteration-38-daemon-client-timeout.md: mark the top error block
(```
error: screenshot: screenshotActor.capture failed (operation timed out)
```) as e.g. `text` or `console`, and mark the "Current timeout chain" block
(the triple-backtick list showing CLI → daemon → Firefox timeouts) as `text` or
`console` as well so markdownlint stops flagging them; update both fenced blocks
to use the chosen language tags.
- Around line 96-100: The output currently always sets timeout_reached; change
the serialization/emit logic in the network command so timeout_reached is true
only when the collection actually hit the deadline while events were still
streaming, and false (and no hint string) when the collection finished naturally
before the deadline; locate and modify the code that builds/emits the JSON (look
for symbols like NetworkCollectionResult / timeout_reached field or the function
that writes the navigate --with-network output in network.rs) and update that
branch so tests that assert timeout_reached == false on the non-timeout path
pass without changing the tests.
---
Outside diff comments:
In `@crates/ff-rdp-cli/src/commands/network_events.rs`:
- Around line 73-107: The loop currently sets last_recv_was_event = true for any
successful transport.recv(), causing non-network frames (e.g.,
consoleAPICall/pageError) to count as network activity; change the code so
last_recv_was_event is set only inside the match arms for
"resources-available-array" and "resources-updated-array" (where
parse_network_resources / parse_network_resource_updates are called) instead of
immediately after Ok(msg), and ensure Err(ProtocolError::Timeout) handling is
unchanged; add a regression test that simulates a non-network frame arriving
just before the total timeout and asserts that the returned last_recv_was_event
(used for timeout_reached) remains false.
---
Nitpick comments:
In `@crates/ff-rdp-cli/tests/daemon_parity_e2e_test.rs`:
- Around line 287-310: The mock server in screenshot_daemon_server currently
exercises the evaluateJSAsync/drawWindow flow so the fallback path
(prepareCapture + capture) and its timeout behavior are never tested; modify
screenshot_daemon_server to force the fallback by having the evaluateJSAsync
response return null (e.g., change the fixture used for "evaluateJSAsync" to one
that yields null or adjust the on_with_followup to simulate a null
evaluationResult) so the code paths in
crates/ff-rdp-cli/src/commands/screenshot.rs that call prepareCapture and
capture are exercised, and additionally make the final reply from the mock
delayed beyond 500ms (adjust the follow-up fixture or server response timing) to
reproduce the timeout behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a201b72b-b8e3-498f-8634-fb2cfa7e8ce7
📒 Files selected for processing (7)
crates/ff-rdp-cli/src/cli/args.rscrates/ff-rdp-cli/src/commands/navigate.rscrates/ff-rdp-cli/src/commands/network.rscrates/ff-rdp-cli/src/commands/network_events.rscrates/ff-rdp-cli/src/daemon/server.rscrates/ff-rdp-cli/tests/daemon_parity_e2e_test.rskb/iterations/iteration-38-daemon-client-timeout.md
| // Use a unique temp path for the output PNG so tests don't collide. | ||
| let ts = std::time::SystemTime::now() | ||
| .duration_since(std::time::UNIX_EPOCH) | ||
| .expect("system time") | ||
| .as_nanos(); | ||
| let out_path = std::env::temp_dir().join(format!("daemon_screenshot_{ts}.png")); | ||
|
|
||
| let mut args = daemon_args(mock_port); | ||
| args.extend([ | ||
| "screenshot".to_owned(), | ||
| "--output".to_owned(), | ||
| out_path.to_string_lossy().into_owned(), | ||
| ]); | ||
|
|
||
| let output = Command::new(ff_rdp_bin()) | ||
| .env("HOME", home.path()) | ||
| .env("USERPROFILE", home.path()) | ||
| .args(&args) | ||
| .output() | ||
| .expect("failed to spawn ff-rdp"); | ||
|
|
||
| // Kill daemon before asserting so cleanup always happens, even on panic. | ||
| daemon.kill(); | ||
| let _ = mock_handle.join(); | ||
|
|
||
| // Clean up the PNG file regardless of test outcome. | ||
| let _ = std::fs::remove_file(&out_path); | ||
|
|
||
| assert!( | ||
| output.status.success(), | ||
| "daemon screenshot must succeed; stderr: {}", | ||
| String::from_utf8_lossy(&output.stderr) | ||
| ); | ||
|
|
||
| let json: serde_json::Value = | ||
| serde_json::from_slice(&output.stdout).expect("stdout must be valid JSON"); | ||
|
|
||
| assert_eq!(json["total"], 1, "total should be 1"); | ||
| assert_eq!(json["results"]["width"], 1, "width should match fixture"); | ||
| assert_eq!(json["results"]["height"], 1, "height should match fixture"); | ||
| assert!( | ||
| json["results"]["bytes"].as_u64().unwrap_or(0) > 0, | ||
| "bytes should be non-zero" | ||
| ); | ||
| let path_str = json["results"]["path"].as_str().unwrap_or(""); | ||
| assert!( | ||
| std::path::Path::new(path_str) | ||
| .extension() | ||
| .is_some_and(|ext| ext.eq_ignore_ascii_case("png")), | ||
| "results.path should end with .png; got: {path_str}" |
There was a problem hiding this comment.
The test removes the PNG before proving it exists.
remove_file(&out_path) runs before any filesystem assertion, so this still passes if the command reports a .png path but never writes it. Please assert the file exists and is a PNG before cleanup.
💡 Suggested fix
- let out_path = std::env::temp_dir().join(format!("daemon_screenshot_{ts}.png"));
+ let out_dir = tempfile::tempdir().expect("temp screenshot dir");
+ let out_path = out_dir.path().join(format!("daemon_screenshot_{ts}.png"));
@@
- // Clean up the PNG file regardless of test outcome.
- let _ = std::fs::remove_file(&out_path);
-
assert!(
output.status.success(),
"daemon screenshot must succeed; stderr: {}",
String::from_utf8_lossy(&output.stderr)
);
+ let bytes = std::fs::read(&out_path).expect("screenshot file should exist");
+ assert!(
+ bytes.starts_with(b"\x89PNG\r\n\x1a\n"),
+ "saved screenshot should be a PNG"
+ );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/ff-rdp-cli/tests/daemon_parity_e2e_test.rs` around lines 328 - 377,
The test currently deletes out_path before verifying the daemon actually wrote
the PNG; move the cleanup (std::fs::remove_file(&out_path)) to after the
assertions and add an assertion that the path reported in
json["results"]["path"] exists and is a PNG file (e.g., check
std::path::Path::exists and either the .extension() equals "png" or read the
file header bytes for the PNG magic) using the path_str or out_path variable
before calling remove_file; update references around out_path, json, and
remove_file so the file existence/PNG check runs after parsing output but before
cleanup.
| ``` | ||
| error: screenshot: screenshotActor.capture failed (operation timed out) | ||
| ``` | ||
|
|
||
| **Root cause:** The daemon's `handle_client()` in `server.rs` sets a **hardcoded 500ms read timeout** on the client TCP stream (line ~567). Screenshot's two-step protocol (`prepareCapture` → `capture`) takes >500ms on Firefox 149, so the client socket times out before the daemon relays Firefox's response. | ||
|
|
||
| Direct mode uses the CLI's `--timeout` (default 5000ms), giving Firefox enough time. | ||
|
|
||
| ## Context | ||
|
|
||
| ### Current timeout chain (daemon mode) | ||
| ``` | ||
| CLI --timeout 5000ms (unused — only affects direct Firefox connection) | ||
| ↓ | ||
| Client → Daemon TCP read timeout: 500ms (hardcoded in handle_client) | ||
| ↓ | ||
| Daemon → Firefox TCP read timeout: 1000ms (for shutdown-flag polling) | ||
| ``` |
There was a problem hiding this comment.
Add languages to the fenced blocks.
markdownlint is already flagging both examples here. Use something like console / text so the docs stay lint-clean.
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 23-23: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 34-34: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@kb/iterations/iteration-38-daemon-client-timeout.md` around lines 23 - 40,
Add language identifiers to the fenced code blocks in
kb/iterations/iteration-38-daemon-client-timeout.md: mark the top error block
(```
error: screenshot: screenshotActor.capture failed (operation timed out)
```) as e.g. `text` or `console`, and mark the "Current timeout chain" block
(the triple-backtick list showing CLI → daemon → Firefox timeouts) as `text` or
`console` as well so markdownlint stops flagging them; update both fenced blocks
to use the chosen language tags.
| ### Part B: Network timeout UX | ||
| - [x] Default `--network-timeout` is 10000ms | ||
| - [x] `navigate --with-network` output includes `"timeout_reached": true` when deadline was hit while events were still flowing | ||
| - [x] `navigate --with-network` output includes `"hint"` string when `timeout_reached` is true | ||
| - [x] No hint/timeout_reached when collection finishes naturally before the deadline |
There was a problem hiding this comment.
The acceptance criteria disagree with the shipped JSON.
crates/ff-rdp-cli/src/commands/network.rs now always emits timeout_reached, and the new tests assert false on the non-timeout path. As written, “No hint/timeout_reached” can never be satisfied. This should say “no hint and timeout_reached=false”, or the code/tests need to change instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@kb/iterations/iteration-38-daemon-client-timeout.md` around lines 96 - 100,
The output currently always sets timeout_reached; change the serialization/emit
logic in the network command so timeout_reached is true only when the collection
actually hit the deadline while events were still streaming, and false (and no
hint string) when the collection finished naturally before the deadline; locate
and modify the code that builds/emits the JSON (look for symbols like
NetworkCollectionResult / timeout_reached field or the function that writes the
navigate --with-network output in network.rs) and update that branch so tests
that assert timeout_reached == false on the non-timeout path pass without
changing the tests.
- Only set last_recv_was_event for actual network event messages (resources-available-array / resources-updated-array), not unrelated traffic, to avoid false timeout_reached signals - Assert screenshot PNG file exists before cleanup in daemon e2e test - Compare results.path against requested output path - Canonicalize temp dir to handle macOS /var → /private/var symlink Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
screenshot(which take >500ms for the two-step capture protocol) no longer time out when routed through the daemon--network-timeoutfrom 5s to 10s — real-world pages like comparis.ch need ~8s to capture most requeststimeout_reachedboolean andhintstring to network JSON output so users/LLMs can detect when results may be truncated and know to increase the timeoutscreenshotthrough the daemon using the mock serverTest plan
daemon_screenshot_saves_png_filee2e test passesbuild_network_summarywith timeout_reached true/falsecargo fmt/cargo clippy/cargo testall cleanff-rdp --port 6000 screenshot --output /tmp/test.pngvia daemon🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Tests
Documentation