Skip to content

Make reconnect timeout mandatory with 5-minute default#1443

Merged
kixelated merged 3 commits into
mainfrom
claude/beautiful-curie-RGTRf
May 22, 2026
Merged

Make reconnect timeout mandatory with 5-minute default#1443
kixelated merged 3 commits into
mainfrom
claude/beautiful-curie-RGTRf

Conversation

@kixelated
Copy link
Copy Markdown
Collaborator

Summary

Changes the reconnect timeout from an optional field to a mandatory one with a sensible 5-minute default. This ensures reconnection attempts always have a bounded timeout rather than potentially retrying indefinitely.

Changes

Rust (rs/moq-native/src/reconnect.rs)

  • Changed Backoff.timeout from Option<Duration> to Duration with a default_value = "5m" in the CLI argument parser
  • Updated the default implementation to use Duration::from_secs(300) (5 minutes) instead of None
  • Simplified timeout check logic: removed the if let Some(timeout) guard since timeout is now always present
  • Enhanced error reporting: the closed_rx watch channel now carries Option<String> (the error message) instead of a boolean flag, allowing callers to see the actual connection error that triggered the timeout
  • Track the most recent connection error in last_error and include it in the timeout error context for better diagnostics
  • Updated closed() method to extract and return the error message from the watch channel

TypeScript (js/net/src/connection/reload.ts)

  • Changed ReloadDelay.timeout from optional to mandatory with a default of 300000 ms (5 minutes)
  • Updated the default delay object in the Reload constructor to include timeout: 300000
  • Simplified timeout check logic: removed the if (this.delay.timeout !== undefined) guard
  • Improved error reporting: when timeout is exceeded, now passes the actual connection error to #closedReject instead of a generic "reconnect timed out" message

Implementation Details

  • Both implementations now guarantee that reconnection attempts will eventually give up, preventing infinite retry loops
  • Error context is preserved and propagated to callers, making it easier to diagnose why reconnection failed
  • The 5-minute default provides a reasonable balance between resilience and resource usage
  • Changes maintain backward compatibility in behavior while simplifying the API surface

https://claude.ai/code/session_01HdKf7HkEmT2pJr9Fs143yh

claude added 2 commits May 22, 2026 19:03
The retry timeout was optional and defaulted to infinite, so a permanently
down server caused the reload loop to spin forever. When the timeout did
fire it returned a generic "reconnect timed out" message, hiding the
underlying connect failure.

Default the timeout to 5 minutes in both the Rust Backoff and the JS
ReloadDelay, and propagate the most recent connect error through to the
closed() future / closed promise so callers can see why the loop stopped.
@kixelated kixelated marked this pull request as ready for review May 22, 2026 20:29
Comment thread js/net/src/connection/reload.ts Outdated
timeout?: DOMHighResTimeStamp;
// Resets after each successful connection. Set to 0 for unlimited retries.
// default: 300000 (5 minutes)
timeout: DOMHighResTimeStamp;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the timeout? API. Set it to 5 minutes if undefined.

Comment thread rs/moq-native/src/reconnect.rs Outdated
pub struct Reconnect {
abort: tokio::task::AbortHandle,
closed_rx: tokio::sync::watch::Receiver<bool>,
closed_rx: tokio::sync::watch::Receiver<Option<String>>,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyhow::Error instead of a String?

- JS: keep ReloadDelay.timeout optional, default to 5 minutes when undefined.
- Rust: store Arc<anyhow::Error> in the closed channel instead of String, so
  the structured error is preserved instead of being stringified at the
  boundary.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Review Change Stack

Walkthrough

This pull request unifies timeout handling for reconnection logic across JavaScript and Rust implementations. The timeout configuration changes from an optional field (where omission disabled timeouts) to a required Duration field with documented defaults of 300000 ms (JavaScript) and 5 minutes (Rust), where zero enables unlimited retries. The reconnection loops now track the most recent connection error and propagate it through watch channel signaling when timeouts are exceeded, replacing generic timeout error messages with the actual connection failure. Both timeout enforcement paths and the closed() method are updated to use the new configuration field and error-aware propagation logic.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: making reconnect timeout mandatory with a 5-minute default, which is the core objective across both Rust and TypeScript implementations.
Description check ✅ Passed The description comprehensively details the changes in both implementations, explaining the rationale, implementation details, and error handling improvements related to the reconnect timeout changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/beautiful-curie-RGTRf
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch claude/beautiful-curie-RGTRf

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kixelated kixelated enabled auto-merge (squash) May 22, 2026 20:41
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
rs/moq-native/src/reconnect.rs (1)

152-159: 🏗️ Heavy lift

Cover the new timeout contract, not just the default value.

This test only proves the default is 300s. The behavioral changes here are timeout == 0 meaning unlimited retries and closed() surfacing the last connection error on timeout; those paths need regression tests.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rs/moq-native/src/reconnect.rs` around lines 152 - 159, Add tests that cover
the new timeout contract beyond Backoff::default(): create one test (e.g.,
test_backoff_unlimited_timeout) that constructs a Backoff with timeout ==
Duration::ZERO and verifies retry behavior does not stop due to timeout
(simulate advancing the backoff or counting retry attempts to ensure retries
continue); and create another test (e.g.,
test_backoff_closed_returns_last_error) that exercises Backoff::closed() by
driving retries until a timeout is reached and asserting closed() returns the
last connection error observed. Use the existing Backoff struct and methods
(Backoff::default(), Backoff::closed()) and simulate failures/advancing time as
done elsewhere in reconnect.rs to trigger the two distinct code paths (timeout
== 0 unlimited retries, and timeout expiry returning last error).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@js/net/src/connection/reload.ts`:
- Around line 132-137: Compute the remaining timeout budget (remaining = timeout
- (performance.now() - this.#retryStart)) before scheduling the next retry; if
remaining <= 0 call this.#closedReject(...) immediately, otherwise pass
Math.min(remaining, desiredDelay) to the effect.timer call so the next wait is
clamped to the remaining time. Update the logic around timeout, this.delay and
the effect.timer invocation (use this.delay or the delay value you currently
compute for the next retry) and keep using this.#retryStart and
this.#closedReject for consistency.

In `@rs/moq-native/src/reconnect.rs`:
- Around line 92-96: The loop currently sleeps for the full delay after a failed
connect, which can exceed the configured backoff.timeout; modify the reconnect
loop around where delay is awaited (referencing backoff.delay, backoff.timeout,
retry_start, and last_error) to compute the remaining_time =
backoff.timeout.saturating_sub(retry_start.elapsed()) and sleep for
min(backoff.delay, remaining_time) (or skip sleeping if remaining_time is_zero),
so the sleep is capped to the remaining timeout budget and the function returns
the timed-out Err using last_error when no time remains.

---

Nitpick comments:
In `@rs/moq-native/src/reconnect.rs`:
- Around line 152-159: Add tests that cover the new timeout contract beyond
Backoff::default(): create one test (e.g., test_backoff_unlimited_timeout) that
constructs a Backoff with timeout == Duration::ZERO and verifies retry behavior
does not stop due to timeout (simulate advancing the backoff or counting retry
attempts to ensure retries continue); and create another test (e.g.,
test_backoff_closed_returns_last_error) that exercises Backoff::closed() by
driving retries until a timeout is reached and asserting closed() returns the
last connection error observed. Use the existing Backoff struct and methods
(Backoff::default(), Backoff::closed()) and simulate failures/advancing time as
done elsewhere in reconnect.rs to trigger the two distinct code paths (timeout
== 0 unlimited retries, and timeout expiry returning last error).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7a49ef0a-2d8a-46a4-bfff-6b0b5754a3d3

📥 Commits

Reviewing files that changed from the base of the PR and between c42187a and 7ae13f2.

📒 Files selected for processing (2)
  • js/net/src/connection/reload.ts
  • rs/moq-native/src/reconnect.rs

Comment on lines +132 to +137
const timeout = this.delay.timeout ?? 300000;
if (timeout > 0) {
const elapsed = performance.now() - this.#retryStart;
if (elapsed >= this.delay.timeout) {
if (elapsed >= timeout) {
console.warn("reconnect timed out");
this.#closedReject(new Error("reconnect timed out"));
this.#closedReject(err instanceof Error ? err : new Error(String(err)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clamp the next retry to the remaining timeout budget.

This check only rejects after a failed attempt. If this.#delay is larger than timeout - elapsed, the following effect.timer(...) can keep the loop alive well past the documented "maximum total time" before closed rejects.

Suggested fix
-				const timeout = this.delay.timeout ?? 300000;
-				if (timeout > 0) {
-					const elapsed = performance.now() - this.#retryStart;
-					if (elapsed >= timeout) {
-						console.warn("reconnect timed out");
-						this.#closedReject(err instanceof Error ? err : new Error(String(err)));
-						return;
-					}
-				}
+				const timeout = this.delay.timeout ?? 300000;
+				let retryDelay = this.#delay;
+				if (timeout > 0) {
+					const elapsed = performance.now() - this.#retryStart;
+					const remaining = timeout - elapsed;
+					if (remaining <= 0) {
+						console.warn("reconnect timed out");
+						this.#closedReject(err instanceof Error ? err : new Error(String(err)));
+						return;
+					}
+					retryDelay = Math.min(retryDelay, remaining);
+				}
 
 				const tick = this.#tick.peek() + 1;
-				effect.timer(() => this.#tick.update((prev) => Math.max(prev, tick)), this.#delay);
+				effect.timer(() => this.#tick.update((prev) => Math.max(prev, tick)), retryDelay);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const timeout = this.delay.timeout ?? 300000;
if (timeout > 0) {
const elapsed = performance.now() - this.#retryStart;
if (elapsed >= this.delay.timeout) {
if (elapsed >= timeout) {
console.warn("reconnect timed out");
this.#closedReject(new Error("reconnect timed out"));
this.#closedReject(err instanceof Error ? err : new Error(String(err)));
const timeout = this.delay.timeout ?? 300000;
let retryDelay = this.#delay;
if (timeout > 0) {
const elapsed = performance.now() - this.#retryStart;
const remaining = timeout - elapsed;
if (remaining <= 0) {
console.warn("reconnect timed out");
this.#closedReject(err instanceof Error ? err : new Error(String(err)));
return;
}
retryDelay = Math.min(retryDelay, remaining);
}
const tick = this.#tick.peek() + 1;
effect.timer(() => this.#tick.update((prev) => Math.max(prev, tick)), retryDelay);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@js/net/src/connection/reload.ts` around lines 132 - 137, Compute the
remaining timeout budget (remaining = timeout - (performance.now() -
this.#retryStart)) before scheduling the next retry; if remaining <= 0 call
this.#closedReject(...) immediately, otherwise pass Math.min(remaining,
desiredDelay) to the effect.timer call so the next wait is clamped to the
remaining time. Update the logic around timeout, this.delay and the effect.timer
invocation (use this.delay or the delay value you currently compute for the next
retry) and keep using this.#retryStart and this.#closedReject for consistency.

Comment on lines +92 to +96
if !backoff.timeout.is_zero() && retry_start.elapsed() > backoff.timeout {
let timeout = backoff.timeout;
return Err(last_error
.map(|e| e.context(format!("reconnect timed out after {timeout:?}")))
.unwrap_or_else(|| anyhow::anyhow!("reconnect timed out after {timeout:?}")));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cap the backoff sleep to the remaining timeout budget.

The timeout is checked before the next connect attempt, but the loop still sleeps for the full delay after each failure. If only 2 seconds remain and delay is 30 seconds, this can exceed the configured maximum retry time by another 28 seconds before returning.

Suggested fix
-			if !backoff.timeout.is_zero() && retry_start.elapsed() > backoff.timeout {
-				let timeout = backoff.timeout;
-				return Err(last_error
-					.map(|e| e.context(format!("reconnect timed out after {timeout:?}")))
-					.unwrap_or_else(|| anyhow::anyhow!("reconnect timed out after {timeout:?}")));
-			}
+			let mut sleep_for = delay;
+			if !backoff.timeout.is_zero() {
+				let elapsed = retry_start.elapsed();
+				let timeout = backoff.timeout;
+				if elapsed >= timeout {
+					return Err(last_error
+						.map(|e| e.context(format!("reconnect timed out after {timeout:?}")))
+						.unwrap_or_else(|| anyhow::anyhow!("reconnect timed out after {timeout:?}")));
+				}
+				sleep_for = sleep_for.min(timeout.saturating_sub(elapsed));
+			}
@@
-					tokio::time::sleep(delay).await;
+					tokio::time::sleep(sleep_for).await;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if !backoff.timeout.is_zero() && retry_start.elapsed() > backoff.timeout {
let timeout = backoff.timeout;
return Err(last_error
.map(|e| e.context(format!("reconnect timed out after {timeout:?}")))
.unwrap_or_else(|| anyhow::anyhow!("reconnect timed out after {timeout:?}")));
let mut sleep_for = delay;
if !backoff.timeout.is_zero() {
let elapsed = retry_start.elapsed();
let timeout = backoff.timeout;
if elapsed >= timeout {
return Err(last_error
.map(|e| e.context(format!("reconnect timed out after {timeout:?}")))
.unwrap_or_else(|| anyhow::anyhow!("reconnect timed out after {timeout:?}")));
}
sleep_for = sleep_for.min(timeout.saturating_sub(elapsed));
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rs/moq-native/src/reconnect.rs` around lines 92 - 96, The loop currently
sleeps for the full delay after a failed connect, which can exceed the
configured backoff.timeout; modify the reconnect loop around where delay is
awaited (referencing backoff.delay, backoff.timeout, retry_start, and
last_error) to compute the remaining_time =
backoff.timeout.saturating_sub(retry_start.elapsed()) and sleep for
min(backoff.delay, remaining_time) (or skip sleeping if remaining_time is_zero),
so the sleep is capped to the remaining timeout budget and the function returns
the timed-out Err using last_error when no time remains.

@kixelated kixelated merged commit cf12478 into main May 22, 2026
1 check passed
@kixelated kixelated deleted the claude/beautiful-curie-RGTRf branch May 22, 2026 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants