-
-
Notifications
You must be signed in to change notification settings - Fork 107
refactor: remove redundant inline timeout check from connect op #2175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: remove redundant inline timeout check from connect op #2175
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a 30-second timeout mechanism to detect and fail stalled connect operations when a joiner receives no acceptances or progress updates. The timeout prevents operations from waiting indefinitely when no peers respond.
Key changes:
- Added
JOINER_PROGRESS_TIMEOUTconstant (30 seconds) andhas_timed_out()method toJoinerState - Added timeout check in
process_messagethat returnsOpError::Timeoutwhen timeout is exceeded - Added
Timeoutvariant to theOpErrorenum
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/core/src/operations/mod.rs | Adds Timeout error variant to OpError enum for operation timeout failures |
| crates/core/src/operations/connect.rs | Implements timeout detection for joiner operations in WaitingForResponses state with 30-second progress timeout |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Check for joiner timeout before processing any message | ||
| if self.gateway.is_some() { | ||
| if let Some(ConnectState::WaitingForResponses(ref state)) = self.state { | ||
| if state.has_timed_out(Instant::now()) { | ||
| tracing::warn!( | ||
| tx = %self.id, | ||
| last_progress_secs = state.last_progress.elapsed().as_secs(), | ||
| accepted_count = state.accepted.len(), | ||
| "connect: joiner timed out waiting for responses" | ||
| ); | ||
| return Err(OpError::Timeout); | ||
| } | ||
| } | ||
| } |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout check only executes when a message is received. If a joiner receives no acceptances and no messages for 30 seconds (the exact scenario described in the PR description), process_message will never be called, so this timeout check will never execute and the operation will wait indefinitely.
Consider implementing the timeout via one of these approaches:
- A periodic background task that checks for timed-out operations
- A tokio timer/timeout wrapper around the operation
- A timeout future that races with message processing
The current implementation only protects against stalls between messages, not the absence of messages entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed this is not the place for this... we have a background task checking for transactions which may have timed out so this is kind of redundant actually. If we are not cleaning up, that's the place where we should be looking at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood - I'll remove the inline timeout check. The existing op_state_manager background task already handles transaction timeouts via ttl_set (lines 708-785 in op_state_manager.rs), which covers Connect operations along with all other transaction types.
Removing the redundant code now.
[AI-assisted - Claude]
| /// Returns true if no progress has been made within the timeout period. | ||
| pub(crate) fn has_timed_out(&self, now: Instant) -> bool { | ||
| now.duration_since(self.last_progress) >= JOINER_PROGRESS_TIMEOUT | ||
| } |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new timeout functionality lacks test coverage. Consider adding a unit test that verifies:
has_timed_out()returnsfalsewhenlast_progressis recenthas_timed_out()returnstruewhenlast_progressexceedsJOINER_PROGRESS_TIMEOUTlast_progressis updated correctly when progress occurs (acceptance or observed address update)
Example test structure:
#[test]
fn joiner_state_timeout_detection() {
let old_instant = Instant::now() - JOINER_PROGRESS_TIMEOUT - Duration::from_secs(1);
let state = JoinerState {
target_connections: 1,
observed_address: None,
accepted: HashSet::new(),
last_progress: old_instant,
};
assert!(state.has_timed_out(Instant::now()));
let recent_state = JoinerState {
target_connections: 1,
observed_address: None,
accepted: HashSet::new(),
last_progress: Instant::now(),
};
assert!(!recent_state.has_timed_out(Instant::now()));
}d596c99 to
872bee8
Compare
67ec62c to
f67d5c0
Compare
872bee8 to
26e7660
Compare
f67d5c0 to
8596569
Compare
26e7660 to
e960382
Compare
8596569 to
ef7454b
Compare
e960382 to
99ab273
Compare
ef7454b to
86c1367
Compare
99ab273 to
01943b7
Compare
86c1367 to
605d86e
Compare
01943b7 to
3359306
Compare
605d86e to
3c6f81f
Compare
d01029d to
0be445d
Compare
3c6f81f to
ef6bce1
Compare
The JoinerState tracks last_progress but never enforced a timeout. If a connect operation received no acceptances, it would wait indefinitely. Now we check for timeout at the start of process_message: - If gateway is set (joiner) and in WaitingForResponses state - And last_progress exceeds JOINER_PROGRESS_TIMEOUT (30s) - Return OpError::Timeout to fail the operation This prevents indefinitely stalled connect operations from blocking new connection attempts. Fixes the WaitingForResponses timeout issue from #2173. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
3e9a61f to
29ccbbd
Compare
Per Nacho's review: the op_state_manager already has a background task that handles transaction timeouts via ttl_set. Adding inline timeout checks in process_message is redundant and not the right place for this. Removes the timeout additions from the previous commit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Consolidates changes from PRs #2172, #2174, and #2175: This builds on PR #2191 (wire protocol cleanup) and adds: - Fix seeding/subscribe operations to handle PeerAddr::Unknown for NAT scenarios - Gateway properly fills in observed addresses from packet source - Improved subscriber address tracking in seeding manager - Update live_tx and connection tests for new address model NOTE: This PR requires review - previous PRs (#2174, #2175) had CHANGES_REQUESTED from Nacho. Submitting consolidated changes for fresh review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Adds NAT address handling to subscribe/seeding operations: - Subscribers with PeerAddr::Unknown have their address filled in by gateway - Gateway observes real UDP source address and updates subscriber address - SeedingManager tracks subscriber addresses properly - live_tx tests updated for new address model - In-memory testing infrastructure updated for PeerAddr Supersedes PRs #2172, #2174, #2175 (which had changes requested). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Context
PR #2174 initially added inline timeout checks to the connect operation's
process_messagemethod. However, Nacho pointed out in review that this is redundant.Why This Is Redundant
The
op_state_managermodule already has a background task that handles transaction timeouts viattl_set(seeop_state_manager.rslines 708-785). This handles timeouts for all operation types including Connect, so adding inline timeout checks in individual operation handlers is unnecessary duplication.This PR
Removes the timeout additions that were added in the previous PR, per Nacho's feedback.
Stack
This PR stacks on #2174.
[AI-assisted - Claude]