-
Notifications
You must be signed in to change notification settings - Fork 135
fix(tunnel): implement ping pong #3491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11-19-chore_pegboard-gateway_add_new_message_id_format_add_deprecated_tunnel_ack
Are you sure you want to change the base?
fix(tunnel): implement ping pong #3491
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
Pull Request Review: Implement Ping/Pong for TunnelSummaryThis PR implements a ping/pong mechanism for the pegboard tunnel to monitor connection health and measure RTT between gateway and runner. Overall, this is a solid implementation that addresses tunnel health monitoring. ✅ Strengths
🔍 Issues & Suggestions1. Potential Race Condition in Ping Timeout Check
|
11b75bb to
73e19ae
Compare
759df3e to
e47241f
Compare
Code Review: Ping-Pong Implementation for Tunnel ProtocolThis PR implements a ping-pong mechanism to detect and handle tunnel timeouts. Overall, the implementation is solid with good separation of concerns. Strengths
Potential Issues1. Ping Timeout Timing Mismatch (Critical) The ping is sent every 3 seconds (UPDATE_PING_INTERVAL), but the timeout is 30 seconds (TUNNEL_PING_TIMEOUT). This means it would take 10 missed pings before timing out. Consider:
2. Initial Ping Timing In pegboard-gateway/src/ping_task.rs:13-21, the first ping is sent after waiting for UPDATE_PING_INTERVAL. This delays initial connection validation by 3 seconds. Consider sending the first ping immediately before entering the loop. 3. RTT Calculation Asymmetry In pegboard-runner/src/ws_to_tunnel_task.rs:109-110, the code assumes symmetric network latency (rtt = delta * 2). While common, this may be inaccurate in asymmetric network conditions. The comment is good, but this limitation should be noted. 4. Error Handling in Ping Task If send_and_check_ping returns an error, the ping task terminates without specific logging about why. Consider adding error context before returning. 5. Potential Race Condition In shared_state.rs:266-267, there's a potential race between the ping task reading last_pong and the pong handler updating it. However, this is likely acceptable since the check is conservative (only leads to false negatives). Code Quality Observations1. Duplicate Abort Logic The abort signal handling is duplicated across three join arms (lib.rs:435-473). Consider extracting to a helper function. 2. Magic Number in Metrics The 0.001 constant in shared_state.rs:270 converts milliseconds to seconds. Consider adding a named constant for clarity. Test CoverageMissing: No tests were added for the new ping-pong functionality. Consider adding:
Security & PerformanceSecurity: No concerns identified. Proper timestamp validation and clock skew handling. Performance: 3-second ping intervals are reasonable. Minimal metrics overhead. Good use of concurrent tasks. SummaryWell-structured implementation that adds necessary timeout detection. Main concerns:
The code follows repository conventions well and demonstrates good async Rust patterns. |
Pull Request Review: Implement Ping-Pong for TunnelOverall AssessmentThis is a well-structured implementation that adds a critical ping-pong mechanism to detect tunnel connection failures. The refactoring to separate tasks improves code organization and maintainability. Strengths
Issues and Concerns1. Potential Race Condition in Ping Timeout Check (High Priority)Location: engine/packages/pegboard-gateway/src/shared_state.rs:221 The ping task sends a ping every 3 seconds and immediately checks if the last pong was more than 30 seconds ago. However, the ping check happens AFTER sending the ping message which could lead to race conditions with message reordering or delivery delays. Recommendation: Consider checking the timeout BEFORE sending the ping, not after. This makes the logic clearer. 2. Ping Task Doesn't Send First Ping Immediately (Medium Priority)Location: engine/packages/pegboard-gateway/src/ping_task.rs:14-21 The ping task sleeps for 3 seconds before sending the first ping. If the connection is already dead, it takes 30 seconds to detect rather than 3 seconds. Recommendation: Consider sending the first ping immediately before entering the sleep loop. 3. RTT Calculation Assumes Symmetric Latency (Low Priority)Location: engine/packages/pegboard-runner/src/ws_to_tunnel_task.rs:109-110 The code multiplies delta by 2 assuming symmetric latency, which may not hold in real-world networks. Recommendation: Document this assumption more prominently or use one-way latency directly. 4. Missing Error Context in Metrics (Low Priority)Location: engine/packages/pegboard-gateway/src/shared_state.rs:270 The metric is recorded with no attributes, making it difficult to debug issues for specific gateways or runners. Recommendation: Consider adding attributes for better observability. 5. Protocol Version Compatibility (Medium Priority)The protocol change from ToRunnerKeepAlive to ToRunnerPing breaks backward compatibility. Recommendation: Document the minimum required protocol version for this feature and consider adding compatibility checks. 6. Inconsistent Import Ordering (Style)Location: engine/packages/pegboard-gateway/src/shared_state.rs:11 Per CLAUDE.md conventions, imports should be at the top of the file. Test CoverageMissing test scenarios:
Recommendation: Add integration tests to cover these scenarios. Security ConsiderationsNo security concerns identified. The ping-pong mechanism doesn't introduce any new attack surfaces. Performance Considerations
SummaryThis is a solid implementation that adds necessary tunnel health monitoring. The main concerns are:
The code is well-structured and follows the project's conventions. With the above issues addressed, this will be a valuable addition to the codebase. Great work overall! |
73e19ae to
1340648
Compare
e47241f to
6521b98
Compare
1340648 to
633be91
Compare
Code Review: Ping/Pong Implementation for TunnelsSummaryThis PR implements a ping/pong mechanism to monitor tunnel health between the gateway and pegboard runners. The implementation is well-structured with good separation of concerns through dedicated task modules. Overall, this is solid work with good architectural decisions. ✅ Strengths
🐛 Issues & ConcernsHigh Priority1. Potential Race Condition in In Entry::Occupied(mut entry) => {
entry.receiver_subject = receiver_subject;
entry.msg_tx = msg_tx;
entry.drop_tx = drop_tx;
entry.opened = false;
entry.message_index = 0;
// Missing: entry.last_pong = util::timestamp::now();Recommendation: Reset 2. Ping Task Error Handling Could Cause Silent Failures ( The ping task calls shared_state.send_and_check_ping(request_id).await?;Recommendation: Consider whether timeout detection should be handled more gracefully, perhaps with retry logic or exponential backoff before terminating the connection. 3. Missing RTT Tracking on Runner Side ( The runner responds to pings but doesn't track its own RTT or connection health. This means the runner has no way to detect if the gateway has gone away or if the connection is degraded. Recommendation: Consider implementing symmetric ping/pong where both sides track RTT and can detect connection issues. Medium Priority4. Hardcoded Timeout Constants Need Documentation const TUNNEL_PING_TIMEOUT: i64 = util::duration::seconds(30);
const UPDATE_PING_INTERVAL: Duration = Duration::from_secs(3);These constants define critical behavior but lack documentation explaining:
Recommendation: Add doc comments explaining the rationale and relationship between these values. 5. Inconsistent Timeout Type (
Recommendation: Use 6. Metrics Missing Attributes ( The comment says "Has no expected attributes" but adding attributes like Recommendation: Consider adding optional attributes to correlate metrics with specific gateways or connection states. 7. GC Debug Messages Use WebSocketMessageNotAcked {
#[allow(dead_code)]
first_msg_index: u16,
#[allow(dead_code)]
last_msg_index: u16,
},These fields are only used in debug output but marked as dead code. This suggests the debug formatting might not be as useful as intended. Recommendation: Either use these in structured logging or remove them if not providing value. 🔍 Minor Issues8. Code Style: Import Ordering ( use rivet_guard_core::{
WebSocketHandle,
custom_serve::{CustomServeTrait, HibernationResult},According to CLAUDE.md, imports should follow a consistent pattern. The 9. Commented Out Dead Code ( // fn wrapping_lt(a: u16, b: u16) -> bool {
// b.wrapping_sub(a) < u16::MAX / 2
// }Per CLAUDE.md: "Avoid backwards-compatibility hacks like... adding Recommendation: Remove the commented code. 10. Pattern Match Result Type Collision ( let msg = match versioned::ToRunner::deserialize_with_embedded_version(&ups_msg.payload) {
Result::Ok(x) => x,Using Recommendation: Verify if this is necessary or if standard ✨ Suggestions
📊 Test CoverageThe PR doesn't include tests for the ping/pong mechanism. Given this is a critical reliability feature, I'd strongly recommend adding:
Security & Performance✅ No security concerns identified ConclusionThis is a solid implementation that significantly improves tunnel reliability. The main concerns are:
Once the |
PR Review: fix(tunnel): implement ping pongSummaryThis PR implements a ping/pong mechanism for tunnel connections between the gateway and pegboard runner to detect connection failures and measure round-trip time (RTT). The implementation adds active health monitoring to replace the previous passive keep-alive approach. ✅ Strengths1. Good Code Organization
2. Proper Concurrency Handling
3. Observability
4. Protocol Design
|
Code Review: Ping-Pong Implementation for TunnelSummaryThis PR implements a WebSocket ping-pong mechanism between the pegboard gateway and runner to detect stale connections and measure round-trip time (RTT). The implementation adds proper connection health monitoring to the tunnel protocol. ✅ Strengths
🔍 Issues & Concerns1. Potential Ordering Issue in Abort Logic
|
6521b98 to
d6a2e79
Compare
633be91 to
4c35d85
Compare
Code Review: Implement Ping/Pong for Tunnel ConnectionSummaryThis PR implements a ping/pong mechanism to detect and handle stale tunnel connections between the pegboard gateway and runner. The implementation adds a periodic ping task that sends ping messages every 3 seconds and tracks RTT (Round Trip Time) metrics. ✅ Positive Observations
🔍 Issues and Concerns1. Critical: Ping Timing Issue
|
Pull Request Review: Implement Ping-Pong for Tunnel ConnectionsOverviewThis PR implements a ping-pong mechanism for tunnel connections between the gateway and pegboard runners. The implementation adds RTT (Round-Trip Time) tracking and timeout detection for WebSocket tunnels. Code Quality & Best PracticesPositives:
Issues:
Potential Bugs & Issues
Performance Considerations
Security Concerns
No critical security issues identified. Test Coverage
Recommended test structure: #[cfg(test)]
mod tests {
#[tokio::test]
async fn test_ping_timeout_detection() { /* ... */ }
#[tokio::test]
async fn test_clock_skew_handling() { /* ... */ }
#[tokio::test]
async fn test_rtt_calculation() { /* ... */ }
}Protocol ChangesThe protocol schema changes look good:
Question: Are there any backwards compatibility concerns with older runners/gateways that don't support ping-pong? Suggestions
SummaryOverall Assessment: This is a solid implementation of ping-pong for tunnel connections. The code is well-structured and handles most edge cases properly. Recommendation: Approve with minor fixes:
Risk Level: Low - The implementation is defensive and includes proper timeout handling. |
d6a2e79 to
adeb05d
Compare
4c35d85 to
879f607
Compare
PR Review: fix(tunnel): implement ping pongSummaryThis PR implements a ping/pong mechanism for the tunnel between the pegboard gateway and runner to detect connection failures and measure RTT. The implementation introduces a periodic ping task on both sides and updates the protocol to support ping/pong messages. Code Quality & Best Practices ✅Strengths:
Observations:
Potential Issues & Bugs 🔍1. Clock Skew Handling on Gateway Side
|
Pull Request Review: fix(tunnel): implement ping pongOverviewThis PR implements a ping/pong mechanism for the tunnel protocol between the pegboard gateway and runner to detect connection health and timeouts. The implementation adds a dedicated ping task that runs alongside the existing tunnel message forwarding tasks. Positive AspectsArchitecture & Design
Code Quality
Issues & Concerns1. Potential Race Condition in
|
879f607 to
f0f270e
Compare
adeb05d to
d6a2e79
Compare
PR Review: Implement Ping Pong for TunnelSummaryThis PR implements a ping/pong mechanism for the tunnel connection between the pegboard gateway and runner. The implementation adds health checking and RTT (round-trip time) monitoring to detect disconnected or slow connections. Code Quality & Best Practices ✅Strengths:
Minor Issues:
Potential Bugs & Issues
|
PR Review: Implement ping/pong for tunnel connectionsOverviewThis PR implements a ping/pong mechanism for tunnel connections between the gateway and pegboard runner, replacing the previous keep-alive system. This improves connection health monitoring and allows tracking of Round-Trip Time (RTT) metrics. Strengths✅ Good architectural improvements:
✅ Protocol enhancements:
✅ Code quality:
Issues & Concerns1. Potential Race Condition in Ping Timeout Logic
|

No description provided.