fix(transport): cap close-handshake timeout to prevent disconnect() b…#79
fix(transport): cap close-handshake timeout to prevent disconnect() b…#79brendanobra merged 6 commits intomainfrom
Conversation
…locking Fixes RDKEMW-15695 When the Firebolt gateway process was unresponsive but the TCP connection remained open, calling disconnect() would block the caller for ~5 seconds before returning. The root cause was that close() initiates an async WebSocket CLOSE handshake and the subsequent connectionThread_.join() blocks until the ASIO io_service exits — which only happens once the handshake completes or its timeout fires. websocketpp's default close_handshake_timeout is 5 000 ms; with a hung gateway that full timeout elapsed on every call. Fix: immediately before calling close(), retrieve the connection pointer via get_con_from_hdl() and call set_close_handshake_timeout(2000) to cap the wait at 2 s. get_con_from_hdl() is wrapped in try/catch because it throws bad_weak_ptr when the connection has already been torn down at the network level, in which case close() fails through its error_code path and join() returns promptly regardless. A component test (TransportDisconnectTimeoutComponentTest) is added to regression-test this. It uses a raw TCP server (SilentAfterUpgradeServer) that accepts the WebSocket HTTP upgrade but then discards all incoming bytes without ever sending a CLOSE response — the exact freeze scenario. The test asserts that disconnect() returns in under 3 s. Without the fix, the test fails at ~5 001 ms; with the fix it completes in ~2 003 ms. Also fixes a pre-existing build failure in helperTest.cpp where Return() was used with std::future (a move-only type). Changed to Return(ByMove(...)) which is correct for move-only return values in GMock.
There was a problem hiding this comment.
Pull request overview
Caps the WebSocket close-handshake timeout during Transport::disconnect() to prevent long blocking waits when the gateway stops responding but keeps the TCP connection open (RDKEMW-15695), and adds a regression test to ensure disconnect returns promptly in that scenario.
Changes:
- Set
close_handshake_timeoutto 2000ms immediately before issuingclient_->close()inTransport::disconnect(). - Add a component-style unit test using a raw TCP “silent after upgrade” server to reproduce and prevent the disconnect stall.
- Fix GMock usage for returning move-only
std::futurevalues by switching toReturn(ByMove(...)).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/transport.cpp |
Caps websocketpp close-handshake timeout to limit disconnect() blocking time. |
test/unit/transportTest.cpp |
Adds a regression test server + test case for the “ignore CLOSE handshake” scenario. |
test/unit/helperTest.cpp |
Fixes mocking of move-only futures using ByMove. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot apply changes based on the comments in this thread |
|
@copilot apply changes based on the comments in this thread |
…emeral port Agent-Logs-Url: https://github.com/rdkcentral/firebolt-cpp-transport/sessions/31eb894b-d97e-42be-968b-3b2e76a05dba Co-authored-by: brendanobra <740575+brendanobra@users.noreply.github.com>
Applied all review changes in commit 41fc8f6:
|
fail fast(er))
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fixes RDKEMW-15695
When the Firebolt gateway process was unresponsive but the TCP connection remained open, calling disconnect() would block the caller for ~5 seconds before returning. The root cause was that close() initiates an async WebSocket CLOSE handshake and the subsequent connectionThread_.join() blocks until the ASIO io_service exits — which only happens once the handshake completes or its timeout fires. websocketpp's default close_handshake_timeout is 5 000 ms; with a hung gateway that full timeout elapsed on every call.
Fix: immediately before calling close(), retrieve the connection pointer via get_con_from_hdl() and call set_close_handshake_timeout(2000) to cap the wait at 2 s. get_con_from_hdl() is wrapped in try/catch because it throws bad_weak_ptr when the connection has already been torn down at the network level, in which case close() fails through its error_code path and join() returns promptly regardless.
A component test (TransportDisconnectTimeoutComponentTest) is added to regression-test this. It uses a raw TCP server (SilentAfterUpgradeServer) that accepts the WebSocket HTTP upgrade but then discards all incoming bytes without ever sending a CLOSE response — the exact freeze scenario. The test asserts that disconnect() returns in under 3 s. Without the fix, the test fails at ~5 001 ms; with the fix it completes in ~2 003 ms.
Also fixes a pre-existing build failure in helperTest.cpp where Return() was used with std::future (a move-only type). Changed to Return(ByMove(...)) which is correct for move-only return values in GMock.