Skip to content

fix(transport): cap close-handshake timeout to prevent disconnect() b…#79

Merged
brendanobra merged 6 commits intomainfrom
bugfix/RDKEMW-15695
Mar 30, 2026
Merged

fix(transport): cap close-handshake timeout to prevent disconnect() b…#79
brendanobra merged 6 commits intomainfrom
bugfix/RDKEMW-15695

Conversation

@brendanobra
Copy link
Copy Markdown
Contributor

Fixes RDKEMW-15695

When the Firebolt gateway process was unresponsive but the TCP connection remained open, calling disconnect() would block the caller for ~5 seconds before returning. The root cause was that close() initiates an async WebSocket CLOSE handshake and the subsequent connectionThread_.join() blocks until the ASIO io_service exits — which only happens once the handshake completes or its timeout fires. websocketpp's default close_handshake_timeout is 5 000 ms; with a hung gateway that full timeout elapsed on every call.

Fix: immediately before calling close(), retrieve the connection pointer via get_con_from_hdl() and call set_close_handshake_timeout(2000) to cap the wait at 2 s. get_con_from_hdl() is wrapped in try/catch because it throws bad_weak_ptr when the connection has already been torn down at the network level, in which case close() fails through its error_code path and join() returns promptly regardless.

A component test (TransportDisconnectTimeoutComponentTest) is added to regression-test this. It uses a raw TCP server (SilentAfterUpgradeServer) that accepts the WebSocket HTTP upgrade but then discards all incoming bytes without ever sending a CLOSE response — the exact freeze scenario. The test asserts that disconnect() returns in under 3 s. Without the fix, the test fails at ~5 001 ms; with the fix it completes in ~2 003 ms.

Also fixes a pre-existing build failure in helperTest.cpp where Return() was used with std::future (a move-only type). Changed to Return(ByMove(...)) which is correct for move-only return values in GMock.

…locking

Fixes RDKEMW-15695

When the Firebolt gateway process was unresponsive but the TCP connection
remained open, calling disconnect() would block the caller for ~5 seconds
before returning. The root cause was that close() initiates an async
WebSocket CLOSE handshake and the subsequent connectionThread_.join() blocks
until the ASIO io_service exits — which only happens once the handshake
completes or its timeout fires. websocketpp's default close_handshake_timeout
is 5 000 ms; with a hung gateway that full timeout elapsed on every call.

Fix: immediately before calling close(), retrieve the connection pointer via
get_con_from_hdl() and call set_close_handshake_timeout(2000) to cap the wait
at 2 s. get_con_from_hdl() is wrapped in try/catch because it throws
bad_weak_ptr when the connection has already been torn down at the network
level, in which case close() fails through its error_code path and join()
returns promptly regardless.

A component test (TransportDisconnectTimeoutComponentTest) is added to
regression-test this. It uses a raw TCP server (SilentAfterUpgradeServer) that
accepts the WebSocket HTTP upgrade but then discards all incoming bytes without
ever sending a CLOSE response — the exact freeze scenario. The test asserts
that disconnect() returns in under 3 s. Without the fix, the test fails at
~5 001 ms; with the fix it completes in ~2 003 ms.

Also fixes a pre-existing build failure in helperTest.cpp where Return() was
used with std::future (a move-only type). Changed to Return(ByMove(...)) which
is correct for move-only return values in GMock.
Copilot AI review requested due to automatic review settings March 27, 2026 21:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Caps the WebSocket close-handshake timeout during Transport::disconnect() to prevent long blocking waits when the gateway stops responding but keeps the TCP connection open (RDKEMW-15695), and adds a regression test to ensure disconnect returns promptly in that scenario.

Changes:

  • Set close_handshake_timeout to 2000ms immediately before issuing client_->close() in Transport::disconnect().
  • Add a component-style unit test using a raw TCP “silent after upgrade” server to reproduce and prevent the disconnect stall.
  • Fix GMock usage for returning move-only std::future values by switching to Return(ByMove(...)).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/transport.cpp Caps websocketpp close-handshake timeout to limit disconnect() blocking time.
test/unit/transportTest.cpp Adds a regression test server + test case for the “ignore CLOSE handshake” scenario.
test/unit/helperTest.cpp Fixes mocking of move-only futures using ByMove.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/transport.cpp Outdated
Comment thread test/unit/transportTest.cpp
Comment thread test/unit/transportTest.cpp Outdated
Comment thread src/transport.cpp
@brendanobra
Copy link
Copy Markdown
Contributor Author

@copilot apply changes based on the comments in this thread

@brendanobra
Copy link
Copy Markdown
Contributor Author

@copilot apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 30, 2026

@copilot apply changes based on the comments in this thread

Applied all review changes in commit 41fc8f6:

  • src/transport.cpp: Added #include <memory> (explicit header for std::bad_weak_ptr); narrowed the catch clause from std::exception to std::bad_weak_ptr so unexpected exceptions are not silently swallowed.
  • test/unit/transportTest.cpp: Added #include <future> explicitly; changed SilentAfterUpgradeServer to bind to ephemeral port 0 with a port() getter, and updated the test to use the dynamically assigned port instead of hardcoded 9005.

Copilot AI review requested due to automatic review settings March 30, 2026 15:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/unit/transportTest.cpp Outdated
Comment thread test/unit/transportTest.cpp
Comment thread src/transport.cpp
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread fmt.sh
Comment thread .github/copilot-instructions.md
Comment thread test.sh
Comment thread test.sh
Comment thread fmt.sh Outdated
@brendanobra brendanobra requested a review from satlead March 30, 2026 17:39
@brendanobra brendanobra merged commit 4b74a1f into main Mar 30, 2026
10 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants