fix(runtime): harden connmgr teardown drop contract#949
Merged
Conversation
Defense-in-depth so drop does not rely on callers having already closed the transport before dropping a ConnectionActor. Previously ConnectionActor::drop only set reader_stop and called join, which hangs if the reader thread is blocked inside recv() and the transport connection was never explicitly closed (e.g., a late add that was rejected after teardown drain but before close reached the actor, or a future code path that forgets the close-then-drop ordering). Changes: - Add transport (*mut HewTransport) and transport_closed (AtomicBool) fields to ConnectionActor. transport is set in hew_connmgr_add; null for test-only actors created directly (null-safe via close_transport_conn). - Add ConnectionActor::close_transport() which swaps transport_closed true before calling close_transport_conn, guarding against double- close. - Drop now calls reader_stop then close_transport() then join, so a reader blocked in recv() is always unblocked before the join. - hew_connmgr_free and hew_connmgr_remove: set transport_closed=true before the existing explicit close_transport_conn call so drop skips the redundant close. - install_connection_actor error path: set transport_closed=true on the rejected actor before the explicit close so drop is a no-op on the transport. - hew_connmgr_add thread-spawn failure path: use actor.close_transport() to mark and close in one step. New regression tests: - conn_actor_drop_closes_transport_before_join: verifies that dropping an actor with a running reader (blocked waiting for a channel close signal) unblocks and joins via the drop's close_transport() call. - connmgr_free_rejects_concurrent_add_and_closes_transport: verifies that hew_connmgr_add returns -1 and closes the transport when the shutdown flag is set, and that hew_connmgr_free completes cleanly.
…nmgr_add Before this commit, hew_connmgr_add had three failure paths that did NOT close conn_id (mutex-poisoned, duplicate, and two callers post-failure), and two that DID (shutdown fence in install_connection_actor and early-shutdown check), with reconnect_worker_loop and hew_node::hew_node_connect also closing on -1 return. HewTransportOps::close_conn has no idempotence contract, so the overlapping closes were a double-close hazard. New invariant: hew_connmgr_add owns conn_id from entry. On success: ownership transfers to the manager (closed by remove/free). On failure: hew_connmgr_add closes conn_id before returning -1. Exception: if mgr is null (API misuse), no transport pointer is available; the caller retains ownership (documented in the null-check comment). Changes: - connection.rs: add close_transport_conn in the mutex-poisoned and duplicate early-return paths so all non-null-mgr failure paths close conn_id. - connection.rs: document ownership contract on hew_connmgr_add entry. - connection.rs: remove the post-failure close from reconnect_worker_loop (hew_connmgr_add now owns it). - hew_node.rs: remove the post-failure close from hew_node_connect (hew_connmgr_add now owns it). The actor-drop defense-in-depth (transport_closed AtomicBool) is unchanged. install_connection_actor still marks transport_closed=true before closing on its rejection path, preventing a second close through actor Drop. Validation: cargo test -p hew-runtime — 1067 passed, 0 failed.
c08c823 to
05dff1c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hew_connmgr_addduring teardown so late reconnect installs are rejected fail-closedConnectionActor::dropbefore join as defense in depthhew_connmgr_addthe single owner of failed-install cleanup and add focused teardown regressionsValidation