Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event Loop shutdown stuck due to uncompleted connect (Fixes #192) #193

Merged
merged 1 commit into from Feb 22, 2023

Conversation

franz1981
Copy link
Contributor

Motivation:

uncompleted CONNECT_SCHEDULE prevent force closing and issuing IORING_OP_CLOSE, causing a graceful shutdown to complete its quiet period and ignore any later CONNECT completion (at the mercy of OS configuration for connect timeout) due to already shutdown executor.
This will cause the executor to not be able to complete its cleanup because of unclosed channels and making it to hang awaiting an already happened CQE.

Modification:

Allow force close to initiate a close despite the uncompleted CONNECT_SCHEDULE, allowing it to issue a IORING_OP_CLOSE. When the FD close would complete, a later CQE of CONNECT_SCHEDULE on the closed FD will be ignored because its channel is already deregistered.

Result:

Prompt channel close on short timeout (regardless peer presence)

@franz1981
Copy link
Contributor Author

I've chosen the easy path here: force close just mark the socket to be closed and prompt initiate a deregistration,
without delaying the close operation: when the CQEs of both CONNECT and CLOSE arrive (in this order), they will be ignored because the channel of the FD they refer to is already gone.

@franz1981 franz1981 force-pushed the force_close_on_connect_scheduled branch from 018c251 to b077404 Compare February 18, 2023 22:09
Motivation:

uncompleted CONNECT_SCHEDULE prevent force closing and issuing IORING_OP_CLOSE, causing a graceful shutdown
to complete its quiet period and ignore any later CONNECT completion (at the mercy of OS configuration for connect timeout)
due to already shutdown executor.
This will cause the executor to not be able to complete its cleanup because of unclosed channels and making it to hang awaiting
an already happened CQE.

Modification:

Allow force close to initiate a close despite the uncompleted CONNECT_SCHEDULE, allowing it to issue a IORING_OP_CLOSE.
When the FD close would complete, a later CQE of CONNECT_SCHEDULE on the closed FD will be ignored because its channel is already deregistered.

Result:

Prompt channel close on short timeout (regardless peer presence)
@franz1981 franz1981 force-pushed the force_close_on_connect_scheduled branch from b077404 to 410c9d2 Compare February 18, 2023 22:10
@franz1981 franz1981 changed the title Event Loop shutdown stuck due to uncompleted connect (#Fixes 192) Event Loop shutdown stuck due to uncompleted connect (Fixes #192) Feb 18, 2023
@franz1981
Copy link
Contributor Author

franz1981 commented Feb 19, 2023

I see that we still have

        final void processDelayedClose() {
            ChannelPromise promise = delayedClose;
            if (promise != null && (ioState & (READ_SCHEDULED | WRITE_SCHEDULED | CONNECT_SCHEDULED)) == 0) {
                delayedClose = null;
                forceClose(promise);
            }
        }

I see no harm to remove CONNECT_SCHEDULED from here as well, although it seems less impactful, or I have still to think to a test case where it can be important.

@normanmaurer any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants