Skip to content

8373409: java/net/httpclient/http3/H3ErrorHandlingTest.java failed due to deadlock#28788

Closed
djelinski wants to merge 4 commits intoopenjdk:masterfrom
djelinski:h3-deadlock
Closed

8373409: java/net/httpclient/http3/H3ErrorHandlingTest.java failed due to deadlock#28788
djelinski wants to merge 4 commits intoopenjdk:masterfrom
djelinski:h3-deadlock

Conversation

@djelinski
Copy link
Member

@djelinski djelinski commented Dec 12, 2025

This PR fixes a deadlock between the localConnectionIdManager and the connections map by closing the manager before calling connections.compute.

No new tests; the issue requires a complex setup to reproduce, and the new code is easy enough to reason about. Existing tests continue to pass.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8373409: java/net/httpclient/http3/H3ErrorHandlingTest.java failed due to deadlock (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28788/head:pull/28788
$ git checkout pull/28788

Update a local copy of the PR:
$ git checkout pull/28788
$ git pull https://git.openjdk.org/jdk.git pull/28788/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28788

View PR using the GUI difftool:
$ git pr show -t 28788

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28788.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 12, 2025

👋 Welcome back djelinski! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 12, 2025

@djelinski This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8373409: java/net/httpclient/http3/H3ErrorHandlingTest.java failed due to deadlock

Reviewed-by: dfuchs

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 85 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot changed the title 8373409 8373409: java/net/httpclient/http3/H3ErrorHandlingTest.java failed due to deadlock Dec 12, 2025
@openjdk openjdk bot added the net net-dev@openjdk.org label Dec 12, 2025
@openjdk
Copy link

openjdk bot commented Dec 12, 2025

@djelinski The following label will be automatically applied to this pull request:

  • net

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@djelinski
Copy link
Member Author

djelinski commented Dec 12, 2025

I checked all other uses of CHM.compute in the java.net.http module; most of them are trivially correct. The ones in AltServiceRegistry are not trivial, and might need to be replaced with something easier to reason about. I'll look into that.

EDIT - checked the AltServiceRegistry; compute is only used on a (non-concurrent) HashMap while holding a lock, and the lambdas only use functions internal to the registry. This should be fine as well.

@djelinski djelinski marked this pull request as ready for review December 12, 2025 12:37
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 12, 2025
@mlbridge
Copy link

mlbridge bot commented Dec 12, 2025

Webrevs

DrainingConnection draining = new DrainingConnection(connection.connectionIds(), idleTimeout);
// we can ignore stateless reset in the draining state.
remapPeerIssuedResetToken(connection, draining);
draining.startTimer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we start the timer only if the connection has been added, and therefore call startTimer in compute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not call this from compute; every method called from the compute lambda increases the risk of reintroducing the deadlock, so I'd like to keep the lambda to a minimum.

Most of the time the connection will be added; the only case where it won't is when there are multiple threads attempting to close the connection in parallel. The timer task only removes the connection from the endpoint, so, worst case, we will remove the connection IDs and the reset tokens twice. The second removal will likely be a no-op, unless we somehow manage to reassign the IDs or the tokens to a different connection.

Copy link
Member

@dfuch dfuch Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A look at startTimer lets me think that it should be safe, but OK. I suspect the double removal would not be an issue - IIRC we check with == on the removed connection, but we would be potentially adding an event to the timer queue which will wake up the timer queue for nothing. We could add a boolean field to ClosingConnection that we could set to true if the connection is added and check that after compute has been called, and start the timer then. I'll let you decide if it's worth it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check with ==; technically we could, but with multiple threads accessing the map, I'm not sure if we would guarantee that all connection IDs are unmapped when the connection is removed.

Multiple threads racing to update the connection map should be rare. Most of the time the compute calls will replace the connection. so the extra check is probably not worth the effort.

I'll move the startTimer call after the connection map updates; I observed occasional failures to update the map, because the timer fired before the map was updated.

connection.localConnectionIdManager().close();
var closingConnection = new ClosingConnection(connection.connectionIds(), idleTimeout, datagram);
remapPeerIssuedResetToken(connection, closingConnection);
closingConnection.startTimer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here.

@dfuch
Copy link
Member

dfuch commented Dec 15, 2025

I checked all other uses of CHM.compute in the java.net.http module; most of them are trivially correct. The ones in AltServiceRegistry are not trivial, and might need to be replaced with something easier to reason about. I'll look into that.

EDIT - checked the AltServiceRegistry; compute is only used on a (non-concurrent) HashMap while holding a lock, and the lambdas only use functions internal to the registry. This should be fine as well.

Thanks for checking the other uses of CHM.compute @djelinski !

Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 16, 2025
@djelinski
Copy link
Member Author

Thanks @dfuch for the review!

/integrate

@openjdk
Copy link

openjdk bot commented Dec 17, 2025

Going to push as commit 386ad61.
Since your change was applied there have been 89 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 17, 2025
@openjdk openjdk bot closed this Dec 17, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 17, 2025
@openjdk
Copy link

openjdk bot commented Dec 17, 2025

@djelinski Pushed as commit 386ad61.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@djelinski djelinski deleted the h3-deadlock branch December 17, 2025 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integrated Pull request has been integrated net net-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

2 participants