Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8277969: HttpClient SelectorManager shuts down when custom Executor rejects a task #7196

Closed
wants to merge 14 commits into from

Conversation

dfuch
Copy link
Member

@dfuch dfuch commented Jan 24, 2022

These changes make sure that pending requests are terminated if the selector manager thread exits due to exceptions.
This includes:

  1. completing CompletableFutures that were returned to the caller code
  2. cancelling requests that are in flight
  3. calling onError on BodySubscribers that may not have been completed
    Note that step 3 is necessary as certain CompletableFutures, such as those returned by BodySubscribers.ofInputStream, complete immediately, the operation being eventually completed when the last bite of the response is read. Completing a completable future that is already completed has no effect, this case is handled by completing the BodySubscriber too.

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8277969: HttpClient SelectorManager shuts down when custom Executor rejects a task

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7196/head:pull/7196
$ git checkout pull/7196

Update a local copy of the PR:
$ git checkout pull/7196
$ git pull https://git.openjdk.java.net/jdk pull/7196/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7196

View PR using the GUI difftool:
$ git pr show -t 7196

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7196.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 24, 2022

👋 Welcome back dfuchs! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 24, 2022
@openjdk
Copy link

openjdk bot commented Jan 24, 2022

@dfuch The following label will be automatically applied to this pull request:

  • net

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the net net-dev@openjdk.org label Jan 24, 2022
@mlbridge
Copy link

mlbridge bot commented Jan 24, 2022

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 21, 2022

@dfuch This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@@ -497,7 +497,8 @@ void onReadError(Throwable ex) {
final Throwable t = (recorded == null ? ex : recorded);
if (debug.on())
debug.log("recorded " + t + "\n\t delegate: " + delegate
+ "\t\t queue.isEmpty: " + queue.isEmpty(), ex);
+ "\n\t queue.isEmpty: " + queue.isEmpty()
+ " stopRequested: " + stopRequested, ex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be preceded with a \n\t to be consistent with the rest of this log message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - will do

@@ -479,11 +366,12 @@ public void onComplete() {
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that later in this code there's another finally block where we have now added a tryRelease call which first checks the acquired flag. A couple of questions:

  1. Should this above line too first check the acquire flag?
  2. Should we reset the acquire local variable once we tryRelease here so as to prevent another tryRelease in the other finally block? A short glance at the implementation of tryRelease suggests that multiple tryRelease() with a single acquire() won't cause a problem in its state but it does end up logging some messages, which perhaps could be avoided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryRelease can be called multiple time - it will only decrement the ref counting once. It could happen when different threads notice that the operation is finished (usually due to some exceptions being propagated) and try concurrently to decrease the ref counter. It is very important to decrease the ref counter (otherwise the HttpClient may never shutdown when it's released) and equally very important not to decrease it twice for the same operation. Here we don't need to check whether it's been acquired because we know it's been acquired if we reach here. IIRC what we don't want to do is call tryRelease() if it's never been acquired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we don't need to check whether it's been acquired because we know it's been acquired if we reach here.

Hello Daniel, may be I am misreading the diff but from what I can see, the acquire() method now returns a boolean which we are assigning to the local acquire. But we aren't checking it for true for doing the subsequent operation. So when it reaches the finally block here, as far as I can see, it's not guaranteed that we did indeed acquire it.

Copy link
Member Author

@dfuch dfuch Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The acquire() method will return true the first time it's been called. And it is called only once. So we only need to check whether acquired is true at places where we are in doubt about whether the method has been called.
The code that calls acquire() above could simply have ignored the result of acquire() and set the boolean to true. That said, it would not be wrong to check whether acquired==true here too - maybe I should do it for consistency...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The acquire() method will return true the first time it's been called. And it is called only once. So we only need to check whether acquired is true at places where we are in doubt about whether the method has been called.

I see what you mean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking acquired in the lambda - which requires acquired being final, I will add a line to assert that assert acquired == true after calling acquire().

try {
subscriber.onError(t);
cf.completeExceptionally(t);
} finally {
if (acquired) refCountTracker.tryRelease();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that before this change, we never did a tryRelease in this finally block. Was that a separate bug in itself and something that is being fixed as part of this PR?

Copy link
Member Author

@dfuch dfuch Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been fighting with trying to get the ref counter back to 0 after Executor::shutdown has been called.
Throwing TaskRejectedException (especially when SequentialScheduler::runOrSchedule(Executor) is called) is very disruptive to the logic.
In practice - it does not matter as much provided that all resources have been released and all body subscribers have been completed - and that's what this PR does.

if (client.debug.on()) {
var msg = reason instanceof RejectedExecutionException
? reason.getClass() : reason;
client.debug.log("aborting pending requests due to: %s", msg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we are using a different log message for RejectedExecutionException? Would it instead be useful to let the underlying message from the RejectedExecutionException be logged too just like we are doing for other exception types here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the message was too long and not very informative.

var pendingRequests = client.pendingRequests;
while (!pendingRequests.isEmpty()) {
var pendings = pendingRequests.iterator();
while (pendings.hasNext()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a case where while we are aborting these current pending requests, some user code does a sendAsync and we end up add a new pending request to the Set? Would we then end up holding those new requests in the Set never to be cleared again because, I guess, abortPendingRequests won't be called again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's been taken care of. The first thing we do is set up a flag that we're shutting down. I remember asking myself the same question and making sure it doesn't happen - that is - everything will be aborted.

@jaikiran
Copy link
Member

jaikiran commented Mar 9, 2022

Hello Daniel, I couldn't fully finish the review, but added my initial comments/questions.

Sorry about the duplicate review comments. Looks like something to do with the GitHub review UI.

Copy link
Member Author

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review Jaikiran!

@@ -497,7 +497,8 @@ void onReadError(Throwable ex) {
final Throwable t = (recorded == null ? ex : recorded);
if (debug.on())
debug.log("recorded " + t + "\n\t delegate: " + delegate
+ "\t\t queue.isEmpty: " + queue.isEmpty(), ex);
+ "\n\t queue.isEmpty: " + queue.isEmpty()
+ " stopRequested: " + stopRequested, ex);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - will do

@@ -479,11 +366,12 @@ public void onComplete() {
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryRelease can be called multiple time - it will only decrement the ref counting once. It could happen when different threads notice that the operation is finished (usually due to some exceptions being propagated) and try concurrently to decrease the ref counter. It is very important to decrease the ref counter (otherwise the HttpClient may never shutdown when it's released) and equally very important not to decrease it twice for the same operation. Here we don't need to check whether it's been acquired because we know it's been acquired if we reach here. IIRC what we don't want to do is call tryRelease() if it's never been acquired.

try {
subscriber.onError(t);
cf.completeExceptionally(t);
} finally {
if (acquired) refCountTracker.tryRelease();
Copy link
Member Author

@dfuch dfuch Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been fighting with trying to get the ref counter back to 0 after Executor::shutdown has been called.
Throwing TaskRejectedException (especially when SequentialScheduler::runOrSchedule(Executor) is called) is very disruptive to the logic.
In practice - it does not matter as much provided that all resources have been released and all body subscribers have been completed - and that's what this PR does.

BodyParser parser = bodyParser;
if (t instanceof EOFException && parser != null &&
parser instanceof UnknownLengthBodyParser) {
((UnknownLengthBodyParser)parser).complete();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

parser.onError(error);
} else {
if (debug.on()) {
debug.log("no parser");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

delegate.execute(command);
} catch (Throwable t) {
errorHandler.accept(command, t);
ASYNC_POOL.execute(command);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Dependent actions added by the caller to the CF returned by the HttpClient are executed in the FJP
  2. This is an exceptional case - we're doing this only when we're shutting down the HttpClient. I don't think we should document it - this is an implementation detail. But we should probably document that shutting down the executor while operations are still in progress can lead to unpredictable/unspecified behavior.

Note: If we had VirtualThreads we would probably create a new VirtualThread here, rather than using the FJP.

// We maintain a list of pending requests that will be aborted if ever
// the selector manager thread exists abnormally.
// The request, its id, and its completable future, are stored in a record-like
// PendingRequest object added to the pending requests set (pendingRequests).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the normal course of action the HttpClient will not shutdown if any operation are still in progress. So holding onto the multiexchange here doesn't prevent the client from shutting down, since the multiexchange will be remove from this list when the operation completes.

if (client.debug.on()) {
var msg = reason instanceof RejectedExecutionException
? reason.getClass() : reason;
client.debug.log("aborting pending requests due to: %s", msg);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the message was too long and not very informative.

var pendingRequests = client.pendingRequests;
while (!pendingRequests.isEmpty()) {
var pendings = pendingRequests.iterator();
while (pendings.hasNext()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's been taken care of. The first thing we do is set up a flag that we're shutting down. I remember asking myself the same question and making sure it doesn't happen - that is - everything will be aborted.

@jaikiran
Copy link
Member

I went over the remaining changes in this PR and to me these changes look fine.

Copy link
Member

@Michael-Mc-Mahon Michael-Mc-Mahon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorny problem. Nice work.

@openjdk
Copy link

openjdk bot commented Apr 8, 2022

@dfuch this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout executor-shutdown-8277969
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed rfr Pull request is ready for review labels Apr 8, 2022
@openjdk
Copy link

openjdk bot commented Apr 8, 2022

@dfuch This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8277969: HttpClient SelectorManager shuts down when custom Executor rejects a task

Reviewed-by: jpai, michaelm

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 67 new commits pushed to the master branch:

  • 6c6d522: 8284758: [linux] improve print_container_info
  • 46b2e54: 8075816: Deprecate AliasLevel flag since it is broken
  • 1b71621: 8042381: Test javax/swing/JRootPane/4670486/bug4670486.java fails with Action has not been received
  • 0f81d8f: 8284933: Improve debug in jdk.crypto.cryptoki
  • 72726c4: 8284563: AArch64: bitperm feature detection for SVE2 on Linux
  • 98d54e8: 8282397: createTempFile method of java.io.File is failing when called with suffix of spaces character
  • 8d456ac: 8282300: Throws NamingException instead of InvalidNameException after JDK-8278972
  • f61d121: 8281388: Change wrapping of EncryptedPrivateKeyInfo
  • 9e051d5: 8278972: Improve URL supports
  • 395bb5b: 8278356: Improve file creation
  • ... and 57 more: https://git.openjdk.java.net/jdk/compare/ea0706de82fffcb634cedf2cb6048c33a7d15004...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review and removed merge-conflict Pull request has merge conflict with target branch labels Apr 8, 2022
@dfuch
Copy link
Member Author

dfuch commented Apr 15, 2022

While retesting after the rebase I got some intermittent test failures, mostly in the new tests (and some others that were unrelated). I have updated this PR with a small change that makes the new tests more stable. See combined changes brought by 2ea851b, 4953167, 0989903, c07341b, 559c852, and b8a0ff2

@Michael-Mc-Mahon could you review these new changes?

Copy link
Member

@Michael-Mc-Mahon Michael-Mc-Mahon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I assume the main point of the change is to "remember" the original cause of the selector manager to close and always throw that exception cause going forward. Makes sense, and there is some useful refactoring in the tests. LGTM

@dfuch
Copy link
Member Author

dfuch commented Apr 20, 2022

Thanks Michael!

@dfuch
Copy link
Member Author

dfuch commented Apr 20, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Apr 20, 2022

Going to push as commit 5291ec8.
Since your change was applied there have been 67 commits pushed to the master branch:

  • 6c6d522: 8284758: [linux] improve print_container_info
  • 46b2e54: 8075816: Deprecate AliasLevel flag since it is broken
  • 1b71621: 8042381: Test javax/swing/JRootPane/4670486/bug4670486.java fails with Action has not been received
  • 0f81d8f: 8284933: Improve debug in jdk.crypto.cryptoki
  • 72726c4: 8284563: AArch64: bitperm feature detection for SVE2 on Linux
  • 98d54e8: 8282397: createTempFile method of java.io.File is failing when called with suffix of spaces character
  • 8d456ac: 8282300: Throws NamingException instead of InvalidNameException after JDK-8278972
  • f61d121: 8281388: Change wrapping of EncryptedPrivateKeyInfo
  • 9e051d5: 8278972: Improve URL supports
  • 395bb5b: 8278356: Improve file creation
  • ... and 57 more: https://git.openjdk.java.net/jdk/compare/ea0706de82fffcb634cedf2cb6048c33a7d15004...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 20, 2022
@openjdk openjdk bot closed this Apr 20, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 20, 2022
@openjdk
Copy link

openjdk bot commented Apr 20, 2022

@dfuch Pushed as commit 5291ec8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

// abort anything pending, then close
abortPendingRequests(owner, t);
}
Set<SelectionKey> keys = new HashSet<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite understand, what for is this keys are here?
Nothing happens with this collection after addAll call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation. Probably meant to cancel the keys, but that will happen anyway when connections are closed (or the selector is closed). Maybe we should get rid of that collection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated net net-dev@openjdk.org
4 participants