Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6968351: httpserver clashes with delayed TCP ACKs for low Content-Length #18667

Closed
wants to merge 33 commits into from

Conversation

robaho
Copy link
Contributor

@robaho robaho commented Apr 6, 2024

fix bug JDK-B6968351 by avoiding flush after response headers


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-6968351: httpserver clashes with delayed TCP ACKs for low Content-Length (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18667/head:pull/18667
$ git checkout pull/18667

Update a local copy of the PR:
$ git checkout pull/18667
$ git pull https://git.openjdk.org/jdk.git pull/18667/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18667

View PR using the GUI difftool:
$ git pr show -t 18667

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18667.diff

Webrev

Link to Webrev Comment

@bridgekeeper bridgekeeper bot added the oca Needs verification of OCA signatory status label Apr 6, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Apr 6, 2024

Hi @robaho, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user robaho" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

@openjdk
Copy link

openjdk bot commented Apr 6, 2024

@robaho This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

6968351: httpserver clashes with delayed TCP ACKs for low Content-Length

Reviewed-by: dfuchs, djelinski, michaelm, jpai

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 418 new commits pushed to the master branch:

  • 02a799c: 8331695: Serial: DefNewGeneration:_promotion_failed used without being initialized
  • 23a72a1: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer
  • a2584a8: 8331714: Make OopMapCache installation lock-free
  • df1ff05: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store()
  • 3b8227b: 8326836: Incorrect @since tags for ClassSignature methods
  • f308e10: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4
  • a8b3f19: 8330077: Allow max number of events to be buffered to be configurable to avoid OVERFLOW_EVENT
  • ae60d84: 8328501: Incorrect @since tags for java security interfaces
  • 7a35f92: 8331660: Parallel: Cleanup includes in parallelScavangeHeap files
  • fa02667: 8322726: C2: Unloaded signature class kills argument value
  • ... and 408 more: https://git.openjdk.org/jdk/compare/862e6156960639564aed5de16de9a26320770a80...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@dfuch, @djelinski, @Michael-Mc-Mahon, @jaikiran) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Apr 6, 2024

@robaho The following label will be automatically applied to this pull request:

  • net

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the net net-dev@openjdk.org label Apr 6, 2024
@robaho robaho changed the title fix for small messages with tcp_nodelay off issue number: 6968351 fix for small messages with tcp_nodelay off Apr 7, 2024
@robaho
Copy link
Contributor Author

robaho commented Apr 7, 2024

/signed

@bridgekeeper bridgekeeper bot added the oca-verify Needs verification of OCA signatory status label Apr 7, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Apr 7, 2024

Thank you! Please allow for up to two weeks to process your OCA, although it is usually done within one to two business days. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

@robaho robaho changed the title issue number: 6968351 fix for small messages with tcp_nodelay off 6968351: fix for small messages with tcp_nodelay off Apr 12, 2024
@robaho robaho changed the title 6968351: fix for small messages with tcp_nodelay off JDK-6968351: httpserver clashes with delayed TCP ACKs for low Content-Length Apr 15, 2024
@openjdk openjdk bot changed the title JDK-6968351: httpserver clashes with delayed TCP ACKs for low Content-Length 6968351: httpserver clashes with delayed TCP ACKs for low Content-Length Apr 15, 2024
@bridgekeeper bridgekeeper bot removed oca Needs verification of OCA signatory status oca-verify Needs verification of OCA signatory status labels Apr 18, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 18, 2024
Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Robert, thanks for working on this issue.

Copy link
Member

@djelinski djelinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not as complex as I expected it to be. Which is a good thing!

Some tests are failing with this change. See:

test/jdk/sun/net/www/http/KeepAliveCache/B8293562.java
test/jdk/java/net/Authenticator/B4769350.java

These tests don't close the exchange or the output stream after sending the response, and the headers are never flushed. Users might have similar code, so I think a release note might be needed.

test/jdk/com/sun/net/httpserver/bugs/B6968351.java Outdated Show resolved Hide resolved
test/jdk/com/sun/net/httpserver/bugs/B6968351.java Outdated Show resolved Hide resolved
test/jdk/com/sun/net/httpserver/bugs/B6968351.java Outdated Show resolved Hide resolved
test/jdk/com/sun/net/httpserver/bugs/B6968351.java Outdated Show resolved Hide resolved
test/jdk/com/sun/net/httpserver/bugs/B6968351.java Outdated Show resolved Hide resolved
@robaho
Copy link
Contributor Author

robaho commented Apr 19, 2024

This is not as complex as I expected it to be. Which is a good thing!

Some tests are failing with this change. See:

test/jdk/sun/net/www/http/KeepAliveCache/B8293562.java
test/jdk/java/net/Authenticator/B4769350.java

These tests don't close the exchange or the output stream after sending the response, and the headers are never flushed. Users might have similar code, so I think a release note might be needed.

I think this is a trivial fix. We should be able to either:

  1. Flush the FixedLengthOutputStream when the bytes remaining is 0.
  2. Always flush the output stream after an exchange completes. This won't affect performance if it has already been flushed()/closed() previously.

I think 1 is the better option - because you can't really use a chunked output stream without flushing or closing the stream - as the client would hang.

@djelinski
Copy link
Member

I think this is a trivial fix. We should be able to either:

  1. Flush the FixedLengthOutputStream when the bytes remaining is 0.

This is reasonable, and should fix B8293562, but it won't fix B4769350, which creates a chunked response, and then leaks it. See the proxyReply method in that file.

  1. Always flush the output stream after an exchange completes. This won't affect performance if it has already been flushed()/closed() previously.

The exchange does not automatically complete when you exit the handle method; you need to explicitly complete it, either by calling close or by closing the output stream. Either of these actions will flush the output stream.

This behavior enables users to asynchronously handle the exchange outside of the handle method. I don't know if anyone actually uses that functionality, but it's been here forever, and I don't think we want to change it now.

@Michael-Mc-Mahon
Copy link
Member

This is not as complex as I expected it to be. Which is a good thing!

Some tests are failing with this change. See:

test/jdk/sun/net/www/http/KeepAliveCache/B8293562.java
test/jdk/java/net/Authenticator/B4769350.java

These tests don't close the exchange or the output stream after sending the response, and the headers are never flushed. Users might have similar code, so I think a release note might be needed.

This is a concern. Release note needed at least or I wonder if the new behavior should be enabled through a configuration switch. Probably not worth going to that extreme, but we need to consider the compatibility impact carefully.

@robaho
Copy link
Contributor Author

robaho commented Apr 19, 2024

This is not as complex as I expected it to be. Which is a good thing!
Some tests are failing with this change. See:

test/jdk/sun/net/www/http/KeepAliveCache/B8293562.java
test/jdk/java/net/Authenticator/B4769350.java

These tests don't close the exchange or the output stream after sending the response, and the headers are never flushed. Users might have similar code, so I think a release note might be needed.

This is a concern. Release note needed at least or I wonder if the new behavior should be enabled through a configuration switch. Probably not worth going to that extreme, but we need to consider the compatibility impact carefully.

I think these tests that are failing are incorrect, as in this case, the connection is corrupted - if the client attempts to send another request on the connection it will hang - since the handler never closed the output stream it won't be ready for another request.

@robaho
Copy link
Contributor Author

robaho commented Apr 19, 2024

B4769350

I don't understand this. If the handler "leaks it", isn't that a bug? If sending a chunked response I think you have to close the output stream or end the exchange otherwise it is a bug in the handler code.

Also, in reviewing the code, I don't think the old code could have worked either. If you don't close() the stream or the exchange() then the write finished event is never fired so the internals will be messed up - and the connection will hang.

@djelinski
Copy link
Member

Yes it is a bug. But even with the bug the code used to work, and now it doesn't. Users might have similar buggy code that will stop working when they get this fix. We need to post a release note so that they know what to expect.

@robaho
Copy link
Contributor Author

robaho commented Apr 19, 2024

Yes it is a bug. But even with the bug the code used to work, and now it doesn't. Users might have similar buggy code that will stop working when they get this fix. We need to post a release note so that they know what to expect.

Are you sure it worked? I think the connection would be "dead" - the client would see the last response - but if it tried to send another request on the connection it would hang - because WriteFinished event would never have been emitted - and so the exchange/connection would never be read for another request.

robaho and others added 3 commits April 25, 2024 10:59
…putStream.java

Co-authored-by: Daniel Fuchs <67001856+dfuch@users.noreply.github.com>
@robaho
Copy link
Contributor Author

robaho commented Apr 25, 2024

I created a PR for the related proposed api changes #18955

@robaho
Copy link
Contributor Author

robaho commented Apr 25, 2024

I created a PR for the related proposed api changes #18955

I don't think I have the ability to create issues, so I can't create one for the api changes.

@openjdk
Copy link

openjdk bot commented Apr 25, 2024

⚠️ @robaho This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@@ -25,7 +25,7 @@
* @test
* @summary Test to stress directory listings
* @library /test/lib
* @run testng/othervm/timeout=180 -Dsun.net.httpserver.nodelay=true StressDirListings
* @run testng/othervm/timeout=180 StressDirListings
Copy link
Member

@jaikiran jaikiran Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Varada @varada1110, in #17363 this StressDirListings test was updated to set the -Dsun.net.httpserver.nodelay=true because without it, the test was failing on AIX. The current PR here is proposing to fix the underlying issue in the HttpServer implementation and is rolling back the addition of this system property to this StressDirListings test. Could you or someone from the AIX team please build the latest mainline JDK and apply this current PR on top of it and rerun the test(s) to verify that this change is OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jaikiran, I have performed jtreg test on AIX and all the tests are passing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Varada for running the tests.

@Michael-Mc-Mahon
Copy link
Member

I created a PR for the related proposed api changes #18955

I don't think I have the ability to create issues, so I can't create one for the api changes.

I created a new Jira issue for this. It's at: https://bugs.openjdk.org/browse/JDK-8331195
Can you rename the PR using this issue?

@robaho
Copy link
Contributor Author

robaho commented Apr 26, 2024

I created a PR for the related proposed api changes #18955

I don't think I have the ability to create issues, so I can't create one for the api changes.

I created a new Jira issue for this. It's at: https://bugs.openjdk.org/browse/JDK-8331195 Can you rename the PR using this issue?

done.

@dfuch
Copy link
Member

dfuch commented May 2, 2024

FYI: I'm running tests with the proposed changes in our CI to check that we do not get new intermittent failures. If tests come back green, and if @Michael-Mc-Mahon is happy with the changes, then one of us will sponsor.

Copy link
Member

@Michael-Mc-Mahon Michael-Mc-Mahon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. Thanks for the contribution.

Copy link
Member

@djelinski djelinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. In order to proceed, use the /integrate command; see the guide for details.

Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests were green. Please integrate and one of us will sponsor.

@robaho
Copy link
Contributor Author

robaho commented May 7, 2024

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label May 7, 2024
@openjdk
Copy link

openjdk bot commented May 7, 2024

@robaho
Your change (at version f385444) is now ready to be sponsored by a Committer.

@dfuch
Copy link
Member

dfuch commented May 7, 2024

/sponsor

@openjdk
Copy link

openjdk bot commented May 7, 2024

Going to push as commit 02c95a6.
Since your change was applied there have been 418 commits pushed to the master branch:

  • 02a799c: 8331695: Serial: DefNewGeneration:_promotion_failed used without being initialized
  • 23a72a1: 8331626: unsafe.cpp:162:38: runtime error in index_oop_from_field_offset_long - applying non-zero offset 4563897424 to null pointer
  • a2584a8: 8331714: Make OopMapCache installation lock-free
  • df1ff05: 8331085: Crash in MergePrimitiveArrayStores::is_compatible_store()
  • 3b8227b: 8326836: Incorrect @since tags for ClassSignature methods
  • f308e10: 8331400: AArch64: Sync aarch64_vector.ad with aarch64_vector_ad.m4
  • a8b3f19: 8330077: Allow max number of events to be buffered to be configurable to avoid OVERFLOW_EVENT
  • ae60d84: 8328501: Incorrect @since tags for java security interfaces
  • 7a35f92: 8331660: Parallel: Cleanup includes in parallelScavangeHeap files
  • fa02667: 8322726: C2: Unloaded signature class kills argument value
  • ... and 408 more: https://git.openjdk.org/jdk/compare/862e6156960639564aed5de16de9a26320770a80...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 7, 2024
@openjdk openjdk bot closed this May 7, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels May 7, 2024
@openjdk
Copy link

openjdk bot commented May 7, 2024

@dfuch @robaho Pushed as commit 02c95a6.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated net net-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

6 participants