8268714: [macos-aarch64] 7 java/net/httpclient/websocket tests failed #79
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. ( a pragmatic approach to the issue at hand - resolve intermittently failing tests )
@dfuch This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/integrate |
Hi,
Please find below a test-only change to fix some intermittent failures observed with the httpclient/websocket tests:
these tests intermittently and randomly fail with ENOMEM ("No buffer space available").
Some machines in our CI seem to allow a higher level of concurrency while being (maybe) configured with lower system resources (such as available buffer space for the TCP stack).
Some of the httpclient/websocket tests attempt to fill the sockets buffers in order to assert some conditions when the buffers are full and writing is paused. When the test process terminates, this leaves behind TCP sockets in the TIME_WAIT state that still hold system buffer resources in case retransmission is needed. When several such tests are run this ends up causing random "No buffer space available" errors on other tests (including these tests themselves) running concurrently or shortly after on the same machine.
This change implements a few tricks to alleviate the situation:
With these changes, I have run the HttpClient tests 200 times on the problematic machines without observing any failures (where previously there was at least a couple of failures per 50 runs). I also ran tier1 once, and tier2 twice and the results came clean.
I am therefore claiming success (even if it might prove temporary ;-) )
If these failures come back to haunt the CI again after this fix, a further remediation policy could be to put the httpclient/websocket directory in exclusive test execution mode (in TEST.root) - this seems to work too - but cleaning up garbage in the tests themselves seems preferable.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk17 pull/79/head:pull/79
$ git checkout pull/79
Update a local copy of the PR:
$ git checkout pull/79
$ git pull https://git.openjdk.java.net/jdk17 pull/79/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 79
View PR using the GUI difftool:
$ git pr show -t 79
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk17/pull/79.diff