Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8264200: java/nio/channels/DatagramChannel/SRTest.java fails intermittently #3354

Closed
wants to merge 8 commits into from

Conversation

c-cleary
Copy link
Contributor

@c-cleary c-cleary commented Apr 6, 2021

Description

SRTest.java has been seen to fail intermittently in a similar manner to other recent failures in DatagramChannel tests. With SRTest.java, these failures are associated with a number of issues listed below. Note that XReader refers to both ClassicReader and NioReader classes. Likewise for XWriter.

Issues

  • Test is run in AgentVM mode.
  • Receivers bind to wildcard/localhost which has been a source of instability.
  • Sockets are not closed properly if an exception is thrown, even more of an issue due to test running in AgentVM mode.
  • An XReader instance will wait forever to receive a DatagramPacket if an XSender has not sent one.

Fixes

To Address these issues, the following was done respective to the order of the Issues above:

  • Test now runs with testng/othervm and uses its own thread pool. Note that the change to testng was not absolutely essential but means that if one test case fails the others will still be run which could provide useful information on why a failure occured.
  • Receivers now bind to loopback address instead of localhost.
  • XReader and XWriter classes implement Autocloseable and are declared within a try-with-resources block for each test case. As well as this, the test cases are now executed with CompleteableFuture.allOf(futures) such that if any Reader or Writer throws an exception during execution, the threads will exit in a safe manner and the exception concerned will be thrown (wrapped in a CompletionException).
  • Test will now timeout with respect to timeout limit of Jtreg which is set to 20 seconds for this test. Note that this is given in the @run as testng/othervm/timeout=20 tag in seconds. See Action Options in Jtreg docs here.

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8264200: java/nio/channels/DatagramChannel/SRTest.java fails intermittently

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3354/head:pull/3354
$ git checkout pull/3354

Update a local copy of the PR:
$ git checkout pull/3354
$ git pull https://git.openjdk.java.net/jdk pull/3354/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3354

View PR using the GUI difftool:
$ git pr show -t 3354

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3354.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 6, 2021

👋 Welcome back ccleary! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 6, 2021

@c-cleary The following label will be automatically applied to this pull request:

  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the nio nio-dev@openjdk.org label Apr 6, 2021
@c-cleary c-cleary marked this pull request as ready for review April 6, 2021 11:06
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 6, 2021
@mlbridge
Copy link

mlbridge bot commented Apr 6, 2021

Webrevs

Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good cleanup overall.

newDecoder().decode(bb);
log.println("From: "+sa+ " said " +cb);
dc.close();
CharBuffer cb = StandardCharsets.US_ASCII.newDecoder().decode(bb);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a charset is used, then it should be used consistently - in both classic and nio readers and writers.
That is, whenever String::getBytes is called or new String(byte[]) is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I think I will opt for using US_ASCII across all of the classic and nio readers/writers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an 8-bit charset might be a better choice, US_ASCII is only 7bit and might produce some unmappable characters.
ISO_8859_1 would be a better choice.

Copy link
Contributor Author

@c-cleary c-cleary Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Is there any reason then why UTF-8 would be used over ISO_8859_1 then, given what you said above about character mapping? Or vice-versa

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message sent by the writer is "hello" - it only contains ASCII characters, so any of US-ASCII, UTF-8, or ISO-8859-1, which are all compatible with ASCII (meaning: all these encode ASCII chars in the same way) would work.
In fact any charset in which "hello" is mappable would work. We simply need to make sure to use the same charset when encoding or decoding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the test is for Datagram send and receive all the data is in byte arrays or equivalent.
The entire test could be written using only byte arrays (no Strings, Charsets or conversions) and omit that possibility of confusion about what the test is doing.
(And there's no need to change anything).

Copy link
Contributor Author

@c-cleary c-cleary Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright then so I will continue to to just use US_ASCII in this test as "hello" is mappable with it. Encoding and Decoding the string will be done with US_ASCII. Appreciate the feedback guys, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Conor. Yes - we could work with byte arrays but strings are easier to print & log & more human friendly...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but please use getBytes(StandardCharsets.US_ASCII) to get the bytes avoiding ambiguity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most recent commit performs a static import of the charset with import static java.nio.charset.StandardCharsets.US_ASCII;. I feel that it is clear enough and keeps references to the Charset concise.

@openjdk
Copy link

openjdk bot commented Apr 6, 2021

@c-cleary This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8264200: java/nio/channels/DatagramChannel/SRTest.java fails intermittently

Reviewed-by: dfuchs, rriggs, msheppar

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 75 new commits pushed to the master branch:

  • 125184e: 8265012: Shenandoah: Backout JDK-8264718
  • be0d46c: 8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio
  • f71be8b: 8264954: unified handling for VectorMask object re-materialization during de-optimization
  • 3c9858d: 8264827: Large mapped buffer/segment crash the VM when calling isLoaded
  • e604320: 8264783: G1 BOT verification should not verify beyond allocation threshold
  • cb2806d: 8265018: [AIX] FileDispatcherImpl.c:31:10: fatal error: 'sys/mount.h' file not found
  • ecef1fc: 8264972: Unused TypeFunc declared in OptoRuntime
  • 440c34a: 8264644: Add PrintClassLoaderDataGraphAtExit to print the detailed CLD graph
  • b1ebf82: 8264358: Don't create invalid oop in method handle tracing
  • 627ad9f: 8262328: Templatize JVMFlag boilerplate access methods
  • ... and 65 more: https://git.openjdk.java.net/jdk/compare/81325483d8ef95a597349e18dc8413eeb8a45008...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@dfuch, @RogerRiggs, @msheppar) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 6, 2021
@@ -192,10 +192,10 @@ public void run() {
byte[] buf = new byte[256];
DatagramPacket dp = new DatagramPacket(buf, buf.length);
ds.receive(dp);
String received = new String(dp.getData());
String received = new String(dp.getData(), US_ASCII);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not quite correct, you need to take into account the offset and length of the received data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... good spot, this results in a needlessly long string.

Copy link
Contributor Author

@c-cleary c-cleary Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to String received = new String(dp.getData(), 0, DATA_STRING.length(), US_ASCII); seems to correct the issue, using a constant DATA_STRING

@msheppar
Copy link

msheppar commented Apr 6, 2021

I note in the senders (ClassicWriter and NioWriter) there are two send invocation while in the receivers there is only one receive invocation. That raises a question as to why there are two send invocations, and which of these a receiver processes - no way of telling if the first has been lost?

now a bit of conjecture:
In theory based on the structure of the test the receiver or reader thread could finish before the writer have executed the second send. So if the receiver has finished, executed close and released their port, making it available for re-allocation in another concurrently executing test, then the second send could be a stray send to another now unrelated UDP end point.

@c-cleary
Copy link
Contributor Author

c-cleary commented Apr 6, 2021

I note in the senders (ClassicWriter and NioWriter) there are two send invocation while in the receivers there is only one receive invocation. That raises a question as to why there are two send invocations, and which of these a receiver processes - no way of telling if the first has been lost?

now a bit of conjecture:
In theory based on the structure of the test the receiver or reader thread could finish before the writer have executed the second send. So if the receiver has finished, executed close and released their port, making it available for re-allocation in another concurrently executing test, then the second send could be a stray send to another now unrelated UDP end point.

@msheppar it seems to me that the duplicate send is a feature from the old test which may no longer be needed. In the old test, the invoke() method contained the following code:

static void invoke(Sprintable reader, Sprintable writer) throws Exception {
        Thread readerThread = new Thread(reader);
        readerThread.start();
        Thread.sleep(50);

        Thread writerThread = new Thread(writer);
        writerThread.start();
        ...

This thread.sleep(50) I'm guessing is to ensure the readerThread has fully started and is waiting to receive before starting the writer thread. Following on from this, both writer classes (ClassicWriter & NioWriter) contain something like...

       dc.send(bb, isa);
       Thread.sleep(50);
       dc.send(bb, isa);

I would subsequently guess that this serves to be extra sure that a Datagram reaches the reader. Assuming the first packet doesnt make it (probably unlikely), waiting for 50 milliseconds ensures that the reader is at the very least waiting to receive.

@msheppar
Copy link

msheppar commented Apr 6, 2021

Yes, the original test had this conundrum.
But for the refactored test there still exists the possibility that the second send could be received by another test process, because the test's receiver has completed and released its socket resources before the second send has been invoked. In your refactored test that possibility is diminished as you start the writer thread prior to the reader thread. While in the original test it was reader thread first and writer next. But as with all multithreaded scenarios there is a strong element of non determinism so the possibility still remains.

As such there is no synchronicity between the sender and the receiver in the test, other than the receiver may block indefinitely if a datagram is not received, which is now diminished by using the loopback as the receiver's bind address and as the destination address. But the rationale for invoking two sends and one receive is obscure and still remains a potential, if somewhat rare, problem.

So I'd proffer some symmetry between the sender and receiver with one send and one receive, or that the receiver should remain extant until the sender has terminated, as such it would wait on "signal" from the sender that it has finished.

Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@msheppar
Copy link

msheppar commented Apr 7, 2021

another observation on the test structure is that each sender is bound to wildcard address, ClassicWriter explicitly through constructor and the NioWriter implicitly when it invokes send - the DatagramChannel is unbound initially, no explicit bind is invoked, and when send is invoked then the underlying socket is bound to wildcard during that call flow.
There are some subtleties when sending from a socket bound to a wildcard address bound - in this case this is minimal as the recipient address is the loopback address.
One possible consideration is to explicitly bind the sender to the loopback address also?

Both ClassicWriter and NioWriter constructors take a port, which is the destination port of the recipient rather than a port to which they'll bind -- maybe a rename of the parameter to dstPort ?

@dfuch
Copy link
Member

dfuch commented Apr 7, 2021

@msheppar I would rather not change the way that the senders are bound, unless it is verified that this is causing problems and we are able to figure out why.

@c-cleary
Copy link
Contributor Author

c-cleary commented Apr 8, 2021

Both ClassicWriter and NioWriter constructors take a port, which is the destination port of the recipient rather than a port to which they'll bind -- maybe a rename of the parameter to dstPort ?

Most recent commit makes this change as suggested by @msheppar

Copy link

@msheppar msheppar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@c-cleary
Copy link
Contributor Author

c-cleary commented Apr 8, 2021

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Apr 8, 2021
@openjdk
Copy link

openjdk bot commented Apr 8, 2021

@c-cleary
Your change (at version 5942445) is now ready to be sponsored by a Committer.

@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Apr 12, 2021
@c-cleary
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Apr 12, 2021
@openjdk
Copy link

openjdk bot commented Apr 12, 2021

@c-cleary
Your change (at version 33f95b3) is now ready to be sponsored by a Committer.

@AlekseiEfimov
Copy link
Member

/sponsor

@openjdk openjdk bot closed this Apr 13, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 13, 2021
@openjdk
Copy link

openjdk bot commented Apr 13, 2021

@AlekseiEfimov @c-cleary Since your change was applied there have been 91 commits pushed to the master branch:

  • a4f644e: 8265064: Move clearing and setting of members into helpers in ReservedSpace
  • 7006070: 8265061: Simplify MethodHandleNatives::canBeCalledVirtual
  • 1935655: 8264957: Cleanup unused array Type::dual_type
  • 954b9a1: 8264795: IGV: Upgrade NetBeans platform
  • f2f7aa3: 8262291: Refactor reserve_memory_special_huge_tlbfs
  • 008fc75: 8264224: Add macosx-aarch64 to Oracle build configurations
  • f4e6395: 8264190: Harden TLS interop tests
  • 18bec9c: 8265084: [BACKOUT] 8264954: unified handling for VectorMask object re-materialization during de-optimization
  • 9dd9625: 8263763: Synthetic constructor parameters of enum are not considered for annotation indices
  • 1ee80e0: 8261355: No data buffering in SunPKCS11 Cipher encryption when the underlying mechanism has no padding
  • ... and 81 more: https://git.openjdk.java.net/jdk/compare/81325483d8ef95a597349e18dc8413eeb8a45008...master

Your commit was automatically rebased without conflicts.

Pushed as commit 784f1c1.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated nio nio-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

6 participants