Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8310807: java/nio/channels/DatagramChannel/Connect.java timed out #16661

Closed
wants to merge 7 commits into from

Conversation

dfuch
Copy link
Member

@dfuch dfuch commented Nov 14, 2023

Please find here a fix for a timeout issue observed in DatagramChannel/Connect.java

The suspicion is that the responder may have received some message from some other source and replied to that, leaving the initiator waiting forever.

The proposed fix makes sure the messages exchanged are unique and identifiable and ensure that both the initiator and responder will discard any message that do not have the expected content.

Additional logging should help with diagnosis if the test fails again.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8310807: java/nio/channels/DatagramChannel/Connect.java timed out (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16661/head:pull/16661
$ git checkout pull/16661

Update a local copy of the PR:
$ git checkout pull/16661
$ git pull https://git.openjdk.org/jdk.git pull/16661/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16661

View PR using the GUI difftool:
$ git pr show -t 16661

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16661.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 14, 2023

👋 Welcome back dfuchs! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8310807 8310807: java/nio/channels/DatagramChannel/Connect.java timed out Nov 14, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 14, 2023
@openjdk
Copy link

openjdk bot commented Nov 14, 2023

@dfuch The following label will be automatically applied to this pull request:

  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the nio nio-dev@openjdk.org label Nov 14, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 14, 2023

Webrevs

Copy link

@msheppar msheppar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good enhancements ... because of the inherent unreliability of UDP, even for local systems, there is the possibility that the initial send may not reach the responder and vice verse ... but that's for another story.
LGTM

@openjdk
Copy link

openjdk bot commented Nov 15, 2023

@dfuch This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8310807: java/nio/channels/DatagramChannel/Connect.java timed out

Reviewed-by: msheppar, jpai

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 47 new commits pushed to the master branch:

  • 8ec6b8d: 8319876: Reduce memory consumption of VM_ThreadDump::doit
  • bbf52e0: 8319897: Move StackWatermark handling out of LockStack::contains
  • 129c470: 8311932: Suboptimal compiled code of nested loop over memory segment
  • 369bbec: 8319896: Remove monitor deflation from final audit
  • 1588dd9: 8319567: Update java/lang/invoke tests to support vm flags
  • 9727f4b: 8320199: Fix HTML 5 errors in java.math.BigInteger
  • d6aa7c8: 8314621: ClassNotFoundException due to lambda reference to elided anonymous inner class
  • 52e2878: 8319987: compilation of sealed classes leads to infinite recursion
  • b05e69f: 8320209: VectorMaskGen clobbers rflags on x86_64
  • f3ed275: 8319103: Popups that request focus are not shown on Linux with Wayland
  • ... and 37 more: https://git.openjdk.org/jdk/compare/58af9aeeb07b7a392a8fbf04ef5cb2607b7b2462...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 15, 2023
dc.write(bb);

// Try to send to some other address
try {
int port = dc.socket().getLocalPort();
InetAddress loopback = InetAddress.getLoopbackAddress();
InetSocketAddress otherAddress = new InetSocketAddress(loopback, (port == 3333 ? 3332 : 3333));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Daniel,
It was a bit surprising to see a specific port value being used here (before the changes in the PR). So I looked at the history of this test to see if it was intentional and was there for some specific reason. It appears that this was an unintentional leftover from one of the previous refactoring of this test. Using an ephemeral port for this otherAddress, like you now do in this PR, looks good to me then.

log.println("Initiator received from Responder at " + connectSocketAddress + ": " + cb);
if (!RESPONSE.equals(cb.toString())) {
log.println("Initiator received unexpected message: continue waiting");
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatagramChannel.read(ByteBuffer), through the contract of ReadableByteChannel.read(ByteBuffer), says that the number of bytes added into the passed ByteBuffer may not always be the remaining space in the buffer. i.e. dc.read(bb) may just read in few bytes at a time.

Do you think that's a practical possibility in the context of this test and would that then mean, in this current proposed form, the RESPONSE.equals(cb.toString()) will never evaluate to true and we would end up looping forever?

Perhaps we should use the return value of dc.read(bb) to determine when to do this response equality check and until then keep accumulating the received bytes?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UDP send and receive is atomic, there are no partial send or receive of a datagram (unlike a stream). Thus the initiator send its message as a complete datagram, then when the responder invokes receive and if the send has been successful, the receive will be the complete datagram sent by the initiator -- no partial datagram. That is unless the datagram is a stray datagram from another source, in which the responder continues to wait for the expected datagram. The reverse is true for the responder, it will receive a complete (atomic) message and echo that back to the initiator

Copy link
Member Author

@dfuch dfuch Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - for DatagramChannel the whole UDP payload is copied in the ByteBuffer. It may get truncated if the ByteBuffer doesn't have enough remaining bytes, but I made sure that the ByteBuffers used in the test are big enough to contain the UDP payload of the datagrams sent by the test. Bytes that don't fit in the ByteBuffer are discarded. Two calls to read would get you two payloads from two different datagrams.

public class Connect {

static PrintStream log = System.err;
static final PrintStream log = System.err;
static final String NOW = Instant.now().toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to rename "log" to "err" so it's a bit clearer at the use-sites where the output is going?

Also I think we should rename "NOW" as it's the time that the class is initialized so very confusing at every usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dfuch
Copy link
Member Author

dfuch commented Nov 16, 2023

Pushed review feedback. Also refactored Responder to put try { } finally { } around the while loop, instead of inside.

Copy link
Member

@jaikiran jaikiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Mark and Daniel for those inputs. The changes look good to me.

@dfuch
Copy link
Member Author

dfuch commented Nov 21, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Nov 21, 2023

Going to push as commit 570dffb.
Since your change was applied there have been 89 commits pushed to the master branch:

  • 21a59b9: 8282726: java/net/vthread/BlockingSocketOps.java timeout/hang intermittently on Windows
  • 9232070: 8318480: Obsolete UseCounterDecay and remove CounterDecayMinIntervalLength
  • e055fae: 8264425: Update building.md on non-English locales on Windows
  • c4aee66: 8320222: Wrong bytecode accepted, and StackMap table generated
  • 604d29a: 8304446: javap --system flag doesn't override system APIs
  • 839dd65: 8319244: implement JVMTI handshakes support for virtual threads
  • 46e4028: 8316592: RISC-V: implement poly1305 intrinsic
  • 3544d2d: 8319784: VM crash during heap dump after JDK-8287061
  • 303757b: 8319879: Stress mode to randomize incremental inlining decision
  • 099a8f5: 8308144: Uncontrolled memory consumption in SSLFlowDelegate.Reader
  • ... and 79 more: https://git.openjdk.org/jdk/compare/58af9aeeb07b7a392a8fbf04ef5cb2607b7b2462...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 21, 2023
@openjdk openjdk bot closed this Nov 21, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 21, 2023
@openjdk
Copy link

openjdk bot commented Nov 21, 2023

@dfuch Pushed as commit 570dffb.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated nio nio-dev@openjdk.org
4 participants