Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8293696: java/nio/channels/DatagramChannel/SelectWhenRefused.java fails with "Unexpected wakeup" #10851

Closed
wants to merge 13 commits into from

Conversation

DarraghClarke
Copy link
Contributor

@DarraghClarke DarraghClarke commented Oct 25, 2022

Added logging to SelectWhenRefused for an intermittent failure caused by unexpected wakeups, this includes trying to receive data if there is any available to identify where it is coming from.

I also set the test to run in othervm mode which should reduce the chances of this failure happening in the first place.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8293696: java/nio/channels/DatagramChannel/SelectWhenRefused.java fails with "Unexpected wakeup"

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10851/head:pull/10851
$ git checkout pull/10851

Update a local copy of the PR:
$ git checkout pull/10851
$ git pull https://git.openjdk.org/jdk pull/10851/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10851

View PR using the GUI difftool:
$ git pr show -t 10851

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10851.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 25, 2022

👋 Welcome back DarraghClarke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 25, 2022
@openjdk
Copy link

openjdk bot commented Oct 25, 2022

@DarraghClarke The following label will be automatically applied to this pull request:

  • nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the nio nio-dev@openjdk.org label Oct 25, 2022
@mlbridge
Copy link

mlbridge bot commented Oct 25, 2022

@@ -22,15 +22,17 @@
*/

/* @test
* @bug 6935563 7044870
* @bug 6935563 7044870 8293696

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can omit the bug number in this instance as it is not at this point in time a product bug -- as you are adding diagnostics to the test, maybe a separate subtask or sub-bug could have been used

for (SelectionKey key : selectedKeys) {
if (!key.isValid() || !key.isReadable()) {
System.out.println("Invalid or unreadable key: " + key);
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop over the selection keys and checking if valid or readable doesn't make sense here. Maybe that can be just removed. If you really want to print out the selection key then save it at L52 so that you have the SelectionKey to print.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have no idea of what is going on it's fine to loop over the set of selected keys - even if it should contain at most one entry. But if it contains a key that is either not valid or not readable we should continue after printing it, not return.

try {
System.out.println("Attempting to read datagram from key: " + key);
ByteBuffer buf = ByteBuffer.allocate(100);
SocketAddress sa = dc.receive(buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datagram channel is configured non-blocking and there may not be a datagram to receive, therefore sa may be null and has to be checked here.

Copy link
Member

@dfuch dfuch Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlanBateman If we reach here the key is both valid and readable so shouldn't there be something to receive, even if the channel is non-blocking?

ByteBuffer buf = ByteBuffer.allocate(100);
n = dc.read(buf);
String message = new String(buf.array());
System.out.format("received %s at %s%n", message, dc.getLocalAddress());
Copy link

@msheppar msheppar Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are interested in the sender detail here ... where did the stray datagram come from, which would necessitate invoking receive on the datagram channel ... but as of yet there has been no failures observed here afaik

Also prior to each DC send it could be worth invoking selectNow on the selector to disperse any stray pending events on the low level OS "select"

@msheppar
Copy link

i see the logUnexpectedWakeup now ... missed that in my first pass :-(

Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One improvement that could be done is put the whole code from line 55 to line 94 into a loop and respin if we get a datagram that doesn't correspond to what was sent to the refuser. It could be that we are receiving a datagram from something/somewhere else in which case we could just ignore it, clear all keys, and go back to line 55. Otherwise throw if we detected an error or break the loop if everything occurred as expected.

for (SelectionKey key : selectedKeys) {
if (!key.isValid() || !key.isReadable()) {
System.out.println("Invalid or unreadable key: " + key);
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have no idea of what is going on it's fine to loop over the set of selected keys - even if it should contain at most one entry. But if it contains a key that is either not valid or not readable we should continue after printing it, not return.

try {
System.out.println("Attempting to read datagram from key: " + key);
ByteBuffer buf = ByteBuffer.allocate(100);
SocketAddress sa = dc.receive(buf);
Copy link
Member

@dfuch dfuch Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlanBateman If we reach here the key is both valid and readable so shouldn't there be something to receive, even if the channel is non-blocking?

@AlanBateman
Copy link
Contributor

@AlanBateman If we reach here the key is both valid and readable so shouldn't there be something to receive, even if the channel is non-blocking?

It's okay to print the selection key (singular). The useful part of the change is to do the non-blocking receive and print the sender's socket address when not null. If it trips then it may help identify where the datagram is coming from. That is, I assume the theory here is that some other test is using the "refuser" port (the port that dc1.closer released). The alternative is to just retry when there is interference, which I think you were suggesting in one of the comments.

@DarraghClarke
Copy link
Contributor Author

One improvement that could be done is put the whole code from line 55 to line 94 into a loop and respin if we get a datagram that doesn't correspond to what was sent to the refuser. It could be that we are receiving a datagram from something/somewhere else in which case we could just ignore it, clear all keys, and go back to line 55. Otherwise throw if we detected an error or break the loop if everything occurred as expected.

I think that this would be good, but because of the nature of the refuser, would we be reliably able to read datagrams sent, since they should be getting refused and throwing an exception before they can be read?

If they can be read, I could change the logUnexpectedWakeup method to return a boolean for whether its local or not since it attempts to read it already

@dfuch
Copy link
Member

dfuch commented Oct 27, 2022

I think that this would be good, but because of the nature of the refuser, would we be reliably able to read datagrams sent, since they should be getting refused and throwing an exception before they can be read?

If they can be read, I could change the logUnexpectedWakeup method to return a boolean for whether its local or not since it attempts to read it already

Yes - that's a good idea - and maybe change the method name to "checkUnexpectedWakeup" if you change it to return a boolean.

@msheppar
Copy link

I think before making any majory changes to the test's execution in terms of respin etc, the focus should be to add diagnostics to the test, to determine if there is actualy any data available and the origin of that unexpected data. Additionally, to check the verascity of the returned values from the selector, to esnure that the select has behaved in a consistent and reliable manner and that it has not responded to indeterminate intermittent input (or other) event.

The test is open to ta race condition in that the "refuser" port could have been resused by another socket in another concurrent test, after the DC1 has closed -- this condition is tested in the test through a bind to refuser port, which is the failed scenarios has not rasied a BindException. But again timing is of the essence.

I would suggest the initial focus for the changes should be for diagnostics of the failed scenario, and determing whether the selector returns actually reflect some input read event and not another spurious "low level event trigger".

In the test the SelectionKey from the DC::register is not recorded, and not compared with the SelectionKey set returned by the selector.
It would be informative if the number of keys returned by the select could be output.
Currently in the test or in the updated logUnexpectedWakeup there is no check that the SelectioKey associated with the registered DC is actually returned. It is assumed that this is the case only a single channel has been registered, which is reasonable, but if consistency is to be deteremined then such a check may be worthwhile. Additionally, the registration is for READ events. Again, it may be worth checking that the select has returned SelectioKey for OP_READ and not something else.

@dfuch
Copy link
Member

dfuch commented Oct 27, 2022

It would be informative if the number of keys returned by the select could be output.

I believe that Alan already implied that this number could only take two values: 0 or 1, because the selector has only one key.

@msheppar
Copy link

yes that's fine, but in this instance, something unexpected has ocurred then why not validate that those assumptions are true

sel.selectedKeys().clear();
try {
ByteBuffer buf = ByteBuffer.allocate(100);
SocketAddress sa = dc.receive(buf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the test to use receive instead of read. I am not sure we want to do that. It's true that if we use read we won't be able to tell where the message comes from - but we're connected, so in theory it could only come from the refuser address, which would indicate that some other process has reused the port. So it should be enough to call read() and print the content of the received buffer. @AlanBateman can we use receive() here instead of read() without changing the nature of the test?

@dfuch
Copy link
Member

dfuch commented Oct 27, 2022

I would suggest the initial focus for the changes should be for diagnostics of the failed scenario, and determing whether the selector returns actually reflect some input read event and not another spurious "low level event trigger".

Given the intermittent nature of the failure I'd rather do both at the same time. If we do receive a datagram, when we expect none, and we do verify that the datagram doesn't come from this test (it doesn't contain "Greetings!") then I'd argue that it must come from somewhere else (whatever that somewhere is) and it's safe to respin and try again.

There might be some subtle issues to fix in the retry logic though. I believe we should respin when getting unexpected wake ups only if we actually manage to read a datagram - for instance, and that we can assert that this datagram is not ours (and it should never be, since we sent it to a refuser that has a different port than our receiver).
We should fail in all other cases.

@dfuch
Copy link
Member

dfuch commented Oct 27, 2022

It's also interesting in that if other DC/DS tests are running in parallel on the same machine they could be reusing the refuser port, and they could receive the datagram we sent to the refuser, and that may make these other tests fail too!

DatagramChannel.open().bind(refuser).close();
throw new RuntimeException("Unexpected wakeup");
}
for (;;) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treat each test scenario individually wrt retry and set an upper bound on the number of potential retries ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an upper bound - e.g. 3 and failing if we reach the upper bound would be prudent yes :-)

if (n > 0) {
ignoreStrayWakeup = checkUnexpectedWakeup(dc, sel.selectedKeys());
sel.selectedKeys().clear();
if (ignoreStrayWakeup) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scenario 1 and 3 treat the received message differently to that of scenario 2 -- should they be handled the same ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - scenario 2 is the one that expect the PortUnreachableException - so the wakeup is expected there and the read is expected to throw PortUnreadableException. The other two scenario are expected to timeout of the select with n == 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the actual testing of the received message -- scenario performs the read, and expects an Exception but if it gets a message, which like the other scenarios it shouldn't, then the test of that message should be the same across all 3 scenarios ?
scenario 1 and 3 test if they receive what has been sent then retry, while scenario 2 will throw a RuntimeException if it receives what it has sent. This seems inconsistent.

Maybe change the send message to be completely unique to this test, there maybe other DatagramSocket or Channel tests that have message containing "Greetings"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I believe scenario 2 is different - but you are right - it would be more correct to replace lines 90-91 with this:

                    // BindException will be thrown if another service is using
                    // our expected refuser port, cannot run just exit.
                    DatagramChannel.open().bind(refuser).close();
                    throw new RuntimeException("PortUnreachableException not raised");

In other words: we are connected - so we could only receive a message from something bound to the refuser's port. So check whether that port is still unbound, and if so, complain that PortUnreachableException should have been received.
Note that there still might be a race condition here because something could have transiently bound to the refuser and then released it. But the window of opportunity for the race should be smaller.
Whether we should throw new RuntimeException("PortUnreachableException not raised"); or simply continue and try again could be discussed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change the send message to be completely unique to this test, there maybe other DatagramSocket or Channel tests that have message containing "Greetings"

Good point. Although it's the only one I could find that uses Datagrams. We could add a time stamp - or a UUID for extra security.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even something as simple as changing the message to Greetings from SelectWhenRefused! makes it unique, though I could do a UUID too?

Copy link
Member

@dfuch dfuch Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point too :-) Greetings from SelectWhenRefused! sounds fine to me. Also it will help identify the culprit if another test receives it! That's a good idea.

@msheppar
Copy link

Given the intermittent nature of the failure I'd rather do both at the same time. If we do receive a datagram, when we expect none, and we do verify that the datagram doesn't come from this test (it doesn't contain "Greetings!") then I'd argue that it must come from somewhere else (whatever that somewhere is) and it's safe to respin and try again.

There might be some subtle issues to fix in the retry logic though. I believe we should respin when getting unexpected wake ups only if we actually manage to read a datagram - for instance, and that we can assert that this datagram is not ours (and it should never be, since we sent it to a refuser that has a different port than our receiver). We should fail in all other cases.

I don't think it should matter what the source of the unexpected data is.
The main issue for the failed scenario is that data may have been reeived.
If the test's sending DatagramChannel receives its own data, then that is a more problematic situation than if it is has received some data from an external UDP src. If it receives its own data then the DatagramChannel has had its data looped back, similar to what happens in multicast scenarios when IP_MULTICAST_LOOP is set on a DatagramChannel, and that channel sends to a multicast address. But this is a directed send and that shouldn't happen. That wiuld all suggest anomalous behaviour at the OS level

Outputting the socket address of the sender would be helpful to determine if there is a possibility that any datagram originates from itself.

if (n > 0) {
ignoreStrayWakeup = checkUnexpectedWakeup(dc, sel.selectedKeys());
sel.selectedKeys().clear();
if (ignoreStrayWakeup) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I believe scenario 2 is different - but you are right - it would be more correct to replace lines 90-91 with this:

                    // BindException will be thrown if another service is using
                    // our expected refuser port, cannot run just exit.
                    DatagramChannel.open().bind(refuser).close();
                    throw new RuntimeException("PortUnreachableException not raised");

In other words: we are connected - so we could only receive a message from something bound to the refuser's port. So check whether that port is still unbound, and if so, complain that PortUnreachableException should have been received.
Note that there still might be a race condition here because something could have transiently bound to the refuser and then released it. But the window of opportunity for the race should be smaller.
Whether we should throw new RuntimeException("PortUnreachableException not raised"); or simply continue and try again could be discussed.

if (n > 0) {
ignoreStrayWakeup = checkUnexpectedWakeup(dc, sel.selectedKeys());
sel.selectedKeys().clear();
if (ignoreStrayWakeup) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change the send message to be completely unique to this test, there maybe other DatagramSocket or Channel tests that have message containing "Greetings"

Good point. Although it's the only one I could find that uses Datagrams. We could add a time stamp - or a UUID for extra security.

@DarraghClarke
Copy link
Contributor Author

Thanks for the comments, I've just pushed some changes based on them

@@ -55,7 +55,7 @@ public static void main(String[] args) throws IOException {
dc.configureBlocking(false);
dc.register(sel, SelectionKey.OP_READ);

for (;;) {
for (int i = 0; i < 3; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to define a static int MAX_TRIES = 3; and use that here:

for (int i = 0; i < MAX_TRIES; i++) {

@@ -70,12 +70,13 @@ public static void main(String[] args) throws IOException {
DatagramChannel.open().bind(refuser).close();
throw new RuntimeException("Unexpected wakeup");
}

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:

                    if (ignoreStrayWakeup) {
                        if (i < MAX_TRIES - 1) continue;
                    }
                    // BindException will be thrown if another service is using 
                    // our expected refuser port, cannot run just exit.
                    DatagramChannel.open().bind(refuser).close();
                    throw new RuntimeException("Unexpected wakeup");
                } else {
                    break; 
                }

We don't want to silently finish and exit the loop if the last attempt failed. And we don't want to retry if n == 0.

Comment on lines 94 to 97
throw new RuntimeException("PortUnreachableException not raised");
} catch (PortUnreachableException pue) {
// expected
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:

                            throw new RuntimeException("PortUnreachableException not raised");
                        } catch (PortUnreachableException pue) {
                            // expected
                        }
                    }
                    break;

If we get PortUnreachableException we have what we expected and we can exit the loop. Same if n == 0;

@@ -70,12 +70,13 @@ public static void main(String[] args) throws IOException {
DatagramChannel.open().bind(refuser).close();
throw new RuntimeException("Unexpected wakeup");
}

}
for (int i = 0; i < 3; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop should start after connecting.
It should be:

                /* Test 2: connected so ICMP port unreachable may be received */
                dc.connect(refuser);
                try {
                     for (int i = 0;  i < MAX_TRIES; i++) {
                         ...
                     }
                 } finally {
                      ...

/* Test 3: not connected so ICMP port unreachable should not be received */
sendDatagram(dc, refuser);
n = sel.select(2000);
int n = sel.select(2000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

                    if (ignoreStrayWakeup) {
                        if (i < MAX_TRIES - 1) continue;
                    }

@@ -121,11 +125,12 @@ public static void main(String[] args) throws IOException {
static void sendDatagram(DatagramChannel dc, SocketAddress remote)
throws IOException
{
ByteBuffer bb = ByteBuffer.wrap("Greetings!".getBytes());
ByteBuffer bb = ByteBuffer.wrap("Greetings from SelectWhenRefused!".getBytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could put the message in a static string constant, that would avoid repeating it.

} catch (PortUnreachableException pue) {
// expected
}
}
} finally {
dc.disconnect();
}

}
for (int i = 0; i < 3; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use MAX_TRIES here too

@DarraghClarke
Copy link
Contributor Author

I refactored and cleaned up the test, it should hopefully be a lot more readable now

Comment on lines 57 to 60
for (int i = 0; i < MAX_TRIES; i++) {
if (!testNoPUEBeforeConnection(dc, refuser, sel, i)) {
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactoring. It does make the test much easier to read! But we can still go one bit further:

  1. we could have the method return true if the test was successful. That would make the method result easier to understand, and would avoid the negation in the if statement above, which would make it easier to read too!
  2. we could change the loop to run from i = 1 ; i <= MAX_TRIES ; i++ - this way we could check whether the maximum number of attempts was reached within the loop by comparing i == MAX_TRIES without having to subtract 1.
  3. assuming the changes above, we should throw an AssertionError if i == MAX_TRIES and we haven't broken out of the loop yet.

So that loop would turn into:

           for (int i = 1; i <= MAX_TRIES; i++) {
                if (testNoPUEBeforeConnection(dc, refuser, sel, i)) {
                    break; // test succeeded
                }
                assertNotEquals(i, MAX_TRIES, "testNoPUEBeforeConnection: too many retries");
           }

The two other loops could be modified in the same way.

Comment on lines 103 to 115
boolean ignoreStrayWakeup = checkUnexpectedWakeup(sel.selectedKeys());
sel.selectedKeys().clear();

if (ignoreStrayWakeup) {
if (retryCount < MAX_TRIES - 1) {
return true;
}
}

// BindException will be thrown if another service is using
// our expected refuser port, cannot run just exit.
DatagramChannel.open().bind(refuser).close();
throw new RuntimeException("Unexpected wakeup");
Copy link
Member

@dfuch dfuch Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the changes suggested above, we could simplify that into:

    boolean mayRetry = checkUnexpectedWakeup(sel.selectedKeys());
    boolean tooManyRetries = retryCount >= MAX_TRIES;
    sel.selectedKeys().clear();
    if (mayRetry && !tooManyRetries) {
        return false; // will retry
    } 
    // BindException will be thrown if another service is using
    // our expected refuser port, cannot run just exit.
    DatagramChannel.open().bind(refuser).close();
    throw new RuntimeException("Unexpected wakeup"); 
}
return true; // test succeeded

Comment on lines 154 to 162
if (retryCount < MAX_TRIES - 1) {
return true;
}
throw new RuntimeException("PortUnreachableException not raised");
} catch (PortUnreachableException pue) {
System.out.println("Got expected PortUnreachableException " + pue);
}
}
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the changes suggested above that would become:

             boolean mayRetry = retryCount < MAX_TRIES;
             if (mayRetry) return false;
             throw new RuntimeException("PortUnreachableException not raised");
         } catch (PortUnreachableException pue) {
             System.out.println("Got expected PortUnreachableException " + pue);
         }
     }
     return true; // test succeeded

Comment on lines 179 to 190
boolean ignoreStrayWakeup = checkUnexpectedWakeup(sel.selectedKeys());
sel.selectedKeys().clear();

if (ignoreStrayWakeup) {
if (retryCount < MAX_TRIES - 1) {
return true;
}
}

throw new RuntimeException("Unexpected wakeup after disconnect");
}
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the changes suggested above that would become:

    boolean mayRetry = checkUnexpectedWakeup(sel.selectedKeys());
    boolean tooManyRetries = retryCount >= MAX_TRIES;
    sel.selectedKeys().clear();
    if (mayRetry && !tooManyRetries) {
        return false; // will retry
    } 
    throw new RuntimeException("Unexpected wakeup after disconnect");
}
return true;  // test succeeded

Comment on lines 76 to 79
for (int i = 0; i < MAX_TRIES; i++) {
if (!testNoPUEAfterDisconnect(dc, refuser, sel, i)) {
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See suggestion above (1 to MAX_TRIES inclusive, assertNoEquals)

Comment on lines 66 to 68
for (int i = 0; i < MAX_TRIES; i++) {
if (!testPUEOnConnect(dc, refuser, sel, i)) {
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See suggestion above (1 to MAX_TRIES inclusive, assertNoEquals)

@DarraghClarke
Copy link
Contributor Author

Thanks for those suggestions, I've added them in

Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! This is much nicer. Just one small comment and one idle thought...

Comment on lines -88 to +87
} catch(BindException e) {
} catch (BindException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should actually log that:

System.out.println("Skipping test: refuser port has been reused: " + e);

*/

import java.nio.ByteBuffer;
import java.nio.channels.*;
import java.net.*;
import java.io.IOException;
import java.util.Set;

import static jdk.test.lib.Asserts.assertNotEquals;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Darn. I wonder if we should convert the test to testng or junit instead, just for the purpose of using junit/testng assertions. It would avoid the need of linking to /test/lib too. Just some idle thought.
The other day Pavel and I were wondering if that Asserts class should be retired - given that we have them available in testng/junit now...

Copy link
Contributor Author

@DarraghClarke DarraghClarke Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be against changing it, At a glance using Junit is only around 10 lines of changes, though I'm not sure if theres any additional patterns I should follow in swapping it to junit beyond using @Test instead of a main method? (and of course changing imports and @run)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change there and made sure the test was still working as intended

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Nov 8, 2022
Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Good job.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 8, 2022
@AlanBateman
Copy link
Contributor

I see you've overhauled this test to be a JUnit test. If you are going that far then maybe it should go a bit further and have 3 test methods rather than having all tests in on test method. It just means setup/teardown is done 3 times, which is okay.

@dfuch
Copy link
Member

dfuch commented Nov 8, 2022

I see you've overhauled this test to be a JUnit test. If you are going that far then maybe it should go a bit further and have 3 test methods rather than having all tests in on test method. It just means setup/teardown is done 3 times, which is okay.

Does it actually make sense? I thought the purpose of this test was to test before connecting, while connected, and after disconnecting. If we split into three different @TEST methods there's no telling in which order they will be executed.

@AlanBateman
Copy link
Contributor

Does it actually make sense? I thought the purpose of this test was to test before connecting, while connected, and after disconnecting. If we split into three different @TEST methods there's no telling in which order they will be executed.

I think we created this test as part of work to get PUE on Windows. Looking at it now, the "not connected" and the "connected" cases are two distinct scenarios. They could be separate test methods, each with their own setup and teardown. The execution order shouldn't matter as they will use different refusers.

The more advanced case is where you connect, then disconnect, and test that a PUE is not thrown.

I'll leave it you, I was just pointing out that it's one test method running three tests.

@msheppar
Copy link

msheppar commented Nov 8, 2022

Using the setup strategy would be in keeping with junit idioms, but it is debatable if it is worth it here (i'm not advocating it).
If you were to use the junit setUp, then I think it would be more appropriately applied on the class

private static DatagramChannel dc;
private static Selector sel;
private static SocketAddress refuser;
private static int port;


@BeforeAll
static void setupClass () {
// test fixture resources setup

}

@AfterAll
static void tearDownClass () {
   // close resources

}

then your test fixtures become
DatagramChannel dc,
SocketAddress refuser,
Selector sel,
int port
which would reduce the parameters passed to each test scenario method. But you'll have to handle any setup failures in the setup method and barf if there are any problems, also. Where as what you have now the test will just fail if any setup exceptions are encountered.

I think the test as one logical unit is best, because then the test has to handle the allocation of a ephemeral refuser port only once. But note that ephemeral refuser port is available for re-allocation by the OS for the duration of the test, and not just the initial test scenario where an OS re-allocation is tested. If the second connected scenario does not raise a PUE, then refuser port re-allocation could be a contributing factor, that is, the delivery of a datagram to an extant UDP end point. Again, it should be stated that this has not been the observed failure, but rather the observed failures have been the unexpected wakeup in scenario 1 no PUE unconnected.

@dfuch
Copy link
Member

dfuch commented Nov 9, 2022

Using the setup strategy would be in keeping with junit idioms, but it is debatable if it is worth it here (i'm not advocating it).

Agreed. I don't think we should do this.

If the second connected scenario does not raise a PUE, then refuser port re-allocation could be a contributing factor, that is, the delivery of a datagram to an extant UDP end point.

Good observation. Maybe we should copy:

            // BindException will be thrown if another service is using
            // our expected refuser port, cannot run just exit.
            DatagramChannel.open().bind(refuser).close();

just before line 166.

If we really wanted to split into three tests then each test should probably create its own refuser etc...
That would mean cloning the test() method into 3 new method where:

  1. the first one would return after the first loop.
  2. the second one would remove the first loop and return after the second loop.
  3. the third test would carry out the actions in the second loop without without performing any checks and without looping, and then would perform the third loop.

All in all I believe it's simpler to keep the single one test method that we have, even if it tests 3 scenarios - because at least we will be checking our assumption and print diagnosis information after each step along the way. It will be easier to reason about if it fails, I think.

@msheppar
Copy link

msheppar commented Nov 9, 2022

All in all I believe it's simpler to keep the single one test method that we have, even if it tests 3 scenarios - because at least we will be checking our assumption and print diagnosis information after each step along the way. It will be easier to reason about if it fails, I think.

👍

@DarraghClarke
Copy link
Contributor Author

I'm in favor of keeping it to one test, though I can make that change to Line166 that was mentioned

Copy link

@msheppar msheppar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job

@DarraghClarke
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Nov 11, 2022
@openjdk
Copy link

openjdk bot commented Nov 11, 2022

@DarraghClarke
Your change (at version 4e26134) is now ready to be sponsored by a Committer.

@AlekseiEfimov
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Nov 11, 2022

Going to push as commit fdabd37.
Since your change was applied there have been 317 commits pushed to the master branch:

  • 4a30081: 8296747: com/sun/net/httpserver/simpleserver/StressDirListings.java timed out
  • 12e76cb: 8296349: [aarch64] Avoid slicing Address::extend
  • 7244eac: 8296771: RISC-V: C2: assert(false) failed: bad AD file
  • 956d75b: 8295099: vmTestbase/nsk/stress/strace/strace013.java failed with "TestFailure: wrong lengths of stack traces: strace013Thread0: NNN strace013Thread83: MMM"
  • 2f9a94f: 8296824: ProblemList compiler/jvmci/jdk.vm.ci.code.test/src/jdk/vm/ci/code/test/NativeCallTest.java
  • 43ebd96: 8296822: ProblemList jdk/jfr/api/consumer/TestRecordingFileWrite.java
  • 84e1224: 8296496: Overzealous check in sizecalc.h prevents large memory allocation
  • 27527b4: 8296612: CertAttrSet is useless
  • 6b456f7: 8262901: [macos_aarch64] NativeCallTest expected:<-3.8194101E18> but was:<3.02668882E10>
  • e1badb7: 8295871: G1: Use different explicit claim marks for CLDs
  • ... and 307 more: https://git.openjdk.org/jdk/compare/172006c0e9433046252bd79e8864890ab7c0ce56...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 11, 2022
@openjdk openjdk bot closed this Nov 11, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Nov 11, 2022
@openjdk
Copy link

openjdk bot commented Nov 11, 2022

@AlekseiEfimov @DarraghClarke Pushed as commit fdabd37.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated nio nio-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

5 participants