Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8272600: (test) Use native "sleep" in Basic.java #5239

Closed
wants to merge 6 commits into from

Conversation

RogerRiggs
Copy link
Contributor

@RogerRiggs RogerRiggs commented Aug 24, 2021

The intermittent test in java/lang/ProcessBuilder/Basic.java has identified unexpected messages from a child Java VM
as the cause of the test failure. Attempts to control the output of the child VM have failed, the VM is unrepentant .

There is no functionality in the child except to wait long enough for the test to finish and the child is destroyed.
The fix is to switch from using a Java child to using a native child; a new executable sleepmillis.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5239/head:pull/5239
$ git checkout pull/5239

Update a local copy of the PR:
$ git checkout pull/5239
$ git pull https://git.openjdk.java.net/jdk pull/5239/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5239

View PR using the GUI difftool:
$ git pr show -t 5239

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5239.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Aug 24, 2021

👋 Welcome back rriggs! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Aug 24, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Aug 24, 2021

@RogerRiggs The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs label Aug 24, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Aug 24, 2021

Webrevs

@@ -27,7 +27,7 @@
* 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313
* 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958
* 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464
* 8067796 8224905 8263729 8265173
* 8067796 8224905 8263729 8265173 8272600 8231297
Copy link
Member

@iklam iklam Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should also be modified to use @run main/othervm/native/timeout=300 so that this test will be flagged by jtreg if -nativepath: is not specified.

Copy link
Contributor Author

@RogerRiggs RogerRiggs Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to run this test as a main, without the overhead of building the native image.
The use of a Java child greatly reduces the complexity of the test and improves its maintainability.
Requiring a native special built program raises the overhead considerably.
And all because the VM can't or won't allow its output to be managed.

Copy link
Member

@dholmes-ora dholmes-ora Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same way the test uses:

private static final String[] winEnvCommand = {"cmd.exe", "/c", "set"};

you could also have:

private static final String[] winSleepCommand = {"cmd.exe", "/c", "timeout", "/T", "60", "/NOBREAK"};

private static List<String> getSleepArgs() {
List<String> childArgs = null;
Path sleepExe = TEST_NATIVEPATH.resolve("sleepmillis");
if (sleepExe.toFile().canExecute()) {
Copy link
Member

@iklam iklam Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the fallback necessary? Other test cases such as test/jdk/tools/launcher/JliLaunchTest.java do not have such a fallback.

Also, I noticed that JliLaunchTest does something like this:

        Path launcher = Paths.get(System.getProperty("test.nativepath"),
            "JliLaunchTest" + (Platform.isWindows() ? ".exe" : ""));

but test/jdk/java/lang/reflect/exeCallerAccessTest/CallerAccessTest.java doesn't add ".exe", so it may not be necessary.

Copy link
Contributor Author

@RogerRiggs RogerRiggs Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, the java Child is more portable and lower maintenance.
Windows looks for xxx.exe if xxx is not found.

Copy link
Member

@iklam iklam Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a comment to explain why the fallback is necessary?

Also, if jtreg -nativepath:xxx is specified, but there's no xxx/sleepmillis, or xxx/sleepmillis is not executable, that should be a setup error (or a bug in the test itself). E.g., if we are testing inside mach5, we should always execute the native program, and should not silently fallback to the java Child program. Otherwise, setup problems in mach5 might bring us back to the mysterious intermittent failure.

(The current version of the code is buggy on Windows and will always silently fall back to Child because the executable is named "sleepmillis.exe", not "sleepmillis").

sleeptime.tv_sec = millis / 1000;
sleeptime.tv_nsec = (millis % 1000) * 1000 * 1000;
int rc;
while ((rc = nanosleep(&sleeptime, &sleeptime)) > 0) {
Copy link
Member

@iklam iklam Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is nanosleep a portable call? I couldn't find documentation for it with google search of nanosleep site:docs.microsoft.com.

Copy link
Contributor Author

@RogerRiggs RogerRiggs Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, true.
Falling back to the portable 'sleep(seconds)' is necessary; but the timing will be less precise.
C++ supports a higher resolution 'sleep_for' but the JDK native test build does not support building
main programs using C++ and its not worth the complications added.

iklam
iklam approved these changes Aug 26, 2021
Copy link
Member

@iklam iklam left a comment

LGTM

@openjdk
Copy link

@openjdk openjdk bot commented Aug 26, 2021

@RogerRiggs This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8272600: (test) Use native "sleep" in Basic.java

Reviewed-by: iklam, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 302 new commits pushed to the master branch:

  • 9c91ff5: 8274031: Typo in StringBuilder.readObject
  • d16bf04: 8273613: JFR: RemoteRecordingStream::start() blocks close()
  • 7ce60c6: 8273651: JFR: onMetadata(), setStartTime(), setEndTime() lacks test coverage
  • 1bd11a7: 8273887: [macos] java/awt/color/ICC_ColorSpace/MTTransformReplacedProfile.java timed out
  • ee3576a: 8256735: JFR: 'jfr' tool displays incorrect timestamps
  • 5fde4b6: 8273909: vmTestbase/nsk/jdi/Event/request/request001 can still fail with "ERROR: new event is not ThreadStartEvent"
  • 04891c9: 8273912: Add threadControl_dumpThread(jthread) function
  • 5c21c00: 8267163: Rename anonymous loader tests to hidden loader tests
  • b3b4b1c: 8273907: Cleanup redundant Math.max/min calls in DefaultHighlighter
  • a67f0f9: 8273505: runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java#default-cl crashed with SIGSEGV in MetaspaceShared::link_shared_classes
  • ... and 292 more: https://git.openjdk.java.net/jdk/compare/f608e81ad8309a001b8a92563a93b8adee1ce2b8...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Aug 26, 2021
For Windows, a native "BasicSleep" program is used.
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Hi Roger,

I think this can be simplified - see comments.

Thanks,
David

Path exePath = Path.of("/bin").resolve("sleep");
if (exePath.toFile().canExecute()) {
return exePath;
}
Copy link
Member

@dholmes-ora dholmes-ora Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is so elaborate when elsewhere in the test we just assume /usr/bin/env exists?

Copy link
Contributor Author

@RogerRiggs RogerRiggs Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True enough, will simplify. At one point, I was going to name the BasicSleep just "sleep" and put the native path at the end of a search list. Some other executables are not reliably in the same directories in all Unix versions.

@@ -27,7 +27,7 @@
* 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313
* 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958
* 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464
* 8067796 8224905 8263729 8265173
* 8067796 8224905 8263729 8265173 8272600 8231297
Copy link
Member

@dholmes-ora dholmes-ora Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same way the test uses:

private static final String[] winEnvCommand = {"cmd.exe", "/c", "set"};

you could also have:

private static final String[] winSleepCommand = {"cmd.exe", "/c", "timeout", "/T", "60", "/NOBREAK"};

test/jdk/java/lang/ProcessBuilder/Basic.java Outdated Show resolved Hide resolved
Path exePath = Path.of(TEST_NATIVEPATH).resolve("BasicSleep.exe");
System.out.println("exePath: " + exePath + ", canExec: " + exePath.toFile().canExecute());
Copy link
Member

@dholmes-ora dholmes-ora Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for??

Copy link
Contributor Author

@RogerRiggs RogerRiggs Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ioi was concerned that the fallback to Java would be used silently and the old intermittent issue would return.

Copy link
Member

@dholmes-ora dholmes-ora Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well Ioi asked for a comment explaining why the fallback was necessary, but the println doesn't explain anything. I assume it should say something like "Unable to find ... falling back to Java version"?

@RogerRiggs
Copy link
Contributor Author

@RogerRiggs RogerRiggs commented Sep 1, 2021

I've had very inconsistent results using the Windows 'timeout' command.
There appear to multiple versions. Some give an error message "Type "TIMEOUT /?" for usage.", (Windows 10)
others a error message "Try 'timeout --help' for more information." (Windows 2016)
And in some cases, it does not appear to wait the entire time requested and one of the test fails.

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Sep 2, 2021

I've had very inconsistent results using the Windows 'timeout' command.
There appear to multiple versions. Some give an error message "Type "TIMEOUT /?" for usage.", (Windows 10)
others a error message "Try 'timeout --help' for more information." (Windows 2016)
And in some cases, it does not appear to wait the entire time requested and one of the test fails.

The timeout --help does not appear to be the Windows command, but a bash variant so I'm assuming you were running under Cygwin or some such environment? Note it is completely different to the Windows timeout command and not what you want.

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Sep 2, 2021

BTW if running the tests on Windows under Cygwin etc then it should find /usr/bin/sleep anyway.

@RogerRiggs
Copy link
Contributor Author

@RogerRiggs RogerRiggs commented Sep 2, 2021

The timeout --help version is from a CI job:

Unexpected child output, to: getErrorStream
Child args: [cmd.exe, /c, timeout, 60]
Try 'timeout --help' for more information.

But ...\cygwin\bin is in PATH before Windows\System32

So sleep would be available, but on cygwin it is in /usr/bin.
sleep would work everywhere, except a real Windows build.

What a tortuous workaround because of a 'feature' of the VM!

@RogerRiggs
Copy link
Contributor Author

@RogerRiggs RogerRiggs commented Sep 2, 2021

And also about Timeout on Windows: it can't be used because it requires a working input stream.

Child args: [C:\Windows\System32\Timeout.exe, 600]
ERROR: Input redirection is not supported, exiting the process immediately.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 3, 2021

Mailing list message from David Holmes on core-libs-dev:

On 3/09/2021 12:45 am, Roger Riggs wrote:

On Sat, 28 Aug 2021 02:34:48 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

The intermittent test in java/lang/ProcessBuilder/Basic.java has identified unexpected messages from a child Java VM
as the cause of the test failure. Attempts to control the output of the child VM have failed, the VM is unrepentant .

There is no functionality in the child except to wait long enough for the test to finish and the child is destroyed.
The fix is to switch from using a Java child to using a native child; a new executable `sleepmillis`.

Roger Riggs has updated the pull request incrementally with one additional commit since the last revision:

Revised to use native /bin/sleep program on Unix* (non-Windows).
For Windows, a native "BasicSleep" program is used.

And also about Timeout on Windows: it can't be used because it requires a working input stream.

Child args\: \[C\:\\Windows\\System32\\Timeout\.exe\, 600\]
ERROR\: Input redirection is not supported\, exiting the process immediately\.

Did you try with the /NOBREAK flag?

David

@RogerRiggs
Copy link
Contributor Author

@RogerRiggs RogerRiggs commented Sep 3, 2021

On Windows Timeout, /NOBREAK does not change the behavior, still does not support if input is redirected.

Added diagnostic output for a test that sometimes fails on Linux when using /bin/sleep.
Addressed review comments.
@RogerRiggs
Copy link
Contributor Author

@RogerRiggs RogerRiggs commented Sep 15, 2021

Are there any other comments on this, or updated reviews?
Passes CI tests Tier1..tier3.

if (p.waitFor(10, TimeUnit.MILLISECONDS)) {
System.out.println("WaitFor didn't wait long enough: " + (System.nanoTime() - start));
Copy link
Member

@dholmes-ora dholmes-ora Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either the condition or the message seems wrong here. If waitFor returns true then the process has exited and we obviously did wait long enough.

Copy link
Contributor Author

@RogerRiggs RogerRiggs Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is diagnostic.
After switching to native sleep, I had intermittent failures claiming it did not sleep long enough.
I was unable to find a specific cause for those failures.
Many of the tests fail to check if the sleep processes terminate prematurely and if the executable is not found, it never launched.

Copy link
Member

@dholmes-ora dholmes-ora Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay but my comment still stands.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 15, 2021

Mailing list message from Bernd Eckenfels on core-libs-dev:

The message should probably more along the line of be ?external sleep process terminated unexpected early?. But maybe it is better to actually fail the test when true is returned as it should not happen instead of diag output? (And for diag output the exit code would be more helpful than the time)

Gruss
Bernd

--
http://bernd.eckenfels.net
________________________________
Von: core-libs-dev <core-libs-dev-retn at openjdk.java.net> im Auftrag von David Holmes <dholmes at openjdk.java.net>
Gesendet: Donnerstag, September 16, 2021 1:01 AM
An: core-libs-dev at openjdk.java.net
Betreff: Re: RFR: 8272600: (test) Use native "sleep" in Basic.java [v4]

On Wed, 15 Sep 2021 22:36:14 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

test/jdk/java/lang/ProcessBuilder/Basic.java line 2452:

2450:
2451: if (p.waitFor(10, TimeUnit.MILLISECONDS)) {
2452: System.out.println("WaitFor didn't wait long enough: " + (System.nanoTime() - start));

Either the condition or the message seems wrong here. If waitFor returns true then the process has exited and we obviously did wait long enough.

This code is diagnostic.
After switching to native sleep, I had intermittent failures claiming it did not sleep long enough.
I was unable to find a specific cause for those failures.
Many of the tests fail to check if the sleep processes terminate prematurely and if the executable is not found, it never launched.

Okay but my comment still stands.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5239

iklam
iklam approved these changes Sep 16, 2021
The test will fail if the process has terminated early.
(It would have failed anyway due to too short wait).
@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 16, 2021

Mailing list message from Bernd Eckenfels on core-libs-dev:

I like it, but I think you don?t Need the %n linebreak (at least the other fail message has none)

--
http://bernd.eckenfels.net
________________________________
Von: core-libs-dev <core-libs-dev-retn at openjdk.java.net> im Auftrag von Roger Riggs <rriggs at openjdk.java.net>
Gesendet: Donnerstag, September 16, 2021 6:13 PM
An: core-libs-dev at openjdk.java.net
Betreff: Re: RFR: 8272600: (test) Use native "sleep" in Basic.java [v5]

The intermittent test in java/lang/ProcessBuilder/Basic.java has identified unexpected messages from a child Java VM
as the cause of the test failure. Attempts to control the output of the child VM have failed, the VM is unrepentant .

There is no functionality in the child except to wait long enough for the test to finish and the child is destroyed.
The fix is to switch from using a Java child to using a native child; a new executable `sleepmillis`.

Roger Riggs has updated the pull request incrementally with one additional commit since the last revision:

Improve diagnostic message to add the exit value of the process.
The test will fail if the process has terminated early.
(It would have failed anyway due to too short wait).

-------------

Changes:
- all: https://git.openjdk.java.net/jdk/pull/5239/files
- new: https://git.openjdk.java.net/jdk/pull/5239/files/2a9c33fb..43a5480

Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5239&range=04
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5239&range=03-04

Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod
Patch: https://git.openjdk.java.net/jdk/pull/5239.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/5239/head:pull/5239

PR: https://git.openjdk.java.net/jdk/pull/5239

to fail on Linux.  The cleanup for a test used /usr/bin/pkill "sleep 60".
A race between that cleanup and subsequent tests that used sleep 60 or 600
could kill the sleep before the test of waitFor completed.
Changing the test using pkill to use 59 seconds makes the test cleanup
selective to the sleep spawned for that test.
The test that was failing (lines 2626-2624) passes consistently.
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Hi Roger,

One suggestion based on your latest change, but that can be handled later. Otherwise this seems fine.

Thanks,
David

// Running the grandchild for 59s should be more than enough.
// A unique (59s) time is needed to avoid killing other sleep processes.
final String[] cmd = { "/bin/bash", "-c", "(/bin/sleep 59)" };
final String[] cmdkill = { "/bin/bash", "-c", "(/usr/bin/pkill -f \"sleep 59\")" };
Copy link
Member

@dholmes-ora dholmes-ora Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe future RFE but why do we even need pkill here when we can get the PID of the sleep process we create and kill only that process?

Copy link
Contributor Author

@RogerRiggs RogerRiggs Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of that too, but notice the parens "()" around that /bin/sleep; that creates and extra level of forked processes and its harder to get that pid. There probably is a way to traverse the hierarchy but I'll keep it as is for now.

Copy link
Member

@dholmes-ora dholmes-ora Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Begs the question why we need to use bash to execute sleep? Could it be shell script instead of a binary?

Copy link
Contributor Author

@RogerRiggs RogerRiggs Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the author's prerogative; the test was written in 2010
to test conditions related to deeply inherited file descriptors.

@RogerRiggs
Copy link
Contributor Author

@RogerRiggs RogerRiggs commented Sep 22, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Sep 22, 2021

Going to push as commit 0a36163.
Since your change was applied there have been 332 commits pushed to the master branch:

  • c6df3c9: 8274071: Clean up java.lang.ref comments and documentation
  • 71788c6: 8271287: jdk/jshell/CommandCompletionTest.java fails with "lists don't have the same size expected"
  • ba7d550: 8270139: jshell InternalError crash for import of @repeatable followed by unresolved ref
  • aefd4ac: 8258734: jdk/jfr/event/oldobject/TestClassLoaderLeak.java failed with "RuntimeException: Could not find class leak"
  • d245a8c: 8274069: Clean up g1ParScanThreadState a bit
  • 3f73ca7: 8274068: Rename G1ScanInYoungSetter to G1SkipCardEnqueueSetter
  • 7f78803: 8273961: jdk/nio/zipfs/ZipFSTester.java fails if file path contains '+' character
  • 51085b5: 8274054: Add custom enqueue calls during reference processing
  • c77ebe8: 8274060: C2: Incorrect computation after JDK-8273454
  • d9872ba: 8273590: Move helper classes in G1 post evacuation sub tasks to cpp files
  • ... and 322 more: https://git.openjdk.java.net/jdk/compare/f608e81ad8309a001b8a92563a93b8adee1ce2b8...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Sep 22, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Sep 22, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 22, 2021

@RogerRiggs Pushed as commit 0a36163.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@RogerRiggs RogerRiggs deleted the 8272600-native-sleep branch Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs integrated
3 participants