-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8272600: (test) Use native "sleep" in Basic.java #5239
Conversation
👋 Welcome back rriggs! A progress list of the required criteria for merging this PR into |
@RogerRiggs The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
@@ -27,7 +27,7 @@ | |||
* 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313 | |||
* 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958 | |||
* 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464 | |||
* 8067796 8224905 8263729 8265173 | |||
* 8067796 8224905 8263729 8265173 8272600 8231297 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test should also be modified to use @run main/othervm/native/timeout=300
so that this test will be flagged by jtreg if -nativepath:
is not specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to run this test as a main, without the overhead of building the native image.
The use of a Java child greatly reduces the complexity of the test and improves its maintainability.
Requiring a native special built program raises the overhead considerably.
And all because the VM can't or won't allow its output to be managed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same way the test uses:
private static final String[] winEnvCommand = {"cmd.exe", "/c", "set"};
you could also have:
private static final String[] winSleepCommand = {"cmd.exe", "/c", "timeout", "/T", "60", "/NOBREAK"};
private static List<String> getSleepArgs() { | ||
List<String> childArgs = null; | ||
Path sleepExe = TEST_NATIVEPATH.resolve("sleepmillis"); | ||
if (sleepExe.toFile().canExecute()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the fallback necessary? Other test cases such as test/jdk/tools/launcher/JliLaunchTest.java do not have such a fallback.
Also, I noticed that JliLaunchTest does something like this:
Path launcher = Paths.get(System.getProperty("test.nativepath"),
"JliLaunchTest" + (Platform.isWindows() ? ".exe" : ""));
but test/jdk/java/lang/reflect/exeCallerAccessTest/CallerAccessTest.java doesn't add ".exe", so it may not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, the java Child is more portable and lower maintenance.
Windows looks for xxx.exe if xxx is not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a comment to explain why the fallback is necessary?
Also, if jtreg -nativepath:xxx
is specified, but there's no xxx/sleepmillis, or xxx/sleepmillis is not executable, that should be a setup error (or a bug in the test itself). E.g., if we are testing inside mach5, we should always execute the native program, and should not silently fallback to the java Child program. Otherwise, setup problems in mach5 might bring us back to the mysterious intermittent failure.
(The current version of the code is buggy on Windows and will always silently fall back to Child because the executable is named "sleepmillis.exe", not "sleepmillis").
sleeptime.tv_sec = millis / 1000; | ||
sleeptime.tv_nsec = (millis % 1000) * 1000 * 1000; | ||
int rc; | ||
while ((rc = nanosleep(&sleeptime, &sleeptime)) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is nanosleep
a portable call? I couldn't find documentation for it with google search of nanosleep site:docs.microsoft.com
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, true.
Falling back to the portable 'sleep(seconds)' is necessary; but the timing will be less precise.
C++ supports a higher resolution 'sleep_for' but the JDK native test build does not support building
main programs using C++ and its not worth the complications added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@RogerRiggs This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 302 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
For Windows, a native "BasicSleep" program is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Roger,
I think this can be simplified - see comments.
Thanks,
David
Path exePath = Path.of("/bin").resolve("sleep"); | ||
if (exePath.toFile().canExecute()) { | ||
return exePath; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this is so elaborate when elsewhere in the test we just assume /usr/bin/env
exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True enough, will simplify. At one point, I was going to name the BasicSleep just "sleep" and put the native path at the end of a search list. Some other executables are not reliably in the same directories in all Unix versions.
@@ -27,7 +27,7 @@ | |||
* 5026830 5023243 5070673 4052517 4811767 6192449 6397034 6413313 | |||
* 6464154 6523983 6206031 4960438 6631352 6631966 6850957 6850958 | |||
* 4947220 7018606 7034570 4244896 5049299 8003488 8054494 8058464 | |||
* 8067796 8224905 8263729 8265173 | |||
* 8067796 8224905 8263729 8265173 8272600 8231297 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same way the test uses:
private static final String[] winEnvCommand = {"cmd.exe", "/c", "set"};
you could also have:
private static final String[] winSleepCommand = {"cmd.exe", "/c", "timeout", "/T", "60", "/NOBREAK"};
Path exePath = Path.of(TEST_NATIVEPATH).resolve("BasicSleep.exe"); | ||
System.out.println("exePath: " + exePath + ", canExec: " + exePath.toFile().canExecute()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ioi was concerned that the fallback to Java would be used silently and the old intermittent issue would return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well Ioi asked for a comment explaining why the fallback was necessary, but the println doesn't explain anything. I assume it should say something like "Unable to find ... falling back to Java version"?
I've had very inconsistent results using the Windows 'timeout' command. |
The |
BTW if running the tests on Windows under Cygwin etc then it should find /usr/bin/sleep anyway. |
The
But ...\cygwin\bin is in PATH before Windows\System32 So sleep would be available, but on cygwin it is in /usr/bin. What a tortuous workaround because of a 'feature' of the VM! |
And also about Timeout on Windows: it can't be used because it requires a working input stream. Child args: [C:\Windows\System32\Timeout.exe, 600] |
Mailing list message from David Holmes on core-libs-dev: On 3/09/2021 12:45 am, Roger Riggs wrote:
Did you try with the /NOBREAK flag? David |
On Windows Timeout, /NOBREAK does not change the behavior, still does not support if input is redirected. |
Added diagnostic output for a test that sometimes fails on Linux when using /bin/sleep. Addressed review comments.
Are there any other comments on this, or updated reviews? |
if (p.waitFor(10, TimeUnit.MILLISECONDS)) { | ||
System.out.println("WaitFor didn't wait long enough: " + (System.nanoTime() - start)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either the condition or the message seems wrong here. If waitFor returns true then the process has exited and we obviously did wait long enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is diagnostic.
After switching to native sleep, I had intermittent failures claiming it did not sleep long enough.
I was unable to find a specific cause for those failures.
Many of the tests fail to check if the sleep processes terminate prematurely and if the executable is not found, it never launched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay but my comment still stands.
Mailing list message from Bernd Eckenfels on core-libs-dev: The message should probably more along the line of be ?external sleep process terminated unexpected early?. But maybe it is better to actually fail the test when true is returned as it should not happen instead of diag output? (And for diag output the exit code would be more helpful than the time) Gruss -- On Wed, 15 Sep 2021 22:36:14 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
Okay but my comment still stands. ------------- PR: https://git.openjdk.java.net/jdk/pull/5239 |
The test will fail if the process has terminated early. (It would have failed anyway due to too short wait).
Mailing list message from Bernd Eckenfels on core-libs-dev: I like it, but I think you don?t Need the %n linebreak (at least the other fail message has none) --
Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: Improve diagnostic message to add the exit value of the process. ------------- Changes: Webrevs: Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod PR: https://git.openjdk.java.net/jdk/pull/5239 |
to fail on Linux. The cleanup for a test used /usr/bin/pkill "sleep 60". A race between that cleanup and subsequent tests that used sleep 60 or 600 could kill the sleep before the test of waitFor completed. Changing the test using pkill to use 59 seconds makes the test cleanup selective to the sleep spawned for that test. The test that was failing (lines 2626-2624) passes consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Roger,
One suggestion based on your latest change, but that can be handled later. Otherwise this seems fine.
Thanks,
David
// Running the grandchild for 59s should be more than enough. | ||
// A unique (59s) time is needed to avoid killing other sleep processes. | ||
final String[] cmd = { "/bin/bash", "-c", "(/bin/sleep 59)" }; | ||
final String[] cmdkill = { "/bin/bash", "-c", "(/usr/bin/pkill -f \"sleep 59\")" }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe future RFE but why do we even need pkill here when we can get the PID of the sleep process we create and kill only that process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of that too, but notice the parens "()" around that /bin/sleep; that creates and extra level of forked processes and its harder to get that pid. There probably is a way to traverse the hierarchy but I'll keep it as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. Begs the question why we need to use bash to execute sleep? Could it be shell script instead of a binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the author's prerogative; the test was written in 2010
to test conditions related to deeply inherited file descriptors.
/integrate |
Going to push as commit 0a36163.
Your commit was automatically rebased without conflicts. |
@RogerRiggs Pushed as commit 0a36163. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
The intermittent test in java/lang/ProcessBuilder/Basic.java has identified unexpected messages from a child Java VM
as the cause of the test failure. Attempts to control the output of the child VM have failed, the VM is unrepentant .
There is no functionality in the child except to wait long enough for the test to finish and the child is destroyed.
The fix is to switch from using a Java child to using a native child; a new executable
sleepmillis
.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5239/head:pull/5239
$ git checkout pull/5239
Update a local copy of the PR:
$ git checkout pull/5239
$ git pull https://git.openjdk.java.net/jdk pull/5239/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5239
View PR using the GUI difftool:
$ git pr show -t 5239
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5239.diff