8265261: java/nio/file/Files/InterruptCopy.java fails with java.lang.RuntimeException: Copy was not interrupted #5154

bplb · 2021-08-18T02:32:53Z

This proposal suggests to change the timing of testing whether a file copy is terminated by an interrupt.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8265261: java/nio/file/Files/InterruptCopy.java fails with java.lang.RuntimeException: Copy was not interrupted

Reviewers

Daniel Fuchs (@dfuch - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5154/head:pull/5154
$ git checkout pull/5154

Update a local copy of the PR:
$ git checkout pull/5154
$ git pull https://git.openjdk.java.net/jdk pull/5154/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5154

View PR using the GUI difftool:
$ git pr show -t 5154

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5154.diff

…RuntimeException: Copy was not interrupted

bridgekeeper · 2021-08-18T02:33:28Z

👋 Welcome back bpb! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2021-08-18T02:35:35Z

@bplb The following label will be automatically applied to this pull request:

nio

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2021-08-18T02:39:33Z

Webrevs

dfuch

LGTM

openjdk · 2021-08-18T09:10:13Z

@bplb This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8265261: java/nio/file/Files/InterruptCopy.java fails with java.lang.RuntimeException: Copy was not interrupted

Reviewed-by: dfuchs

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 49 new commits pushed to the master branch:

7f80683: 8272783: Epsilon: Refactor tests to improve performance
22ef4f0: 5015261: NPE may be thrown if JDesktopIcon is set to null on a JInternalFrame
9bc0232: 8269223: -Xcheck:jni WARNINGs working with fonts on Linux
2ff4c01: 8271600: C2: CheckCastPP which should closely follow Allocate is sunk of a loop
ad92033: 8272736: [JVMCI] Add API for reading and writing JVMCI thread locals
709b591: 8272553: several hotspot runtime/CommandLine tests don't check exit code
1884072: 8265253: javac -Xdoclint:all gives "no comment" warning for code that can't be commented
594e516: 8272778: Consolidate is_instance and is_instance_inlined in java_lang_String
d542745: 8267894: Skip work for empty regions in G1 Full GC
741f58c: 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData
... and 39 more: https://git.openjdk.java.net/jdk/compare/a68b5b9c1dd9dd69fa3ef29d8f1186707ec09311...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

msheppar · 2021-08-18T10:03:33Z

is a countdown of 2 required?
or is it sufficient that the Interrupter thread should wait for the main to tell it to proceed?
Does the fact that the test runs agentvm influence the scheduling of the test's threads or is
is that irrelevant on linux? That is to say would running as othervm allow for more consistent
and deterministic interruption

bplb · 2021-08-18T15:55:53Z

It did indeed look as if the copy starts before the interrupter thread but that is not guaranteed. I do not know about the difference between agentvm and othervm. Another alternative would be to repeatedly interrupt at a fixed rate with zero delay but I don't know whether the original intent was to have only one interrupt.

msheppar · 2021-08-18T21:28:42Z

ah yes! with the count of 2 you wish to avoid either thread racing ahead, and so are attempting to arrange that the interrupter is at least Ready to Run when the copy thread initiates its copy, as such creating a reasonable probability that the interrupter will invoke the interrupt while the copy is in progress. As such, the ping pong between the two threads is a sort of ad hoc barrier, which seems reasonable and will most likely work the majority of the time.

But there is, I think, still some probability of missing the interrupt, if the interrupter thread "got stuck" after its little sleep, and was less than fairly scheduled? This is where agentvm or othervm may have influence, depending on the scheduling algorithm of the OS. Are threads, on Windows, scheduled independent of their owning process or relative to their owning process. othervm mode, afaik, launches a new process to executed the test, not sure if that would influence the scheduling of its threads.

Would some additional diagnostic output be useful in the Interrupter thread to indicate that it starting to run and it is invoking interrupt or has invoked interrupt?

In the second phase of the test the cancellation of scheduled task, is a latch also required in this instance, to ensure that when the cancel is invoked that the copy is in progress.

bplb · 2021-08-18T21:39:34Z

@msheppar Right, the latch is intended to do exactly what you described. There is definitely still some probability of missing the interrupt however, but this results in a failure only if the copy took more than DURATION_MAX_IN_MS (5 seconds). I don't know how this limit was determined.

I think you are correct about the problem of the interrupter getting stuck. I don't know which vm mode would best minimize this problem.

I had some print statements in the test when I was playing around with different approaches, but I was afraid that they might affect the timing. I'll take another look at it however as well as the idea of a latch added to the cancellation section.

bplb · 2021-08-18T23:31:55Z

Strangely enough it looks like no exception is thrown if cancellation fails.

dfuch

Updated logic LGTM.

dfuch · 2021-08-19T11:04:05Z

test/jdk/java/nio/file/Files/InterruptCopy.java

                result.get();
-            System.out.println("Copy cancelled.");
+                throw new RuntimeException("Copy was not cancelled");


Should DURATION_MAX_IN_MS be taken into account here too to decide whether to throw or not?

I don't know. It seems like cancellation ought to be a bit more predictable than interruption. I am not sure how DURATION_MAX_IN_MS was determined. If there were no "hiccups" in the threads starting, then with the correct timing I am not sure that such a threshold is even needed. The execution time for the copy that I have observed is between 150ms (macOS) and 400ms (Windows) which is a lot less than 5s, so there would have to be some slowdown for this threshold even to be reached.

Not wishing to labour the discussion on the change, which will in the main deal with the issue, but is it not the case with the current set of Intermittent failures that there is this slowdown, i.e. the copy is taking longer than the threshold of 5 seconds hence the RuntimeException with Copy was not Interrupted?

As such,

if (duration > DURATION_MAX_IN_MS)
throw new RuntimeException("Copy was not interrupted");

is triggering a failure.
You would expect if a typical copy duration is 400msecs that if the duration is taking a lot longer that the interrupt has a greater probability to happen.

the rationale for this logic is not clear - so if the duration is less than the copy threshold then test proceeds as if an interrupt has occurred?

the success and failure semantics for the test are not absolute. As such, maybe there's an opportunity to be more exact with test semantics e.g. retrying to interrupt the copy if it didn't happen.

as an aside it would appear that invoking Thread.interrupted(); in the main thread seems redundant ?

bplb · 2021-08-19T20:44:01Z

It looks like the duration threshold DURATION_MAX_IN_MS was added in the change which resolved JDK-6993267. A comment in that issue is "We should improve the test to work on very fast machines." So the copy must have been too fast for it to be interrupted and without the duration threshold there were spurious failures.

Thread.interrupted() in the main thread indeed seems unnecessary.

I think a further look at the semantics, as suggested, is in order.

bplb · 2021-08-19T20:44:18Z

It looks like the duration threshold DURATION_MAX_IN_MS was added in the change which resolved JDK-6993267. A comment in that issue is "We should improve the test to work on very fast machines." So the copy must have been too fast for it to be interrupted and without the duration threshold there were spurious failures.

Thread.interrupted() in the main thread indeed seems unnecessary.

I think a further look at the semantics, as suggested, is in order.

bplb · 2021-08-20T01:25:22Z

This latest version might be overthinking it, but I doubled the size of the test file created, and changed the interrupt and cancellation sections to make multiple attempts to interrupt the copy. I ran 80 iterations on each of the pertinent platforms with no failures.

dfuch · 2021-08-20T08:51:23Z

test/jdk/java/nio/file/Files/InterruptCopy.java

            final Thread me = Thread.currentThread();
-            Future<?> wakeup = pool.schedule(new Runnable() {
+            Future<?> wakeup = pool.scheduleAtFixedRate(new Runnable() {


Is it really a good idea? The interrupted state should stick to the thread until it is cleared. It should not really matter if you interrupt the thread too early - but if you interrupt the thread again after the File copy was interrupted, won't it skew the logic in the test? Especially WRT double checking that the interrupt status was/wasn't cleared? Also shouldn't you cancel this wakeup at some point before testing cancel below?

I was not sure about the fixed rate firing here so you might be right. The wakeup is cancelled by wakeup.get() unless I misunderstand.

Correction: I think an explicit wakeup.cancel() is needed.

Apparently setting the interrupted state prior to the copy does not cause the copy to be interrupted.

bplb · 2021-08-21T00:02:13Z

I have another version in progress and it would be worth waiting to review that one once it is published. The new version succeeded in 539 of 540 runs on four platforms (135 runs per platform). The failed case printed this information:

Source file created.
Copying file at 1629496891924ms...
Interrupting at 1629496891934ms...
Done copying at 1629496892261ms...
----------System.err:(13/761)----------
java.lang.RuntimeException: Copy was not interrupted
	at InterruptCopy.doTest(InterruptCopy.java:110)
	at InterruptCopy.main(InterruptCopy.java:60)

The interrupt call was made 10ms after the copy started, but the copy nonetheless ran another 327ms somehow without catching the interrupt. At present this is baffling.

bplb · 2021-08-23T18:02:04Z

I tested in my work in progress by delaying the sleep until 10ms after the interrupt was issued and the copy was not interrupted. I expect that this is the reason for the 1 failure out of 540 executions mentioned above.

bplb · 2021-08-24T00:31:04Z

Version 04 revives the use of CountDownLatch but with a count of 1 instead of 2. It is used for both the interrupt and the cancel cases to make the interrupting thread wait until just before copy() before the interrupting thread starts its sleep delay. For the interrupt case, a minimum duration for copying leading to a test failure is reinstated. This tries to prevent a false failure if the copy was much faster than expected.

dfuch

If the test campaign shows better results with this latter version then I have no objection: it looks better to me than the previous repeated interrupt logic.

bplb · 2021-08-24T17:02:47Z

/integrate

openjdk · 2021-08-24T17:04:47Z

Going to push as commit aaedac6.
Since your change was applied there have been 58 commits pushed to the master branch:

d34f17c: 8270195: Add missing links between methods of JavaFX properties
f608e81: 8264322: Generate CDS archive when creating custom JDK image
f681d65: 8272806: [macOS] "Apple AWT Internal Exception" when input method is changed
6e0328f: 8272725: G1: add documentation on needs_remset_update_t vs bool
2309b7d: 8253178: Replace LinkedList Impl in net.http.FilterFactory
94f5e44: 8271258: @param with non-ascii variable names produces incorrect results
7454306: 8272526: Cleanup ThreadStateTransition class
0597cde: 8221360: Eliminate Shared_DirtyCardQ_lock
928b972: 8271930: Simplify end_card calculation in G1BlockOffsetTablePart::verify
7f80683: 8272783: Epsilon: Refactor tests to improve performance
... and 48 more: https://git.openjdk.java.net/jdk/compare/a68b5b9c1dd9dd69fa3ef29d8f1186707ec09311...master

Your commit was automatically rebased without conflicts.

openjdk · 2021-08-24T17:04:59Z

@bplb Pushed as commit aaedac6.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

mlbridge · 2021-09-01T14:41:35Z

Mailing list message from Brian Burkhalter on nio-dev:

On Aug 24, 2021, at 4:04 AM, Daniel Fuchs <dfuchs at openjdk.java.net<mailto:dfuchs at openjdk.java.net>> wrote:

On Tue, 24 Aug 2021 00:30:48 GMT, Brian Burkhalter <bpb at openjdk.org<mailto:bpb at openjdk.org>> wrote:

This proposal suggests to change the timing of testing whether a file copy is terminated by an interrupt.

Brian Burkhalter has updated the pull request incrementally with one additional commit since the last revision:

8265261: Reinstate duration check and latches but with countdown 1

If the test campaign shows better results with this latter version then I have no objection: it looks better to me than the previous repeated interrupt logic.

This does show better results. I am not however convinced that it is better than the previously posted fixed rate interrupt version. I verified that if the interrupt is issued before the copy begins then it is not recognized by `copy()`. Given that, repeating might give a more predictable result. I would agree though to go with this version and revisit it once more should failures be seen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/nio-dev/attachments/20210824/1715a4e8/attachment.htm>

8265261: java/nio/file/Files/InterruptCopy.java fails with java.lang.…

7c75978

…RuntimeException: Copy was not interrupted

openjdk bot added the rfr Pull request is ready for review label Aug 18, 2021

openjdk bot added the nio nio-dev@openjdk.org label Aug 18, 2021

dfuch approved these changes Aug 18, 2021

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Aug 18, 2021

8265261: Fix the section which tests cancellation

85f7886

dfuch approved these changes Aug 19, 2021

View reviewed changes

dfuch reviewed Aug 19, 2021

View reviewed changes

8265261: Make multiple attempts to interrupt and cancel

c02a8d1

dfuch reviewed Aug 20, 2021

View reviewed changes

8265261: Drop countdown latch and repeats of interrupts

752eb4b

8265261: Reinstate duration check and latches but with countdown 1

4ba5134

dfuch approved these changes Aug 24, 2021

View reviewed changes

openjdk bot closed this Aug 24, 2021

openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 24, 2021

bplb deleted the Files-InterruptCopy-8265261 branch August 24, 2021 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8265261: java/nio/file/Files/InterruptCopy.java fails with java.lang.RuntimeException: Copy was not interrupted #5154

8265261: java/nio/file/Files/InterruptCopy.java fails with java.lang.RuntimeException: Copy was not interrupted #5154

bplb commented Aug 18, 2021 •

edited by openjdk bot

bridgekeeper bot commented Aug 18, 2021

openjdk bot commented Aug 18, 2021

mlbridge bot commented Aug 18, 2021 •

edited

dfuch left a comment

openjdk bot commented Aug 18, 2021 •

edited

msheppar commented Aug 18, 2021 •

edited

bplb commented Aug 18, 2021

msheppar commented Aug 18, 2021

bplb commented Aug 18, 2021 •

edited

bplb commented Aug 18, 2021

dfuch left a comment

dfuch Aug 19, 2021

bplb Aug 19, 2021

msheppar Aug 19, 2021

bplb commented Aug 19, 2021

bplb commented Aug 19, 2021

bplb commented Aug 20, 2021

dfuch Aug 20, 2021

bplb Aug 20, 2021

bplb Aug 20, 2021

bplb Aug 23, 2021

bplb commented Aug 21, 2021

bplb commented Aug 23, 2021

bplb commented Aug 24, 2021

dfuch left a comment

bplb commented Aug 24, 2021

openjdk bot commented Aug 24, 2021

openjdk bot commented Aug 24, 2021

mlbridge bot commented Sep 1, 2021

8265261: java/nio/file/Files/InterruptCopy.java fails with java.lang.RuntimeException: Copy was not interrupted #5154

8265261: java/nio/file/Files/InterruptCopy.java fails with java.lang.RuntimeException: Copy was not interrupted #5154

Conversation

bplb commented Aug 18, 2021 • edited by openjdk bot

Progress

Issue

Reviewers

Reviewing

bridgekeeper bot commented Aug 18, 2021

openjdk bot commented Aug 18, 2021

mlbridge bot commented Aug 18, 2021 • edited

Webrevs

dfuch left a comment

Choose a reason for hiding this comment

openjdk bot commented Aug 18, 2021 • edited

msheppar commented Aug 18, 2021 • edited

bplb commented Aug 18, 2021

msheppar commented Aug 18, 2021

bplb commented Aug 18, 2021 • edited

bplb commented Aug 18, 2021

dfuch left a comment

Choose a reason for hiding this comment

dfuch Aug 19, 2021

Choose a reason for hiding this comment

bplb Aug 19, 2021

Choose a reason for hiding this comment

msheppar Aug 19, 2021

Choose a reason for hiding this comment

bplb commented Aug 19, 2021

bplb commented Aug 19, 2021

bplb commented Aug 20, 2021

dfuch Aug 20, 2021

Choose a reason for hiding this comment

bplb Aug 20, 2021

Choose a reason for hiding this comment

bplb Aug 20, 2021

Choose a reason for hiding this comment

bplb Aug 23, 2021

Choose a reason for hiding this comment

bplb commented Aug 21, 2021

bplb commented Aug 23, 2021

bplb commented Aug 24, 2021

dfuch left a comment

Choose a reason for hiding this comment

bplb commented Aug 24, 2021

openjdk bot commented Aug 24, 2021

openjdk bot commented Aug 24, 2021

mlbridge bot commented Sep 1, 2021

bplb commented Aug 18, 2021 •

edited by openjdk bot

mlbridge bot commented Aug 18, 2021 •

edited

openjdk bot commented Aug 18, 2021 •

edited

msheppar commented Aug 18, 2021 •

edited

bplb commented Aug 18, 2021 •

edited