New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8265261: java/nio/file/Files/InterruptCopy.java fails with java.lang.RuntimeException: Copy was not interrupted #5154
Conversation
…RuntimeException: Copy was not interrupted
|
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@bplb This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 49 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
|
is a countdown of 2 required? |
It did indeed look as if the copy starts before the interrupter thread but that is not guaranteed. I do not know about the difference between agentvm and othervm. Another alternative would be to repeatedly interrupt at a fixed rate with zero delay but I don't know whether the original intent was to have only one interrupt. |
ah yes! with the count of 2 you wish to avoid either thread racing ahead, and so are attempting to arrange that the interrupter is at least Ready to Run when the copy thread initiates its copy, as such creating a reasonable probability that the interrupter will invoke the interrupt while the copy is in progress. As such, the ping pong between the two threads is a sort of ad hoc barrier, which seems reasonable and will most likely work the majority of the time. But there is, I think, still some probability of missing the interrupt, if the interrupter thread "got stuck" after its little sleep, and was less than fairly scheduled? This is where agentvm or othervm may have influence, depending on the scheduling algorithm of the OS. Are threads, on Windows, scheduled independent of their owning process or relative to their owning process. othervm mode, afaik, launches a new process to executed the test, not sure if that would influence the scheduling of its threads. Would some additional diagnostic output be useful in the Interrupter thread to indicate that it starting to run and it is invoking interrupt or has invoked interrupt? In the second phase of the test the cancellation of scheduled task, is a latch also required in this instance, to ensure that when the cancel is invoked that the copy is in progress. |
@msheppar Right, the latch is intended to do exactly what you described. There is definitely still some probability of missing the interrupt however, but this results in a failure only if the copy took more than I think you are correct about the problem of the interrupter getting stuck. I don't know which vm mode would best minimize this problem. I had some print statements in the test when I was playing around with different approaches, but I was afraid that they might affect the timing. I'll take another look at it however as well as the idea of a latch added to the cancellation section. |
Strangely enough it looks like no exception is thrown if cancellation fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated logic LGTM.
result.get(); | ||
System.out.println("Copy cancelled."); | ||
throw new RuntimeException("Copy was not cancelled"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should DURATION_MAX_IN_MS
be taken into account here too to decide whether to throw or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. It seems like cancellation ought to be a bit more predictable than interruption. I am not sure how DURATION_MAX_IN_MS
was determined. If there were no "hiccups" in the threads starting, then with the correct timing I am not sure that such a threshold is even needed. The execution time for the copy that I have observed is between 150ms (macOS) and 400ms (Windows) which is a lot less than 5s, so there would have to be some slowdown for this threshold even to be reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not wishing to labour the discussion on the change, which will in the main deal with the issue, but is it not the case with the current set of Intermittent failures that there is this slowdown, i.e. the copy is taking longer than the threshold of 5 seconds hence the RuntimeException with Copy was not Interrupted?
As such,
if (duration > DURATION_MAX_IN_MS)
throw new RuntimeException("Copy was not interrupted");
is triggering a failure.
You would expect if a typical copy duration is 400msecs that if the duration is taking a lot longer that the interrupt has a greater probability to happen.
the rationale for this logic is not clear - so if the duration is less than the copy threshold then test proceeds as if an interrupt has occurred?
the success and failure semantics for the test are not absolute. As such, maybe there's an opportunity to be more exact with test semantics e.g. retrying to interrupt the copy if it didn't happen.
as an aside it would appear that invoking Thread.interrupted(); in the main thread seems redundant ?
It looks like the duration threshold
I think a further look at the semantics, as suggested, is in order. |
1 similar comment
It looks like the duration threshold
I think a further look at the semantics, as suggested, is in order. |
This latest version might be overthinking it, but I doubled the size of the test file created, and changed the interrupt and cancellation sections to make multiple attempts to interrupt the copy. I ran 80 iterations on each of the pertinent platforms with no failures. |
final Thread me = Thread.currentThread(); | ||
Future<?> wakeup = pool.schedule(new Runnable() { | ||
Future<?> wakeup = pool.scheduleAtFixedRate(new Runnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really a good idea? The interrupted state should stick to the thread until it is cleared. It should not really matter if you interrupt the thread too early - but if you interrupt the thread again after the File copy was interrupted, won't it skew the logic in the test? Especially WRT double checking that the interrupt status was/wasn't cleared? Also shouldn't you cancel this wakeup at some point before testing cancel below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure about the fixed rate firing here so you might be right. The wakeup is cancelled by wakeup.get()
unless I misunderstand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction: I think an explicit wakeup.cancel()
is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently setting the interrupted state prior to the copy does not cause the copy to be interrupted.
I have another version in progress and it would be worth waiting to review that one once it is published. The new version succeeded in 539 of 540 runs on four platforms (135 runs per platform). The failed case printed this information:
The interrupt call was made 10ms after the copy started, but the copy nonetheless ran another 327ms somehow without catching the interrupt. At present this is baffling. |
I tested in my work in progress by delaying the sleep until 10ms after the interrupt was issued and the copy was not interrupted. I expect that this is the reason for the 1 failure out of 540 executions mentioned above. |
Version 04 revives the use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the test campaign shows better results with this latter version then I have no objection: it looks better to me than the previous repeated interrupt logic.
/integrate |
Going to push as commit aaedac6.
Your commit was automatically rebased without conflicts. |
Mailing list message from Brian Burkhalter on nio-dev: On Aug 24, 2021, at 4:04 AM, Daniel Fuchs <dfuchs at openjdk.java.net<mailto:dfuchs at openjdk.java.net>> wrote: On Tue, 24 Aug 2021 00:30:48 GMT, Brian Burkhalter <bpb at openjdk.org<mailto:bpb at openjdk.org>> wrote: This proposal suggests to change the timing of testing whether a file copy is terminated by an interrupt. Brian Burkhalter has updated the pull request incrementally with one additional commit since the last revision: 8265261: Reinstate duration check and latches but with countdown 1 If the test campaign shows better results with this latter version then I have no objection: it looks better to me than the previous repeated interrupt logic. This does show better results. I am not however convinced that it is better than the previously posted fixed rate interrupt version. I verified that if the interrupt is issued before the copy begins then it is not recognized by `copy()`. Given that, repeating might give a more predictable result. I would agree though to go with this version and revisit it once more should failures be seen. |
This proposal suggests to change the timing of testing whether a file copy is terminated by an interrupt.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5154/head:pull/5154
$ git checkout pull/5154
Update a local copy of the PR:
$ git checkout pull/5154
$ git pull https://git.openjdk.java.net/jdk pull/5154/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5154
View PR using the GUI difftool:
$ git pr show -t 5154
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5154.diff