-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8197825: [Test] Intermittent timeout with javax/swing JColorChooser Test #2220
Conversation
👋 Welcome back psadhukhan! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Yet I suggest fixing the typo in the bug synopsis: Intermiitent → Intermittent
@prsadhuk This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 136 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
This bug looks like a product bug, infinite loop in the "Toolkit.shutdown", so it is better to fix the jdk than the test. |
while (!tk.IsDisposed()) { | ||
Sleep(100); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can cause infinite looping as the message queue to get DISPOSE events is removed so if we are not getting disposed by now, we probably will not get DISPOSE event ever causing infinite loop.
XToolkit, LWToolkit does not have this infinite loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that this is not a test bug, so the test should not be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at the "AwtToolkit::Dispose()" method, on how much stuff should be done to properly shutdown the toolkit. This Dispose() method is executed immediately when we exit the message loop in the "Java_sun_awt_windows_WToolkit_eventLoop". So when the shutdown hook is executed we should have the message loop, then we call tk.QuitMessageLoop to stop it, and wait until all code in the Dispose() is executed. But since the IsDisposed() return false we for unknow reason hang, does it mean that the message loop still operates? Or we got some error during "QuitMessageLoop"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems in successful run, when the test finish
- AwtToolkit::MessageLoop starts
- DoQuitMessageLoop is called
- AwtToolkit::QuitMessageLoop starts
- AwtToolkit::QuitMessageLoop finishes
- AwtToolkit::MessageLoop finish
- Dispose() is called, m_isDisposed sets to true
- shutdown hook isDisposed is true so no infinite loop
During unsuccessful run,
- AwtToolkit::MessageLoop starts
- DoQuitMessageLoop is called
- AwtToolkit::QuitMessageLoop starts
- AwtToolkit::QuitMessageLoop finishes
- AwtToolkit::MessageLoop NEVER ends so Dispose() is not called so m_isDisposed is not set to true so shutdown hook goes in infinite loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the code yesterday. Could it be because of synchronisation? I mean, do we need to use native synchronisation to guarantee variable changes are seen across the threads?
Does MessageLoop not receive Quit / Null message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that this is not a test bug, so the test should not be changed.
The test never dispose of the frame. Why is it expected to shut down the toolkit? Shall the frame be disposed of when the main thread in the test finishes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test never dispose of the frame. Why is it expected to shut down the toolkit? Shall the frame be disposed of when the main thread in the test finishes?
The shutdown is caused by the System.exit call while the toolkit active, so we should shut down it before the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems "m_breakMessageLoop" is never true for unsuccessful run even though AwtToolkit::QuitMessageLoop finish execution (where m_breakMessageLoop is set to true),
so AwtToolkit::MessageLoop never ends and shutdown hook gets called where tk.isDisposed() loop start spinning. seems to be some timing issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems "m_breakMessageLoop" is never true for unsuccessful run even though AwtToolkit::QuitMessageLoop finish execution (where m_breakMessageLoop is set to true),
so AwtToolkit::MessageLoop never ends and shutdown hook gets called where tk.isDisposed() loop start spinning. seems to be some timing issue.
Then this does look like a synchronisation problem. One thread changes the value of m_breakMessageLoop
but another doesn't see it's changed. Should m_breakMessageLoop
be declared as volatile
?
if (frame != null) { | ||
SwingUtilities.invokeAndWait(() -> frame.dispose()); |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
I've edited the JBS synopsis. Please also update the PR subject. |
@@ -475,7 +475,7 @@ class AwtToolkit { | |||
BOOL m_verifyComponents; | |||
BOOL m_breakOnError; | |||
|
|||
BOOL m_breakMessageLoop; | |||
volatile BOOL m_breakMessageLoop; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this volatile modifier resolve the now-removed infinite loop in while (!tk.IsDisposed())
in WToolkit_shutdown
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this volatile modifier resolve the now-removed infinite loop in
while (!tk.IsDisposed())
inWToolkit_shutdown
?
The loop should not be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, volatile modifier has no effect if infinite loop is reinstated..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this volatile modifier resolve the now-removed infinite loop in
while (!tk.IsDisposed())
inWToolkit_shutdown
?The loop should not be removed.
No, it should not, as you noted previously.
However, making m_breakMessageLoop
volatile does not look right either. If QuitMessageLoop
is called from !IsMainThread()
thread, it is posted as a message to run on the correct thread. Thus m_breakMessageLoop
should be accessed on a single thread only; if it's the case, volatile is unneeded.
And @prsadhuk's latest test confirms it. There must be something else…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the test locally for several iterations but it does not fail locally so I believe it's a test issue. Re reverting the test changes made in 1st webrev...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a reason to change the test until the root cause of the hang is not recognized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were many instances faced during test sprint where a test was failing in one platform and we added the stability code of waitForIdle,delay etc to make it work. It was not cited as product bug...I believe it's the same here...I am not able to reproduce it locally at all, and I also checked with other who was also not able to reproduce it locally. It only failed in mach5 and it passed once the test is modified with stability fixes...I dont see a reason to stall it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is not the same, this a hang in the product due to failed toolkit shutdown, please create one more bug and attach the old test to it reproduce the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. JDK-8261016
/integrate |
@prsadhuk Since your change was applied there have been 172 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit c008410. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This test was failing in our nightly mach5 testing. Appropriate stability code in form of waitForIdle and delay is added.
mach5 job running for several iterations on all platforms is ok. Link in JBS.
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/2220/head:pull/2220
$ git checkout pull/2220