8255248: NullPointerException in JFXPanel due to race condition in HostContainer#1968
8255248: NullPointerException in JFXPanel due to race condition in HostContainer#1968prsadhuk wants to merge 13 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back psadhukhan! A progress list of the required criteria for merging this PR into |
|
@prsadhuk This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 9 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
| debugPrint = "true".equalsIgnoreCase(debugStr); | ||
| } | ||
|
|
||
| protected static void debug_println(String str) { |
There was a problem hiding this comment.
javadoc complains about this new public API. Or is it a temporary debugging thing? Can it be declared private?
If it is a permanent thing, it incurs a string concatenation overhead even when disabled. Use lambdas instead? Alternatively (and faster), one needs to check if debug printout is enabled on each use inline:
if(DEBUG) {
debug_println("JFXPanel Thread " + Thread.currentThread().getName() + " isFXUserThread " + Toolkit.getToolkit().isFxUserThread());
}
There was a problem hiding this comment.
I intend to keep it as permanent to enable logging of flow and thread context viewing.
Updated to keep the method private
| String debugStr = System.getProperty(JFXPANEL_DEBUG); | ||
|
|
||
| debugPrint = "true".equalsIgnoreCase(debugStr); | ||
| } |
There was a problem hiding this comment.
suggestion:
private static final boolean DEBUG = Boolean.getBoolean("jfxpanel.debug");
|
@prsadhuk Can you provide your evaluation as to why this is the right fix in light of the comments on the earlier PR #1178 and the discussion in JBS issue JDK-8255248? Reviewers: @kevinrushforth @andy-goryachev-oracle Also, @mstr2 and @hjohn might want to weigh in since they had comments on #1178 /reviewers 2 |
|
@kevinrushforth |
|
I couldn't find anything adverse w.r.t thread usage using the added debug logs and I think it's standard and proven practice to store global (and in this case transient) variable into temp variable and use it to prevent |
| private static final String debugPrefix = "JFXPanel:>> "; | ||
|
|
||
| private static void debug_println(String str) { | ||
| if (DEBUG) { |
There was a problem hiding this comment.
I am sorry, I was not sufficiently clear.
This method does not need the conditional. The conditionals are needed in every place that calls here.
In other words, we don't want to incur the string concatenation overhead if DEBUG is false.
There was a problem hiding this comment.
will that be too much overhead? We had used the same in jdk having DEBUG check in one place...but I will modify it for FX..
There was a problem hiding this comment.
Then you might want to fix all these places in JDK!
The code you currently have now (after 115001f ) has minimum runtime overhead - basically a boolean check.
Anything else consumes more CPU and memory at runtime (we don't have a pre-processor in java thankfully), so if(DEBUG) { debug_print(...); } is the best, albeit a bit verbose option.
Event the modern logging facades incur more overhead with lambdas or formats, since one needs to actually enter the logging function to check whether the level/logger is enabled. I mean logger.debug("param {}", param); or logger.debug("param {}", () -> param());
| public JFXPanel() { | ||
| super(); | ||
|
|
||
| debug_println("JFXPanel Thread " + Thread.currentThread().getName() + " isFXUserThread " + Toolkit.getToolkit().isFxUserThread()); |
There was a problem hiding this comment.
Basically, here (and elsewhere debug_println() is called) we should have
if(DEBUG) {
debug_println("JFXPanel Thread " + Thread.currentThread().getName() + " isFXUserThread " + Toolkit.getToolkit().isFxUserThread());
}
As a separate note, you may want to consider removing some noise from debugging output and only print when "unexpected" condition occurs, such as (thread != fx), right?
There was a problem hiding this comment.
I guess different method will have different unexpected condition so I kept it generic to crosscheck if calling thread are indeed what it should be..
|
Sorry, what exactly are we trying to do here? Is the problem that intractable that we must add a million debug lines that will be shipped to everyone in the hope someone will debug the problem on their machine? This stuff belongs on some local branch, not be considered for inclusion in FX IMHO. |
This is a good point - perhaps we ought to create a follow-up ticket to remove the debug prints once we determine they served their purpose? |
andy-goryachev-oracle
left a comment
There was a problem hiding this comment.
The latest changes look good to me.
I am also curious why we need these debug prints in the first place.
|
Implementing a system with shared state that is concurrently accessed by different threads is difficult, and it requires rigorous analysis because it's almost impossible to empirically verify that code is not racy. Neither the original implementation nor the changes in this PR inspire confidence that this analysis has happened. It seems like we are presented yet again with a supposed fix without a clear explanation of why this is the right approach. Just by looking at As a side note, I don't understand the point of all the logging. If these methods have thread invariants (which you imply in your comment), then enforce those invariants. If they don't, what's the point? |
kevinrushforth
left a comment
There was a problem hiding this comment.
The debug print logging is OK for debugging purposes to help validate your analysis, as long as you realize that it isn't a substitute for analysis. I do agree with John and Michael that keeping the debug statements isn't needed (or wanted) as part of the product, so I'd like to see it removed once this has proceeded to the point that we are otherwise ready to approve it.
On that note, there are still unanswered questions about the thread safety of the JFXPanel class. We do know that, with the exception of setScene (and by extension, getScene()), all public methods are specified to be called on the EDT (the AWT thread). setScene checks the thread and always calls setSceneImpl on the FX thread to do the "heavy lifting". The question then is around the rest of the implementation methods, some of which are called on the FX thread and some on the EDT.
The NPE being hit by the test program is coming because setScene(null), which is called on the FX app thread, will lead to calling HostContainer::setEmbeddedStage(null) and HostContainer::setEmbeddedScene(null). Those methods will set the stagePeer and scenePeer fields to null. Meanwhile, a repaint operation or a mouse move or ... on the EDT could be trying to access the scene peer.
Locally capturing the stagePeer and scenePeer fields in methods that are called on the EDT so that it doesn't change out from under us will prevent the NPE, but doesn't guarantee thread safety. In the case where it prevents the NPE, we go one to call a method on a scene peer that is no longer being used. This might be OK or it might not be.
As long as this doesn't make things worse (and I don't see how it would), we could consider taking some variant of your proposed fix as a "workaround" to solve the NPE, but I'd like to understand the problem better. If we do take this fix, we would need to file a follow-on bug to fix the root cause (which could involve some design work to ensure thread safety without introducing deadlocks).
Yes, the test does this so when FX thread calls Also, removed the debug logging |
There was a problem hiding this comment.
I wonder if we are going to continue chasing the issues until we do it right.
What do you think of the following idea:
- rename all the variables to have
fxandedt(orsw?) prefix to clearly indicate which thread controls the variable (sets, mutates, etc.) - similarly rename the methods that can be renamed, and add comments to methods that cannot be renamed
- whenever the fields are read in a wrong thread, make a local copy (and declare the field as
volatile)
and maybe even go a bit further, instead of suggestion in 3), we explicitly disallow accessing of fields from a wrong thread, and use the message passing via Platform.runLater() and EventQueue.invokeLater() ?
| scenePeer.setPixelScaleFactors((float) newScaleFactorX, | ||
| var hScenePeer = scenePeer; | ||
| if (hScenePeer != null) { | ||
| hScenePeer.setPixelScaleFactors((float) newScaleFactorX, |
There was a problem hiding this comment.
minor: the indentation is now a bit off (maybe remove the wrapping altogether?)
| private void sendMoveEventToFX() { | ||
| if (stagePeer == null) { | ||
| return; | ||
| if (stagePeer != null) { |
There was a problem hiding this comment.
stagePeer is set in the FX thread (L1009), but this method is called from the EDT (L608).
Please assign to a temp. variable like the other cases.
Ideally, this component should be redesigned to ensure proper communication between threads.
| t.substring(0, e.getCommittedCharacterCount()), | ||
| insertionIndex); | ||
| if (scenePeer != null) { | ||
| scenePeer.inputMethodEvent( |
There was a problem hiding this comment.
did we trip over the same problem again here?
which thread modifies the scenePeer field? which thread calls sendInputMethodEventToFX() ?
That would be the best long-term solution, but that would be a large effort.
Possibly. This gets back to the question I asked earlier: is the approach proposed in this PR a reasonable workaround that we might employ until such time as we "do it right"? If so, then we might continue down this path, addressing all outstanding comments, and making sure that we locally capture the scene/stage peer in all places where they are accessed from the EDT.
I think adding comments as to which methods are called on which threads would be very useful (I wouldn't bother renaming any of the methods, since many/most of the interesting ones can't be). Similarly, adding comments to the fields as to which are modified on which thread, would be useful; I guess there might be some value in renaming the fields, but let's at least document this clearly.
This is basically what this PR attempts to do. It is a workaround for the lack of a proper MT solution, but it might very well be "good enough" until we can get a proper MT solution.
I don't think this is practical, or even desired. There are other ways to ensure that we consistently communicate values between the threads, possibly synchronizing operations at the right level of granularity (not just when you read or write the value). |
|
Modified PR to add comments to specify which methods and fields are called on which threads
Synchronizing at atomic granularity needs to be done but many methods internally context switch to other thread which can be problematic for this so I have followed the present approach.. |
There was a problem hiding this comment.
I left a few comments inline. I still want to look more closely at the sendXxxxxEventToFX() methods to see if there are any problems using a stale value of scenePeer() or stagePeer(). I do think this PR improves the situation by avoiding spurious NPEs, so if further analysis doesn't show any additional problems, it it probably reasonable to proceed with this PR and file a follow-on bug to do a proper MT design for JFXPanel.
I also want do some additional testing.
|
|
||
| private transient volatile Scene scene; |
There was a problem hiding this comment.
Minor: I would remove the blank line between these two declarations to make it clear that the comment applies to both.
There was a problem hiding this comment.
I intentionally added the blank line to separate it out as stage is set in setSceneImpl in FX thread and accessed in paintComponent in EDT thread whereas scene is set in setSceneImpl in FX thread and never accessed in EDT thread
| var hStagePeer = JFXPanel.this.stagePeer; | ||
|
|
||
| if (jfxPanelIOP.isUngrabEvent(event)) { | ||
| SwingNodeHelper.runOnFxThread(() -> { | ||
| if (JFXPanel.this.stagePeer != null && | ||
| if (hStagePeer != null && | ||
| getScene() != null && | ||
| getScene().getFocusOwner() != null && | ||
| getScene().getFocusOwner().isFocused()) { | ||
| JFXPanel.this.stagePeer.focusUngrab(); | ||
| hStagePeer.focusUngrab(); |
There was a problem hiding this comment.
This can be reverted. stagePeer is only accessed on the FX thread.
There was a problem hiding this comment.
stagePeer is accessed in processMouseEvent, sendResizeEventToFX, sendMoveEventToFX, sendFocusEventToFX in EDT so I did that.
But it seems in this particular method, it is only accessed in FX thread as is mentioned so reverted..
There was a problem hiding this comment.
Right, I only meant this specific instance (and the usage later on in this same lambda). It looks good now.
| if (JFXPanel.this.stagePeer != null) { | ||
| if (hStagePeer != null) { | ||
| // No need to check if grab is active or not. | ||
| // NoAutoHide popups don't request the grab and | ||
| // ignore the Ungrab event anyway. | ||
| // AutoHide popups actually should be hidden when | ||
| // user clicks some non-FX content, even if for | ||
| // some reason they didn't install the grab when | ||
| // they were shown. | ||
| JFXPanel.this.stagePeer.focusUngrab(); | ||
| hStagePeer.focusUngrab(); |
There was a problem hiding this comment.
This can be reverted. stagePeer is only accessed on the FX thread.
| SwingNodeHelper.runOnFxThread(() -> { | ||
| if ((stage != null) && !stage.isShowing()) { | ||
| stage.show(); | ||
| sendMoveEventToFX(); | ||
| } | ||
| }); | ||
| sendMoveEventToFX(); |
There was a problem hiding this comment.
This is not an equivalent change, since sendMoveEventToFX() can now happen before stage.show() and will also happen unconditionally even if the stage is null or already showing. Rather than moving it out of the runOnFxThread block, leave it where it is and call runOnEDT like this:
SwingNodeHelper.runOnFxThread(() -> {
if ((stage != null) && !stage.isShowing()) {
stage.show();
SwingNodeHelper.runOnEDT(() -> sendMoveEventToFX());
}
});
| if (stagePeer == null) { | ||
| var hStagePeer = stagePeer; | ||
| if (hStagePeer == null) { |
There was a problem hiding this comment.
If this is always invoked from the FX thread, you don't need to locally capture the peers.
There was a problem hiding this comment.
but down below it is accessed in EDT so I did the local capture
| Platform.runLater(() -> contentPane.setScene(webView.getScene())); | ||
| Thread.sleep(100); | ||
| } | ||
| System.out.println("failure = " + failure.get()); |
There was a problem hiding this comment.
This print can be removed.
kevinrushforth
left a comment
There was a problem hiding this comment.
The fix now looks good with one question about the change in EmbeddedScene. I also looked at the implementation of the EmbeddedStage and EmbeddedScene methods, and they all do the right thing in that they forward the requests to the FX thread.
I do want to see a follow-up issue filed to consider redesigning the threading model, but I think this PR is a good workaround for the NPEs.
As for the newly added test, it passes with the fix and with no exception messages (good). However, at least one of the times I ran the test without the fix (using the existing 100 msec sleep), it printed a couple exceptions and passed anyway. It seems that some of the exceptions that can occur will cause the test to fail but others will not. You might consider using the test.javafx.util.OutputRedirect utility instead of relying on the UncaughtExceptionHandler. This might not be a problem in practice if you reduce the sleep time, but will make the test more likely to catch any problems.
| var hStagePeer = JFXPanel.this.stagePeer; | ||
|
|
||
| if (jfxPanelIOP.isUngrabEvent(event)) { | ||
| SwingNodeHelper.runOnFxThread(() -> { | ||
| if (JFXPanel.this.stagePeer != null && | ||
| if (hStagePeer != null && | ||
| getScene() != null && | ||
| getScene().getFocusOwner() != null && | ||
| getScene().getFocusOwner().isFocused()) { | ||
| JFXPanel.this.stagePeer.focusUngrab(); | ||
| hStagePeer.focusUngrab(); |
There was a problem hiding this comment.
Right, I only meant this specific instance (and the usage later on in this same lambda). It looks good now.
| if (stagePeer == null) { | ||
| var hStagePeer = stagePeer; | ||
| if (hStagePeer == null) { |
| if (getSceneState() != null) { | ||
| updateSceneState(); | ||
| } |
There was a problem hiding this comment.
None of the other calls to updateSceneState() check for a null sceneState. Why is this is the only one that needs to? If it is needed, it might be safer to move the null check into updateSceneState itself.
There was a problem hiding this comment.
We were getting this so it is needed
java.lang.NullPointerException: Cannot invoke "com.sun.javafx.tk.quantum.SceneState.update()" because "this.sceneState" is null
at javafx.graphics@26-internal/com.sun.javafx.tk.quantum.GlassScene.updateSceneState(GlassScene.java:253)
at javafx.graphics@26-internal/com.sun.javafx.tk.quantum.EmbeddedScene.lambda$setPixelScaleFactors$1(EmbeddedScene.java:158)
at javafx.graphics@26-internal/com.sun.javafx.tk.quantum.QuantumToolkit.runWithRenderLock(QuantumToolkit.java:447)
at javafx.graphics@26-internal/com.sun.javafx.tk.quantum.EmbeddedScene.lambda$setPixelScaleFactors$0(EmbeddedScene.java:157)
at javafx.graphics@26-internal/com.sun.javafx.application.PlatformImpl.lambda$runLater$0(PlatformImpl.java:424)
at javafx.graphics@26-internal/com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
at javafx.graphics@26-internal/com.sun.glass.ui.win.WinApplication._runLoop(Native Method)
| Platform.runLater(() -> contentPane.setScene(null)); | ||
| Thread.sleep(100); | ||
| Platform.runLater(() -> contentPane.setScene(webView.getScene())); | ||
| Thread.sleep(100); |
There was a problem hiding this comment.
With a sleep(100) this only occasionally gets an exception when I run it without your fix. I recommend sleep(1) which is what the manual test does. The test will take less time and also be a better stress test.
kevinrushforth
left a comment
There was a problem hiding this comment.
Looks good with one more minor suggestion. As mentioned offline, we'll do a CI headful test run.
| if (getSceneState() != null) { | ||
| sceneState.update(); |
There was a problem hiding this comment.
| if (getSceneState() != null) { | |
| sceneState.update(); | |
| if (sceneState != null) { | |
| sceneState.update(); |
Minor: it seems cleaner to access sceneState directly in both the test and usage rather than mixing them.
| contentPane.setScene(new Scene(webView)); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Very minor: there is an extra blank line here that you might remove.
kevinrushforth
left a comment
There was a problem hiding this comment.
Updated test looks good with one additional suggestion.
| Thread.sleep(1); | ||
| Platform.runLater(() -> contentPane.setScene(webView.getScene())); | ||
| Thread.sleep(1); | ||
| } |
There was a problem hiding this comment.
The calls in the loop to the FX and AWT threads are asynchronous, so I recommend adding the following after the loop to ensure that both threads are finished with the work submitted in the loops:
// Wait for both threads to process the earlier runnables
SwingUtilities.invokeAndWait(() -> {});
Util.runAndWait(() -> {});
andy-goryachev-oracle
left a comment
There was a problem hiding this comment.
Looked at how stage, scene, stagePeer, and scenePeer fields are used, looks good. Left some minor comments for the follow-up/redesign.
Only the test needs to be changed, otherwise looks good. Thank you @prsadhuk for persistence!
| Platform.runLater(() -> contentPane.setScene(null)); | ||
| Thread.sleep(1); | ||
| Platform.runLater(() -> contentPane.setScene(webView.getScene())); | ||
| Thread.sleep(1); |
There was a problem hiding this comment.
discussed offline:
we'll should add some code (invokeAndWait + Util.runAndWait) here to make sure all the prior events are processed before leaving the try block and doing OutputRedirect.checkAndRestoreStderr();
Also, maybe add a @Timeout to make sure the test does not hang.
| final void updateSceneState() { | ||
| // should only be called on the event thread | ||
| sceneState.update(); | ||
| if (sceneState != null) { |
There was a problem hiding this comment.
L253, this is the original code, but I found the "event thread" words misleading - too similar to "event dispatcher" thread. Should it be "fx application thread"?
| if (hStagePeer != null) { | ||
| int focusCause = AbstractEvents.FOCUSEVENT_ACTIVATED; | ||
| stagePeer.setFocused(true, focusCause); | ||
| hStagePeer.setFocused(true, focusCause); |
There was a problem hiding this comment.
suggestion (for a follow up): document which methods in EmbeddedStageInterface can be called from which thread.
setFocused() method seems to be thread-safe.
| if ((stage != null) && !stage.isShowing()) { | ||
| stage.show(); | ||
| sendMoveEventToFX(); | ||
| SwingNodeHelper.runOnEDT(() -> sendMoveEventToFX()); |
There was a problem hiding this comment.
for possible followup: L1032 the field returned in getInputMethodRequests() in EmbeddedScene is not volatile.
| dnd.addNotify(); | ||
| if (scenePeer != null) { | ||
| scenePeer.setDragStartListener(dnd.getDragStartListener()); | ||
| if (hScenePeer != null) { |
There was a problem hiding this comment.
this null check can be removed, since the code will bail out on L1106 if hScenePeer == null
|
Updated PR to rename var w.r.t thread setting it... |
kevinrushforth
left a comment
There was a problem hiding this comment.
LGTM
@prsadhuk Please file the follow-on bug as discussed and add a comment with the bug ID.
andy-goryachev-oracle
left a comment
There was a problem hiding this comment.
Looks good. Thank you for all the work!
| import test.javafx.util.OutputRedirect; | ||
| import test.util.Util; | ||
|
|
||
| @Timeout(value=30000, unit=TimeUnit.MILLISECONDS) |
There was a problem hiding this comment.
just FYI: the default time unit is SECONDS, so we can simply write
@Timeout(30)
(we used MILLISECONDS earlier to minimize the changes going from junit4)
|
Looks like JBS is still having connectivity problems. Once it wakes up, this should be marked as ready to integrate. /touch |
|
@kevinrushforth The pull request is being re-evaluated and the inactivity timeout has been reset. |
|
/issue add JDK-8334593 |
|
@prsadhuk |
|
/integrate |
|
Going to push as commit 33abf5d.
Your commit was automatically rebased without conflicts. |
|
Regarding my comment:
I see you filed JDK-8372322 as a follow-up issue. Thank you. |
NPE is seen while accessing transient "scenePeer" variable between reads..
Fix is made to store it in a temp variable rather than reading it twice since the value can change between successive reads in many places it is accessed.
Also some debug logs added to be enabled via
jfxpanel.debugpropertyProgress
Issues
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jfx.git pull/1968/head:pull/1968$ git checkout pull/1968Update a local copy of the PR:
$ git checkout pull/1968$ git pull https://git.openjdk.org/jfx.git pull/1968/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 1968View PR using the GUI difftool:
$ git pr show -t 1968Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jfx/pull/1968.diff
Using Webrev
Link to Webrev Comment